This implementation improves performance of SkMutex acquire / release pair from 42ns -> 13 ns.
SkSharedMutex and SkSpinlock have the same performance.
It also removes specialized windows and linux/mac code.
BUG=skia:
Review URL: https://codereview.chromium.org/1359733002