The main meat of things is in SkThreadPool. We can now give SkThreadPool a
type for each thread to create and destroy on its local stack. It's TLS
without going through SkTLS.
I've split the DM tasks into CpuTasks that run on threads with no TLS, and
GpuTasks that run on threads with a thread local GrContextFactory.
The old CpuTask and GpuTask have been renamed to CpuGMTask and GpuGMTask.
Upshot: default run of out/Debug/dm goes from ~45 seconds to ~20 seconds.
BUG=skia:
R=bsalomon@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/179233005
git-svn-id: http://skia.googlecode.com/svn/trunk@13632 2bbb7eff-a529-9590-31e7-b0007b416f81