Like yesterday's change to run CPU-parent child tasks serially in thread, this
reduces peak memory usage by improving the temporaly locality of the bitmaps we
create.
E.g. Let's say we start with tasks A B C and D
Queue: [ A B C D ]
Running A creates A' and A", which depend on a bitmap created by A.
Queue: [ B C D A' A" * ]
That bitmap now needs sit around in RAM while B C and D run pointlessly and can
only be destroyed at *. If instead we do this and push dependent child tasks
to the front of the queue, the queue and bitmap lifetime looks like this:
Queue: [ A' A" * B C D ]
This is much, much worse in practice because the queue is often several thousand
tasks long. 100s of megs of bitmaps can pile up for 10s of seconds pointlessly.
To make this work we add addNext() to SkThreadPool and its cousin DMTaskRunner.
I also took the opportunity to swap head and tail in the threadpool
implementation so it matches the comments and intuition better: we always pop
the head, add() puts it at the tail, addNext() at the head.
Before
Debug: 49s, 1403352k peak
Release: 16s, 2064008k peak
After
Debug: 49s, 1234788k peak
Release: 15s, 1903424k peak
BUG=skia:2478
R=bsalomon@google.com, borenet@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/263803003
git-svn-id: http://skia.googlecode.com/svn/trunk@14506 2bbb7eff-a529-9590-31e7-b0007b416f81
The main meat of things is in SkThreadPool. We can now give SkThreadPool a
type for each thread to create and destroy on its local stack. It's TLS
without going through SkTLS.
I've split the DM tasks into CpuTasks that run on threads with no TLS, and
GpuTasks that run on threads with a thread local GrContextFactory.
The old CpuTask and GpuTask have been renamed to CpuGMTask and GpuGMTask.
Upshot: default run of out/Debug/dm goes from ~45 seconds to ~20 seconds.
BUG=skia:
R=bsalomon@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/179233005
git-svn-id: http://skia.googlecode.com/svn/trunk@13632 2bbb7eff-a529-9590-31e7-b0007b416f81
There's a scenario that we're currently not allowing for, but I'd really like to use in DM:
1) client calls add(SomeRunnable*) several times
2) client calls wait()
3) any of the runnables added by the client _themselves_ call add(SomeOtherRunnable*)
4-inf) maybe those SomeOtherRunnables too call add(SomeCrazyThirdRunnable*), etc.
Right now in this scenario we'll assert in debug mode in step 3) when we call
add() and we're waiting to stop, and do strange unspecified things in release
mode.
The old threadpool had basically two states: running, and waiting to stop. If
a thread saw we were waiting to stop and the queue was empty, that thread shut
down. This wasn't accounting for any work that other threads might be doing;
potentially they were about to add to the queue.
So now we have three states: running, waiting, and halting. When the client
calls wait() (or the destructor triggers), we move into waiting. When a thread
notices we're _really_ done, that is, have an empty queue and there are no
active threads, we move into halting. The halting state actually triggers the
threads to stop, which wait() is patiently join()ing on.
BUG=
R=bungeman@google.com, bsalomon@google.com
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/26389005
git-svn-id: http://skia.googlecode.com/svn/trunk@11852 2bbb7eff-a529-9590-31e7-b0007b416f81