Like yesterday's change to run CPU-parent child tasks serially in thread, this
reduces peak memory usage by improving the temporaly locality of the bitmaps we
create.
E.g. Let's say we start with tasks A B C and D
Queue: [ A B C D ]
Running A creates A' and A", which depend on a bitmap created by A.
Queue: [ B C D A' A" * ]
That bitmap now needs sit around in RAM while B C and D run pointlessly and can
only be destroyed at *. If instead we do this and push dependent child tasks
to the front of the queue, the queue and bitmap lifetime looks like this:
Queue: [ A' A" * B C D ]
This is much, much worse in practice because the queue is often several thousand
tasks long. 100s of megs of bitmaps can pile up for 10s of seconds pointlessly.
To make this work we add addNext() to SkThreadPool and its cousin DMTaskRunner.
I also took the opportunity to swap head and tail in the threadpool
implementation so it matches the comments and intuition better: we always pop
the head, add() puts it at the tail, addNext() at the head.
Before
Debug: 49s, 1403352k peak
Release: 16s, 2064008k peak
After
Debug: 49s, 1234788k peak
Release: 15s, 1903424k peak
BUG=skia:2478
R=bsalomon@google.com, borenet@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/263803003
git-svn-id: http://skia.googlecode.com/svn/trunk@14506 2bbb7eff-a529-9590-31e7-b0007b416f81
These tasks tend to do similar things with similar sized bitmaps, so running
them serially means we tend to hold 2x bitmaps at a time (golden and
comparison) instead of (1+k)x bitmaps (golden and k concurrent comparisons).
We still migrate GPU task's children over to the main CPU thread pool,
because they'll run faster there and free up capacity on the GPU thread.
Before
Debug: 54s, 2.9G peak
Release: 13s, 2.4G peak
After
Debug: 48s, 1.5G peak
Release: 15s, 2.0G peak
BUG=skia:2478
R=borenet@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/261593008
git-svn-id: http://skia.googlecode.com/svn/trunk@14486 2bbb7eff-a529-9590-31e7-b0007b416f81
The main meat of things is in SkThreadPool. We can now give SkThreadPool a
type for each thread to create and destroy on its local stack. It's TLS
without going through SkTLS.
I've split the DM tasks into CpuTasks that run on threads with no TLS, and
GpuTasks that run on threads with a thread local GrContextFactory.
The old CpuTask and GpuTask have been renamed to CpuGMTask and GpuGMTask.
Upshot: default run of out/Debug/dm goes from ~45 seconds to ~20 seconds.
BUG=skia:
R=bsalomon@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/179233005
git-svn-id: http://skia.googlecode.com/svn/trunk@13632 2bbb7eff-a529-9590-31e7-b0007b416f81
Also:
- make GrMemoryPoolBenches threadsafe
- some tweaks to various DM code
- rename GM::shortName() to getName() to match benches and tests
On my desktop, (289 GMs, 617 benches) x 4 configs, 227 tests takes 46s in Debug, 14s in Release. (Still minutes faster than running tests && bench && gm.) GPU singlethreading is definitely the limiting factor again; going to reexamine whether that's helpful to thread it again.
BUG=skia:
R=reed@google.com, bsalomon@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/178473006
git-svn-id: http://skia.googlecode.com/svn/trunk@13603 2bbb7eff-a529-9590-31e7-b0007b416f81
- refactor GYPs and a few flags
- make GPU tests grab a thread-local GrContextFactory when needed as we do in DM for GMs
- add a few more UI features to make DM more like tests
I believe this makes the program 'tests' obsolete.
It should be somewhat faster to run the two sets together than running the old binaries serially:
- serial: tests 20s (3m18s CPU), dm 21s (3m01s CPU)
- together: 27s (6m21s CPU)
Next up is to incorporate benches. I'm only planning there on a single-pass sanity check, so that won't obsolete the program 'bench' just yet.
Tested: out/Debug/tests && out/Debug/dm && echo ok
BUG=skia:
Committed: http://code.google.com/p/skia/source/detail?r=13586R=reed@google.com, bsalomon@google.com, mtklein@google.com, tfarina@chromium.org
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/178273002
git-svn-id: http://skia.googlecode.com/svn/trunk@13592 2bbb7eff-a529-9590-31e7-b0007b416f81
- refactor GYPs and a few flags
- make GPU tests grab a thread-local GrContextFactory when needed as we do in DM for GMs
- add a few more UI features to make DM more like tests
I believe this makes the program 'tests' obsolete.
It should be somewhat faster to run the two sets together than running the old binaries serially:
- serial: tests 20s (3m18s CPU), dm 21s (3m01s CPU)
- together: 27s (6m21s CPU)
Next up is to incorporate benches. I'm only planning there on a single-pass sanity check, so that won't obsolete the program 'bench' just yet.
Tested: out/Debug/tests && out/Debug/dm && echo ok
BUG=skia:
R=reed@google.com, bsalomon@google.com, mtklein@google.com, tfarina@chromium.org
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/178273002
git-svn-id: http://skia.googlecode.com/svn/trunk@13586 2bbb7eff-a529-9590-31e7-b0007b416f81
This is sort of the near-minimal proof-of-concept skeleton.
- It can run existing GMs.
- It supports most configs (just not PDF).
- --replay is the only "fancy" feature it currently supports
Hopefully you will be disturbed by its speed.
BUG=
R=epoger@google.com
Review URL: https://codereview.chromium.org/22839016
git-svn-id: http://skia.googlecode.com/svn/trunk@11802 2bbb7eff-a529-9590-31e7-b0007b416f81