Reason for revert:
Leaks, leaks, leaks.
Original issue's description:
> SkThreadPool ~~> SkTaskGroup
>
> SkTaskGroup is like SkThreadPool except the threads stay in
> one global pool. Each SkTaskGroup itself is tiny (4 bytes)
> and its wait() method applies only to tasks add()ed to that
> instance, not the whole thread pool.
>
> This means we don't need to bring up new thread pools when
> tests themselves want to use multithreading (e.g. pathops,
> quilt). We just create a new SkTaskGroup and wait for that
> to complete. This should be more efficient, and allow us
> to expand where we use threads to really latency sensitive
> places. E.g. we can probably now use these in nanobench
> for CPU .skp rendering.
>
> Now that all threads are sharing the same pool, I think we
> can remove most of the custom mechanism pathops tests use
> to control threading. They'll just ride on the global pool
> with all other tests now.
>
> This (temporarily?) removes the GPU multithreading feature
> from DM, which we don't use.
>
> On my desktop, DM runs a little faster (57s -> 55s) in
> Debug, and a lot faster in Release (36s -> 24s). The bots
> show speedups of similar proportions, cutting more than a
> minute off the N4/Release and Win7/Debug runtimes.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/9c7207b5dc71dc5a96a2eb107d401133333d5b6fR=caryclark@google.com, bsalomon@google.com, bungeman@google.com, reed@google.com, mtklein@chromium.orgTBR=bsalomon@google.com, bungeman@google.com, caryclark@google.com, mtklein@chromium.org, reed@google.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/533393002
SkTaskGroup is like SkThreadPool except the threads stay in
one global pool. Each SkTaskGroup itself is tiny (4 bytes)
and its wait() method applies only to tasks add()ed to that
instance, not the whole thread pool.
This means we don't need to bring up new thread pools when
tests themselves want to use multithreading (e.g. pathops,
quilt). We just create a new SkTaskGroup and wait for that
to complete. This should be more efficient, and allow us
to expand where we use threads to really latency sensitive
places. E.g. we can probably now use these in nanobench
for CPU .skp rendering.
Now that all threads are sharing the same pool, I think we
can remove most of the custom mechanism pathops tests use
to control threading. They'll just ride on the global pool
with all other tests now.
This (temporarily?) removes the GPU multithreading feature
from DM, which we don't use.
On my desktop, DM runs a little faster (57s -> 55s) in
Debug, and a lot faster in Release (36s -> 24s). The bots
show speedups of similar proportions, cutting more than a
minute off the N4/Release and Win7/Debug runtimes.
BUG=skia:
R=caryclark@google.com, bsalomon@google.com, bungeman@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/531653002
For now this only creates a degenerate bounding box hierarchy where all ops
just have maximal bounds. I will flesh out FillBounds in future CL(s).
Not quite sure why QuadTree and TileGrid aren't drawing right---haven't even
looked at the diffs yet---so I've disabled those test modes for now. RTree
seems fine, so that'll at least get us coverage for all this new plumbing.
BUG=skia:
R=robertphillips@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/454123003
Allow GM results to be compared across machines and platforms by
standardizing the fonts used by all tests.
This adds runtime flags to DM to use either the system font context (the
default), the fonts in the resources directory ( --resourceFonts ) or a set
of canonical paths generated from the fonts ( --portableFonts ).
This CL should leave the current DM results unchanged by default.
If the portable font data or resource font is missing when DM is run, it
falls back to using the system font context.
The create_test_font tool generates the paths and metrics read by DM
with the --portableFonts flag set, and generates the font substitution
tables read by DM with the --resourceFonts flag set.
If DM is run in SkDebug mode with the --reportUsedChars flag set, it
generates the corresponding data compiled into the create_test_font tool.
All GM tests set their typeface information by calling either
sk_tool_utils::set_portable_typeface or
sk_tool_utils::portable_typeface .
(The former takes the paint, the latter returns a SkTypeface.) These calls
can be removed in the future when the Font Manager can be superceded.
BUG=skia:2687
R=mtklein@google.com
Review URL: https://codereview.chromium.org/407183003
Share command flags between dm and unit tests.
Also, allow dm's core to be included by itself and iOSShell.
Command line flags that are the same (or nearly the same) in DM
and in skia_tests have been moved to common_flags. Authors,
please check to see that the shared common flag is correct for
the tool.
For iOS, the 'tool_main' entry point has a wrapper to allow multiple
tools to be statically linked in the iOSShell.
Since SkCommandLineFlags::Parse can only be called once, these calls
are disabled in the IOS build.
Since the iOS app directory is dynamically assigned a name, use '@' to
select it. (This is the same convention chosen by the Mobile Harness
iOS file system utilities.)
Move the heart of dm.gyp into dm.gypi so that it can be included by
itself and iOSShell.gyp.
Add tools/flags/SkCommonFlags.* to define and declare common
command line flags.
Add support for dm to iOSShell.
BUG=skia:
R=scroggo@google.com, mtklein@google.com, jvanverth@google.com, bsalomon@google.com
Author: caryclark@google.com
Review URL: https://codereview.chromium.org/389653004
Replay isn't that helpful of a test any more now that we have the more
stringent Quilt tests. Quilt was missing bounding box hierarchies, though,
while Replay was sort of testing RTree (pointlessly, as it was drawing without
any clip). Now Quilt does everything, testing RTree, QuadTree, and TileGrid.
Quilt mode now falls back to drawing all at once (i.e. Replay) for GMs that
don't tile perfectly. Still a TODO to make this check more flexible than exact
pixel matches.
Two GMs fail when using a BBH:
- imageresizetiled
- resizeimagefilter
We think we're not adjusting the bounds of save layers by their paint.
This is probably a bug, but one to be fixed separately from adding new tests.
BUG=skia:
R=robertphillips@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/377373003
Now that we're drawing tiles threaded like implside painting, remove the checks
that those lock counts are balanced. They're just not right for anyone anymore.
SkBitmaps themselves are not threadsafe (even const ones), so shallow copy them
on playback of an SkRecord. (The underlying SkPixelRefs are threadsafe.)
Simplify quilt drawing by using SkBitmap::extractSubset. No need for locking.
Bump up to 256x256 tiles. 16x16 tiles just murders performance (way too much
contention). This has the nice side effect of letting us enable a bunch more
GMs for quilt mode; they drew wrong with small tiles but exactly right with large.
BUG=171776
R=reed@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/371023005
Always build the tools with JSON, but either build our own
or use the system's.
Rename skia_build_json_writer to skia_use_system_jsoncpp,
since we now always build with JSON.
Remove SK_BUILD_JSON_WRITER, which was only there so
we could build without JSON it in the framework.
BUG=skia:2448
R=djsollen@google.com, reed@google.com
Author: scroggo@google.com
Review URL: https://codereview.chromium.org/303913002
I'm soon going to have SkRecorder start calling getTotalMatrix(), which
would be broken in write-only mode. That change is big and nebulous,
but it's clear kWriteOnly needs to go, so we might as well kill it now.
My notes in bench_playback about kWriteOnly mode being important were
probably overly cautious. I now think this is a fair enough comparison
even re-recording into a read-write canvas.
BUG=skia:2378
R=fmalita@chromium.org, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/290653004
git-svn-id: http://skia.googlecode.com/svn/trunk@14963 2bbb7eff-a529-9590-31e7-b0007b416f81
Like yesterday's change to run CPU-parent child tasks serially in thread, this
reduces peak memory usage by improving the temporaly locality of the bitmaps we
create.
E.g. Let's say we start with tasks A B C and D
Queue: [ A B C D ]
Running A creates A' and A", which depend on a bitmap created by A.
Queue: [ B C D A' A" * ]
That bitmap now needs sit around in RAM while B C and D run pointlessly and can
only be destroyed at *. If instead we do this and push dependent child tasks
to the front of the queue, the queue and bitmap lifetime looks like this:
Queue: [ A' A" * B C D ]
This is much, much worse in practice because the queue is often several thousand
tasks long. 100s of megs of bitmaps can pile up for 10s of seconds pointlessly.
To make this work we add addNext() to SkThreadPool and its cousin DMTaskRunner.
I also took the opportunity to swap head and tail in the threadpool
implementation so it matches the comments and intuition better: we always pop
the head, add() puts it at the tail, addNext() at the head.
Before
Debug: 49s, 1403352k peak
Release: 16s, 2064008k peak
After
Debug: 49s, 1234788k peak
Release: 15s, 1903424k peak
BUG=skia:2478
R=bsalomon@google.com, borenet@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/263803003
git-svn-id: http://skia.googlecode.com/svn/trunk@14506 2bbb7eff-a529-9590-31e7-b0007b416f81
These tasks tend to do similar things with similar sized bitmaps, so running
them serially means we tend to hold 2x bitmaps at a time (golden and
comparison) instead of (1+k)x bitmaps (golden and k concurrent comparisons).
We still migrate GPU task's children over to the main CPU thread pool,
because they'll run faster there and free up capacity on the GPU thread.
Before
Debug: 54s, 2.9G peak
Release: 13s, 2.4G peak
After
Debug: 48s, 1.5G peak
Release: 15s, 2.0G peak
BUG=skia:2478
R=borenet@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/261593008
git-svn-id: http://skia.googlecode.com/svn/trunk@14486 2bbb7eff-a529-9590-31e7-b0007b416f81
Before this change, when limited to 4G, pathops threaded tests were the weak
link RAM-consumption-wise (in thread-local font caches) up until about 12
cores, where other problems begin to pile up too.
Tested by running:
ulimit -Sv 4194304
out/Debug/dm --threads N [--pathOpsSingleThread]
After this, we're _probably_ good to go on 32-bit machines with 8 cores.
BUG=skia:2478
R=borenet@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/265593003
git-svn-id: http://skia.googlecode.com/svn/trunk@14463 2bbb7eff-a529-9590-31e7-b0007b416f81
- Rename TileGrid -> Quilt to avoid the name overload.
- Tag all failing GMs with kSkipTiled_Flag.
You may be wondering, do any GMs pass? Yes, some do! And that trends towards all of them as we increase --quiltTile.
Two GMs only fail in --quilt mode in 565. Otherwise all GMs which fail are skipped, and those which don't fail aren't. (The 8888 variants of those two GMs are skipped even though they pass.)
BUG=skia:2477
R=reed@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/256373002
git-svn-id: http://skia.googlecode.com/svn/trunk@14457 2bbb7eff-a529-9590-31e7-b0007b416f81
This CL sets the stage for retracting the SkPicture::kOptimizeForClippedPlayback_RecordingFlag flag
from the public API (more work needs to be done in Blink & Chrome). In the new world the only way
to set this flag (and thus instantiate an SkPicture-derived
class) is by passing a factory to the SkPictureRecorder class. This is to get all clients always using
factories so that we can then change the factory call used (i.e., so the factory just creates a BBH) and
do away with the SkPicture-derived classes.
BUG=skia:2315
R=reed@google.com
Author: robertphillips@google.com
Review URL: https://codereview.chromium.org/239703006
git-svn-id: http://skia.googlecode.com/svn/trunk@14221 2bbb7eff-a529-9590-31e7-b0007b416f81
Record performance as measured by bench_record (out/Release/bench_record --skr) improves by at least 1.9x, at most 6.7x, arithmetic mean 2.6x, geometric mean 3.0x. So, good.
Correctness as measured by DM (out/Debug/dm --skr) is ~ok. One GM (shadertext2) fails because we're assuming all paint effects are immutable, but SkShaders are still mutable.
To do after this CL:
- measure playback speed
- catch up feature-wise to SkPicture
- match today's playback speed
BUG=skia:
R=robertphillips@google.com, bsalomon@google.com, reed@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/206313003
git-svn-id: http://skia.googlecode.com/svn/trunk@14010 2bbb7eff-a529-9590-31e7-b0007b416f81
DM will now read and write a custom image format, which begins first with a
normal PNG, but also includes the bitmap's raw pixels. Most tools (file
browser, Preview, Chrome, Skia image decoder) will read the file as PNG. DM
skips the encoded PNG and reads the raw pixels instead.
This scheme allows perfect pixel comparisons that never go through PM -> UPM or
UPM -> PM conversions, as long as you compare on a machine with the same native
pixel format as the machine which generated the images. Since we've only ever
used this to do same-machine comparisons, that's more than good enough for now.
We could convert as needed to a standardized PM pixel format if we find we want
these files to be portable.
DM -w output now does increase to ~1.3G. If that turns out to be annoyingly
big, I'm sure I can come up with a simple pixelwise RLE instead of writing
uncompressed pixels.
BUG=skia:
R=reed@google.com, bsalomon@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/185343002
git-svn-id: http://skia.googlecode.com/svn/trunk@13638 2bbb7eff-a529-9590-31e7-b0007b416f81
The main meat of things is in SkThreadPool. We can now give SkThreadPool a
type for each thread to create and destroy on its local stack. It's TLS
without going through SkTLS.
I've split the DM tasks into CpuTasks that run on threads with no TLS, and
GpuTasks that run on threads with a thread local GrContextFactory.
The old CpuTask and GpuTask have been renamed to CpuGMTask and GpuGMTask.
Upshot: default run of out/Debug/dm goes from ~45 seconds to ~20 seconds.
BUG=skia:
R=bsalomon@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/179233005
git-svn-id: http://skia.googlecode.com/svn/trunk@13632 2bbb7eff-a529-9590-31e7-b0007b416f81
Also:
- make GrMemoryPoolBenches threadsafe
- some tweaks to various DM code
- rename GM::shortName() to getName() to match benches and tests
On my desktop, (289 GMs, 617 benches) x 4 configs, 227 tests takes 46s in Debug, 14s in Release. (Still minutes faster than running tests && bench && gm.) GPU singlethreading is definitely the limiting factor again; going to reexamine whether that's helpful to thread it again.
BUG=skia:
R=reed@google.com, bsalomon@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/178473006
git-svn-id: http://skia.googlecode.com/svn/trunk@13603 2bbb7eff-a529-9590-31e7-b0007b416f81
- refactor GYPs and a few flags
- make GPU tests grab a thread-local GrContextFactory when needed as we do in DM for GMs
- add a few more UI features to make DM more like tests
I believe this makes the program 'tests' obsolete.
It should be somewhat faster to run the two sets together than running the old binaries serially:
- serial: tests 20s (3m18s CPU), dm 21s (3m01s CPU)
- together: 27s (6m21s CPU)
Next up is to incorporate benches. I'm only planning there on a single-pass sanity check, so that won't obsolete the program 'bench' just yet.
Tested: out/Debug/tests && out/Debug/dm && echo ok
BUG=skia:
Committed: http://code.google.com/p/skia/source/detail?r=13586R=reed@google.com, bsalomon@google.com, mtklein@google.com, tfarina@chromium.org
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/178273002
git-svn-id: http://skia.googlecode.com/svn/trunk@13592 2bbb7eff-a529-9590-31e7-b0007b416f81
- refactor GYPs and a few flags
- make GPU tests grab a thread-local GrContextFactory when needed as we do in DM for GMs
- add a few more UI features to make DM more like tests
I believe this makes the program 'tests' obsolete.
It should be somewhat faster to run the two sets together than running the old binaries serially:
- serial: tests 20s (3m18s CPU), dm 21s (3m01s CPU)
- together: 27s (6m21s CPU)
Next up is to incorporate benches. I'm only planning there on a single-pass sanity check, so that won't obsolete the program 'bench' just yet.
Tested: out/Debug/tests && out/Debug/dm && echo ok
BUG=skia:
R=reed@google.com, bsalomon@google.com, mtklein@google.com, tfarina@chromium.org
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/178273002
git-svn-id: http://skia.googlecode.com/svn/trunk@13586 2bbb7eff-a529-9590-31e7-b0007b416f81
We've already decoded the PNGs themselves into unpremultiplied bitmaps with native byte order. SkColor is just not the right choice unless you get lucky.
dm -w /tmp/dm && dm -r /tmp/dm still works on Linux, and it's much closer to working on Mac:
0 tasks left, 2 failed
Failures:
matrixconvolution_gpu
colormatrix_gpu
BUG=skia:
R=reed@google.com
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/139943002
git-svn-id: http://skia.googlecode.com/svn/trunk@13101 2bbb7eff-a529-9590-31e7-b0007b416f81
PNGs store unpremultiplied colors, so we have to convert back and forth with
SkBitmap. This is lossy. GM solves this problem by stripping the alpha
channel before writing the PNG.
This flips it around, converting the GM's output to unpremultiplied as needed. This way each pixel goes from premul to unpremul once, never back.
Tested:
out/Release/dm -w /tmp/w --config 565 8888 gpu
out/Release/dm -r /tmp/w --config 565 8888 gpu
BUG=
R=bsalomon@google.com
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/122923003
git-svn-id: http://skia.googlecode.com/svn/trunk@12926 2bbb7eff-a529-9590-31e7-b0007b416f81
DM writes out its images in a hierarchy that's a little different than GM,
so this can't read GM's output. But it can read its own, written with -w.
Example usage:
$ out/Release/dm -w /tmp/baseline
$ out/Release/dm -r /tmp/baseline -w /tmp/new
(and optionally)
$ mkdir /tmp/diff; out/Release/skdiff /tmp/baseline /tmp/new /tmp/diff
GM's IndividualImageExpectationsSource and Expectations are a little too eager
about decoding and hashing the expected images, so I took the opportunity to
add DM::Expectations that mostly replaces skiagm::ExpectationsSource and
skiagm::Expectations in DM. It mainly exists to move the image decoding and
comparison off the main thread, which would otherwise be a major speed
bottleneck.
I tried to use skiagm code where possible. One notable place where I differed
is in this new feature. When -r is a directory of images, DM does no hashing.
It considerably faster to read the expected file into an SkBitmap and do a
byte-for-byte comparison than to hash the two bitmaps and check those.
The example usage above isn't quite working 100% yet. Expectations on some GMs
fail, even with no binary change. I haven't pinned down whether this is due to
- a bug in DM
- flaky GMs
- unthreadsafe GMs
- flaky image decoding
- unthreadsafe image decoding
- something else
but I intend to. Leon, Derek and I have suspected PNG decoding isn't
threadsafe, but are as yet unable to prove it.
I also seem to be able to cause malloc to fail on my laptop if I run too many
configs at once, though I never seem to be using more than ~1G of RAM. Will
track that down too.
BUG=
R=reed@google.com, bsalomon@google.com
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/108963002
git-svn-id: http://skia.googlecode.com/svn/trunk@12596 2bbb7eff-a529-9590-31e7-b0007b416f81
This is sort of the near-minimal proof-of-concept skeleton.
- It can run existing GMs.
- It supports most configs (just not PDF).
- --replay is the only "fancy" feature it currently supports
Hopefully you will be disturbed by its speed.
BUG=
R=epoger@google.com
Review URL: https://codereview.chromium.org/22839016
git-svn-id: http://skia.googlecode.com/svn/trunk@11802 2bbb7eff-a529-9590-31e7-b0007b416f81