This fixes a bug where we run some Android bots with --nocpu, and the
current behavior disables the (CPU-bound) WriteTasks the GPU bound GM
runs spawn off. The WriteTasks don't run and we end up with "null" in
our .json files.
Tested locally: out/Release/dm --nocpu -w /tmp/out; ls /tmp/out
dm.json gpu/
BUG=skia:2938
R=jcgregorio@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/578033002
This has the nice property of being able to double-check hashes after the fact.
mtklein@mtklein ~/skia (hash-png)> md5sum bad/8888/3x3bitmaprect.png
deede70ab2f34067d461fb4a93332d4c bad/8888/3x3bitmaprect.png
mtklein@mtklein ~/skia (hash-png)> grep 3x3bitmaprect_8888 bad/dm.json
"3x3bitmaprect_8888" : "deede70ab2f34067d461fb4a93332d4c",
I have checked that no two premultiplied colors map to the same unpremultiplied
color (math nerds: unpremultiplication is injective), so a change in
premultiplied SkBitmap will always imply a change in the encoded
unpremultiplied .png. This means, it's safe to hash .pngs; we won't miss
subtle changes.
BUG=skia:
R=jcgregorio@google.com, stephana@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/549203003
DM's striking off into its own JSON world. This gets strawman implementations
in place for writing and reading a JSON file mapping test name to hashes.
For what it's worth, I basically want to change _all_ these pieces,
- MD5 is slow and we can replace it with something faster,
- JSON schema needs room to grow more data,
- it'd be nice to hash once instead of twice when reading and writing,
- this code wants lots of refactoring,
but this gives us a starting platform to work on these bits at our leisure.
E.x. file for now:
mtklein@mtklein ~/skia (dm)> cat good/dm.json
{
"3x3bitmaprect_565" : "fc70d985fbfbe70e3a3c9dc626d4f5bc",
"3x3bitmaprect_8888" : "df1591dde35907399734ea19feb76663",
"3x3bitmaprect_gpu" : "df1591dde35907399734ea19feb76663",
"aaclip_565" : "1862798689b838a7ab0dc0652b9ace3a",
"aaclip_8888" : "47bb314329f0ce243f1d83fd583decb7",
"aaclip_gpu" : "75f72412d0ef4815770202d297246e7d",
...
BUG=skia:
R=jcgregorio@google.com, stephana@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/546873002
SkTaskGroup is like SkThreadPool except the threads stay in
one global pool. Each SkTaskGroup itself is tiny (4 bytes)
and its wait() method applies only to tasks add()ed to that
instance, not the whole thread pool.
This means we don't need to bring up new thread pools when
tests themselves want to use multithreading (e.g. pathops,
quilt). We just create a new SkTaskGroup and wait for that
to complete. This should be more efficient, and allow us
to expand where we use threads to really latency sensitive
places. E.g. we can probably now use these in nanobench
for CPU .skp rendering.
Now that all threads are sharing the same pool, I think we
can remove most of the custom mechanism pathops tests use
to control threading. They'll just ride on the global pool
with all other tests now.
This (temporarily?) removes the GPU multithreading feature
from DM, which we don't use.
On my desktop, DM runs a little faster (57s -> 55s) in
Debug, and a lot faster in Release (36s -> 24s). The bots
show speedups of similar proportions, cutting more than a
minute off the N4/Release and Win7/Debug runtimes.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/9c7207b5dc71dc5a96a2eb107d401133333d5b6fR=caryclark@google.com, bsalomon@google.com, bungeman@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/531653002
Reason for revert:
Leaks, leaks, leaks.
Original issue's description:
> SkThreadPool ~~> SkTaskGroup
>
> SkTaskGroup is like SkThreadPool except the threads stay in
> one global pool. Each SkTaskGroup itself is tiny (4 bytes)
> and its wait() method applies only to tasks add()ed to that
> instance, not the whole thread pool.
>
> This means we don't need to bring up new thread pools when
> tests themselves want to use multithreading (e.g. pathops,
> quilt). We just create a new SkTaskGroup and wait for that
> to complete. This should be more efficient, and allow us
> to expand where we use threads to really latency sensitive
> places. E.g. we can probably now use these in nanobench
> for CPU .skp rendering.
>
> Now that all threads are sharing the same pool, I think we
> can remove most of the custom mechanism pathops tests use
> to control threading. They'll just ride on the global pool
> with all other tests now.
>
> This (temporarily?) removes the GPU multithreading feature
> from DM, which we don't use.
>
> On my desktop, DM runs a little faster (57s -> 55s) in
> Debug, and a lot faster in Release (36s -> 24s). The bots
> show speedups of similar proportions, cutting more than a
> minute off the N4/Release and Win7/Debug runtimes.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/9c7207b5dc71dc5a96a2eb107d401133333d5b6fR=caryclark@google.com, bsalomon@google.com, bungeman@google.com, reed@google.com, mtklein@chromium.orgTBR=bsalomon@google.com, bungeman@google.com, caryclark@google.com, mtklein@chromium.org, reed@google.com
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Author: mtklein@google.com
Review URL: https://codereview.chromium.org/533393002
SkTaskGroup is like SkThreadPool except the threads stay in
one global pool. Each SkTaskGroup itself is tiny (4 bytes)
and its wait() method applies only to tasks add()ed to that
instance, not the whole thread pool.
This means we don't need to bring up new thread pools when
tests themselves want to use multithreading (e.g. pathops,
quilt). We just create a new SkTaskGroup and wait for that
to complete. This should be more efficient, and allow us
to expand where we use threads to really latency sensitive
places. E.g. we can probably now use these in nanobench
for CPU .skp rendering.
Now that all threads are sharing the same pool, I think we
can remove most of the custom mechanism pathops tests use
to control threading. They'll just ride on the global pool
with all other tests now.
This (temporarily?) removes the GPU multithreading feature
from DM, which we don't use.
On my desktop, DM runs a little faster (57s -> 55s) in
Debug, and a lot faster in Release (36s -> 24s). The bots
show speedups of similar proportions, cutting more than a
minute off the N4/Release and Win7/Debug runtimes.
BUG=skia:
R=caryclark@google.com, bsalomon@google.com, bungeman@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/531653002
For now this only creates a degenerate bounding box hierarchy where all ops
just have maximal bounds. I will flesh out FillBounds in future CL(s).
Not quite sure why QuadTree and TileGrid aren't drawing right---haven't even
looked at the diffs yet---so I've disabled those test modes for now. RTree
seems fine, so that'll at least get us coverage for all this new plumbing.
BUG=skia:
R=robertphillips@google.com, mtklein@google.com, reed@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/454123003
Allow GM results to be compared across machines and platforms by
standardizing the fonts used by all tests.
This adds runtime flags to DM to use either the system font context (the
default), the fonts in the resources directory ( --resourceFonts ) or a set
of canonical paths generated from the fonts ( --portableFonts ).
This CL should leave the current DM results unchanged by default.
If the portable font data or resource font is missing when DM is run, it
falls back to using the system font context.
The create_test_font tool generates the paths and metrics read by DM
with the --portableFonts flag set, and generates the font substitution
tables read by DM with the --resourceFonts flag set.
If DM is run in SkDebug mode with the --reportUsedChars flag set, it
generates the corresponding data compiled into the create_test_font tool.
All GM tests set their typeface information by calling either
sk_tool_utils::set_portable_typeface or
sk_tool_utils::portable_typeface .
(The former takes the paint, the latter returns a SkTypeface.) These calls
can be removed in the future when the Font Manager can be superceded.
BUG=skia:2687
R=mtklein@google.com
Review URL: https://codereview.chromium.org/407183003
Share command flags between dm and unit tests.
Also, allow dm's core to be included by itself and iOSShell.
Command line flags that are the same (or nearly the same) in DM
and in skia_tests have been moved to common_flags. Authors,
please check to see that the shared common flag is correct for
the tool.
For iOS, the 'tool_main' entry point has a wrapper to allow multiple
tools to be statically linked in the iOSShell.
Since SkCommandLineFlags::Parse can only be called once, these calls
are disabled in the IOS build.
Since the iOS app directory is dynamically assigned a name, use '@' to
select it. (This is the same convention chosen by the Mobile Harness
iOS file system utilities.)
Move the heart of dm.gyp into dm.gypi so that it can be included by
itself and iOSShell.gyp.
Add tools/flags/SkCommonFlags.* to define and declare common
command line flags.
Add support for dm to iOSShell.
BUG=skia:
R=scroggo@google.com, mtklein@google.com, jvanverth@google.com, bsalomon@google.com
Author: caryclark@google.com
Review URL: https://codereview.chromium.org/389653004
Replay isn't that helpful of a test any more now that we have the more
stringent Quilt tests. Quilt was missing bounding box hierarchies, though,
while Replay was sort of testing RTree (pointlessly, as it was drawing without
any clip). Now Quilt does everything, testing RTree, QuadTree, and TileGrid.
Quilt mode now falls back to drawing all at once (i.e. Replay) for GMs that
don't tile perfectly. Still a TODO to make this check more flexible than exact
pixel matches.
Two GMs fail when using a BBH:
- imageresizetiled
- resizeimagefilter
We think we're not adjusting the bounds of save layers by their paint.
This is probably a bug, but one to be fixed separately from adding new tests.
BUG=skia:
R=robertphillips@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/377373003
Now that we're drawing tiles threaded like implside painting, remove the checks
that those lock counts are balanced. They're just not right for anyone anymore.
SkBitmaps themselves are not threadsafe (even const ones), so shallow copy them
on playback of an SkRecord. (The underlying SkPixelRefs are threadsafe.)
Simplify quilt drawing by using SkBitmap::extractSubset. No need for locking.
Bump up to 256x256 tiles. 16x16 tiles just murders performance (way too much
contention). This has the nice side effect of letting us enable a bunch more
GMs for quilt mode; they drew wrong with small tiles but exactly right with large.
BUG=171776
R=reed@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/371023005
Always build the tools with JSON, but either build our own
or use the system's.
Rename skia_build_json_writer to skia_use_system_jsoncpp,
since we now always build with JSON.
Remove SK_BUILD_JSON_WRITER, which was only there so
we could build without JSON it in the framework.
BUG=skia:2448
R=djsollen@google.com, reed@google.com
Author: scroggo@google.com
Review URL: https://codereview.chromium.org/303913002
I'm soon going to have SkRecorder start calling getTotalMatrix(), which
would be broken in write-only mode. That change is big and nebulous,
but it's clear kWriteOnly needs to go, so we might as well kill it now.
My notes in bench_playback about kWriteOnly mode being important were
probably overly cautious. I now think this is a fair enough comparison
even re-recording into a read-write canvas.
BUG=skia:2378
R=fmalita@chromium.org, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/290653004
git-svn-id: http://skia.googlecode.com/svn/trunk@14963 2bbb7eff-a529-9590-31e7-b0007b416f81
Like yesterday's change to run CPU-parent child tasks serially in thread, this
reduces peak memory usage by improving the temporaly locality of the bitmaps we
create.
E.g. Let's say we start with tasks A B C and D
Queue: [ A B C D ]
Running A creates A' and A", which depend on a bitmap created by A.
Queue: [ B C D A' A" * ]
That bitmap now needs sit around in RAM while B C and D run pointlessly and can
only be destroyed at *. If instead we do this and push dependent child tasks
to the front of the queue, the queue and bitmap lifetime looks like this:
Queue: [ A' A" * B C D ]
This is much, much worse in practice because the queue is often several thousand
tasks long. 100s of megs of bitmaps can pile up for 10s of seconds pointlessly.
To make this work we add addNext() to SkThreadPool and its cousin DMTaskRunner.
I also took the opportunity to swap head and tail in the threadpool
implementation so it matches the comments and intuition better: we always pop
the head, add() puts it at the tail, addNext() at the head.
Before
Debug: 49s, 1403352k peak
Release: 16s, 2064008k peak
After
Debug: 49s, 1234788k peak
Release: 15s, 1903424k peak
BUG=skia:2478
R=bsalomon@google.com, borenet@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/263803003
git-svn-id: http://skia.googlecode.com/svn/trunk@14506 2bbb7eff-a529-9590-31e7-b0007b416f81
These tasks tend to do similar things with similar sized bitmaps, so running
them serially means we tend to hold 2x bitmaps at a time (golden and
comparison) instead of (1+k)x bitmaps (golden and k concurrent comparisons).
We still migrate GPU task's children over to the main CPU thread pool,
because they'll run faster there and free up capacity on the GPU thread.
Before
Debug: 54s, 2.9G peak
Release: 13s, 2.4G peak
After
Debug: 48s, 1.5G peak
Release: 15s, 2.0G peak
BUG=skia:2478
R=borenet@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/261593008
git-svn-id: http://skia.googlecode.com/svn/trunk@14486 2bbb7eff-a529-9590-31e7-b0007b416f81
Before this change, when limited to 4G, pathops threaded tests were the weak
link RAM-consumption-wise (in thread-local font caches) up until about 12
cores, where other problems begin to pile up too.
Tested by running:
ulimit -Sv 4194304
out/Debug/dm --threads N [--pathOpsSingleThread]
After this, we're _probably_ good to go on 32-bit machines with 8 cores.
BUG=skia:2478
R=borenet@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/265593003
git-svn-id: http://skia.googlecode.com/svn/trunk@14463 2bbb7eff-a529-9590-31e7-b0007b416f81
- Rename TileGrid -> Quilt to avoid the name overload.
- Tag all failing GMs with kSkipTiled_Flag.
You may be wondering, do any GMs pass? Yes, some do! And that trends towards all of them as we increase --quiltTile.
Two GMs only fail in --quilt mode in 565. Otherwise all GMs which fail are skipped, and those which don't fail aren't. (The 8888 variants of those two GMs are skipped even though they pass.)
BUG=skia:2477
R=reed@google.com, mtklein@google.com
Author: mtklein@chromium.org
Review URL: https://codereview.chromium.org/256373002
git-svn-id: http://skia.googlecode.com/svn/trunk@14457 2bbb7eff-a529-9590-31e7-b0007b416f81
This CL sets the stage for retracting the SkPicture::kOptimizeForClippedPlayback_RecordingFlag flag
from the public API (more work needs to be done in Blink & Chrome). In the new world the only way
to set this flag (and thus instantiate an SkPicture-derived
class) is by passing a factory to the SkPictureRecorder class. This is to get all clients always using
factories so that we can then change the factory call used (i.e., so the factory just creates a BBH) and
do away with the SkPicture-derived classes.
BUG=skia:2315
R=reed@google.com
Author: robertphillips@google.com
Review URL: https://codereview.chromium.org/239703006
git-svn-id: http://skia.googlecode.com/svn/trunk@14221 2bbb7eff-a529-9590-31e7-b0007b416f81