If we track how many pending batches a kickoff()
has in flight, we can endStep() it properly when
that number hits zero.
This double sync.WaitGroup trick is pretty neat.
Now we're thinking with portals...
Added some comments to prevent myself falling in
the trap of assuming we'll have runtime.NumCPU()
batches... rounding the batch size up means we'll
sometimes have fewer.
Change-Id: If50615c204485862462c240b9bbdfd4ddbad43b2
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366142
Reviewed-by: Eric Boren <borenet@google.com>
It's nice to see it in the task log, and to be able to see
it's not there when we're not working with Gold (*SAN) bots.
(One trybot of each kind here.)
Change-Id: Ibb4aa20badf95ef603f3890e1c8248cad675507f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366143
Reviewed-by: Eric Boren <borenet@google.com>
Group batches from a single kickoff() into another mid-level step:
Top-Level
kickoff --some flags
batch sources...
batch (exec)
batch other sources ...
batch (exec)
rerun (exec)
rerun (exec)
batch yet other sources ...
batch (exec)
rerun (exec)
kickoff --some other flags
...
Big question: is it okay for the kickoff steps to td.EndStep() while its
kids are still running (or haven't even started) on other goroutines?
Change-Id: I77ad2274e35cea0151be0cca6c690eafc4f8983e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366140
Reviewed-by: Eric Boren <borenet@google.com>
There are bots (*SAN) that won't ever be uploading to Gold,
so *bot != "" doesn't really describe the right condition.
We could do this logic inside fm_driver.go based on *bot,
but I kind of want the flexibility to do things like upload
local ad-hoc runs or sanitizer runs if we want using --gold.
Change-Id: Id972d8b0097616c5b2802bc99c2718fdd1568fe3
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366139
Reviewed-by: Mike Klein <mtklein@google.com>
Why have other bots when we can do it all here?
Change-Id: I6a3f3c2ed5d19a3b8ecf59f44cc0d2f6076bba7f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366138
Reviewed-by: Mike Klein <mtklein@google.com>
There's no need to tick wg up and down when running reruns, and as
written it's possible for the overall fm_driver program to exit before
one last call to endStep() has happened. Simply calling wg.Done() once
per item dequeued outside worker() fixes both.
Change-Id: I0fb0acc5a3f2c624dfc14f875fa094db6dd40838
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366137
Reviewed-by: Mike Klein <mtklein@google.com>
If a batch fails, we've got to rerun everything (or at least from the
failure on), but when it's merely unknown hashes, no need to rerun
what's produced hashes we know already.
Small tweak to FM to keep all the printed source names exactly what's
passed in, keeping the whole path for skp/svg/image files. This means
zero bookkeeping needed to know what to rerun when parsing that output.
Change-Id: I1e7ed3ee51158b68a6bdd3152560f3a282109576
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/365818
Reviewed-by: Mike Klein <mtklein@google.com>
Now with ctx scoping fixed,
and steps nested just how I like them.
Change-Id: Ifa43a432faddbafaae118ab0b16f710b695b5377
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/365504
Reviewed-by: Eric Boren <borenet@google.com>
This also marks the glorious return of td.FailStep() as the answer to my
question "now how do I find my failures in this giant list?"
Change-Id: I15f98862d77942f2e289dc626da8643789a91d48
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364838
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
Calling td.FailStep() as written here doesn't really do anything except
hide the more useful summary error, e.g. "484 runs of build/fm failed
after retries." Maybe it'll become useful again if I add step nesting?
Change-Id: I23eb59afce8559f4b0e549f31873577939fc7ca7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364497
Reviewed-by: Mike Klein <mtklein@google.com>
td.FailStep() isn't enough to fail the bot,
so go back to a call to td.Fatal() when failures>0.
Change-Id: Ib2be7b15200376ab8a16e4a1b69d98fde0630673
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364471
Reviewed-by: Mike Klein <mtklein@google.com>
Even with all the workarounds (deleted here), calling td methods still
costs a fair chunk of CPU work. Instead of sneakily working around it,
just never call it when run locally.
Change-Id: I2e421a5d585c86a6315d56867a29bdcdc9d45479
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364461
Reviewed-by: Mike Klein <mtklein@google.com>
Cq-Include-Trybots: luci.skia.skia.primary:FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All,FM-Win2019-Clang-GCE-CPU-AVX2-x86_64-Debug-All
Change-Id: I319f2b80aec95f51ff9fe3db341bb7bf0d82d971
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364015
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
No need for this extra parallelism, and it's extra contention.
Cq-Include-Trybots: luci.skia.skia.primary:FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All,FM-Win2019-Clang-GCE-CPU-AVX2-x86_64-Debug-All
Change-Id: I5c0d52def5043555f313e99713335aa66b269e22
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364014
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
I've pulled most of this from the BonusConfigs smorgasbord,
skipping a few redundant ones (do we really need all combos
of {8888,f16}x{srgb,narrow,p3,rec2020}?).
Cq-Include-Trybots: luci.skia.skia.primary:FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All,FM-Win2019-Clang-GCE-CPU-AVX2-x86_64-Debug-All
Change-Id: I56f684eb593f4e54d74f592e08508662bd7daa35
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/363998
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Kicking off goroutines willy-nilly wasn't a good idea,
but some of the other work was nice and can be kept even
with the safer pool-of-goroutines strategy.
- use exec.Silent to skip some burned formatting work
if we're just going to send it all to /dev/null
- rearrange to not need both a todo list and a queue
of work makes sense... just get the workers going and
have kickoff() feed the queue directly
- straighten out worker logic flow to make it understandable
Cq-Include-Trybots: luci.skia.skia.primary:FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All,FM-Win2019-Clang-GCE-CPU-AVX2-x86_64-Debug-All
Change-Id: I4b27db4b9d41cf05a1c9dee9409ebd664f566567
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364011
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
This reverts commits 8ef3c539a2
and 4b09de3c90.
It turns out controlling the scheduling is a good idea;
I keep running into exec failures and process limits.
Cq-Include-Trybots: luci.skia.skia.primary:FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All,FM-Win2019-Clang-GCE-CPU-AVX2-x86_64-Debug-All
Change-Id: Ia72f446965e5093fbf996e78d9513c15dedae3d9
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364006
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Instead of making a list of work to do and then
later kicking off goroutines to do it, just start
the work as soon as it's ready to go.
Change-Id: I6bd8a031958ae440ba7f72609d9dfb867ebb2490
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/363436
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Instead of standing goroutines pulling from a work queue, just kick off
individual goroutines. No need to write a scheduler on top of another
scheduler.
Local runs put this at the same wall time as before while saving a
little user/sys CPU time. Bot runs typically took 25-50s before this
change, and now 40s, 28s, and 29s, so the same there too.
We can choose whether to handle re-runs on the same goroutine or kick
off new ones. I've chosen here to run them on the same goroutine (see
the commented /*go*/), mostly because the bots quickly exhaust their
user process limits when the reruns are all spawning FM processes in
parallel. I think that means we don't need the extra parallelism. As
far as I have seen, whether we kick off a goroutine or not has had no
impact on wall/user/sys at all, so might as well not for now.
Change-Id: If2990e07a402dee8c5706f537f503421013a5586
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/363376
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
When running as a bot (even locally if you want), grab the known hashes
from Gold and scan through FM's stdout looking for unknown hashes.
If we do find unknown hashes, requeue the batch for individual reruns
like we do on failure, print the command and new hash if those singleton
reruns also (re)produce an unknown hash.
Eventually, I'll have singleton runs write out .pngs and upload them to
Gold when the hash is unknown.
Change-Id: I835881e6e6260e4dbe84de8d03d16921881aae1c
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/363039
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Building job strings and then parsing them into Work structs
seems a bit roundabout when there's no job string to start with.
Instead reorient so that we're building a list of Work, and create
those Work units directly when possible instead of via job strings.
Should be no practical change here.
Change-Id: I48f1eec8ab7ccbe2c46fc62174cd3625c51d3732
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/363038
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
We can now mimic bots locally by running fm_driver.go like this,
go run **/fm_driver.go -bot $BOT out/fm
where $BOT is like FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All.
As a demo, skip aarectmodes and GoodHash on Debian10.
Change-Id: Iec215182dce9f05b8aa6807e837daa0618e2669f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/362316
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
This lets us pass a job on the command line,
go run infra/bots/task_drivers/fm_driver/fm_driver.go out/fm tests b=cpu
or use -script to pass a file or stdin,
cat << EOF | go run infra/bots/task_drivers/fm_driver/fm_driver.go -script - out/fm
b=cpu tests
gms skvm=false b=cpu w=$out/vanilla
gms skvm=true b=cpu w=$out/skvm
#gms skvm=true b=cpu w=$out/dp3 gamut=p3 tf=srgb
#gms skvm=true b=cpu w=$out/linear gamut=srgb tf=linear
#gms skvm=true b=cpu w=$out/rec2020 gamut=rec2020 tf=rec2020
EOF
(This CL will make the one FM bot temporarily do nothing,
but the next CL should fix it.)
Change-Id: I1f3badac78a0f61698179c1afec37b3020539fff
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/362216
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
Update comments and small tweaks as I remember how this works.
Change-Id: I4a279781e512fc707b96226e62a2831a1d0683e5
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/362196
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
The skia gn/ninja build step and the emscripten build step were using
a different set of defines. this violated assumptions of a couple of
tests
Change-Id: Id5364c0e1281b2e4024685fe8f106ee55c4961cb
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/338343
Commit-Queue: Nathaniel Nifong <nifong@google.com>
Reviewed-by: Kevin Lubick <kjlubick@google.com>
The build process was broken a few weeks ago and never fixed.
Thanks to metzman@ for the suggested fix!
Change-Id: Id3e0370896cd59b72b484accae107a2e0c9d36e1
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/343896
Reviewed-by: Kevin Lubick <kjlubick@google.com>
There are currently many tests skipped, but many more pass.
This changes the built binary to have a lot of debugging logic
in it so we should be able to get backtraces on those crashes
more easily when debugging.
gmtests.html was removed as it was superceded by run-wasm-gm-tests
and make run_local.
Bug: skia:10812, skia:10869
Change-Id: I72ab34d3db83a654dc8829831b3ecb795fe23d43
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/329170
Reviewed-by: Chris Dalton <csmartdalton@google.com>
Reviewed-by: Nathaniel Nifong <nifong@google.com>
Commit-Queue: Kevin Lubick <kjlubick@google.com>
To load in the resources, we have the Node JS script
find all files in the provided resources directory and serve
that as a JSON file (the HTML JS can't list files easily).
The HTML JS reads that file, then loads all those files as
ArrayBuffers.
After the testing WASM and the resources all load, we pre-load
them into the WASM memory, assigned with their name. This is
just a map of name -> SkData. The WASM code can't (easily)
make fetch calls, so rather than load these resources on demand
like we would in a real file system, we pre-load them all
and serve them from RAM. For simplicity (and consistency with
the known_hashes), this map is a global.
Finally, to connect the resources to the GMs, we overwrite the
gResourceFactory (defined in ResourceFactory.h) which is used
by tools/Resources.cpp to load any resource file (in theory).
One more change is to write some progress steps to window._log
so it can be read by puppeteer and dumped to disk to aid in
debugging.
Bug: skia:10812
Change-Id: Ie22c7f4b8d7cbbd18173b4e2ed755105c1b45249
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/328901
Reviewed-by: Leandro Lovisolo <lovisolo@google.com>
This loads in the known digests from Gold, starts the test
harness (which runs the GMs using puppeteer) and then uses
goldctl to upload the results to Gold when finished.
This will fail (and should not be landed) until
https://skia-review.googlesource.com/c/buildbot/+/328156
makes it into goldctl and the cipd build.
Bug: skia:10812
Change-Id: I89e5cf188d8f2adeba4ff676525d9bfbdcb46d5a
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/328380
Reviewed-by: Leandro Lovisolo <lovisolo@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
This picks up a new version of the CIPD deps and a new helper for
writing task drivers.
Bug: skia:10812
Change-Id: I8b9b57acd4d8eee9cdea86008da1f3039af0cdc9
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/328496
Reviewed-by: Eric Boren <borenet@google.com>
Commit-Queue: Kevin Lubick <kjlubick@google.com>
Next step is to add the following task:
Test-Ubuntu18-EMCC-Golo-GPU-QuadroP400-wasm-Release-All-WasmGMTests_WebGL2
Bug: skia:10812
Change-Id: Ibe45b7205cebd30f0e7904ea6d93a01ea3df87fe
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/324617
Commit-Queue: Kevin Lubick <kjlubick@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
This is too slow at the moment to run on the CQ (~50 minutes), but
metzman@ is planning on caching a bulk of the work needed before
we can compile the fuzzers.
Bug: skia:10713
Change-Id: I664b8afbdb9fa57a4bce3aa479ffce3c70b684ee
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/317283
Reviewed-by: Eric Boren <borenet@google.com>
Also add a skiplist to bypass a skp that is (hopefully temporarily)
timing out.
Change-Id: I912bd901a985ae86adfed13db102a1e7fee5686a
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/320060
Reviewed-by: Kevin Lubick <kjlubick@google.com>
Bug: skia:10563
Change-Id: I64f12f57de4482cb6676ad8dc9b96d150cc4b3de
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/312337
Commit-Queue: Ravi Mistry <rmistry@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
Bug: skia:10554
Change-Id: I27650520a5fbda0d391b597533dde14ec2bb32a7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/306600
Commit-Queue: Ravi Mistry <rmistry@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
Change-Id: I0bfd891b63b894aca29f7e566315a11a62768445
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/306517
Reviewed-by: Eric Boren <borenet@google.com>
Commit-Queue: Ravi Mistry <rmistry@google.com>
Bug: skia:10477
Change-Id: Ie3a68dc718ef17d7e185638757903ee480910639
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/304063
Reviewed-by: Eric Boren <borenet@google.com>
Commit-Queue: Ravi Mistry <rmistry@google.com>
Bug: skia:10477
Change-Id: I46a48185977a409225583aea58f5fd31cf306d4c
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/303266
Reviewed-by: Eric Boren <borenet@google.com>
Commit-Queue: Ravi Mistry <rmistry@google.com>
Bug: skia:10477
Change-Id: Ibf9bcb1d03a6003d00b124db8d826c7952842fef
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/300780
Commit-Queue: Ravi Mistry <rmistry@google.com>
Reviewed-by: Eric Boren <borenet@google.com>