Commit Graph

36 Commits

Author SHA1 Message Date
Kevin Lubick
66125eac15 [canvaskit] Specify gold url and bucket for uploading
A previous change to goldctl removed the special-casing for
Skia, so we need to specify it ourselves.

Change-Id: If4d122daa4ee4bb865b628b7c6ee1cbe5d44d670
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/409396
Reviewed-by: Ravi Mistry <rmistry@google.com>
2021-05-18 12:49:08 +00:00
Mike Klein
158cab563f track flags in a map
I got to thinking that seeing flags like,

    out/fm --nonativeFonts -b cpu --nativeFonts -s ...

is kind of confusing, and I'm also trying to figure out
how to identify these runs to Gold.  I think the answer
to both might be to track a map[string]string for flags,
allowing overrides rather than just appending, and then
that flag map ends up being the identifying properties.

Change-Id: Ie5f80ee8b145c205edc768ae871eb70a3e1bc5b7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/378355
Reviewed-by: Eric Boren <borenet@google.com>
2021-03-03 14:22:16 +00:00
Mike Klein
bc4a36af7c add --race to FM
Try to uncover races by running parallel replicas.

The default --race 0 should keep FM working as before, but now with
--race ≥2 we'll actively try to race replicas, syncing between tests.
--race 1 is almost pointless, just changing the thread tests run on but
without any interesting concurrency.

Rearrange a bit how fm_driver decides what flags to pass to which
invocations of FM, so individual runs can easily override defaults (e.g.
--nativeFonts overriding the usual --nonativeFonts).  Use that here to
set --race 0 for unit tests; many unit tests are not reentrant.

Change-Id: Ida451626c093793b0805d3036beb185e7d54f27e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/376761
Reviewed-by: Brian Osman <brianosman@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-03-02 20:43:05 +00:00
Mike Klein
6cac02f5c7 add missing failStep()
If a re-run fails today, that leaf exec.Run() step will fail, and its
grandparent (representing a giant bunch of sources and a set of flags)
fails, and the whole great-grandparent task fails, but the exec.Run()'s
parent (a small batch of those sources) stays green.

Here's a before-this-CL example:
https://task-driver.skia.org/td/y2nmMwJAyuK2kmCNGtrk?ifNotFound=https%3A%2F%2Fchromium-swarm.appspot.com%2Ftask%3Fid%3D520ccdac1ff16010

Here's one where the failures propagate up right, I think:
https://task-driver.skia.org/td/Gb2Pta6M2bZAJpQYvRXO?ifNotFound=https%3A%2F%2Fchromium-swarm.appspot.com%2Ftask%3Fid%3D520cfbaeb2f06010

I don't suspect anything's wrong beyond bookkeeping.

Change-Id: Ib138f62f6663bbba1804ccafb28756f8c4c4d3ea
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/378401
Reviewed-by: Eric Boren <borenet@google.com>
2021-03-02 19:17:28 +00:00
Mike Klein
a9e62e893b add flags for clipping to FM
tabl_mozilla.skp is uselessly large without a clip.

Change-Id: I6e8ab8c31e790b6629be01e6eeb2e8d60c6ff56f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/378360
Reviewed-by: Brian Osman <brianosman@google.com>
2021-03-02 19:06:24 +00:00
Mike Klein
ae8ba01835 further refine reruns
On batch failure we're rerunning every source in the batch, while we
really only need to rerun sources that we don't know succeeded.

If for example we run sources "foo", "bar", and "baz", and foo produces
a known hash, then bar crashes, we only need to rerun bar and baz.  The
batch run was enough to demonstrate foo's good.

Change-Id: I17634a6095906bcc2ad0bd33bb78eba000654b5e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/369456
Reviewed-by: Eric Boren <borenet@google.com>
2021-02-11 18:29:45 +00:00
Mike Klein
510e45c223 minor fm_driver tweaks
Move definition of Work struct until just before it's used,
and show one of the sources as an example at kickoff-level step.

These are just cosmetic/refactors.

Change-Id: Ib23b9379683b9867e097c8d68ef8736013719cee
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/369356
Reviewed-by: Eric Boren <borenet@google.com>
2021-02-11 17:33:06 +00:00
Mike Klein
0ebdb37b55 plumb errors up to mid-level steps
As is, on failure the top-level task fails correctly,
then the next level steps all confusingly look green,
then the next two levels down are correctly failed and red.

I think this is because worker(ctx) fails `ctx`, which we make like
this,

    ctx := startStep(w.Ctx, td.Props(strings.Join(w.Sources, " ")))

but nothing ever fails that `w.Ctx`.  This should fix that.

Compare trybot runs here with https://task-driver.skia.org/td/BMCjZbc5ki1cXbkM0oZp?ifNotFound=https%3A%2F%2Fchromium-swarm.appspot.com%2Ftask%3Fid%3D51aa8032a90a8810

Change-Id: Idfbd933b9027cac423a3a2cc5b0513c894d60e63
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/369265
Reviewed-by: Eric Boren <borenet@google.com>
2021-02-11 17:24:21 +00:00
Mike Klein
420c8a505e end kickoff() step at the right time
If we track how many pending batches a kickoff()
has in flight, we can endStep() it properly when
that number hits zero.

This double sync.WaitGroup trick is pretty neat.
Now we're thinking with portals...

Added some comments to prevent myself falling in
the trap of assuming we'll have runtime.NumCPU()
batches... rounding the batch size up means we'll
sometimes have fewer.

Change-Id: If50615c204485862462c240b9bbdfd4ddbad43b2
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366142
Reviewed-by: Eric Boren <borenet@google.com>
2021-02-04 17:19:43 +00:00
Mike Klein
1dea436a38 make fetching Gold hashes a step
It's nice to see it in the task log, and to be able to see
it's not there when we're not working with Gold (*SAN) bots.
(One trybot of each kind here.)

Change-Id: Ibb4aa20badf95ef603f3890e1c8248cad675507f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366143
Reviewed-by: Eric Boren <borenet@google.com>
2021-02-04 16:19:17 +00:00
Mike Klein
f38f20ecf7 another layer of step nesting
Group batches from a single kickoff() into another mid-level step:

Top-Level
    kickoff --some flags
        batch sources...
            batch (exec)
        batch other sources ...
            batch (exec)
            rerun (exec)
            rerun (exec)
        batch yet other sources ...
            batch (exec)
            rerun (exec)
    kickoff --some other flags
        ...

Big question: is it okay for the kickoff steps to td.EndStep() while its
kids are still running (or haven't even started) on other goroutines?

Change-Id: I77ad2274e35cea0151be0cca6c690eafc4f8983e
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366140
Reviewed-by: Eric Boren <borenet@google.com>
2021-02-04 14:47:56 +00:00
Mike Klein
c481dd69bb split --gold off from --bot
There are bots (*SAN) that won't ever be uploading to Gold,
so *bot != "" doesn't really describe the right condition.

We could do this logic inside fm_driver.go based on *bot,
but I kind of want the flexibility to do things like upload
local ad-hoc runs or sanitizer runs if we want using --gold.

Change-Id: Id972d8b0097616c5b2802bc99c2718fdd1568fe3
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366139
Reviewed-by: Mike Klein <mtklein@google.com>
2021-02-04 12:31:45 +00:00
Mike Klein
3c8444e18e NativeFonts, the fm_driver way
Why have other bots when we can do it all here?

Change-Id: I6a3f3c2ed5d19a3b8ecf59f44cc0d2f6076bba7f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366138
Reviewed-by: Mike Klein <mtklein@google.com>
2021-02-04 11:02:26 +00:00
Mike Klein
6738e2b9c7 rearrange wg use
There's no need to tick wg up and down when running reruns, and as
written it's possible for the overall fm_driver program to exit before
one last call to endStep() has happened.  Simply calling wg.Done() once
per item dequeued outside worker() fixes both.

Change-Id: I0fb0acc5a3f2c624dfc14f875fa094db6dd40838
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366137
Reviewed-by: Mike Klein <mtklein@google.com>
2021-02-04 10:45:52 +00:00
Mike Klein
fa962f583a rerun only what makes unknown hashes
If a batch fails, we've got to rerun everything (or at least from the
failure on), but when it's merely unknown hashes, no need to rerun
what's produced hashes we know already.

Small tweak to FM to keep all the printed source names exactly what's
passed in, keeping the whole path for skp/svg/image files.  This means
zero bookkeeping needed to know what to rerun when parsing that output.

Change-Id: I1e7ed3ee51158b68a6bdd3152560f3a282109576
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/365818
Reviewed-by: Mike Klein <mtklein@google.com>
2021-02-04 10:22:52 +00:00
Mike Klein
574a453b15 introduce steps
Now with ctx scoping fixed,
and steps nested just how I like them.

Change-Id: Ifa43a432faddbafaae118ab0b16f710b695b5377
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/365504
Reviewed-by: Eric Boren <borenet@google.com>
2021-02-04 10:22:10 +00:00
Mike Klein
9e189aab1e don't shuffle sources destructively
Change-Id: Ib9bcb811068bfaff518c97db59cfd13e237d7f74
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/366136
Reviewed-by: Mike Klein <mtklein@google.com>
2021-02-04 10:16:57 +00:00
Mike Klein
840472529d wire up images/skps/svgs
This also marks the glorious return of td.FailStep() as the answer to my
question "now how do I find my failures in this giant list?"

Change-Id: I15f98862d77942f2e289dc626da8643789a91d48
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364838
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
2021-02-03 17:18:19 +00:00
Mike Klein
805eee00d9 don't td.FailStep() quite yet
Calling td.FailStep() as written here doesn't really do anything except
hide the more useful summary error, e.g. "484 runs of build/fm failed
after retries."  Maybe it'll become useful again if I add step nesting?

Change-Id: I23eb59afce8559f4b0e549f31873577939fc7ca7
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364497
Reviewed-by: Mike Klein <mtklein@google.com>
2021-02-02 19:31:56 +00:00
Mike Klein
129bc16b56 fixup, fail on bots
td.FailStep() isn't enough to fail the bot,
so go back to a call to td.Fatal() when failures>0.

Change-Id: Ib2be7b15200376ab8a16e4a1b69d98fde0630673
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364471
Reviewed-by: Mike Klein <mtklein@google.com>
2021-02-02 16:44:33 +00:00
Mike Klein
b97a9de755 silo away td when -local
Even with all the workarounds (deleted here), calling td methods still
costs a fair chunk of CPU work.  Instead of sneakily working around it,
just never call it when run locally.

Change-Id: I2e421a5d585c86a6315d56867a29bdcdc9d45479
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364461
Reviewed-by: Mike Klein <mtklein@google.com>
2021-02-02 15:07:22 +00:00
Mike Klein
a6c692b884 easier to read with all flags first
Cq-Include-Trybots: luci.skia.skia.primary:FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All,FM-Win2019-Clang-GCE-CPU-AVX2-x86_64-Debug-All
Change-Id: I319f2b80aec95f51ff9fe3db341bb7bf0d82d971
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364015
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-02-01 22:36:52 +00:00
Mike Klein
876c25c9a3 keep reruns on the same goroutine
No need for this extra parallelism, and it's extra contention.

Cq-Include-Trybots: luci.skia.skia.primary:FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All,FM-Win2019-Clang-GCE-CPU-AVX2-x86_64-Debug-All
Change-Id: I5c0d52def5043555f313e99713335aa66b269e22
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364014
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-02-01 22:31:01 +00:00
Mike Klein
e1926e8942 test more with FM
I've pulled most of this from the BonusConfigs smorgasbord,
skipping a few redundant ones (do we really need all combos
of {8888,f16}x{srgb,narrow,p3,rec2020}?).

Cq-Include-Trybots: luci.skia.skia.primary:FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All,FM-Win2019-Clang-GCE-CPU-AVX2-x86_64-Debug-All
Change-Id: I56f684eb593f4e54d74f592e08508662bd7daa35
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/363998
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-02-01 22:29:50 +00:00
Mike Klein
65bf8200d2 restore good bits of fm_driver simplifications
Kicking off goroutines willy-nilly wasn't a good idea,
but some of the other work was nice and can be kept even
with the safer pool-of-goroutines strategy.

  - use exec.Silent to skip some burned formatting work
    if we're just going to send it all to /dev/null

  - rearrange to not need both a todo list and a queue
    of work makes sense... just get the workers going and
    have kickoff() feed the queue directly

  - straighten out worker logic flow to make it understandable

Cq-Include-Trybots: luci.skia.skia.primary:FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All,FM-Win2019-Clang-GCE-CPU-AVX2-x86_64-Debug-All
Change-Id: I4b27db4b9d41cf05a1c9dee9409ebd664f566567
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364011
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-02-01 22:27:36 +00:00
Mike Klein
779a7b83b5 Revert fm_driver simplifications
This reverts commits 8ef3c539a2
and 4b09de3c90.

It turns out controlling the scheduling is a good idea;
I keep running into exec failures and process limits.

Cq-Include-Trybots: luci.skia.skia.primary:FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All,FM-Win2019-Clang-GCE-CPU-AVX2-x86_64-Debug-All
Change-Id: Ia72f446965e5093fbf996e78d9513c15dedae3d9
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/364006
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-02-01 22:04:32 +00:00
Mike Klein
8ef3c539a2 simplify further
Instead of making a list of work to do and then
later kicking off goroutines to do it, just start
the work as soon as it's ready to go.

Change-Id: I6bd8a031958ae440ba7f72609d9dfb867ebb2490
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/363436
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-31 20:00:51 +00:00
Mike Klein
4b09de3c90 simplify fm_driver
Instead of standing goroutines pulling from a work queue, just kick off
individual goroutines.  No need to write a scheduler on top of another
scheduler.

Local runs put this at the same wall time as before while saving a
little user/sys CPU time.  Bot runs typically took 25-50s before this
change, and now 40s, 28s, and 29s, so the same there too.

We can choose whether to handle re-runs on the same goroutine or kick
off new ones.  I've chosen here to run them on the same goroutine (see
the commented /*go*/), mostly because the bots quickly exhaust their
user process limits when the reruns are all spawning FM processes in
parallel.  I think that means we don't need the extra parallelism.  As
far as I have seen, whether we kick off a goroutine or not has had no
impact on wall/user/sys at all, so might as well not for now.

Change-Id: If2990e07a402dee8c5706f537f503421013a5586
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/363376
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-31 19:09:31 +00:00
Mike Klein
66436eaad1 fetch known hashes
When running as a bot (even locally if you want), grab the known hashes
from Gold and scan through FM's stdout looking for unknown hashes.

If we do find unknown hashes, requeue the batch for individual reruns
like we do on failure, print the command and new hash if those singleton
reruns also (re)produce an unknown hash.

Eventually, I'll have singleton runs write out .pngs and upload them to
Gold when the hash is unknown.

Change-Id: I835881e6e6260e4dbe84de8d03d16921881aae1c
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/363039
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-30 21:50:19 +00:00
Mike Klein
9040e4d720 refactor to build a list of Work directly
Building job strings and then parsing them into Work structs
seems a bit roundabout when there's no job string to start with.
Instead reorient so that we're building a list of Work, and create
those Work units directly when possible instead of via job strings.

Should be no practical change here.

Change-Id: I48f1eec8ab7ccbe2c46fc62174cd3625c51d3732
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/363038
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-30 17:53:31 +00:00
Mike Klein
78d1adcb22 let fm_driver.go decide what to do for each bot
We can now mimic bots locally by running fm_driver.go like this,

    go run **/fm_driver.go -bot $BOT out/fm

where $BOT is like FM-Debian10-Clang-GCE-CPU-AVX2-x86_64-Debug-All.

As a demo, skip aarectmodes and GoodHash on Debian10.

Change-Id: Iec215182dce9f05b8aa6807e837daa0618e2669f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/362316
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-29 20:30:49 +00:00
Mike Klein
8716cfdc6a don't hardcode jobs in fm_driver
This lets us pass a job on the command line,

    go run infra/bots/task_drivers/fm_driver/fm_driver.go out/fm tests b=cpu

or use -script to pass a file or stdin,

    cat << EOF | go run infra/bots/task_drivers/fm_driver/fm_driver.go -script - out/fm
    b=cpu tests
    gms skvm=false b=cpu w=$out/vanilla
    gms skvm=true  b=cpu w=$out/skvm
    #gms skvm=true  b=cpu w=$out/dp3      gamut=p3      tf=srgb
    #gms skvm=true  b=cpu w=$out/linear   gamut=srgb    tf=linear
    #gms skvm=true  b=cpu w=$out/rec2020  gamut=rec2020 tf=rec2020
    EOF

(This CL will make the one FM bot temporarily do nothing,
but the next CL should fix it.)

Change-Id: I1f3badac78a0f61698179c1afec37b3020539fff
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/362216
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-29 19:47:18 +00:00
Mike Klein
75bd058766 refamiliarize fm_driver.go
Update comments and small tweaks as I remember how this works.

Change-Id: I4a279781e512fc707b96226e62a2831a1d0683e5
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/362196
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2021-01-29 18:39:28 +00:00
Mike Klein
33951266c5 improve QOL of local fm_driver runs
The default Task Driver logging gets in the way on the console, so I've
sent it to /dev/null for local runs.  We control the horizontal and the
vertical.

Instead, print out each isolated failure and how to reproduce it:

    out/fm -i resources -b cpu -s ducky_yuv_blend #failed:
    	Resource "resources/images/ducky.jpg" not found.
    	../tools/fm/fm.cpp:573: fatal error: "Image(s) failed to load."

    	Signal 5:
    	_sigtramp (+0x1d)
    	sk_abort_no_print() (+0x5)
    	std::__1::__function::__func<...
    	_GLOBAL__sub_I_fm.cpp (+0x0)
    	main (+0x12d5)

    out/fm -i resources -b cpu --skvm true -s ducky_yuv_blend #failed:
    	Resource "resources/images/ducky.jpg" not found.
    	../tools/fm/fm.cpp:573: fatal error: "Image(s) failed to load."

    	Signal 5:
    	_sigtramp (+0x1d)
    	sk_abort_no_print() (+0x5)
    	std::__1::__function::__func<...
    	_GLOBAL__sub_I_fm.cpp (+0x0)
    	main (+0x12d5)

    2 runs of out/fm failed after retries.
    exit status 1

Bot runs still look ok,
https://task-driver.skia.org/td/TEaSLB6jtmRq5XDBUIwS?ifNotFound=https%3A%2F%2Fchromium-swarm.appspot.com%2Ftask%3Fid%3D4bea6af25506c010

Change-Id: I56adacdaeed5545785a3096a4e495eb524db442f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/287017
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-05-01 21:54:47 +00:00
Mike Klein
5cf3c9c1db another round of work on fm_driver
- Simplify command line to match fm_bot better.
 - Support all sources, not just GMS.
 - Isolate failures with retries, propagating only
   isolated failures to the root.

Example failures:
    https://task-driver.skia.org/td/XzxO7p58FfDJGOvHxIhJ?ifNotFound=https%3A%2F%2Fchromium-swarm.appspot.com%2Ftask%3Fid%3D4be5e88a3f7a3110

Change-Id: Ic1c5f05042cd86487cc8d0c992a6101b0e50dd33
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/286644
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-04-30 23:05:26 +00:00
Mike Klein
298bda131c add fm driver
PS 6 trying StartStep/EndStep/FailStep()
PS 7 better usage?
PS 8 goes back to td.Fatal* for top-level failures

Failures seem to be working ok as of PS 8,
but I am puzzled why PS 7 wasn't correct... much prefer it.

Also set max_attempts to 1... this driver will handle flakiness itself.

Change-Id: I7de6809920bfaf1d878d654c9cf5b7861a64d23f
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/286118
Reviewed-by: Mike Klein <mtklein@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>
2020-04-30 14:02:00 +00:00