Reason for revert:
xfermodes and xfermodes2 show major diffs on Nexus 5 and Daisy (both ARMv7 w/NEON). Nexus 9 and SSE all look fine...
Original issue's description:
> SoftLight with SkPMFloat
>
> SSE speeds up about 4.5x over existing integer SSE,
> NEON speeds up about 3x over serial integer code.
>
> We expect 1-2 bit component diffs in the usual GMs.
>
> Still guarded by SK_SUPPORT_LEGACY_XFERMODES,
> which I'll now try to lift in Chrome.
>
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/3e47d49b46b3ab62071218ef3dd44642c9713e04TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1221683002
SSE speeds up about 4.5x over existing integer SSE,
NEON speeds up about 3x over serial integer code.
We expect 1-2 bit component diffs in the usual GMs.
Still guarded by SK_SUPPORT_LEGACY_XFERMODES,
which I'll now try to lift in Chrome.
BUG=skia:
Review URL: https://codereview.chromium.org/1221493002
Changes verbose mode to print both the table and the individual sample
values. No need to hold back information in verbose mode.
BUG=skia:
Review URL: https://codereview.chromium.org/1208763003
These structs are always implemented as
struct uintNNxMx4_t {
uintNNxM val[4];
};
So, the first set of braces is for the struct, the second for val.
BUG=skia:
Review URL: https://codereview.chromium.org/1221453002
Both 25-35% faster with SSE.
With NEON, Burn measures as a ~10% regression, Dodge a huge 2.9x improvement.
The Burn regression is somewhat artificial: we're drawing random colored rects onto an opaque white dst, so we're heavily biased toward the (d==da) fast path in the serial code. In the vector code there's no short-circuiting and we always pay a fixed cost for ColorBurn regardless of src or dst content.
Dodge's fast paths, in contrast, only trigger when (s==sa) or (d==0), neither of which happens any more than randomly in our benchmark. I don't think (d==0) should happen at all. Similarly, the (s==0) Burn fast path is really only going to happen as often as SkRandom allows.
In practice, the existing Burn benchmark is hitting its fast path 100% of the time. So I actually feel really great that this only dings the benchmark by 10%.
Chrome's still guarded by SK_SUPPORT_LEGACY_XFERMODES, which I'll lift after finishing the last xfermode, SoftLight.
BUG=skia:
Review URL: https://codereview.chromium.org/1214443002
Adds a nanobench mode that takes samples for a fixed amount of time,
rather than taking a fixed amount of samples.
BUG=skia:
Review URL: https://codereview.chromium.org/1204153002
this also exposes nine-patch drawing directly to devices, and creates a shared iterator for unrolling a nine-patch into single rect->rect draws.
BUG=skia:
Review URL: https://codereview.chromium.org/1211583003
vcvt_n_f32_u32 and _u32_f32 work in power-of-2 fixed point, so (...,8)
meant 'please multiply or divide by 256'. We need to use 255. :(
BUG=skia:
Review URL: https://codereview.chromium.org/1204363002
Now that Sk4px exists, there's a lot less sense in eeking out every
cycle of speed from SkPMFloat: if we need to go _really_ fast, we
should use Sk4px. SkPMFloat's going to be used for things that are
already slow: large-range intermediates, divides, sqrts, etc.
A [0,1] range is easier to work with, and can even be faster if we
eliminate enough *255 and *1/255 steps. This is particularly true
on ARM, where NEON can do the *255 and /255 steps for us while
converting float<->int.
We have lots of experimental SkPMFloat <-> SkPMColor APIs that
I'm now removing. Of the existing APIs, roundClamp() is the sanest,
so I've kept only that, now called round(). The 4-at-a-time APIs
never panned out, so they're gone.
There will be small diffs on:
colormatrix coloremoji colorfilterimagefilter fadefilter imagefilters_xfermodes imagefilterscropexpand imagefiltersgraph tileimagefilter
BUG=skia:
Review URL: https://codereview.chromium.org/1201343004
I'm getting this error, which is fixed by this change:
[75/76] LINK skpdiff
FAILED: c++ -m64 -pie -Wl,-rpath=\$ORIGIN/lib/ -Wl,-rpath-link=lib/ -o skpdiff -Wl,--start-group obj/tools/skpdiff/skpdiff.skpdiff_main.o obj/tools/skpdiff/skpdiff.SkDiffContext.o obj/tools/skpdiff/skpdiff.SkImageDiffer.o obj/tools/skpdiff/skpdiff.SkPMetric.o obj/tools/skpdiff/skpdiff.skpdiff_util.o obj/tools/skpdiff/skpdiff.SkDifferentPixelsMetric_cpu.o obj/gyp/libflags.a obj/gyp/libpicture_utils.a -Wl,--end-group lib/libskia.so -lrt -lz
/usr/bin/ld: obj/tools/skpdiff/skpdiff.SkDiffContext.o: undefined reference to symbol 'pthread_mutexattr_settype@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
BUG=skia:
Review URL: https://codereview.chromium.org/1206333002
Fixed-function NVPR codepaths were removed a while ago. Only NVPR API
version 1.3 (PathFragmentInputGen) was left working. Remove
backwards-compatibility code that was left behind.
Remove some NVPR API function typedefs that were left from initial
commits.
Remove PathCoords function pointer from GrGLInterface, it has
never been called and causes problems in the future, since it will
not be implemented in the Chromium pseudo extension.
Review URL: https://codereview.chromium.org/1177243004
HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.
This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain. Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.
The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated. This allows
us to emulate conditionals with vectors.
This also removes the concept of SkNb. Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons. Turns out to be a lot more manageable this way.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddc
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot
Review URL: https://codereview.chromium.org/1196713004
Mangle external function names to avoid conflict with libjpeg
Take advantage of direct color conversion (RGBA, BGRA, 565)
Prepare to use jpeg_skip_scanlines (when it is upstreamed)
BUG=skia:
Review URL: https://codereview.chromium.org/1180983002
It's possible this rarely breaks, but I do write ARMv8 code frequently
enough that it'll help prevent broken trees.
BUG=skia:
Review URL: https://codereview.chromium.org/1209763003
Reason for revert:
64-bit ARM build failures.
Original issue's description:
> Implement four more xfermodes with Sk4px.
>
> HardLight, Overlay, Darken, and Lighten are all
> ~2x faster with SSE, ~25% faster with NEON.
>
> This covers all previously-implemented NEON xfermodes.
> 3 previous SSE xfermodes remain. Those need division
> and sqrt, so I'm planning on using SkPMFloat for them.
> It'll help the readability and NEON speed if I move that
> into [0,1] space first.
>
> The main new concept here is c.thenElse(t,e), which behaves like
> (c ? t : e) except, of course, both t and e are evaluated. This allows
> us to emulate conditionals with vectors.
>
> This also removes the concept of SkNb. Instead of a standalone bool
> vector, each SkNi or SkNf will just return their own types for
> comparisons. Turns out to be a lot more manageable this way.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/b9d4163bebab0f5639f9c5928bb5fc15f472dddcTBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1205703008
HardLight, Overlay, Darken, and Lighten are all
~2x faster with SSE, ~25% faster with NEON.
This covers all previously-implemented NEON xfermodes.
3 previous SSE xfermodes remain. Those need division
and sqrt, so I'm planning on using SkPMFloat for them.
It'll help the readability and NEON speed if I move that
into [0,1] space first.
The main new concept here is c.thenElse(t,e), which behaves like
(c ? t : e) except, of course, both t and e are evaluated. This allows
us to emulate conditionals with vectors.
This also removes the concept of SkNb. Instead of a standalone bool
vector, each SkNi or SkNf will just return their own types for
comparisons. Turns out to be a lot more manageable this way.
BUG=skia:
Review URL: https://codereview.chromium.org/1196713004
Modern processors support flush-to-zero and denormalized-are-zero
for floating point operations (with ARM NEON unable to disable them).
However, iOS on ARM is the only current system which defaults
processes to using both all the time. However, this is only iOS on
ARM, iOS on x86 (the simulator) does not. Correctly defining this
allows the math tests to run error free in the simulator.
Review URL: https://codereview.chromium.org/1203533003