Tidy up a little while I'm in here:
1) SIMD headers are now included by SkTypes.h as appropriate.
2) _mm_cvtss_f32() is pithier and generates the same code.
Looks like this is the only code checking for SSE wrong. After this CL:
~/skia (sse) $ git grep __SSE
include/core/SkPreConfig.h: #if defined(__SSE4_2__)
include/core/SkPreConfig.h: #elif defined(__SSE4_1__)
include/core/SkPreConfig.h: #elif defined(__SSE3__)
include/core/SkPreConfig.h: #elif defined(__SSE2__)
every other check is in SkPreConfig.h where it belongs.
This is going to affect some GMs subtly on Windows.
BUG=chromium:511458
No public API changes.
TBR=reed@google.com
Review URL: https://codereview.chromium.org/1248503004
This is split off of https://codereview.chromium.org/1225923010/ (Start tightening correspondence betweeen GrDrawContext and GrRenderTarget). It:
fixes some style nits
replaces some passing of GrContext with GrTextureProvider & GrDrawContext
does a bit of the finer grained creation of GrDrawContexts
Review URL: https://codereview.chromium.org/1245183002
This uses the most basic approach possible:
- to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
- to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.
Clearly, we can optimize these loads and stores. That's a TODO.
The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.
The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.
This will cause minor GM diffs, but I don't think any layout test changes.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0
Committed: https://skia.googlesource.com/skia/+/860dcaa2ddfdadc050af4f943a84a9d499315066
Review URL: https://codereview.chromium.org/1245673002
Command buffer will expose GL_CHROMIUM_framebuffer_multisample and
GL_CHROMIUM_map_sub, added support for these to enable interface
validation to succeed.
Review URL: https://codereview.chromium.org/1248853003
The Android framework was failing on conditions in
libjpeg-turbo.gyp, even though libjpeg-turbo is listed
in dependencies! for the framework (maybe because I
forgot to add export_dependent_settings!). This is fixed
by rearranging the gyp file.
BUG=skia:
Review URL: https://codereview.chromium.org/1249003002
This allows codecs that support subsets natively (i.e. WEBP) to do so.
Add a field on SkCodec::Options representing the subset.
Add a method on SkCodec to find a valid subset which approximately
matches a desired subset.
Implement subset decodes in SkWebpCodec.
Add a test in DM for decoding subsets.
Notice that we only start on even boundaries. This is due to the
way libwebp's API works. SkWEBPImageDecoder does not take this into
account, which results in visual artifacts.
FIXME: Subsets with scaling are not pixel identical, but close. (This
may be fine, though - they are not perceptually different. We'll just
need to mark another set of images in gold as valid, once
https://skbug.com/4038 is fixed, so we can tests scaled webp without
generating new images on each run.)
Review URL: https://codereview.chromium.org/1240143002
Reason for revert:
NEON 565 gold images have gone ugly. This is what I get for writing and testing SSE and just writing NEON.
E.g. colortype_xfermodes, dstreadshuffle, bigbitmaprect, pictures, textbloblooper, aaxfermodes (only Plus)
Original issue's description:
> 565 support for SIMD xfermodes
>
> This uses the most basic approach possible:
> - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
> - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.
>
> Clearly, we can optimize these loads and stores. That's a TODO.
>
> The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.
>
> The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.
>
> This will cause minor GM diffs, but I don't think any layout test changes.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0
>
> Committed: https://skia.googlesource.com/skia/+/860dcaa2ddfdadc050af4f943a84a9d499315066TBR=msarett@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1248893004
This uses the most basic approach possible:
- to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
- to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.
Clearly, we can optimize these loads and stores. That's a TODO.
The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.
The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.
This will cause minor GM diffs, but I don't think any layout test changes.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0
Review URL: https://codereview.chromium.org/1245673002
If we ever want to allow the command buffer as a skia gles2 backend,
we need a more up to date version of ANGLE, specifically there are
4 defines that differ between newer and older versions of ANGLE which
we use in skia, I've updated these in this change.
I'm not quite sure if what I've done for the 'angle_path' is correct,
I tried setting it to a path relative to skia, and to '<(DEPTH)', both
of which do not compile correctly, only '../' worked.
Committed: https://skia.googlesource.com/skia/+/db0b1e796ddbd08e6be8a666537318b1c0e2ce56
Review URL: https://codereview.chromium.org/1244843003
Visual Studio 2015 has additional warnings around noexcept and
disabling exceptions, which can be worked around with the
(undocumented) _HAS_EXCEPTIONS macro.
Visual Studio 2013 and 2015 have roundf in math.h, so use it to
avoid extra work and casts.
We avoid using cmath, as it undefs isfinite on gcc, but Visual Studio
2015 no longer provides overloads of copysign from math.h (which is
actually correct). As a result, use copysignf (which is available in
math.h in 2013 and 2015) directly.
Review URL: https://codereview.chromium.org/1244173005
Reason for revert:
Compile error that the try bots didn't catch :(
Original issue's description:
> ANGLE deps roll
>
> If we ever want to allow the command buffer as a skia gles2 backend,
> we need a more up to date version of ANGLE, specifically there are
> 4 defines that differ between newer and older versions of ANGLE which
> we use in skia, I've updated these in this change.
>
> I'm not quite sure if what I've done for the 'angle_path' is correct,
> I tried setting it to a path relative to skia, and to '<(DEPTH)', both
> of which do not compile correctly, only '../' worked.
>
> Committed: https://skia.googlesource.com/skia/+/db0b1e796ddbd08e6be8a666537318b1c0e2ce56TBR=bsalomon@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Review URL: https://codereview.chromium.org/1245223007
If we ever want to allow the command buffer as a skia gles2 backend,
we need a more up to date version of ANGLE, specifically there are
4 defines that differ between newer and older versions of ANGLE which
we use in skia, I've updated these in this change.
I'm not quite sure if what I've done for the 'angle_path' is correct,
I tried setting it to a path relative to skia, and to '<(DEPTH)', both
of which do not compile correctly, only '../' worked.
Review URL: https://codereview.chromium.org/1244843003
These handwritten xfermodes for Clear, Src, DstIn, and DstOut are actually dead
code: they're all covered by Sk4pxXfermode, which we'd already have returned.
Tidies up the xfermode creation logic to make this clearer.
This cuts 20-40K off SkXfermode.o, depending on the platform.
BUG=skia:
Review URL: https://codereview.chromium.org/1249773004
This deduplicates a few pieces of code:
- we end up with one copy of each xfer32() driver loop instead of one per xfermode;
- we end up with two* copies of each xfermode implementation instead of ten**.
* For a given Mode: Mode() itself and xfer_aa<Mode>().
** From unrolling: twice at a stride of 8, once at 4, once at 2, and once at 1, then all again for when we have AA.
This decreases the size of SkXfermode.o from 1.5M to 620K on x86-64 and from 1.3M to 680K on ARMv7+NEON.
If we wanted to, we could eliminate the xfer_aa<Mode>() copy by tagging each Mode() function as __attribute__((noinline)) or its equivalent. This would result in another ~100K space savings.
Performance is affected in proportion to the original xfermode speed:
fast modes like Plus take the largest proportional hit, and slow modes
like HardLight or SoftLight see essentially no hit at all.
This adds SK_VECTORCALL to help keep this code fast on ARMv7 and Windows. I've looked at the ARMv7 generated code... it looks good, even pretty.
For compatibility with SK_VECTORCALL, we now pass the vector-sized arguments by value instead of by reference. Some refactoring now allows us to declare each mode as just a static function instead of a struct, which simplifies things.
TBR=reed@google.com
No public API changes.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/e617e1525916d7ee684142728c0905828caf49da
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm7-Debug-Android_NoNeon-Trybot
Review URL: https://codereview.chromium.org/1242743004
Reason for revert:
http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-Arm7-Debug-Android_NoNeon/builds/1168/steps/build%20most/logs/stdio
Original issue's description:
> De-templatize Sk4pxXfermode code a bit.
>
> This deduplicates a few pieces of code:
> - we end up with one copy of each xfer32() driver loop instead of one per xfermode;
> - we end up with two* copies of each xfermode implementation instead of ten**.
>
> * For a given Mode: Mode() itself and xfer_aa<Mode>().
> ** From unrolling: twice at a stride of 8, once at 4, once at 2, and once at 1, then all again for when we have AA.
>
> This decreases the size of SkXfermode.o from 1.5M to 620K on x86-64 and from 1.3M to 680K on ARMv7+NEON.
>
> If we wanted to, we could eliminate the xfer_aa<Mode>() copy by tagging each Mode() function as __attribute__((noinline)) or its equivalent. This would result in another ~100K space savings.
>
> Performance is affected in proportion to the original xfermode speed:
> fast modes like Plus take the largest proportional hit, and slow modes
> like HardLight or SoftLight see essentially no hit at all.
>
> This adds SK_VECTORCALL to help keep this code fast on ARMv7 and Windows. I've looked at the ARMv7 generated code... it looks good, even pretty.
>
> For compatibility with SK_VECTORCALL, we now pass the vector-sized arguments by value instead of by reference. Some refactoring now allows us to declare each mode as just a static function instead of a struct, which simplifies things.
>
> TBR=reed@google.com
> No public API changes.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/e617e1525916d7ee684142728c0905828caf49daTBR=msarett@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1245273005
It appears that the Adreno compiler is even more twitchy about gl_FragCoord handling than expected.
BUG=skia:4078
Review URL: https://codereview.chromium.org/1246773003
This deduplicates a few pieces of code:
- we end up with one copy of each xfer32() driver loop instead of one per xfermode;
- we end up with two* copies of each xfermode implementation instead of ten**.
* For a given Mode: Mode() itself and xfer_aa<Mode>().
** From unrolling: twice at a stride of 8, once at 4, once at 2, and once at 1, then all again for when we have AA.
This decreases the size of SkXfermode.o from 1.5M to 620K on x86-64 and from 1.3M to 680K on ARMv7+NEON.
If we wanted to, we could eliminate the xfer_aa<Mode>() copy by tagging each Mode() function as __attribute__((noinline)) or its equivalent. This would result in another ~100K space savings.
Performance is affected in proportion to the original xfermode speed:
fast modes like Plus take the largest proportional hit, and slow modes
like HardLight or SoftLight see essentially no hit at all.
This adds SK_VECTORCALL to help keep this code fast on ARMv7 and Windows. I've looked at the ARMv7 generated code... it looks good, even pretty.
For compatibility with SK_VECTORCALL, we now pass the vector-sized arguments by value instead of by reference. Some refactoring now allows us to declare each mode as just a static function instead of a struct, which simplifies things.
TBR=reed@google.com
No public API changes.
BUG=skia:
Review URL: https://codereview.chromium.org/1242743004
It turns out that gyp (kind of) has support for cross
compiling with a different host and target. We simply
need to specify CC_host and CC_target instead of CC.
Making this change allows us to compile yasm on a Linux
host for Android.
We run into problems on Mac because
the linker on a Mac host requires different command line
arguments than the linker on the Android target. In
looking through the code for gyp itself and speaking to
Ben, it doesn't appear to me that gyp supports passing
different arguments to host and target linkers.
I would imagine that we would have similar problems on
Windows.
Below is a link to a CL that would fix this issue in gyp.
It looks like it has been dropped for a long time.
Thanks to Ben for this link!
https://chromiumcodereview.appspot.com/10795044/
Also I'm adding a link to the build instructions for Chrome
(thanks again Ben). It looks like they only support
building for Android from Linux.
https://code.google.com/p/chromium/wiki/AndroidBuildInstructions
My next steps are:
1) Getting in touch with Torne or someone else with gyp to
see if people are aware of this issue or interested in
fixing it.
2) Deciding if skia should care about this issue.
3) Deciding if skia should work around this issue.
It'd be really great to hear your thoughts on (2) and (3).
My first thought is that we shouldn't care because, as
long as we always compile the production copy of skia for
Android on Linux, we will get the fast code. Is this
a valid conclusion? Is there a way to write Android apps
on Mac that accidentally use the slower code?
If we do care, there are workarounds:
For Mac, we can check in a yasm binary - it's a little
smaller than the one I am deleting in this CL :-/
For Windows, we *might* be able to use the yasm.exe binary
already in externals (we get this from DEPS because this is
how chromium uses yasm on Windows).
Are there other platforms that we care about?
Let me know what you think!
BUG=skia:4028
DOCS_PREVIEW= https://skia.org/?cl=1239333002
Review URL: https://codereview.chromium.org/1239333002
Reason for revert:
942930d (included in this roll) introduced a 140 kB sizes regression in
libskia.so. Please investigate and reland if this regression is necessary.
Original issue's description:
> 565 support for SIMD xfermodes
>
> This uses the most basic approach possible:
> - to load an Sk4px from 565, convert to SkPMColors on the stack serially then load those SkPMColors.
> - to store an Sk4px to 565, store to SkPMColors on the stack then convert to 565 serially.
>
> Clearly, we can optimize these loads and stores. That's a TODO.
>
> The code using SkPMFloat is the same idea but a little more long-term viable, as we're only operating on one pixel at a time anyway. We could probably write 565 <-> SkPMFloat methods, but I'd rather not until it's really compelling.
>
> The speedups are varied but similar across SSE and NEON: a few uninteresting, many 50% faster, some 2x faster, and SoftLight ~4x faster.
>
> This will cause minor GM diffs, but I don't think any layout test changes.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/942930dcaa51f66d82cdaf46ae62efebd16c8cd0TBR=msarett@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1242973004
This change is motivated by a recent switch in how chromium handles
<video> color spaces, making rec709 more commonly used. This will
allow video -> canvas copies to take the fast GPU path when we're using
709, just as we do with 601 and jpeg.
Chromium-side change: https://codereview.chromium.org/1236313002
Review URL: https://codereview.chromium.org/1241723005