Bug: skia:
Change-Id: I2adb983d68625d327e7c00e53b6ae4703b46252f
Reviewed-on: https://skia-review.googlesource.com/104761
Commit-Queue: Chris Dalton <csmartdalton@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
The last version had a problem with the no simd compilation.
TBR=mtklein@google.com
Change-Id: I139388cf3bf1b55cb4a49133a98be129fd9711c6
Reviewed-on: https://skia-review.googlesource.com/80983
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Herb Derby <herb@google.com>
Change-Id: Ie3bada767f0ba945cb17f174f179510768eb178d
Reviewed-on: https://skia-review.googlesource.com/77583
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
Bug: skia:
Change-Id: I0377e6a1dd8259e944f7902a5c68af524fa588c7
Reviewed-on: https://skia-review.googlesource.com/79382
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Chris Dalton <csmartdalton@google.com>
and test it.
Change-Id: Ib0c2cf93c63d8d3c36a7d4d60bbec4ecede29bc7
Reviewed-on: https://skia-review.googlesource.com/78480
Reviewed-by: Chris Dalton <csmartdalton@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Change-Id: I0e94ef1620b54405a23470507e2b2c4bb54731c9
Reviewed-on: https://skia-review.googlesource.com/72860
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Herb Derby <herb@google.com>
Change-Id: I1b00ba2720648b75fce47d3f4d0f56fb8f2cd171
Reviewed-on: https://skia-review.googlesource.com/67041
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Herb Derby <herb@google.com>
Add mulHi to base SkNx, and specialize implementations for Sk4u for
neon and sse.
Add casts for converting from uint8_t by 4 to uint32_t by 4.
Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I29a32e2ad9812a47fff841ceca334e562362836f
Reviewed-on: https://skia-review.googlesource.com/57960
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Herb Derby <herb@google.com>
created new file src/core/SkColorData.h for
internal consumption. Note that many of the
functions there are unused as well.
Bug: skia: 6898
R: reed@google.com
Change-Id: I25bfd5a9c21f53558c4ca65a77eb5d322d897c6d
Reviewed-on: https://skia-review.googlesource.com/46848
Commit-Queue: Cary Clark <caryclark@google.com>
Reviewed-by: Mike Reed <reed@google.com>
For the blur_1.50_normal_low_quality benchmark, this code goes from about 120us to 85us.
The original implementation executes at about 95us.
This changed in controlled by the flag:
SK_SUPPORT_LEGACY_SLOW_SMALL_BLUR
BUG=chromium:759070
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: If722cb8ffd8c47a94b7a6b4e6dd26fd1474b6209
Reviewed-on: https://skia-review.googlesource.com/45300
Commit-Queue: Herb Derby <herb@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
When Skia's built with an interestingly advanced instruction set
baseline like SSSE3 or SSE4.1, we end up with two distinct copies of
some SkOpts functions, one default in SkOpts.o and one specialization
from SkOpts_{ssse3,sse41}.o. These functions are static, and so are
technically unrelated, even though they're the same code compiled with
the same instructions available. They're going to be identical.
What we want here is to remove static but mark them as inline instead.
In this case inline means "if the linker sees multiple copies of this,
that's cool, just pick any one arbitrarily". That's just what we want.
Now, when I disassemble a binary before and after this change, I do see
the redundant routines removed. However, the file size change is
minimal... I suspect that this must mean the linker has noticed that we
had identical code and physically folded the two logically independent
routines. I don't know how prevalent this optimization is, though, so
it doesn't hurt to give it more of a "one copy please" hint with inline.
There may also be a difference here between the binary size (~unchanged)
and the in-memory layout of that binary?
Change-Id: Id9c8f0ffc84aa1c9a066c22b623d34adab281857
Reviewed-on: https://skia-review.googlesource.com/37501
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Ben Wagner <bungeman@google.com>
This is mostly dead code.
In order to make it truly dead, we need to opt drawing unpremul images
into SkRasterPipelineBlitter. They had been handled by
SkLinearBitmapPipeline, but can't be draw by SkBitmapProcLegacyShader.
Drawing unpremul images is tested by the GM all_variants_8888, which
gave us trouble last time around (serialize-8888 drew right, 8888 wrong)
but now draws fine. I think this was probably also the root of the
revert, drawing some unpremul image in Chrome's tests somewhere.
Change-Id: I453f9df44ade807316935921cbae82961e2f08aa
Reviewed-on: https://skia-review.googlesource.com/24862
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Bug: skia:
Change-Id: If6f0d0a57463bf99a66d674e65a62ce3931d0116
Reviewed-on: https://skia-review.googlesource.com/24644
Commit-Queue: Mike Reed <reed@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Bug: skia:
Change-Id: Id82a58f2f4d947894bb710cb3190c873b20b98eb
Reviewed-on: https://skia-review.googlesource.com/23404
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Yuqian Li <liyuqian@google.com>
This reverts commit d9b1fe02a6.
Reason for revert: try to fix chrome roll
Original change's description:
> remove a bit more dead code
>
> Change-Id: I61484672e88d6bb4f75833ee89e7178c4f34d610
> Reviewed-on: https://skia-review.googlesource.com/20780
> Commit-Queue: Mike Klein <mtklein@google.com>
> Reviewed-by: Herb Derby <herb@google.com>
TBR=mtklein@chromium.org,mtklein@google.com,herb@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
Change-Id: I03dcd344dfb138261d9421b0692d12e4ed431100
Reviewed-on: https://skia-review.googlesource.com/20822
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Reed <reed@google.com>
Change-Id: I61484672e88d6bb4f75833ee89e7178c4f34d610
Reviewed-on: https://skia-review.googlesource.com/20780
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Herb Derby <herb@google.com>
This is mostly reindentation to make each platform's code clear, and
rewriting comments to match the usual way we orient vector comments
these days (in memory order).
Change-Id: I7ceb98c5af88980e74b6a124507e0ef1900fc731
Reviewed-on: https://skia-review.googlesource.com/19663
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
enables lots of code to delete
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Bug: skia:
Change-Id: I13631ead68a9232bd8c13c5ef54727f44def26ca
Reviewed-on: https://skia-review.googlesource.com/19278
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Mike Reed <reed@google.com>
Most importantly, remove the undefined behavior implied by "delta".
Change-Id: I8f9740804ec74dd40b049eafd4f0d51b36ce3237
Reviewed-on: https://skia-review.googlesource.com/18140
Reviewed-by: Herb Derby <herb@google.com>
Change-Id: Icc570d8a8f1df1dea202e1d234433491122b9b67
Reviewed-on: https://skia-review.googlesource.com/18135
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Change ARM implementation of alpha blending to work on 8 pixels at a
time (using NEON). Also improve the accuracy of alpha blending by using
a formula based on SkMulDiv255Round rather than SkPMSrcOver.
Note that a number of variations of this code were considered. Here are
some notes:
- A 16 pixels at a time version was considered. This performs well for
the case of extreme alpha (all-opaque or all-transparent pixels), but
performs worst than the 8 pixels version when there are frequent
transitions of alpha. Also gcc 6.2.1 seems to have troubles with
register pressure when using this version.
- If the branch to detect the fully-opaque or fully-transparent cases
is removed, then the performance increases significantly for images
which are all partially transparent (especially on ARM Cortex A72),
but can significantly decrease for images that are almost fully
opaque or fully transparent.
This implementation is a compromise to the effects described above.
This patch produces a ~10% improvement on the nanobench's sub-scores
repeatTile_BGRA_8888_A, constXTile_MM_filter_trans, constXTile_CC_trans,
constXTile_RR_filter_trans when running on ARM Cortex A72. Improvements
of greater magnitude (20% to 30%) are observed when running on ARM
Cortex A53.
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I1f0c9f549057613bbffd26e6651f3beeb0019af9
Bug: skia:
Reviewed-on: https://skia-review.googlesource.com/16520
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
This lets the compiler generate AVX versions with wider writes.
Change-Id: Ia63825e70c72bdb4d14bef97d8b4ea4be54c9d84
Reviewed-on: https://skia-review.googlesource.com/17715
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
This is not an important format, and the code is dead or close to it.
The code is an occasional maintenance burden so I'd like it gone.
Change-Id: I4ad921533abf3211e6a81e6e475b848795eea060
Reviewed-on: https://skia-review.googlesource.com/15600
Reviewed-by: Mike Reed <reed@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
On a simple benchmark the CRC32 version is about 3x faster on
ARM Cortex A57 (Aarch32) than the Murmur3 scalar version.
BUG=skia:
Change-Id: I71515e8463a33924998b837ff9f32202690dd2fe
Reviewed-on: https://skia-review.googlesource.com/15480
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
ARMv8 specifies that an IT block should be followed by only one 16-bit instruction.
* SkFloatToFix is back to a C implementation that mirrors the assembly code.
* S32A_D565_Opaque_neon switched the usage of the temporary 'ip' register to let
the compiler choose what is best in the context of the IT block. And replaced
'keep_dst' by 'ip' where low register or high register does not matter.
BUG=skia:
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: If587110a0c74b637ae99460419d46cf969c694fc
Reviewed-on: https://skia-review.googlesource.com/9346
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Change-Id: I5de3c8daae80e437b3553ab6afcee7120a1bb775
Reviewed-on: https://skia-review.googlesource.com/14038
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
I still plan to replace this more thoroughly with a different blitter,
but for now just implement it using callback.
This is the last stage not supported by SkJumper! Will follow up by
removing all of SkRasterPipeline_opts.h and anything that indicates
SkJumper might not work.
Change-Id: I96ba2bb0a26266f3b658e5f3153ec7d5bbd46799
Reviewed-on: https://skia-review.googlesource.com/14037
Reviewed-by: Florin Malita <fmalita@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Looks like the color-space images have this well tested (even without
lab_to_xyz) and the diffs look like rounding/FMA.
The old plan to keep loads and stores outside callback was:
1) awkward, with too many pointers and pointers to pointers to track
2) misguided... load and store stages march ahead by x,
working at ptr+0, ptr+8, ptr+16, etc. while callback
always wants to be working at the same spot in the buffer.
I spent a frustrating day in lldb to understood 2). :/
So now the stage always store4's its pixels to a buffer in the context
before the callback, and when the callback returns it load4's them back
from a pointer in the context, defaulting to that same buffer.
Instead of passing a void* into the callback, we pass the context
itself. This lets us subclass the context and add our own data...
C-compatible object-oriented programming.
Change-Id: I7a03439b3abd2efb000a6973631a9336452e9a43
Reviewed-on: https://skia-review.googlesource.com/13985
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
This lets us temporarily escape to piece of code outside
SkRasterPipeline. We should be able to use this to replace
- parametric_{r,g,b,a}
- table_{r,g,b,a}
- color_lookup_table
- shader_adapter*
* We want to obsolete shader_adapter for other reasons anyway,
but we _could_ replace it with this if we want to.
Change-Id: I42b657b3c19c679796ed1876856cae0c8471307e
Reviewed-on: https://skia-review.googlesource.com/12102
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Matt Sarett <msarett@google.com>
This splits SkImageShaderContext into three parts:
- SkJumper_GatherCtx: always, already done
- SkJumper_SamplerCtx: when bilinear or bicubic
- MiscCtx: other little bits (the matrix, paint color, tiling limits)
Thanks for the snazzy allocator that allows this Herb!
Both SkJumper and SkRasterPipeline_opts.h should be speaking all the
same types now.
I've copied the comments about bilinear/bicubic to SkJumper with little
typo fixes and clarifications.
Change-Id: I4ba7b7c02feba3f65f5292169a22c060e34933c6
Reviewed-on: https://skia-review.googlesource.com/13269
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
SkJumper_GatherCtx is a prefix of SkImageShaderContext, so
this is a no-op. It helps to keep things straight, and I
do want to split apart the GatherCtx from a new SamplingCtx.
Change-Id: I9c5f436b096624c2809e1f810e9bcd6c6b00b883
Reviewed-on: https://skia-review.googlesource.com/13264
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
We can't realistically use AVX and SkNx together because of ODR
problems, so remove the code that may tempt us to try.
Remaining code paths using AVX:
- one intrinsics-only routine in SkOpts_hsw.cpp
- SkJumper
Change-Id: I0d2d03b47ea4a0eec27f2de2b28a4c3d1ff8376f
Reviewed-on: https://skia-review.googlesource.com/13121
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
N=8 on that bot.
CQ_INCLUDE_TRYBOTS=skia.primary:Build-Ubuntu-Clang-x86_64-Release-Fast
Change-Id: If54ae800b50d9dffb9f983b23ff6f522657943b1
Reviewed-on: https://skia-review.googlesource.com/13061
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
This moves all the values that gather_8888, gather_a8, etc. need to
the front of SkImageShaderContext, and dereferences the color table.
This should be a no-op, but will make these stages easier to write
in SkJumper.
Change-Id: I0dff97d5113d14e941e7b717cd85f0036764eb88
Reviewed-on: https://skia-review.googlesource.com/11492
Reviewed-by: Herb Derby <herb@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
These can't really be done with SkJumper.
Change-Id: Ic357f00695eacd2766f6dfb9a3be13b0c07c3650
Reviewed-on: https://skia-review.googlesource.com/11386
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
This also subtlely allows clients to convert between F32 and F16.
BUG=skia:
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: Ied5f2295fce00c69d8cf85730be899f3f8597915
Reviewed-on: https://skia-review.googlesource.com/9914
Reviewed-by: Mike Reed <reed@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Matt Sarett <msarett@google.com>