The artifacts previously thought to require msaa can be handled by
(1) converting near-linear quadratics into lines, and (2) ensuring all
quadratic segments are monotonic with respect to the vector of their
closing edge [P2 -> P0].
No. 1 was already in effect.
No. 2 is implemented by this change.
Now we only fall back on soft msaa for the two corner pixels.
This change also does some generic housekeeping in the quadratic
processor.
Bug: skia:
Change-Id: Ib3309c2ed86d3d8bec5f451125a69326e82eeb1c
Reviewed-on: https://skia-review.googlesource.com/29721
Commit-Queue: Chris Dalton <csmartdalton@google.com>
Reviewed-by: Greg Daniel <egdaniel@google.com>
Google3 now has that guard flag removed.
Bug: skia:
Change-Id: I6dede8c815e9f55bd769daef3982fd2fa8a7d6be
Reviewed-on: https://skia-review.googlesource.com/31201
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Yuqian Li <liyuqian@google.com>
Our SkThreadedBMPDevice is very experimental so I didn't handle this
edge case earlier. Maybe it's now a good time to fix it.
Bug: skia:
Change-Id: Ie3938475449c1341d34200ff3afe4589836950fc
Reviewed-on: https://skia-review.googlesource.com/31203
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Yuqian Li <liyuqian@google.com>
Bug: skia:6881
Change-Id: I8c1e4be16f4a79e9aa6fb663337476d0c0fe8c1c
Reviewed-on: https://skia-review.googlesource.com/31024
Reviewed-by: Eric Boren <borenet@google.com>
Commit-Queue: Ben Wagner <benjaminwagner@google.com>
Currently only the hard-stop specializations support tiling.
Consolidate the tiling code and expand to kTwo_ColorType,
kThree_ColorType also.
Change-Id: I0c04997f563a7150a486ccc03f8121099a651c0b
Reviewed-on: https://skia-review.googlesource.com/30780
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Florin Malita <fmalita@chromium.org>
Bug: skia:5419
Change-Id: I75a67d35c94821bf7de80b63eb835b20f2772ddd
Reviewed-on: https://skia-review.googlesource.com/31241
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Robert Phillips <robertphillips@google.com>
Adds support for basic Texture creation.
Bug: skia:
Change-Id: I9a3f15bef1c88054c19e952e231cad94ad69f296
Reviewed-on: https://skia-review.googlesource.com/30781
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Greg Daniel <egdaniel@google.com>
This will make it much easier to experiment with upgrading the OS to
Debian-9.1 without causing failures on master.
No-Try: true
Change-Id: Id43b055841ec3ceb42133c9dd7b630f12d1b45c6
Reviewed-on: https://skia-review.googlesource.com/31001
Reviewed-by: Eric Boren <borenet@google.com>
Commit-Queue: Ben Wagner <benjaminwagner@google.com>
Bug: skia:
Change-Id: Icfaf04a541138700da906d96dfc2d90e4e00379d
Reviewed-on: https://skia-review.googlesource.com/31150
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Reed <reed@google.com>
Without this fix, the newly added GM draws incorrectly.
Change-Id: Ic159ab3201c10369ad5f8151186245d8d076cc25
Reviewed-on: https://skia-review.googlesource.com/30484
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Florin Malita <fmalita@chromium.org>
I believe this was originally added to make Raster & GPU rendering more similar. I think we've moved on from there.
Change-Id: Ic980f3308fbd427e5857b720488c91383a32a149
Reviewed-on: https://skia-review.googlesource.com/30761
Reviewed-by: Stan Iliev <stani@google.com>
Commit-Queue: Robert Phillips <robertphillips@google.com>
Perf showed that DAA is slow with MSVC. Disable it until I find
out why.
Bug: skia:
Change-Id: If30c24e97fa42e3a7ce143a1b1d06e4a3f278d13
TBR: mtklein@google.com
Reviewed-on: https://skia-review.googlesource.com/30584
Reviewed-by: Mike Klein <mtklein@chromium.org>
Reviewed-by: Yuqian Li <liyuqian@google.com>
Commit-Queue: Yuqian Li <liyuqian@google.com>
- Bring back some previously deleted macros and helper types.
- Automatically inject base_type information into snapshot events,
to allow simpler tracking of polymorphic object types.
- Fix JSON formatting of pointer values (they were serializing as bool).
Bug: skia:
Change-Id: Iac7803f72ce5396ffd2fbcb5a36d76745c5e3f3e
Reviewed-on: https://skia-review.googlesource.com/28220
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Brian Osman <brianosman@google.com>
It's funny how now that I'm on a machine that doesn't support AVX2,
it's suddenly important for me that pack() is optimized for SSE!
This is basically the same as this morning, without any weird AVX2
pack ordering issues. This replaces something like
movdqa 2300(%rip), %xmm0
pshufb %xmm0, %xmm3
pshufb %xmm0, %xmm2
punpcklqdq %xmm3, %xmm2
(This is SSE4.1; the SSE2 version is worse.)
with
psrlw $8, %xmm3
psrlw $8, %xmm2
packuswb %xmm3, %xmm2
(SSE2 and SSE4.1 both.)
It's always nice to not need to load a shuffle mask out of memory.
Change-Id: I56fb30b31fcedc0ee84a4a71c483a597c8dc1622
Reviewed-on: https://skia-review.googlesource.com/30583
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Bug: skia:
Change-Id: I505e5c339947e9fc8bbec6acefc48ee9f47c96d2
Reviewed-on: https://skia-review.googlesource.com/30581
Reviewed-by: Brian Osman <brianosman@google.com>
Commit-Queue: Brian Osman <brianosman@google.com>
This makes loading them much simpler in 8-bit mode.
Change-Id: I35ff34ebd0b93425c4e39e055bf4ade8cf8561e1
Reviewed-on: https://skia-review.googlesource.com/30621
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
My next step is to change the uniform_color context to
struct {
float r,g,b,a;
uint32_t rgba;
};
so that it's trivial to load in both float and 8-bit pipelines.
Change-Id: If9bdde353ced3bf9eb0c63204b4770ed614ad16b
Reviewed-on: https://skia-review.googlesource.com/30481
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
The color appended here is both uniform and constant, and it's the
constantness that makes this custom append method useful over just
append(SkRasterPipeline::uniform_color, ...).
Uniform colors that are not constant have to be loaded from the pointer
each time (the caller might have changed the color out-of-band), but
constant uniform colors can be analyzed once and implemented with
specalizations like black_color and white_color.
Change-Id: I3cfc00ccc578dd915367bca7113010557181224c
Reviewed-on: https://skia-review.googlesource.com/30560
Commit-Queue: Mike Klein <mtklein@google.com>
Commit-Queue: Florin Malita <fmalita@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>
This is a consistent, very small speedup for srcover.
SkRasterPipeline_run
Before: 30.4057ns
After: 30.1089ns
i.e. a 1% speedup on the bench, maybe 3-4% improvment in srcover itself.
The only reason I'd send this out now is that this will slightly change
some pixels, so it's a good thing to sneak in before rebaselining.
It's possible that other blend modes would benefit from the same, but
I've only looked at srcover (and I've also changed dstover so that it
doesn't look funny).
Change-Id: Ic056ca0912d76648d43a78e0052176fd0f7934f1
Reviewed-on: https://skia-review.googlesource.com/30281
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
__builtin_convertvector(..., U8x4) is producing a fairly long
sequence of code to convert U16x4 to U8x4 on HSW:
vextracti128 $0x1,%ymm2,%xmm3
vmovdqa 0x1848(%rip),%xmm4
vpshufb %xmm4,%xmm3,%xmm3
vpshufb %xmm4,%xmm2,%xmm2
vpunpcklqdq %xmm3,%xmm2,%xmm2
vextracti128 $0x1,%ymm0,%xmm3
vpshufb %xmm4,%xmm3,%xmm3
vpshufb %xmm4,%xmm0,%xmm0
vpunpcklqdq %xmm3,%xmm0,%xmm0
vinserti128 $0x1,%xmm2,%ymm0,%ymm0
We can do much better with _mm256_packus_epi16:
vinserti128 $0x1,%xmm0,%ymm2,%ymm3
vperm2i128 $0x31,%ymm0,%ymm2,%ymm0
vpackuswb %ymm0,%ymm3,%ymm0
vpackuswb packs the values in a somewhat surprising order,
which the first two instructions get us lined up for.
This is a pretty noticeable speedup, 7-8% on some benchmarks.
The same sort of change could be made for SSE2 and SSE4.1 also
using _mm_packus_epi16, but the difference for that change is
much less dramatic. Might as well stick to focusing on HSW.
Change-Id: I0d6765bd67e0d024d658a61d19e6f6826b4d392c
Reviewed-on: https://skia-review.googlesource.com/30420
Reviewed-by: Florin Malita <fmalita@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
Bug: skia:6916
Change-Id: I16badf80c3b34e517b8baab161150c9434f325aa
Reviewed-on: https://skia-review.googlesource.com/30100
Commit-Queue: Ravi Mistry <rmistry@google.com>
Reviewed-by: Eric Boren <borenet@google.com>
Like _lowp.cpp, it's only meant to be compiled offline.
Change-Id: I0d4f7c1fd8fa880ffd084c1e332f6a33def6e26f
Reviewed-on: https://skia-review.googlesource.com/30262
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Mike Klein <mtklein@chromium.org>
I think we can replace a lot of legacy code with an SkRasterPipeline
backend that works in 8-bit and stays interlaced. Think of this as a
"lowerp" replacement for lowp.
I'm having some trouble getting ARMv8 working.
ARMv7 should be fine, but I want to turn it on separately from x86.
I haven't looked at 32-bit x86 yet, but that's also on the todo list.
Open questions to follow up on:
- is it better to fold every multiply back down to 8-bit
(as seen here), or to allow intermediates to accumulate
in 16-bit and divide by 255 when done/needed?
- is it better pass tightly packed 8-bit vectors between stages (as
seen here), or to keep the 8-bit values unpacked in 16-bit lanes?
- should we make V wider than 1 register?
GMs look good. All diffs invisible and plausibly due to the 15->8 bit
precision drop. A quick bench run showed this running in about 0.75x
the time of the existing lowp backend.
Change-Id: I24aa46ff1d19c0b9b8dc192d5b1821cab0b8843c
Reviewed-on: https://skia-review.googlesource.com/29886
Commit-Queue: Mike Klein <mtklein@chromium.org>
Reviewed-by: Florin Malita <fmalita@chromium.org>