Because we recognize commonly used gamma tables and
parameters as 2.2f, about 98% of jpegs with color profiles
will pass through this xform (assuming the dst is also
2.2f). Sample size is 10,322 jpegs.
I won't go crazy with performance numbers because this is
a work in progress, particularly in terms of correctness.
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
Decode Time + QCMS Xform 1.28 ms
QCMS Xform Only 0.495 ms
Decode Time + Skia Opt Xform 1.01 ms
Skia Opt Xform Only 0.235 ms
Decode Time + Xform Speed-up 1.27x
Xform Only Speed-up 2.11x
FWIW, Skia xform time before these optimizations was
41.1 ms. But we expected that code to be slow.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2046013002
Currently this is not actually hooked into the system.
To give some context, in a follow up CL I'll add this to GrDrawTarget.
For this I will move the gpu onDraw command to the GpuCommandBuffer as well.
For GL this will end up just being a pass through to a non virtual draw(...)
on GrGLGpu, and for vulkan it will mostly do what it currently does but
adding commands to the secondary command buffer instead.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2038583002
Review-Url: https://codereview.chromium.org/2038583002
Replaces targetHasUnifiedMultisampling with a simpler "useHWAA". Now
the code that creates a pipeline builder needs to decide on its own
whether it should enable multisampling, rather than relying on the
builder to try and guess.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2041283002
Review-Url: https://codereview.chromium.org/2041283002
Fail out in a couple of new places when the input data is very
large and exceeds the limits of the pathops machinery.
Most of the change here plumbs in a way to exclude an assert in
one of these exceptional cases. The current SkAddIntersection
implementation and the inner functions it calls has no way to
report an error to the root caller for an early exit, so rather
than add that in, exclude the assert when the test that would
trigger it runs (allowing the test to otherwise ensure that it
properly fails).
TBR=reed@google.com
BUG=617586,617635
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046713003
Review-Url: https://codereview.chromium.org/2046713003
$ git grep -l '<windows.h>' include src
include/private/SkLeanWindows.h
$ git grep -l SkLeanWindows.h | grep '\.h$'
include/ports/SkTypeface_win.h
include/utils/win/SkHRESULT.h
include/utils/win/SkTScopedComPtr.h
include/views/SkEvent.h
src/core/SkMathPriv.h
src/ports/SkTypeface_win_dw.h
src/utils/SkThreadUtils_win.h
src/utils/win/SkWGL.h
The same for `#include <intrin.h>` that was found in SkMath.h.
Those functions that needed it are moved to SkMathPriv.h.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2041943002
CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_chromium_compile_dbg_ng,win_chromium_compile_rel_ng
Review-Url: https://codereview.chromium.org/2041943002
Reason for revert:
Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways.
Original issue's description:
> Move immintrin/arm_neon includes to where they are used.
>
> On my Mac (so, immintrin), this improves compile time, both wall and cpu,
> by about 16%. To test I ran this on an SSD with files hot in their caches:
>
> $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
> ninja -C out/Release -t clean && \
> time ninja -C out/Release
>
> Before: 159 wall / 3367 cpu
> 159 wall / 3368 cpu
>
> After: 137 wall / 2860 cpu
> 136 wall / 2863 cpu
>
> I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
> That made no signficant difference, so I've kept immintrin for its simplicity.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> TBR=reed@google.com
> No public API changes.
>
> Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164aTBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2046213002
Due to performance regression on various GPUs, we're only going to use the
new draw-call based mip-mapper when necessary. Of the bots where we have
test coverage, that means Intel and Mac-NVIDIA. We also had failures on
our AMD 7770 bots - I'm upgrading the drivers on those two machines, and
I'm leaving them out of the whitelist for now.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2042313002
Review-Url: https://codereview.chromium.org/2042313002
Don't add an edge if the bottom vertex was already added, or
if an island vertex has a left poly but no right poly.
(Sorry for the lack of test, but the only reduction I could create was still a huge path and only crashes in Chrome.)
BUG=617907
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2043873005
Review-Url: https://codereview.chromium.org/2043873005
Since we're already approximating the sRGB gamma curve with a sqrt(), we might
as well approximate with it a faster approximate sqrt(). On Intel, this
.rsqrt().invert() version is 2-3x faster than .sqrt() (~3x faster on older
machines, ~2x faster on newer machines).
This should provide ~11 bits of precision, suspiciously exactly enough.
Running dm --config srgb, there are diffs, but none perceptible.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046063002
Review-Url: https://codereview.chromium.org/2046063002
On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:
$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release
Before: 159 wall / 3367 cpu
159 wall / 3368 cpu
After: 137 wall / 2860 cpu
136 wall / 2863 cpu
I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
TBR=reed@google.com
No public API changes.
Review-Url: https://codereview.chromium.org/2045633002
This change is a refactorization to open up more flexable use for render passes
in the future. Once we start bundling draw calls into a single render pass we
will need to start using specific load and store ops instead of the default
currently of load and store everything.
We still will need to simply get a compatible render pass for creation of framebuffers
and pipeline objects. These render passes don't need to know load and store ops. Thus
in this change, I'm defaulting the "compatible" render pass to always be the one
which both loads and stores. All other load/store combinations will be created lazily
when requested for a specific draw (future change).
The CompatibleRPHandle is a way for us to avoid analysing the RenderTarget every time
we want to get a new GrVkRenderPass.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1977403002
Review-Url: https://codereview.chromium.org/1977403002
Reason for revert:
Still causing problems in Google3, e.g.
https://test.corp.google.com/ui#cl=124138817&flags=CAMQBQ==&id=OCL:124138817:BASE:124139560:1465227435491:219ffbdb&t=//third_party/skia/HEAD:dm
Original issue's description:
> Make SkPngCodec decode progressively.
>
> This is a step towards using SkCodec in Chromium, where progressive
> decoding is necessary.
>
> Switch from using png_read_row (which expects all the data to be
> available) to png_process_data, which uses callbacks when rows are
> available.
>
> Create a new API for SkCodec, which supports progressive decoding and
> scanline decoding. Future changes will switch the other clients off of
> startScanlineDecode and get/skip-Scanlines to the new API.
>
> Remove SkCodec::kNone_ScanlineOrder, which was only used for interlaced
> PNG images. In the new API, interlaced PNG fits kTopDown. Also remove
> updateCurrScanline(), which was only used by the old implementation for
> interlaced PNG.
>
> DMSrcSink:
> - In CodecSrc::kScanline_Mode, use the new method for scanline decoding
> for the supported formats (just PNG and PNG-in-ICO for now).
>
> fuzz.cpp:
> - Remove reference to kNone_ScanlineOrder
>
> SkCodec:
> - Add new APIs:
> - startIncrementalDecode
> - incrementalDecode
> - Remove kNone_SkScanlineOrder and updateCurrScanline()
>
> SkPngCodec:
> - Implement new APIs
> - Switch from sk_read_fn/png_read_row etc to png_process_data
> - Expand AutoCleanPng's role to decode the header and create the
> SkPngCodec
> - Make the interlaced PNG decoder report how many lines were
> initialized during an incomplete decode
> - Make initializeSwizzler return a bool instead of an SkCodec::Result
> (It only returned kSuccess or kInvalidInput anyway)
>
> SkIcoCodec:
> - Implement the new APIs; supported for PNG in ICO
>
> SkSampledCodec:
> - Call the new method for decoding scanlines, and fall back to the old
> method if the new version is unimplemented
> - Remove references to kNone_SkScanlineOrder
>
> tests/CodecPartial:
> - Add a test which decodes part of an image, then finishes the decode,
> and compares it to the straightforward method
>
> tests/CodecTest:
> - Add a test which decodes all scanlines using the new method
> - Repurpose the Codec_stripes test to decode using the new method in
> sections rather than all at once
> - In the method check(), add a parameter for whether the image supports
> the new method of scanline decoding, and be explicit about whether an
> image supports incomplete
> - Test incomplete PNG decodes. We should have been doing it anyway for
> non-interlaced (except for an image that is too small - one row), but
> the new method supports interlaced incomplete as well
> - Make test_invalid_parameters test the new method
> - Add a test to ensure that it's safe to fall back to scanline decoding without
> rewinding
>
> BUG=skia:4211
>
> The new version was generally faster than the old version (but not significantly so).
>
> Some raw performance differences can be found at https://docs.google.com/a/google.com/spreadsheets/d/1Gis3aRCEa72qBNDRMgGDg3jD-pMgO-FXldlNF9ejo4o/
>
> Design doc can be found at https://docs.google.com/a/google.com/document/d/11Mn8-ePDKwVEMCjs3nWwSjxcSpJ_Cu8DF57KNtUmgLM/
>
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1997703003
>
> Committed: https://skia.googlesource.com/skia/+/a4b09a117d4d1ba5dda372e6a2323e653766539e
>
> Committed: https://skia.googlesource.com/skia/+/30e78c9737ff4861dc4e3fa1e4cd010680ed6965
>
> Committed: https://skia.googlesource.com/skia/+/6fb2391b2cc83ee2160b4e994faa8128975acc1fTBR=reed@google.com,msarett@google.com,scroggo@chromium.org
# Not skipping CQ checks because original CL landed more than 1 days ago.
BUG=skia:4211
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2044573002
Review-Url: https://codereview.chromium.org/2044573002
Crash is caused by ldxc1 instruction, which traps when double values are
not aligned on 8-byte boundaries. Problem was tracked to SkChunkAlloc which
produces pointers aligned on 4-byte boundaries leading to misalignment.
This change makes sure that SkChunkAlloc will produce pointers that are
aligned to 8 bytes.
Appropriate tests are added to tests/MemsetTest.cpp
TEST=Build Chromium with Clang and run on MIPS32r2 platform
TEST=./out/Debug/dm --match Memset
BUG=130022
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1849183004
Review-Url: https://codereview.chromium.org/1849183004