Reason for revert:
Appears to have broken Test-Ubuntu-GCC-GCE-CPU-AVX2-x86-Debug
Original issue's description:
> Disable tail calls inside Simple GM functions.
>
> I haven't found any way to turn off the particular optimization (-foptimize-sibling-calls)
> per-function, but I can control optimization settings coarsely:
>
> - on GCC, we can pick a particular -O level, so I've picked -O1 which does not
> enable -foptimize-sibling-calls
> - on Clang, we can only disable all optimization for a function
> - have no idea about MSVC
>
> This should make sure the simple GM functions, e.g. all_bitmap_configs_GM(),
> show up on stack traces when we crash.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2050473006
>
> Committed: https://skia.googlesource.com/skia/+/eee3ced68f787aadc47fa274ca8e13b354ec920aTBR=reed@google.com,mtklein@chromium.org
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86-Debug-Trybot
Review-Url: https://codereview.chromium.org/2051193002
This fixes serialize-8888 for 2 GMs on Mac that I'm now unblacklisting.
I think another was already fixed, and two more were Windows-only.
Seems safe to use encoded_size=1 as another sentinel here (like we already
use =0); I can't imagine any encoded image format that can encode an image
in a single byte.
I suspect this is the root of the referenced bug too,
but this is a good idea even if not.
BUG=chromium:601851
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2039813007
CQ_EXTRA_TRYBOTS=client.skia:Test-Win-MSVC-GCE-CPU-AVX2-x86_64-Release-Trybot,Test-Mac-Clang-MacMini6.2-CPU-AVX-x86_64-Release-Trybot;client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Release-Trybot
Review-Url: https://codereview.chromium.org/2039813007
I haven't found any way to turn off the particular optimization (-foptimize-sibling-calls)
per-function, but I can control optimization settings coarsely:
- on GCC, we can pick a particular -O level, so I've picked -O1 which does not
enable -foptimize-sibling-calls
- on Clang, we can only disable all optimization for a function
- have no idea about MSVC
This should make sure the simple GM functions, e.g. all_bitmap_configs_GM(),
show up on stack traces when we crash.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2050473006
Review-Url: https://codereview.chromium.org/2050473006
SkMipMap only deals with the levels it generates.
That is to day, it deals with mipmap levels 1-x, not 0-x.
Other functions reflect thing when indexing.
They go from 0 to x-1 (giving the index into SkMipMap's contents).
ComputeLevelSize should also follow that same indexing.
BUG=578304
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2042843005
Review-Url: https://codereview.chromium.org/2042843005
Instead of two synchronization systems (in_signal_handler, gMutex),
we can just use one. This simplifies the signal handler logic to:
- first thread through grabs the lock, prints what's running and a stack trace,
then exits
- all other threads just sit waiting on that lock until exit kills them
Previously I think all threads were racing to exit, which can kill the process
before the printing thread is done. That truncated the output, which is dumb.
Plus...
refactor slightly so that crash_handler() shows up at the top of the stack
trace rather than some odd name for a lambda inside setup_crash_handler().
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2051863002
Review-Url: https://codereview.chromium.org/2051863002
Because we recognize commonly used gamma tables and
parameters as 2.2f, about 98% of jpegs with color profiles
will pass through this xform (assuming the dst is also
2.2f). Sample size is 10,322 jpegs.
I won't go crazy with performance numbers because this is
a work in progress, particularly in terms of correctness.
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
Decode Time + QCMS Xform 1.28 ms
QCMS Xform Only 0.495 ms
Decode Time + Skia Opt Xform 1.01 ms
Skia Opt Xform Only 0.235 ms
Decode Time + Xform Speed-up 1.27x
Xform Only Speed-up 2.11x
FWIW, Skia xform time before these optimizations was
41.1 ms. But we expected that code to be slow.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Review-Url: https://codereview.chromium.org/2046013002
Currently this is not actually hooked into the system.
To give some context, in a follow up CL I'll add this to GrDrawTarget.
For this I will move the gpu onDraw command to the GpuCommandBuffer as well.
For GL this will end up just being a pass through to a non virtual draw(...)
on GrGLGpu, and for vulkan it will mostly do what it currently does but
adding commands to the secondary command buffer instead.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2038583002
Review-Url: https://codereview.chromium.org/2038583002
Replaces targetHasUnifiedMultisampling with a simpler "useHWAA". Now
the code that creates a pipeline builder needs to decide on its own
whether it should enable multisampling, rather than relying on the
builder to try and guess.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2041283002
Review-Url: https://codereview.chromium.org/2041283002
Fail out in a couple of new places when the input data is very
large and exceeds the limits of the pathops machinery.
Most of the change here plumbs in a way to exclude an assert in
one of these exceptional cases. The current SkAddIntersection
implementation and the inner functions it calls has no way to
report an error to the root caller for an early exit, so rather
than add that in, exclude the assert when the test that would
trigger it runs (allowing the test to otherwise ensure that it
properly fails).
TBR=reed@google.com
BUG=617586,617635
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046713003
Review-Url: https://codereview.chromium.org/2046713003
$ git grep -l '<windows.h>' include src
include/private/SkLeanWindows.h
$ git grep -l SkLeanWindows.h | grep '\.h$'
include/ports/SkTypeface_win.h
include/utils/win/SkHRESULT.h
include/utils/win/SkTScopedComPtr.h
include/views/SkEvent.h
src/core/SkMathPriv.h
src/ports/SkTypeface_win_dw.h
src/utils/SkThreadUtils_win.h
src/utils/win/SkWGL.h
The same for `#include <intrin.h>` that was found in SkMath.h.
Those functions that needed it are moved to SkMathPriv.h.
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2041943002
CQ_INCLUDE_TRYBOTS=tryserver.chromium.win:win_chromium_compile_dbg_ng,win_chromium_compile_rel_ng
Review-Url: https://codereview.chromium.org/2041943002
Reason for revert:
Appears to have broken the ARMv7 aspect of the Google3 roll in bizarre seemingly-unrelated ways.
Original issue's description:
> Move immintrin/arm_neon includes to where they are used.
>
> On my Mac (so, immintrin), this improves compile time, both wall and cpu,
> by about 16%. To test I ran this on an SSD with files hot in their caches:
>
> $ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
> ninja -C out/Release -t clean && \
> time ninja -C out/Release
>
> Before: 159 wall / 3367 cpu
> 159 wall / 3368 cpu
>
> After: 137 wall / 2860 cpu
> 136 wall / 2863 cpu
>
> I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
> That made no signficant difference, so I've kept immintrin for its simplicity.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> TBR=reed@google.com
> No public API changes.
>
> Committed: https://skia.googlesource.com/skia/+/12dfaaa53c23f3d03050bde8f64136ac1f44164aTBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review-Url: https://codereview.chromium.org/2046213002
Due to performance regression on various GPUs, we're only going to use the
new draw-call based mip-mapper when necessary. Of the bots where we have
test coverage, that means Intel and Mac-NVIDIA. We also had failures on
our AMD 7770 bots - I'm upgrading the drivers on those two machines, and
I'm leaving them out of the whitelist for now.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2042313002
Review-Url: https://codereview.chromium.org/2042313002
Don't add an edge if the bottom vertex was already added, or
if an island vertex has a left poly but no right poly.
(Sorry for the lack of test, but the only reduction I could create was still a huge path and only crashes in Chrome.)
BUG=617907
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2043873005
Review-Url: https://codereview.chromium.org/2043873005
Since we're already approximating the sRGB gamma curve with a sqrt(), we might
as well approximate with it a faster approximate sqrt(). On Intel, this
.rsqrt().invert() version is 2-3x faster than .sqrt() (~3x faster on older
machines, ~2x faster on newer machines).
This should provide ~11 bits of precision, suspiciously exactly enough.
Running dm --config srgb, there are diffs, but none perceptible.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046063002
Review-Url: https://codereview.chromium.org/2046063002
On my Mac (so, immintrin), this improves compile time, both wall and cpu,
by about 16%. To test I ran this on an SSD with files hot in their caches:
$ env CC=/usr/bin/clang CXX=/usr/bin/clang++ ./gyp_skia && \
ninja -C out/Release -t clean && \
time ninja -C out/Release
Before: 159 wall / 3367 cpu
159 wall / 3368 cpu
After: 137 wall / 2860 cpu
136 wall / 2863 cpu
I also tried further refining immintrin down to emmintrin / tmmintrin / smmintrin etc.
That made no signficant difference, so I've kept immintrin for its simplicity.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2045633002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
TBR=reed@google.com
No public API changes.
Review-Url: https://codereview.chromium.org/2045633002
This change is a refactorization to open up more flexable use for render passes
in the future. Once we start bundling draw calls into a single render pass we
will need to start using specific load and store ops instead of the default
currently of load and store everything.
We still will need to simply get a compatible render pass for creation of framebuffers
and pipeline objects. These render passes don't need to know load and store ops. Thus
in this change, I'm defaulting the "compatible" render pass to always be the one
which both loads and stores. All other load/store combinations will be created lazily
when requested for a specific draw (future change).
The CompatibleRPHandle is a way for us to avoid analysing the RenderTarget every time
we want to get a new GrVkRenderPass.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1977403002
Review-Url: https://codereview.chromium.org/1977403002