Tidy up a little while I'm in here:
1) SIMD headers are now included by SkTypes.h as appropriate.
2) _mm_cvtss_f32() is pithier and generates the same code.
Looks like this is the only code checking for SSE wrong. After this CL:
~/skia (sse) $ git grep __SSE
include/core/SkPreConfig.h: #if defined(__SSE4_2__)
include/core/SkPreConfig.h: #elif defined(__SSE4_1__)
include/core/SkPreConfig.h: #elif defined(__SSE3__)
include/core/SkPreConfig.h: #elif defined(__SSE2__)
every other check is in SkPreConfig.h where it belongs.
This is going to affect some GMs subtly on Windows.
BUG=chromium:511458
No public API changes.
TBR=reed@google.com
Review URL: https://codereview.chromium.org/1248503004
This is split off of https://codereview.chromium.org/1225923010/ (Start tightening correspondence betweeen GrDrawContext and GrRenderTarget). It:
fixes some style nits
replaces some passing of GrContext with GrTextureProvider & GrDrawContext
does a bit of the finer grained creation of GrDrawContexts
Review URL: https://codereview.chromium.org/1245183002
This allows codecs that support subsets natively (i.e. WEBP) to do so.
Add a field on SkCodec::Options representing the subset.
Add a method on SkCodec to find a valid subset which approximately
matches a desired subset.
Implement subset decodes in SkWebpCodec.
Add a test in DM for decoding subsets.
Notice that we only start on even boundaries. This is due to the
way libwebp's API works. SkWEBPImageDecoder does not take this into
account, which results in visual artifacts.
FIXME: Subsets with scaling are not pixel identical, but close. (This
may be fine, though - they are not perceptually different. We'll just
need to mark another set of images in gold as valid, once
https://skbug.com/4038 is fixed, so we can tests scaled webp without
generating new images on each run.)
Review URL: https://codereview.chromium.org/1240143002
Visual Studio 2015 has additional warnings around noexcept and
disabling exceptions, which can be worked around with the
(undocumented) _HAS_EXCEPTIONS macro.
Visual Studio 2013 and 2015 have roundf in math.h, so use it to
avoid extra work and casts.
We avoid using cmath, as it undefs isfinite on gcc, but Visual Studio
2015 no longer provides overloads of copysign from math.h (which is
actually correct). As a result, use copysignf (which is available in
math.h in 2013 and 2015) directly.
Review URL: https://codereview.chromium.org/1244173005
This deduplicates a few pieces of code:
- we end up with one copy of each xfer32() driver loop instead of one per xfermode;
- we end up with two* copies of each xfermode implementation instead of ten**.
* For a given Mode: Mode() itself and xfer_aa<Mode>().
** From unrolling: twice at a stride of 8, once at 4, once at 2, and once at 1, then all again for when we have AA.
This decreases the size of SkXfermode.o from 1.5M to 620K on x86-64 and from 1.3M to 680K on ARMv7+NEON.
If we wanted to, we could eliminate the xfer_aa<Mode>() copy by tagging each Mode() function as __attribute__((noinline)) or its equivalent. This would result in another ~100K space savings.
Performance is affected in proportion to the original xfermode speed:
fast modes like Plus take the largest proportional hit, and slow modes
like HardLight or SoftLight see essentially no hit at all.
This adds SK_VECTORCALL to help keep this code fast on ARMv7 and Windows. I've looked at the ARMv7 generated code... it looks good, even pretty.
For compatibility with SK_VECTORCALL, we now pass the vector-sized arguments by value instead of by reference. Some refactoring now allows us to declare each mode as just a static function instead of a struct, which simplifies things.
TBR=reed@google.com
No public API changes.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/e617e1525916d7ee684142728c0905828caf49da
CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm7-Debug-Android_NoNeon-Trybot
Review URL: https://codereview.chromium.org/1242743004
Reason for revert:
http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-Arm7-Debug-Android_NoNeon/builds/1168/steps/build%20most/logs/stdio
Original issue's description:
> De-templatize Sk4pxXfermode code a bit.
>
> This deduplicates a few pieces of code:
> - we end up with one copy of each xfer32() driver loop instead of one per xfermode;
> - we end up with two* copies of each xfermode implementation instead of ten**.
>
> * For a given Mode: Mode() itself and xfer_aa<Mode>().
> ** From unrolling: twice at a stride of 8, once at 4, once at 2, and once at 1, then all again for when we have AA.
>
> This decreases the size of SkXfermode.o from 1.5M to 620K on x86-64 and from 1.3M to 680K on ARMv7+NEON.
>
> If we wanted to, we could eliminate the xfer_aa<Mode>() copy by tagging each Mode() function as __attribute__((noinline)) or its equivalent. This would result in another ~100K space savings.
>
> Performance is affected in proportion to the original xfermode speed:
> fast modes like Plus take the largest proportional hit, and slow modes
> like HardLight or SoftLight see essentially no hit at all.
>
> This adds SK_VECTORCALL to help keep this code fast on ARMv7 and Windows. I've looked at the ARMv7 generated code... it looks good, even pretty.
>
> For compatibility with SK_VECTORCALL, we now pass the vector-sized arguments by value instead of by reference. Some refactoring now allows us to declare each mode as just a static function instead of a struct, which simplifies things.
>
> TBR=reed@google.com
> No public API changes.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/e617e1525916d7ee684142728c0905828caf49daTBR=msarett@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:
Review URL: https://codereview.chromium.org/1245273005
This deduplicates a few pieces of code:
- we end up with one copy of each xfer32() driver loop instead of one per xfermode;
- we end up with two* copies of each xfermode implementation instead of ten**.
* For a given Mode: Mode() itself and xfer_aa<Mode>().
** From unrolling: twice at a stride of 8, once at 4, once at 2, and once at 1, then all again for when we have AA.
This decreases the size of SkXfermode.o from 1.5M to 620K on x86-64 and from 1.3M to 680K on ARMv7+NEON.
If we wanted to, we could eliminate the xfer_aa<Mode>() copy by tagging each Mode() function as __attribute__((noinline)) or its equivalent. This would result in another ~100K space savings.
Performance is affected in proportion to the original xfermode speed:
fast modes like Plus take the largest proportional hit, and slow modes
like HardLight or SoftLight see essentially no hit at all.
This adds SK_VECTORCALL to help keep this code fast on ARMv7 and Windows. I've looked at the ARMv7 generated code... it looks good, even pretty.
For compatibility with SK_VECTORCALL, we now pass the vector-sized arguments by value instead of by reference. Some refactoring now allows us to declare each mode as just a static function instead of a struct, which simplifies things.
TBR=reed@google.com
No public API changes.
BUG=skia:
Review URL: https://codereview.chromium.org/1242743004
This change is motivated by a recent switch in how chromium handles
<video> color spaces, making rec709 more commonly used. This will
allow video -> canvas copies to take the fast GPU path when we're using
709, just as we do with 601 and jpeg.
Chromium-side change: https://codereview.chromium.org/1236313002
Review URL: https://codereview.chromium.org/1241723005
Motivation:
- perf win for clients that overwrite the surface after a snapshot.
- may allow us to eliminate SkDeferredCanvas, as this was its primary advantage.
BUG=skia:
Review URL: https://codereview.chromium.org/1236023004
The new test is disabled by default, as it's quite slow.
We can run it if we suspect problems by passing -x to DM.
This test would have been failing before the bug fix, and now is passing.
Assuming the Priv on the end means it's not considered public API...
TBR=reed@google.com
BUG=skia:4052
Review URL: https://codereview.chromium.org/1228333003
and are treated as convex when they are not.
Allow the SkPath::Iter to leave degenerate path
segments unmolested by passing an additional exact
bool to next().
Treat any non-zero length as significant in addPt().
R=reed@google.com,robertphillips@google.com
BUG=493450
Review URL: https://codereview.chromium.org/1228383002
Implement support for path rendering in Chromium through
CHROMIUM_path_rendering pseudo extension.
The extension defines a new pseudo-gl function,
BindFragmentInputLocation. This behaves similarly to the
BindUniformLocation pseudo-gl function. The idea is to assign fragment
input location to a fragment input before linking the program.
BUG=chromium:344330
Committed: https://skia.googlesource.com/skia/+/eeef46d181f9f8db388ecea81df699fc1b3c9280
Review URL: https://codereview.chromium.org/1192663002
(and a couple presubmit fixes)
This allows us to turn back on -Werror for LLVM coverage builds,
and more generally supports building with Clang 3.7.
No public API changes.
TBR=reed@google.com
BUG=skia:
Review URL: https://codereview.chromium.org/1232463006
Make getScanlineDecoder return a new object each time, which is
owned by the caller, and independent from any existing scanline
decoders and the SkCodec itself.
Since the SkCodec already contains the entire state machine, and it
is used by the scanline decoders, simply create a new SkCodec which
is now owned by the scanline decoder.
Move code that cleans up after using a scanline decoder into its
destructor
One side effect is that creating the first scanline decoder requires
a duplication of the stream and re-reading the header. (With some
more complexity/changes, we could pass the state machine to the
scanline decoder and make the SkCodec recreate its own state machine
instead.) The typical client of the scanline decoder (region decoder)
uses an SkMemoryStream, so the duplication is cheap, although we
should consider the extra time to reread the header/recreate the state
machine. (If/when we use the scanline decoder for other purposes,
where the stream may not be cheaply duplicated, we should consider
passing the state machine.)
One (intended) result of this change is that a client can create a
new scanline decoder in a new thread, and decode different pieces of
the image simultaneously.
In SkPngCodec::decodePalette, use fBitDepth rather than a parameter.
Review URL: https://codereview.chromium.org/1230033004
The error log is as follows:
../../third_party/skia/include/core/SkSpinlock.h:24: error: undefined reference to 'SkPODSpinlock::contendedAcquire()'
collect2: error: ld returned 1 exit status
[mtklein added below here]
Despite the presence in include/ and the added SK_API, this file is not part of Skia's public API... it's just used by files which are.
TBR=reed@google.com
Review URL: https://codereview.chromium.org/1229003004
The proportion of time spent doing useful work is well over 99% in acquire(),
so outlining it doesn't hurt speed at all, and makes it much easier to pick out
on a profile.
It'd be about 50/50 work/overhead if we outlined the extremely-cheap release().
I also tried outlining some SkRefCnt methods with similar mixed results.
BUG=skia:
No public API changes.
TBR=reed@google.com
Review URL: https://codereview.chromium.org/1212253013
Follow up to the split between SkImageGenerator and SkCodec. Now that
SkCodec does not inherit from SkImageGenerator, SkImageGenerator no
longer needs Options or Result, which were added for SkCodec. Remove
them, but keep them behind a flag, since Chromium has its own
subclasses of SkImageGenerator which assume the old signature for
onGetPixels.
Review URL: https://codereview.chromium.org/1226023003
SkImageGenerator makes some assumptions that are not necessarily valid
for SkCodec. For example, SkCodec does not assume that it can always be
rewound.
We also have an ongoing question of what an SkCodec should report as
its default settings (i.e. the return from getInfo). It makes sense for
an SkCodec to report that its pixels are unpremultiplied, if that is
the case for the underlying data, but if a client of SkImageGenerator
uses the default settings (as many do), they will receive
unpremultiplied pixels which cannot (currently) be drawn with Skia. We
may ultimately decide to revisit SkCodec reporting an SkImageInfo, but
I have left it unchanged for now.
Import features of SkImageGenerator used by SkCodec into SkCodec.
I have left SkImageGenerator unchanged for now, but it no longer needs
Result or Options. This will require changes to Chromium.
Manually handle the lifetime of fScanlineDecoder, so SkScanlineDecoder.h
can include SkCodec.h (where Result is), and SkCodec.h does not need
to include it (to delete fScanlineDecoder).
In many places, make the following simple changes:
- Now include SkScanlineDecoder.h, which is no longer included by
SkCodec.h
- Use the enums in SkCodec, rather than SkImageGenerator
- Stop including SkImageGenerator.h where no longer needed
Review URL: https://codereview.chromium.org/1220733013
For some users of SkPictureRecorder, the cull rect is more efficiently
determined while drawing is in progress, rather than when recording starts.
The existing API requires the cull rect at start time, even though the
information is not used for any culling purpose until the end of recording.
This patch provides a means to reset the cull rect when recording ends,
allowing users to update the rect based on information learned during
drawing and for the new rect to be used as the culling bound. A valid
bound is still required on the beginRecording call because
it sizes the underlying canvas and sets the aspect ratio for any bounding
box hierarchy. The bounding box factory can also be specified and parameters
that control SkPicture creation.
R=mtklein, reed1
BUG=skia:3919
Review URL: https://codereview.chromium.org/1178673007
Reason for revert:
DEPS roll failing
Original issue's description:
> Implement support for CHROMIUM_path_rendering pseudo extension
>
> Implement support for path rendering in Chromium through
> CHROMIUM_path_rendering pseudo extension.
>
> The extension defines a new pseudo-gl function,
> BindFragmentInputLocation. This behaves similarly to the
> BindUniformLocation pseudo-gl function. The idea is to assign fragment
> input location to a fragment input before linking the program.
>
> BUG=chromium:344330
>
> Committed: https://skia.googlesource.com/skia/+/eeef46d181f9f8db388ecea81df699fc1b3c9280TBR=bsalomon@google.com,joshualitt@google.com,kkinnunen@nvidia.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=chromium:344330
Review URL: https://codereview.chromium.org/1223673002
Implement support for path rendering in Chromium through
CHROMIUM_path_rendering pseudo extension.
The extension defines a new pseudo-gl function,
BindFragmentInputLocation. This behaves similarly to the
BindUniformLocation pseudo-gl function. The idea is to assign fragment
input location to a fragment input before linking the program.
BUG=chromium:344330
Review URL: https://codereview.chromium.org/1192663002
Some of this is transitive, like SkRecords.h used by SkMiniRecorder.h
used by (public) SkPictureRecorder.h.
BUG=skia:
Review URL: https://codereview.chromium.org/1217293004
This makes nanobench picture recording benchmarks somewhat useful again,
as opposed to all taking about 5us to run no matter the content.
ATTN Sheriff: this will probably trigger perf.skia.org alerts.
BUG=skia:
Review URL: https://codereview.chromium.org/1219873002
the bottom of the image. In our code before this patch,
this would result in cleanup code in onFinish() never being
called.
We can allow subclasses to take ownership of the
SkScanlineDecoder in order to make sure that it is
finished/deleted before deleting the decode manager.
BUG=skia:
Review URL: https://codereview.chromium.org/1212593003
Fixed-function NVPR codepaths were removed a while ago. Only NVPR API
version 1.3 (PathFragmentInputGen) was left working. Remove
backwards-compatibility code that was left behind.
Remove some NVPR API function typedefs that were left from initial
commits.
Remove PathCoords function pointer from GrGLInterface, it has
never been called and causes problems in the future, since it will
not be implemented in the Chromium pseudo extension.
Avoid failing interface creation even if nvprmsaaXX config is
requested but the driver is not recent enough. The SAN bots have
old driver, but try to run nvprmsaa16 configs. Instead, print
out a warning.
Committed: https://skia.googlesource.com/skia/+/fb8d6884e0e01d0c2f8596adf5af1efb0d08de7e
Committed: https://skia.googlesource.com/skia/+/e35b5d99d8dfcc6b2be844df28cba47436380809
Review URL: https://codereview.chromium.org/1177243004
This patch ensures that when inverting a SkMatrix44, we handle small
floats properly. When inverted these can cause infinite values, but
still evaluate to true in an if condition.
BUG=chromium:498516
Review URL: https://codereview.chromium.org/1209763002
Reason for revert:
Breaks the Ubuntu *SAN bots.
Original issue's description:
> Cleanup legacy NVPR-related definitions
>
> Fixed-function NVPR codepaths were removed a while ago. Only NVPR API
> version 1.3 (PathFragmentInputGen) was left working. Remove
> backwards-compatibility code that was left behind.
>
> Remove some NVPR API function typedefs that were left from initial
> commits.
>
> Remove PathCoords function pointer from GrGLInterface, it has
> never been called and causes problems in the future, since it will
> not be implemented in the Chromium pseudo extension.
>
> Committed: https://skia.googlesource.com/skia/+/fb8d6884e0e01d0c2f8596adf5af1efb0d08de7e
>
> Committed: https://skia.googlesource.com/skia/+/e35b5d99d8dfcc6b2be844df28cba47436380809TBR=joshualitt@google.com,cdalton@nvidia.com,bsalomon@google.com,kkinnunen@nvidia.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
Review URL: https://codereview.chromium.org/1219663005
Fixed-function NVPR codepaths were removed a while ago. Only NVPR API
version 1.3 (PathFragmentInputGen) was left working. Remove
backwards-compatibility code that was left behind.
Remove some NVPR API function typedefs that were left from initial
commits.
Remove PathCoords function pointer from GrGLInterface, it has
never been called and causes problems in the future, since it will
not be implemented in the Chromium pseudo extension.
Committed: https://skia.googlesource.com/skia/+/fb8d6884e0e01d0c2f8596adf5af1efb0d08de7e
Review URL: https://codereview.chromium.org/1177243004
this also exposes nine-patch drawing directly to devices, and creates a shared iterator for unrolling a nine-patch into single rect->rect draws.
BUG=skia:
Review URL: https://codereview.chromium.org/1211583003
Fixed-function NVPR codepaths were removed a while ago. Only NVPR API
version 1.3 (PathFragmentInputGen) was left working. Remove
backwards-compatibility code that was left behind.
Remove some NVPR API function typedefs that were left from initial
commits.
Remove PathCoords function pointer from GrGLInterface, it has
never been called and causes problems in the future, since it will
not be implemented in the Chromium pseudo extension.
Review URL: https://codereview.chromium.org/1177243004
Improves the GPU measuring accuracy of nanobench by using fence syncs.
Fence syncs are very widely supported and available on almost every
platform.
NO_MERGE_BUILDS
BUG=skia:
Review URL: https://codereview.chromium.org/1194783003