Commit Graph

53 Commits

Author SHA1 Message Date
Chris Dalton
89c5e8878e Implement Sk2f::floor
Bug: skia:
Change-Id: Id40e7165a338d321df71a1852b48eb2570ecd75b
Reviewed-on: https://skia-review.googlesource.com/133460
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
2018-06-08 18:41:13 +00:00
Chris Dalton
9f2dab0fdd ccpr: Implement conics
TBR=egdaniel@google.com

Bug: skia:
Change-Id: Idf7811dc285961db52db41c9ff145afda40c274d
Reviewed-on: https://skia-review.googlesource.com/122127
Reviewed-by: Chris Dalton <csmartdalton@google.com>
Commit-Queue: Chris Dalton <csmartdalton@google.com>
2018-04-18 20:43:54 +00:00
Chris Dalton
e7fbafe1da Revert "ccpr: Implement conics"
This reverts commit 98b241573e.

Reason for revert: TSAN not happy

Original change's description:
> ccpr: Implement conics
> 
> Bug: skia:
> Change-Id: I4bae8b059072af987abb7b2d9c57fe08f783d680
> Reviewed-on: https://skia-review.googlesource.com/120040
> Commit-Queue: Chris Dalton <csmartdalton@google.com>
> Reviewed-by: Greg Daniel <egdaniel@google.com>

TBR=egdaniel@google.com,bsalomon@google.com,csmartdalton@google.com

Change-Id: Ic29bf660f042c20b7e4492b03400412e378dbb8a
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: skia:
Reviewed-on: https://skia-review.googlesource.com/121717
Reviewed-by: Chris Dalton <csmartdalton@google.com>
Commit-Queue: Chris Dalton <csmartdalton@google.com>
2018-04-16 22:45:43 +00:00
Chris Dalton
98b241573e ccpr: Implement conics
Bug: skia:
Change-Id: I4bae8b059072af987abb7b2d9c57fe08f783d680
Reviewed-on: https://skia-review.googlesource.com/120040
Commit-Queue: Chris Dalton <csmartdalton@google.com>
Reviewed-by: Greg Daniel <egdaniel@google.com>
2018-04-16 21:57:54 +00:00
Chris Dalton
21f6437764 Implement Sk2f Load2
Bug: skia:
Change-Id: I7d37a76bcb9df9c5a1c22eb1b0277387816df7bb
Reviewed-on: https://skia-review.googlesource.com/120602
Reviewed-by: Mike Klein <mtklein@google.com>
Commit-Queue: Chris Dalton <csmartdalton@google.com>
2018-04-12 14:49:00 +00:00
Chris Dalton
e3fda9310a Implement Sk4f min/max
Bug: skia:
Change-Id: Icf235dea81e9f125c1c8590ec87cb3591393036c
Reviewed-on: https://skia-review.googlesource.com/120281
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
2018-04-12 14:29:00 +00:00
Chris Dalton
42f02aa4e3 Implement Sk2f::Store2
Bug: skia:
Change-Id: Ifea5957458e5547ee428809d9599286e70f3f8f9
Reviewed-on: https://skia-review.googlesource.com/119860
Commit-Queue: Mike Klein <mtklein@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
2018-04-09 11:43:04 +00:00
Mike Klein
68ff92f78a specialize arm64 allTrue()/anyTrue()
aarch64 added vector-wise add/mul/min/max instructions.
We can use min and max to implement allTrue() and anyTrue(),
respectively.

(This CL is mostly so I don't forget these intrinsics exist.)

In assembly, these actually compile to two instructions,
the folding operation into a vector register, then a move
from the vector register to a general purpose register.

Change-Id: Ia6a999ac250740de765e871094e911979a8711c7
Reviewed-on: https://skia-review.googlesource.com/116482
Reviewed-by: Chris Dalton <csmartdalton@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
2018-03-26 18:48:23 +00:00
Chris Dalton
ae18d07a7d Revert "Implement Sk2f::Store2"
This reverts commit 8a8a8e9dd5.

Reason for revert: Needs non-SIMD impl

Original change's description:
> Implement Sk2f::Store2
> 
> Bug: skia:
> Change-Id: Ieedd05ced376a7604936e9d2729fc20a8669496e
> Reviewed-on: https://skia-review.googlesource.com/115531
> Commit-Queue: Chris Dalton <csmartdalton@google.com>
> Reviewed-by: Mike Klein <mtklein@google.com>

TBR=mtklein@google.com,csmartdalton@google.com

Change-Id: I8dfbd87c5871b041a4fc6ef3816f121c72083a20
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: skia:
Reviewed-on: https://skia-review.googlesource.com/116240
Reviewed-by: Chris Dalton <csmartdalton@google.com>
Commit-Queue: Chris Dalton <csmartdalton@google.com>
2018-03-23 21:18:01 +00:00
Chris Dalton
8a8a8e9dd5 Implement Sk2f::Store2
Bug: skia:
Change-Id: Ieedd05ced376a7604936e9d2729fc20a8669496e
Reviewed-on: https://skia-review.googlesource.com/115531
Commit-Queue: Chris Dalton <csmartdalton@google.com>
Reviewed-by: Mike Klein <mtklein@google.com>
2018-03-23 19:28:30 +00:00
Chris Dalton
6f8fa4e6ca Implement Sk2f::Store4
Bug: skia:
Change-Id: I2adb983d68625d327e7c00e53b6ae4703b46252f
Reviewed-on: https://skia-review.googlesource.com/104761
Commit-Queue: Chris Dalton <csmartdalton@google.com>
Reviewed-by: Mike Klein <mtklein@chromium.org>
2018-02-07 05:06:15 +00:00
Chris Dalton
0cb75879a5 Add Store3 to Sk2f
Bug: skia:
Change-Id: I0377e6a1dd8259e944f7902a5c68af524fa588c7
Reviewed-on: https://skia-review.googlesource.com/79382
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Chris Dalton <csmartdalton@google.com>
2017-12-01 21:12:49 +00:00
Mike Klein
213d821462 add Load2() to Sk4f
and test it.

Change-Id: Ib0c2cf93c63d8d3c36a7d4d60bbec4ecede29bc7
Reviewed-on: https://skia-review.googlesource.com/78480
Reviewed-by: Chris Dalton <csmartdalton@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
2017-12-01 19:26:02 +00:00
Herb Derby
5eb1528234 Add mulHi to SkNx
Add mulHi to base SkNx, and specialize implementations for Sk4u for
neon and sse.

Add casts for converting from uint8_t by 4 to uint32_t by 4.

Cq-Include-Trybots: skia.primary:Test-Debian9-Clang-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Change-Id: I29a32e2ad9812a47fff841ceca334e562362836f
Reviewed-on: https://skia-review.googlesource.com/57960
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Herb Derby <herb@google.com>
2017-10-11 08:21:09 +00:00
Chris Dalton
7732f4f8f2 Add missing methods to neon/sse SkNx implementations
Adds negate, abs, sqrt to Sk2f and/or Sk4f.

Bug: skia:
Change-Id: I0688dae45b32ff94abcc0525ef1f09d666f9c6e9
Reviewed-on: https://skia-review.googlesource.com/39642
Reviewed-by: Mike Klein <mtklein@chromium.org>
Commit-Queue: Chris Dalton <csmartdalton@google.com>
2017-08-28 21:21:36 +00:00
Yuqian Li
7da6ba2d63 Implement Sk4i's abs, min, max
CQ_INCLUDE_TRYBOTS=skia.primary:Test-Debian9-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD
Bug: skia:
Change-Id: Ia9ec3f72095e1c744f88df7bb990d99e0f87d578
Reviewed-on: https://skia-review.googlesource.com/22720
Commit-Queue: Yuqian Li <liyuqian@google.com>
Reviewed-by: Herb Derby <herb@google.com>
2017-07-12 20:20:43 +00:00
Mike Klein
33cbfd75af Make load4 and store4 part of SkNx properly.
Every type now nominally has Load4() and Store4() methods.
The ones that we use are implemented.

BUG=skia:

GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=3046
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Change-Id: I7984f0c2063ef8acbc322bd2e968f8f7eaa0d8fd
Reviewed-on: https://skia-review.googlesource.com/3046
Reviewed-by: Matt Sarett <msarett@google.com>
Commit-Queue: Mike Klein <mtklein@chromium.org>
2016-10-06 17:13:59 +00:00
msarett
c0444615ed Support Float32 output from SkColorSpaceXform
* Adds Float32 support to SkColorSpaceXform
* Changes API to allows clients to ask for F32, updates clients to
  new API
* Adds Sk4f_load4 and Sk4f_store4 to SkNx
* Make use of new xform in SkGr.cpp

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47
Review-Url: https://codereview.chromium.org/2339233003
2016-09-16 11:45:59 -07:00
msarett
c71a9b7f53 Revert of Support Float32 output from SkColorSpaceXform (patchset #7 id:140001 of https://codereview.chromium.org/2339233003/ )
Reason for revert:
Hitting an assert

Original issue's description:
> Support Float32 output from SkColorSpaceXform
>
> * Adds Float32 support to SkColorSpaceXform
> * Changes API to allows clients to ask for F32, updates clients to
>   new API
> * Adds Sk4f_load4 and Sk4f_store4 to SkNx
> * Make use of new xform in SkGr.cpp
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
> CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/43d6651111374b5d1e4ddd9030dcf079b448ec47

TBR=brianosman@google.com,mtklein@google.com,scroggo@google.com,mtklein@chromium.org,bsalomon@google.com
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review-Url: https://codereview.chromium.org/2347473007
2016-09-16 11:01:27 -07:00
msarett
43d6651111 Support Float32 output from SkColorSpaceXform
* Adds Float32 support to SkColorSpaceXform
* Changes API to allows clients to ask for F32, updates clients to
  new API
* Adds Sk4f_load4 and Sk4f_store4 to SkNx
* Make use of new xform in SkGr.cpp

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2339233003
CQ_INCLUDE_TRYBOTS=master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2339233003
2016-09-16 09:51:12 -07:00
mtklein
58e389b051 Expand _01 half<->float limitation to _finite. Simplify.
It's become clear we need to sometimes deal with values <0 or >1.
    I'm not yet convinced we care about NaN or +-inf.

    We had some fairly clever tricks and optimizations here for NEON
    and SSE.  I've thrown them out in favor of a single implementation.
    If we find the specializations mattered, we can certainly figure out
    how to extend them to this new range/domain.

    This happens to add a vectorized float -> half for ARMv7, which was
    missing from the _01 version.  (The SSE strategy was not portable to
    platforms that flush denorm floats to zero.)

    I've tested the full float range for FloatToHalf on my desktop and a 5x.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot

Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
Review-Url: https://codereview.chromium.org/2145663003
2016-07-15 07:00:11 -07:00
mtklein
64bbad360f Revert of Expand _01 half<->float limitation to _finite. Simplify. (patchset #7 id:120001 of https://codereview.chromium.org/2145663003/ )
Reason for revert:
Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast

Original issue's description:
> Expand _01 half<->float limitation to _finite.  Simplify.
>
>     It's become clear we need to sometimes deal with values <0 or >1.
>     I'm not yet convinced we care about NaN or +-inf.
>
>     We had some fairly clever tricks and optimizations here for NEON
>     and SSE.  I've thrown them out in favor of a single implementation.
>     If we find the specializations mattered, we can certainly figure out
>     how to extend them to this new range/domain.
>
>     This happens to add a vectorized float -> half for ARMv7, which was
>     missing from the _01 version.  (The SSE strategy was not portable to
>     platforms that flush denorm floats to zero.)
>
>     I've tested the full float range for FloatToHalf on my desktop and a 5x.
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
> CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90

TBR=msarett@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review-Url: https://codereview.chromium.org/2151023003
2016-07-14 12:03:04 -07:00
mtklein
3296bee70d Expand _01 half<->float limitation to _finite. Simplify.
It's become clear we need to sometimes deal with values <0 or >1.
    I'm not yet convinced we care about NaN or +-inf.

    We had some fairly clever tricks and optimizations here for NEON
    and SSE.  I've thrown them out in favor of a single implementation.
    If we find the specializations mattered, we can certainly figure out
    how to extend them to this new range/domain.

    This happens to add a vectorized float -> half for ARMv7, which was
    missing from the _01 version.  (The SSE strategy was not portable to
    platforms that flush denorm floats to zero.)

    I've tested the full float range for FloatToHalf on my desktop and a 5x.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review-Url: https://codereview.chromium.org/2145663003
2016-07-14 11:02:09 -07:00
mtklein
f8f90e4a85 SkNx refresh
- rearrange a bit
   - fewer macros
   - hooks for all operators
   - add left and right scalar operator overrides
   - add +=, &=, <<=, etc.
   - add SkNx_split() and SkNx_join()
   - simplify the many rsqrt() and invert() options to just what we actually use

This refactoring pointed out that our float <-> int NEON conversions are not specialized, so I've implemented them.  It seems nice that this is an error rather than silently falling back to serial code.

It's unclear to me if split/join want to be external, static methods, or non-static methods (SkNx_join(), Sk4f::Join(), x.join()).  Time will tell?

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1812233003
CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot;client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1812233003
2016-03-21 10:04:46 -07:00
mtklein
7c249e5319 SkNx: kth<...>() -> [...]
Just some syntax cleanup.  No real change: kth<...>() was calling [...] already.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1714363002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1714363002
2016-02-21 10:54:19 -08:00
mtklein
0cf795fd11 fast sk4f <-> sk4i SSE methods
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1707673002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1707673002
2016-02-17 07:23:36 -08:00
mtklein
126626e523 Sk4f: add floor()
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614

CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus7-GPU-Tegra3-Arm7-Release-Trybot,Test-Android-GCC-Nexus9-GPU-TegraK1-Arm64-Release-Trybot

Review URL: https://codereview.chromium.org/1685773002
2016-02-09 15:41:36 -08:00
mtklein
c023210893 Revert of Sk4f: add floor() (patchset #3 id:40001 of https://codereview.chromium.org/1685773002/ )
Reason for revert:
build break must be this, right?

Original issue's description:
> Sk4f: add floor()
>
> BUG=skia:
> GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
> CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/86c6c4935171a1d2d6a9ffbff37ec6dac1326614

TBR=herb@google.com,mtklein@chromium.org
# Skipping CQ checks because original CL landed less than 1 days ago.
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1679343004
2016-02-09 15:10:56 -08:00
mtklein
86c6c49351 Sk4f: add floor()
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685773002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1685773002
2016-02-09 13:46:49 -08:00
mtklein
e4c0beed74 sknx refactoring
- trim unused specializations (Sk4i, Sk2d) and apis (SkNx_dup)
  - expand apis a little
    * v[0] == v.kth<0>()
    * SkNx_shuffle can now convert to different-sized vectors, e.g. Sk2f <-> Sk4f
  - remove anonymous namespace

I believe it's safe to remove the anonymous namespace right now.
We're worried about violating the One Definition Rule; the anonymous namespace protected us from that.

In Release builds, this is mostly moot, as everything tends to inline completely.
In Debug builds, violating the ODR is at worst an inconvenience, time spent trying to figure out why the bot is broken.

Now that we're building with SSE2/NEON everywhere, very few bots have even a chance about getting confused by two definitions of the same type or function.  Where we do compile variants depending on, e.g., SSSE3, we do so in static inline functions.  These are not subject to the ODR.

I plan to follow up with a tedious .kth<...>() -> [...] auto-replace.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1683543002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1683543002
2016-02-09 10:35:28 -08:00
mtklein
629f25a7ef fix float <---> uint16_t conversion, with Mike's tests.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1679713002

patch from issue 1679713002 at patchset 1 (http://crrev.com/1679713002#ps1)
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1676893002
2016-02-08 05:54:38 -08:00
mtklein
c33065a93a add SkNx::abs(), for now only implemented for Sk4f
There's no reason we couldn't implement this for all ints and floats;
just don't want to land unused code.

BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1590843003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1590843003
2016-01-15 12:16:40 -08:00
mtklein
6f37b4a475 Unify some SkNx code
- one base case and one N=1 case instead of two each (or three with doubles)
 - use SkNx_cast instead of FromBytes/toBytes
 - 4-at-a-time Sk4f::ToBytes becomes a special standalone Sk4f_ToBytes

If I did everything right, this'll be perf- and pixel- neutral.

https://gold.skia.org/search2?issue=1526523003&unt=true&query=source_type%3Dgm&master=false

BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1526523003
2015-12-14 11:25:18 -08:00
mtklein
6c221b4068 Add SkNx_cast().
SkNx_cast() can cast between any of our vector types,
provided they have the same number of elements.

Any types should work with the default implementation,
and we can drop in specializations as needed, like the
SSE and NEON Sk4f -> Sk4i I included here as an example.

To make this work, I made some internal name changes:
    SkNi<N,T> -> SkNx<N, T>
    SkNf<N>   -> SkNx<N, float>
User aliases (Sk4f, Sk16b, etc.) stay the same.
We can land this first (it's PS1) if that makes things easier.

BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1464623002
2015-11-20 13:53:19 -08:00
mtklein
6f797092d2 prune unused SkNx features
- remove float -> int conversion, keeping float -> byte
  - remove support for doubles

I was thinking of specializing Sk8f for AVX.  This will help keep the complexity down.

This may cause minor diffs in radial gradients: toBytes() rounds where castTrunc() truncated.  But I don't see any diffs in Gold.
https://gold.skia.org/search2?issue=1411563008&unt=true&query=source_type%3Dgm&master=false

BUG=skia:4117
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot

Review URL: https://codereview.chromium.org/1411563008
2015-11-09 08:33:53 -08:00
mtklein
a508f3c62d Require Sk4f::toBytes() clamps
BUG=skia:4117

CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.android:Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Release-Trybot

Review URL: https://codereview.chromium.org/1312053004
2015-09-01 06:29:45 -07:00
mtklein
4be181e304 3-15% speedup to HardLight / Overlay xfermodes.
While investigating my bug (skia:4052) I saw this TODO and figured
it'd make me feel better about an otherwise unsuccessful investigation.

This speeds up HardLight and Overlay (same code) by about 15% with SSE, mostly
by rewriting the logic from 1 cheap comparison and 2 expensive div255() calls
to 2 cheap comparisons and 1 expensive div255().

NEON speeds up by a more modest ~3%.

BUG=skia:

Review URL: https://codereview.chromium.org/1230663005
2015-07-14 10:54:19 -07:00
mtklein
e20633ed26 Add a GM that reproduces layout test failures with my new xfermode code.
Inspired by https://storage.googleapis.com/chromium-layout-test-archives/linux_blink_rel/69169/layout-test-results/results.html

I think the root cause is overflow.

Also, adds tests for Sk16b::operator<().  It wasn't wrong, but it was suspect
(used in all three of these xfermode implementations) and so it's best to have
tests.

BUG=skia:

Review URL: https://codereview.chromium.org/1228393006
2015-07-13 12:06:33 -07:00
mtklein
059ac00446 Update some Sk4px APIs.
Mostly this is about ergonomics, making it easier to do good operations and hard / impossible to do bad ones.

- SkAlpha / SkPMColor constructors become static factories.
- Remove div255TruncNarrow(), rename div255RoundNarrow() to div255().  In practice we always want to round, and the narrowing to 8-bit is contextually obvious.
- Rename fastMulDiv255Round() approxMulDiv255() to stress it's approximate-ness over its speed.  Drop Round for the same reason as above... we should always round.
- Add operator overloads so we don't have to keep throwing in seemingly-random Sk4px() or Sk4px::Wide() casts.
- use operator*() for 8-bit x 8-bit -> 16-bit math.  It's always what we want, and there's generally no 8x8->8 alternative.
- MapFoo can take a const Func&.  Don't think it makes a big difference, but nice to do.

BUG=skia:

Review URL: https://codereview.chromium.org/1202013002
2015-06-22 10:39:38 -07:00
mtklein
3747875341 Thorough tests for saturatedAdd and mulDiv255Round.
BUG=skia:3951

Committed: https://skia.googlesource.com/skia/+/ce9d11189a5924b47c3629063b72bae9d466c2c7

CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Release-Trybot

Review URL: https://codereview.chromium.org/1184113003
2015-06-15 10:58:43 -07:00
mtklein
b1bbca8049 Revert of Thorough tests for saturatedAdd and mulDiv255Round. (patchset #1 id:1 of https://codereview.chromium.org/1184113003/)
Reason for revert:
https://code.google.com/p/skia/issues/detail?id=3951

Original issue's description:
> Thorough tests for saturatedAdd and mulDiv255Round.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/ce9d11189a5924b47c3629063b72bae9d466c2c7

TBR=fmalita@chromium.org,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1177123004
2015-06-15 10:03:41 -07:00
mtklein
ce9d11189a Thorough tests for saturatedAdd and mulDiv255Round.
BUG=skia:

Review URL: https://codereview.chromium.org/1184113003
2015-06-15 08:58:22 -07:00
mtklein
f2fe0e0320 Remove overly-promiscuous SkNx syntax sugar.
I haven't figured out a pithy way to have these apply to only classes
originating from SkNx, so let's just remove them.  There aren't too
many use cases, and it's not really any less readable without them.

Semantically, this is a no-op.

BUG=skia:

Review URL: https://codereview.chromium.org/1167153002
2015-06-10 08:57:28 -07:00
mtklein
27e517ae53 add Min to SkNi, specialized for u8 and u16 on SSE and NEON
0x8001 / 0x7fff don't seem to work, but we were close: 0x8000 does.

I plan to use this to implement the Difference xfermode,
and it seems generally handy.

BUG=skia:

Review URL: https://codereview.chromium.org/1133933004
2015-05-14 17:53:04 -07:00
mtklein
d7c014ff03 Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM
This is a logical no-op.  Everything was using the equivalent of rsqrt1() before, and is now after.

BUG=skia:

Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot

Review URL: https://codereview.chromium.org/1109913002
2015-04-27 14:22:32 -07:00
mtklein
9a22f489e8 Revert of Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM (patchset #2 id:20001 of https://codereview.chromium.org/1109913002/)
Reason for revert:
arm64 typos

Original issue's description:
> Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM
>
> This is a logical no-op.  Everything was using the equivalent of rsqrt1() before, and is now after.
>
> BUG=skia:
>
> Committed: https://skia.googlesource.com/skia/+/9de16283fdc8cc0d31a84f503578d0ecea4e8297

TBR=reed@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1105233003
2015-04-27 13:55:53 -07:00
mtklein
9de16283fd Split rsqrt into rsqrt{0,1,2}, with increasing cost and precision on ARM
This is a logical no-op.  Everything was using the equivalent of rsqrt1() before, and is now after.

BUG=skia:

Review URL: https://codereview.chromium.org/1109913002
2015-04-27 13:51:28 -07:00
mtklein
1113da72ec Mike's radial gradient CL with better float -> int.
patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001)

This looks quite launchable.  radial_gradient3, min of 100 samples:
  N5:  985µs -> 946µs
  MBP: 395µs -> 279µs

On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time.  Is that something we could do in float math rather than with a lookup table?

BUG=skia:

CQ_EXTRA_TRYBOTS=client.skia.compile:Build-Mac10.8-Clang-Arm7-Debug-Android-Trybot,Build-Ubuntu-GCC-Arm7-Release-Android_NoNeon-Trybot

Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec

Review URL: https://codereview.chromium.org/1109643002
2015-04-27 12:08:01 -07:00
mtklein
8d3e9dff3f Revert of Mike's radial gradient CL with better float -> int. (patchset #7 id:120001 of https://codereview.chromium.org/1109643002/)
Reason for revert:
compile failures.

Original issue's description:
> Mike's radial gradient CL with better float -> int.
>
> patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001)
>
> This looks quite launchable.  radial_gradient3, min of 100 samples:
>   N5:  985µs -> 946µs
>   MBP: 395µs -> 279µs
>
> On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time.  Is that something we could do in float math rather than with a lookup table?
>
> BUG=skia:
>
> CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot
>
> Committed: https://skia.googlesource.com/skia/+/abf6c5cf95e921fae59efb487480e5b5081cf0ec

TBR=reed@google.com,robertphillips@google.com,mtklein@chromium.org
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=skia:

Review URL: https://codereview.chromium.org/1109883003
2015-04-27 11:21:16 -07:00
mtklein
abf6c5cf95 Mike's radial gradient CL with better float -> int.
patch from issue 1072303005 at patchset 40001 (http://crrev.com/1072303005#ps40001)

This looks quite launchable.  radial_gradient3, min of 100 samples:
  N5:  985µs -> 946µs
  MBP: 395µs -> 279µs

On my MBP, most of the meat looks like it's now in reading the cache and writing to dst one color at a time.  Is that something we could do in float math rather than with a lookup table?

BUG=skia:

CQ_EXTRA_TRYBOTS=client.skia.android:Test-Android-GCC-Nexus5-CPU-NEON-Arm7-Debug-Trybot,Test-Android-GCC-Nexus9-CPU-Denver-Arm64-Debug-Trybot

Review URL: https://codereview.chromium.org/1109643002
2015-04-27 11:13:53 -07:00