Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
/*
|
|
|
|
* Copyright 2015 Google Inc.
|
|
|
|
*
|
|
|
|
* Use of this source code is governed by a BSD-style license that can be
|
|
|
|
* found in the LICENSE file.
|
|
|
|
*/
|
|
|
|
|
|
|
|
#ifndef SkNx_neon_DEFINED
|
|
|
|
#define SkNx_neon_DEFINED
|
|
|
|
|
|
|
|
#include <arm_neon.h>
|
|
|
|
|
2015-05-12 13:11:21 +00:00
|
|
|
// Well, this is absurd. The shifts require compile-time constant arguments.
|
|
|
|
|
|
|
|
#define SHIFT8(op, v, bits) switch(bits) { \
|
|
|
|
case 1: return op(v, 1); case 2: return op(v, 2); case 3: return op(v, 3); \
|
|
|
|
case 4: return op(v, 4); case 5: return op(v, 5); case 6: return op(v, 6); \
|
|
|
|
case 7: return op(v, 7); \
|
|
|
|
} return fVec
|
|
|
|
|
|
|
|
#define SHIFT16(op, v, bits) if (bits < 8) { SHIFT8(op, v, bits); } switch(bits) { \
|
|
|
|
case 8: return op(v, 8); case 9: return op(v, 9); \
|
|
|
|
case 10: return op(v, 10); case 11: return op(v, 11); case 12: return op(v, 12); \
|
|
|
|
case 13: return op(v, 13); case 14: return op(v, 14); case 15: return op(v, 15); \
|
|
|
|
} return fVec
|
|
|
|
|
|
|
|
#define SHIFT32(op, v, bits) if (bits < 16) { SHIFT16(op, v, bits); } switch(bits) { \
|
|
|
|
case 16: return op(v, 16); case 17: return op(v, 17); case 18: return op(v, 18); \
|
|
|
|
case 19: return op(v, 19); case 20: return op(v, 20); case 21: return op(v, 21); \
|
|
|
|
case 22: return op(v, 22); case 23: return op(v, 23); case 24: return op(v, 24); \
|
|
|
|
case 25: return op(v, 25); case 26: return op(v, 26); case 27: return op(v, 27); \
|
|
|
|
case 28: return op(v, 28); case 29: return op(v, 29); case 30: return op(v, 30); \
|
|
|
|
case 31: return op(v, 31); } return fVec
|
|
|
|
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
template <>
|
2015-04-14 18:49:14 +00:00
|
|
|
class SkNb<2, 4> {
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
public:
|
2015-04-14 18:49:14 +00:00
|
|
|
SkNb(uint32x2_t vec) : fVec(vec) {}
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
|
2015-04-14 18:49:14 +00:00
|
|
|
SkNb() {}
|
|
|
|
bool allTrue() const { return vget_lane_u32(fVec, 0) && vget_lane_u32(fVec, 1); }
|
|
|
|
bool anyTrue() const { return vget_lane_u32(fVec, 0) || vget_lane_u32(fVec, 1); }
|
2015-05-12 13:11:21 +00:00
|
|
|
|
2015-04-14 18:49:14 +00:00
|
|
|
uint32x2_t fVec;
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
template <>
|
2015-04-14 18:49:14 +00:00
|
|
|
class SkNb<4, 4> {
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
public:
|
2015-04-14 18:49:14 +00:00
|
|
|
SkNb(uint32x4_t vec) : fVec(vec) {}
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
|
2015-04-14 18:49:14 +00:00
|
|
|
SkNb() {}
|
|
|
|
bool allTrue() const { return vgetq_lane_u32(fVec, 0) && vgetq_lane_u32(fVec, 1)
|
|
|
|
&& vgetq_lane_u32(fVec, 2) && vgetq_lane_u32(fVec, 3); }
|
|
|
|
bool anyTrue() const { return vgetq_lane_u32(fVec, 0) || vgetq_lane_u32(fVec, 1)
|
|
|
|
|| vgetq_lane_u32(fVec, 2) || vgetq_lane_u32(fVec, 3); }
|
2015-05-12 13:11:21 +00:00
|
|
|
|
2015-04-14 18:49:14 +00:00
|
|
|
uint32x4_t fVec;
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
template <>
|
|
|
|
class SkNf<2, float> {
|
2015-04-14 18:49:14 +00:00
|
|
|
typedef SkNb<2, 4> Nb;
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
public:
|
|
|
|
SkNf(float32x2_t vec) : fVec(vec) {}
|
|
|
|
|
|
|
|
SkNf() {}
|
|
|
|
explicit SkNf(float val) : fVec(vdup_n_f32(val)) {}
|
|
|
|
static SkNf Load(const float vals[2]) { return vld1_f32(vals); }
|
|
|
|
SkNf(float a, float b) { fVec = (float32x2_t) { a, b }; }
|
|
|
|
|
|
|
|
void store(float vals[2]) const { vst1_f32(vals, fVec); }
|
|
|
|
|
|
|
|
SkNf approxInvert() const {
|
|
|
|
float32x2_t est0 = vrecpe_f32(fVec),
|
|
|
|
est1 = vmul_f32(vrecps_f32(est0, fVec), est0);
|
|
|
|
return est1;
|
|
|
|
}
|
|
|
|
SkNf invert() const {
|
|
|
|
float32x2_t est1 = this->approxInvert().fVec,
|
|
|
|
est2 = vmul_f32(vrecps_f32(est1, fVec), est1);
|
|
|
|
return est2;
|
|
|
|
}
|
|
|
|
|
|
|
|
SkNf operator + (const SkNf& o) const { return vadd_f32(fVec, o.fVec); }
|
|
|
|
SkNf operator - (const SkNf& o) const { return vsub_f32(fVec, o.fVec); }
|
|
|
|
SkNf operator * (const SkNf& o) const { return vmul_f32(fVec, o.fVec); }
|
|
|
|
SkNf operator / (const SkNf& o) const {
|
|
|
|
#if defined(SK_CPU_ARM64)
|
|
|
|
return vdiv_f32(fVec, o.fVec);
|
|
|
|
#else
|
|
|
|
return vmul_f32(fVec, o.invert().fVec);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2015-04-14 18:49:14 +00:00
|
|
|
Nb operator == (const SkNf& o) const { return vceq_f32(fVec, o.fVec); }
|
|
|
|
Nb operator < (const SkNf& o) const { return vclt_f32(fVec, o.fVec); }
|
|
|
|
Nb operator > (const SkNf& o) const { return vcgt_f32(fVec, o.fVec); }
|
|
|
|
Nb operator <= (const SkNf& o) const { return vcle_f32(fVec, o.fVec); }
|
|
|
|
Nb operator >= (const SkNf& o) const { return vcge_f32(fVec, o.fVec); }
|
|
|
|
Nb operator != (const SkNf& o) const { return vmvn_u32(vceq_f32(fVec, o.fVec)); }
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
|
|
|
|
static SkNf Min(const SkNf& l, const SkNf& r) { return vmin_f32(l.fVec, r.fVec); }
|
|
|
|
static SkNf Max(const SkNf& l, const SkNf& r) { return vmax_f32(l.fVec, r.fVec); }
|
|
|
|
|
2015-04-27 21:22:32 +00:00
|
|
|
SkNf rsqrt0() const { return vrsqrte_f32(fVec); }
|
|
|
|
SkNf rsqrt1() const {
|
|
|
|
float32x2_t est0 = this->rsqrt0().fVec;
|
|
|
|
return vmul_f32(vrsqrts_f32(fVec, vmul_f32(est0, est0)), est0);
|
|
|
|
}
|
|
|
|
SkNf rsqrt2() const {
|
|
|
|
float32x2_t est1 = this->rsqrt1().fVec;
|
|
|
|
return vmul_f32(vrsqrts_f32(fVec, vmul_f32(est1, est1)), est1);
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
SkNf sqrt() const {
|
|
|
|
#if defined(SK_CPU_ARM64)
|
|
|
|
return vsqrt_f32(fVec);
|
|
|
|
#else
|
2015-04-27 21:22:32 +00:00
|
|
|
return *this * this->rsqrt2();
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2015-04-03 13:16:13 +00:00
|
|
|
template <int k> float kth() const {
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
SkASSERT(0 <= k && k < 2);
|
2015-04-03 13:16:13 +00:00
|
|
|
return vget_lane_f32(fVec, k&1);
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
float32x2_t fVec;
|
|
|
|
};
|
|
|
|
|
|
|
|
#if defined(SK_CPU_ARM64)
|
|
|
|
template <>
|
2015-04-14 18:49:14 +00:00
|
|
|
class SkNb<2, 8> {
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
public:
|
2015-04-14 18:49:14 +00:00
|
|
|
SkNb(uint64x2_t vec) : fVec(vec) {}
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
|
2015-04-14 18:49:14 +00:00
|
|
|
SkNb() {}
|
|
|
|
bool allTrue() const { return vgetq_lane_u64(fVec, 0) && vgetq_lane_u64(fVec, 1); }
|
|
|
|
bool anyTrue() const { return vgetq_lane_u64(fVec, 0) || vgetq_lane_u64(fVec, 1); }
|
2015-05-12 13:11:21 +00:00
|
|
|
|
2015-04-14 18:49:14 +00:00
|
|
|
uint64x2_t fVec;
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
template <>
|
|
|
|
class SkNf<2, double> {
|
2015-04-14 18:49:14 +00:00
|
|
|
typedef SkNb<2, 8> Nb;
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
public:
|
|
|
|
SkNf(float64x2_t vec) : fVec(vec) {}
|
|
|
|
|
|
|
|
SkNf() {}
|
|
|
|
explicit SkNf(double val) : fVec(vdupq_n_f64(val)) {}
|
|
|
|
static SkNf Load(const double vals[2]) { return vld1q_f64(vals); }
|
|
|
|
SkNf(double a, double b) { fVec = (float64x2_t) { a, b }; }
|
|
|
|
|
|
|
|
void store(double vals[2]) const { vst1q_f64(vals, fVec); }
|
|
|
|
|
|
|
|
SkNf operator + (const SkNf& o) const { return vaddq_f64(fVec, o.fVec); }
|
|
|
|
SkNf operator - (const SkNf& o) const { return vsubq_f64(fVec, o.fVec); }
|
|
|
|
SkNf operator * (const SkNf& o) const { return vmulq_f64(fVec, o.fVec); }
|
|
|
|
SkNf operator / (const SkNf& o) const { return vdivq_f64(fVec, o.fVec); }
|
|
|
|
|
2015-04-14 18:49:14 +00:00
|
|
|
Nb operator == (const SkNf& o) const { return vceqq_f64(fVec, o.fVec); }
|
|
|
|
Nb operator < (const SkNf& o) const { return vcltq_f64(fVec, o.fVec); }
|
|
|
|
Nb operator > (const SkNf& o) const { return vcgtq_f64(fVec, o.fVec); }
|
|
|
|
Nb operator <= (const SkNf& o) const { return vcleq_f64(fVec, o.fVec); }
|
|
|
|
Nb operator >= (const SkNf& o) const { return vcgeq_f64(fVec, o.fVec); }
|
|
|
|
Nb operator != (const SkNf& o) const {
|
|
|
|
return vreinterpretq_u64_u32(vmvnq_u32(vreinterpretq_u32_u64(vceqq_f64(fVec, o.fVec))));
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
static SkNf Min(const SkNf& l, const SkNf& r) { return vminq_f64(l.fVec, r.fVec); }
|
|
|
|
static SkNf Max(const SkNf& l, const SkNf& r) { return vmaxq_f64(l.fVec, r.fVec); }
|
|
|
|
|
|
|
|
SkNf sqrt() const { return vsqrtq_f64(fVec); }
|
2015-04-27 21:22:32 +00:00
|
|
|
|
|
|
|
SkNf rsqrt0() const { return vrsqrteq_f64(fVec); }
|
|
|
|
SkNf rsqrt1() const {
|
|
|
|
float64x2_t est0 = this->rsqrt0().fVec;
|
|
|
|
return vmulq_f64(vrsqrtsq_f64(fVec, vmulq_f64(est0, est0)), est0);
|
|
|
|
}
|
|
|
|
SkNf rsqrt2() const {
|
|
|
|
float64x2_t est1 = this->rsqrt1().fVec;
|
|
|
|
return vmulq_f64(vrsqrtsq_f64(fVec, vmulq_f64(est1, est1)), est1);
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
SkNf approxInvert() const {
|
|
|
|
float64x2_t est0 = vrecpeq_f64(fVec),
|
|
|
|
est1 = vmulq_f64(vrecpsq_f64(est0, fVec), est0);
|
|
|
|
return est1;
|
|
|
|
}
|
|
|
|
|
|
|
|
SkNf invert() const {
|
|
|
|
float64x2_t est1 = this->approxInvert().fVec,
|
|
|
|
est2 = vmulq_f64(vrecpsq_f64(est1, fVec), est1),
|
|
|
|
est3 = vmulq_f64(vrecpsq_f64(est2, fVec), est2);
|
|
|
|
return est3;
|
|
|
|
}
|
|
|
|
|
2015-04-03 13:16:13 +00:00
|
|
|
template <int k> double kth() const {
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
SkASSERT(0 <= k && k < 2);
|
2015-04-03 13:16:13 +00:00
|
|
|
return vgetq_lane_f64(fVec, k&1);
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
float64x2_t fVec;
|
|
|
|
};
|
|
|
|
#endif//defined(SK_CPU_ARM64)
|
|
|
|
|
2015-04-27 19:08:01 +00:00
|
|
|
template <>
|
|
|
|
class SkNi<4, int> {
|
|
|
|
public:
|
|
|
|
SkNi(const int32x4_t& vec) : fVec(vec) {}
|
|
|
|
|
|
|
|
SkNi() {}
|
|
|
|
explicit SkNi(int val) : fVec(vdupq_n_s32(val)) {}
|
|
|
|
static SkNi Load(const int vals[4]) { return vld1q_s32(vals); }
|
|
|
|
SkNi(int a, int b, int c, int d) { fVec = (int32x4_t) { a, b, c, d }; }
|
|
|
|
|
|
|
|
void store(int vals[4]) const { vst1q_s32(vals, fVec); }
|
|
|
|
|
|
|
|
SkNi operator + (const SkNi& o) const { return vaddq_s32(fVec, o.fVec); }
|
|
|
|
SkNi operator - (const SkNi& o) const { return vsubq_s32(fVec, o.fVec); }
|
|
|
|
SkNi operator * (const SkNi& o) const { return vmulq_s32(fVec, o.fVec); }
|
|
|
|
|
2015-05-12 13:11:21 +00:00
|
|
|
SkNi operator << (int bits) const { SHIFT32(vshlq_n_s32, fVec, bits); }
|
|
|
|
SkNi operator >> (int bits) const { SHIFT32(vshrq_n_s32, fVec, bits); }
|
2015-04-27 19:08:01 +00:00
|
|
|
|
|
|
|
template <int k> int kth() const {
|
|
|
|
SkASSERT(0 <= k && k < 4);
|
|
|
|
return vgetq_lane_s32(fVec, k&3);
|
|
|
|
}
|
2015-05-12 13:11:21 +00:00
|
|
|
|
2015-04-27 19:08:01 +00:00
|
|
|
int32x4_t fVec;
|
|
|
|
};
|
|
|
|
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
template <>
|
|
|
|
class SkNf<4, float> {
|
2015-04-14 18:49:14 +00:00
|
|
|
typedef SkNb<4, 4> Nb;
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
public:
|
|
|
|
SkNf(float32x4_t vec) : fVec(vec) {}
|
|
|
|
|
|
|
|
SkNf() {}
|
|
|
|
explicit SkNf(float val) : fVec(vdupq_n_f32(val)) {}
|
|
|
|
static SkNf Load(const float vals[4]) { return vld1q_f32(vals); }
|
|
|
|
SkNf(float a, float b, float c, float d) { fVec = (float32x4_t) { a, b, c, d }; }
|
|
|
|
|
|
|
|
void store(float vals[4]) const { vst1q_f32(vals, fVec); }
|
|
|
|
|
2015-04-27 19:08:01 +00:00
|
|
|
SkNi<4, int> castTrunc() const { return vcvtq_s32_f32(fVec); }
|
|
|
|
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
SkNf approxInvert() const {
|
|
|
|
float32x4_t est0 = vrecpeq_f32(fVec),
|
|
|
|
est1 = vmulq_f32(vrecpsq_f32(est0, fVec), est0);
|
|
|
|
return est1;
|
|
|
|
}
|
|
|
|
SkNf invert() const {
|
|
|
|
float32x4_t est1 = this->approxInvert().fVec,
|
|
|
|
est2 = vmulq_f32(vrecpsq_f32(est1, fVec), est1);
|
|
|
|
return est2;
|
|
|
|
}
|
|
|
|
|
|
|
|
SkNf operator + (const SkNf& o) const { return vaddq_f32(fVec, o.fVec); }
|
|
|
|
SkNf operator - (const SkNf& o) const { return vsubq_f32(fVec, o.fVec); }
|
|
|
|
SkNf operator * (const SkNf& o) const { return vmulq_f32(fVec, o.fVec); }
|
|
|
|
SkNf operator / (const SkNf& o) const {
|
|
|
|
#if defined(SK_CPU_ARM64)
|
|
|
|
return vdivq_f32(fVec, o.fVec);
|
|
|
|
#else
|
|
|
|
return vmulq_f32(fVec, o.invert().fVec);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2015-04-14 18:49:14 +00:00
|
|
|
Nb operator == (const SkNf& o) const { return vceqq_f32(fVec, o.fVec); }
|
|
|
|
Nb operator < (const SkNf& o) const { return vcltq_f32(fVec, o.fVec); }
|
|
|
|
Nb operator > (const SkNf& o) const { return vcgtq_f32(fVec, o.fVec); }
|
|
|
|
Nb operator <= (const SkNf& o) const { return vcleq_f32(fVec, o.fVec); }
|
|
|
|
Nb operator >= (const SkNf& o) const { return vcgeq_f32(fVec, o.fVec); }
|
|
|
|
Nb operator != (const SkNf& o) const { return vmvnq_u32(vceqq_f32(fVec, o.fVec)); }
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
|
|
|
|
static SkNf Min(const SkNf& l, const SkNf& r) { return vminq_f32(l.fVec, r.fVec); }
|
|
|
|
static SkNf Max(const SkNf& l, const SkNf& r) { return vmaxq_f32(l.fVec, r.fVec); }
|
|
|
|
|
2015-04-27 21:22:32 +00:00
|
|
|
SkNf rsqrt0() const { return vrsqrteq_f32(fVec); }
|
|
|
|
SkNf rsqrt1() const {
|
|
|
|
float32x4_t est0 = this->rsqrt0().fVec;
|
|
|
|
return vmulq_f32(vrsqrtsq_f32(fVec, vmulq_f32(est0, est0)), est0);
|
|
|
|
}
|
|
|
|
SkNf rsqrt2() const {
|
|
|
|
float32x4_t est1 = this->rsqrt1().fVec;
|
|
|
|
return vmulq_f32(vrsqrtsq_f32(fVec, vmulq_f32(est1, est1)), est1);
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
SkNf sqrt() const {
|
|
|
|
#if defined(SK_CPU_ARM64)
|
|
|
|
return vsqrtq_f32(fVec);
|
|
|
|
#else
|
2015-04-27 21:22:32 +00:00
|
|
|
return *this * this->rsqrt2();
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2015-04-03 13:16:13 +00:00
|
|
|
template <int k> float kth() const {
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
SkASSERT(0 <= k && k < 4);
|
2015-04-03 13:16:13 +00:00
|
|
|
return vgetq_lane_f32(fVec, k&3);
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
float32x4_t fVec;
|
|
|
|
};
|
|
|
|
|
2015-05-12 13:11:21 +00:00
|
|
|
template <>
|
|
|
|
class SkNi<8, uint16_t> {
|
|
|
|
public:
|
|
|
|
SkNi(const uint16x8_t& vec) : fVec(vec) {}
|
|
|
|
|
|
|
|
SkNi() {}
|
|
|
|
explicit SkNi(uint16_t val) : fVec(vdupq_n_u16(val)) {}
|
|
|
|
static SkNi Load(const uint16_t vals[8]) { return vld1q_u16(vals); }
|
|
|
|
|
|
|
|
SkNi(uint16_t a, uint16_t b, uint16_t c, uint16_t d,
|
|
|
|
uint16_t e, uint16_t f, uint16_t g, uint16_t h) {
|
|
|
|
fVec = (uint16x8_t) { a,b,c,d, e,f,g,h };
|
|
|
|
}
|
|
|
|
|
|
|
|
void store(uint16_t vals[8]) const { vst1q_u16(vals, fVec); }
|
|
|
|
|
|
|
|
SkNi operator + (const SkNi& o) const { return vaddq_u16(fVec, o.fVec); }
|
|
|
|
SkNi operator - (const SkNi& o) const { return vsubq_u16(fVec, o.fVec); }
|
|
|
|
SkNi operator * (const SkNi& o) const { return vmulq_u16(fVec, o.fVec); }
|
|
|
|
|
|
|
|
SkNi operator << (int bits) const { SHIFT16(vshlq_n_u16, fVec, bits); }
|
|
|
|
SkNi operator >> (int bits) const { SHIFT16(vshrq_n_u16, fVec, bits); }
|
|
|
|
|
|
|
|
template <int k> uint16_t kth() const {
|
|
|
|
SkASSERT(0 <= k && k < 8);
|
|
|
|
return vgetq_lane_u16(fVec, k&7);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint16x8_t fVec;
|
|
|
|
};
|
|
|
|
|
|
|
|
template <>
|
|
|
|
class SkNi<16, uint8_t> {
|
|
|
|
public:
|
|
|
|
SkNi(const uint8x16_t& vec) : fVec(vec) {}
|
|
|
|
|
|
|
|
SkNi() {}
|
|
|
|
explicit SkNi(uint8_t val) : fVec(vdupq_n_u8(val)) {}
|
|
|
|
static SkNi Load(const uint8_t vals[16]) { return vld1q_u8(vals); }
|
|
|
|
|
|
|
|
SkNi(uint8_t a, uint8_t b, uint8_t c, uint8_t d,
|
|
|
|
uint8_t e, uint8_t f, uint8_t g, uint8_t h,
|
|
|
|
uint8_t i, uint8_t j, uint8_t k, uint8_t l,
|
|
|
|
uint8_t m, uint8_t n, uint8_t o, uint8_t p) {
|
|
|
|
fVec = (uint8x16_t) { a,b,c,d, e,f,g,h, i,j,k,l, m,n,o,p };
|
|
|
|
}
|
|
|
|
|
|
|
|
void store(uint8_t vals[16]) const { vst1q_u8(vals, fVec); }
|
|
|
|
|
2015-05-13 15:02:14 +00:00
|
|
|
SkNi saturatedAdd(const SkNi& o) const { return vqaddq_u8(fVec, o.fVec); }
|
|
|
|
|
2015-05-12 13:11:21 +00:00
|
|
|
SkNi operator + (const SkNi& o) const { return vaddq_u8(fVec, o.fVec); }
|
|
|
|
SkNi operator - (const SkNi& o) const { return vsubq_u8(fVec, o.fVec); }
|
|
|
|
SkNi operator * (const SkNi& o) const { return vmulq_u8(fVec, o.fVec); }
|
|
|
|
|
|
|
|
SkNi operator << (int bits) const { SHIFT8(vshlq_n_u8, fVec, bits); }
|
|
|
|
SkNi operator >> (int bits) const { SHIFT8(vshrq_n_u8, fVec, bits); }
|
|
|
|
|
|
|
|
template <int k> uint8_t kth() const {
|
|
|
|
SkASSERT(0 <= k && k < 15);
|
|
|
|
return vgetq_lane_u8(fVec, k&16);
|
|
|
|
}
|
|
|
|
|
|
|
|
uint8x16_t fVec;
|
|
|
|
};
|
|
|
|
|
|
|
|
#undef SHIFT32
|
|
|
|
#undef SHIFT16
|
|
|
|
#undef SHIFT8
|
|
|
|
|
Refactor Sk2x<T> + Sk4x<T> into SkNf<N,T> and SkNi<N,T>
The primary feature this delivers is SkNf and SkNd for arbitrary power-of-two N. Non-specialized types or types larger than 128 bits should now Just Work (and we can drop in a specialization to make them faster). Sk4s is now just a typedef for SkNf<4, SkScalar>; Sk4d is SkNf<4, double>, Sk2f SkNf<2, float>, etc.
This also makes implementing new specializations easier and more encapsulated. We're now using template specialization, which means the specialized versions don't have to leak out so much from SkNx_sse.h and SkNx_neon.h.
This design leaves us room to grow up, e.g to SkNf<8, SkScalar> == Sk8s, and to grown down too, to things like SkNi<8, uint16_t> == Sk8h.
To simplify things, I've stripped away most APIs (swizzles, casts, reinterpret_casts) that no one's using yet. I will happily add them back if they seem useful.
You shouldn't feel bad about using any of the typedef Sk4s, Sk4f, Sk4d, Sk2s, Sk2f, Sk2d, Sk4i, etc. Here's how you should feel:
- Sk4f, Sk4s, Sk2d: feel awesome
- Sk2f, Sk2s, Sk4d: feel pretty good
No public API changes.
TBR=reed@google.com
BUG=skia:3592
Review URL: https://codereview.chromium.org/1048593002
2015-03-30 17:50:27 +00:00
|
|
|
#endif//SkNx_neon_DEFINED
|