mtklein
|
a525cb151b
|
skeleton for float <-> half optimized procs
Nothing fancy yet, just calls the serial code in a loop.
I will try to folow this up with at least some of:
- SSE2 version of serial code
- NEON version of serial code
- NEON version using vcvt.f32.f16/vcvt.f16.f32
- F16C (between AVX and AVX2) version using vcvtph2ps/vcvtps2ph
The last two are fastest but need runtime detection.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1686543003
Review URL: https://codereview.chromium.org/1686543003
|
2016-02-09 08:18:10 -08:00 |
|