SkMatrix::mapPts() using aacc/bbdd was always worse than using badc():
- On Intel, it was faster than exisiting swizzle, but badc() is 10% faster still (one pshufd instead of two).
- On ARM, existing swizzle < badc() < aacc()+bbdd(), even though aacc() then bbdd() is really a single vtrn instruction.
I will revert SkMatrix.cpp before submitting. Just thought you might like to look.
Will think more and try to gear up Instruments on ARM.
BUG=skia:
Review URL: https://codereview.chromium.org/1012573003
This removes all the existing Sk4x swizzles and adds badc(), which is
both fast on all implementations and currently useful.
BUG=skia:
Review URL: https://codereview.chromium.org/997353005
We don't have control over which way _mm_cvtps_epi32 rounds.
- This makes the SSE SkPMFloat rounding consistent with _neon and _none.
- Sk4f::cast<Sk4i>() is closer to (int)float's behavior. (Correct when >=0).
Add tests that would fail at head.
BUG=skia:
Review URL: https://codereview.chromium.org/1029163002
The bench improves from 39 to 30, about half from porting to Sk2f, half from
x.add(x) instead of x.multiply(two).
Remove Sk4f Load2/store2 now that we have Sk2f.
BUG=skia:
Review URL: https://codereview.chromium.org/1019773004
This doesn't add them to the second-stringer Sk4i. It's unclear we should be
doing that often, and we don't have efficient ways to do it except via floats.
BUG=skia:
Review URL: https://codereview.chromium.org/964603002
work on tests
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu13.10-GCE-NoGPU-x86_64-Debug-ASAN-Trybot,Test-Ubuntu12-ShuttleA-GTX660-x86-Debug-Trybot,Test-Win7-ShuttleA-HD2000-x86_64-Debug-Trybot,Test-Win7-ShuttleA-HD2000-x86-Debug-Trybot
BUG=skia:
Review URL: https://codereview.chromium.org/704923003