- streamline the testing down to just byte multiplies
(that's always where the blend algorithms vary)
- add another approximate multiply (x*y+255)>>8
- add another variant of the perfect multiply, ((x*y+128)*257)>>16
I've realized ((x*y+128)*257)>>16 might be just as fast in SSE/NEON
as our current (x*y+x)>>8 approximation. Good to be testing it here.
BUG=skia:
Review URL: https://codereview.chromium.org/1453043005
The new test is disabled by default, as it's quite slow.
We can run it if we suspect problems by passing -x to DM.
This test would have been failing before the bug fix, and now is passing.
Assuming the Priv on the end means it's not considered public API...
TBR=reed@google.com
BUG=skia:4052
Review URL: https://codereview.chromium.org/1228333003