We now search for a block of N adjacent registers that minimizes the
number of spills, where unspillable registers are counted as if
infinitely expensive.
On arm64, store64 now uses alloc_tmp(2) and st2.4s, and similarly
store128 uses alloc_tmp(4) and st4.4s.
For the purposes of arm64 instructions we could allow the block to wrap
around the register file mod 32, but I figured that probably wasn't
necessary and might be confusing to follow. If we're right on the edge
some day we could circle back.
I'm not sure yet if our register-pair use cases on x86-64 will ever care
that the registers are adjacent, but it doesn't hurt to start that way.
This should behave pretty much the same when N=1 except that we're not
doing any interesting tie-breaking when all registers are occupied. I
left a TODO, but I bet we'll never feel the need to follow up on it.
Change-Id: Ibbdd42858a6daf61401c638435617bfb37d1899c
Reviewed-on: https://skia-review.googlesource.com/c/skia/+/357300
Reviewed-by: Herb Derby <herb@google.com>
Commit-Queue: Mike Klein <mtklein@google.com>