Implement I64x2 multiply using 32-bit multiplies.
This approach uses two fewer cycles (0.88x) on Cortex-A53 and three fewer cycles (0.86x)
on Cortex-A72, compared to moving to general purpose registers and doing two 64-bit multiplies.
Based on a patch by Zhi An Ng.
Bug: v8:8460
Change-Id: I9c8d3bb77f0d751eec2d85823522558b7f173628
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1781696
Commit-Queue: Martyn Capewell <martyn.capewell@arm.com>
Reviewed-by: Zhi An Ng <zhin@chromium.org>
Reviewed-by: Bill Budge <bbudge@chromium.org>
Cr-Commit-Position: refs/heads/master@{#63558}