The simple scaling that only samples every input pixel once, can be
used with downscaling < 2x as well if we just handle the case where the
input can't be in the intermediate buffer.
At the same time the handling of the intermediate buffer has been moved
out of simple scale helper functions so the code can be shared and the
AVX2 optimizations also used for non-argb32pm formats.
Change-Id: I98d225ef8d4f2978480d09110c959b556c563b57
Reviewed-by: Eirik Aavitsland <eirik.aavitsland@qt.io>
Reviewed-by: Lars Knoll <lars.knoll@qt.io>