Assorted tweaking, twisting and tuning to squeeze a few additional cycles
out of the memchr code. Changes include bypassing the shift pairs
(sld,srd) when they are not required, and unrolling the small_loop that
handles short and trailing strings.
Per scrollpipe data measuring aligned strings for 64-bit, these changes
save between five and eight cycles (9-13% overall) for short strings (<32),
Longer aligned strings see slight improvement of 1-3% due to bypassing the
shifts and the instruction rearranging.
This patch provides optimized logb (1.2x on PPC32 and 2.5x on PPC64),
logbf (1.1x on PPC32 and 2.2x on PPC64), and logbl (1.3x on PPC32 and
50% on PPC64) for the POWER7 processor.
This patch tries to organize the implies files for ppc, since there are
a number of processors and most of them are compatible with each other
(backwards compatible).
Having in mind that we start the search for processor-specific files in
the sysdeps/unix/sysv/linux tree
(sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/[processor]/fpu to be
exact), we would like to grab any linux-specific code from that tree
prior to going through the other tree (sysdeps/powerpc/...).
For that, i removed the Implies files that were originally inside the
fpu directories and placed then in the non-fpu directories (still inside
the unix/sysv/linux tree). If no processor-specific/linux-specific files
could be found, we "imply" the other tree's (sysdeps/powerpc/...) fpu
directory for that specific processor AND also the non-fpu directory for
that same tree.
If, again, no processor-specific code is found, we read another Implies
file that will point to the most compatible processor that we should
grab code from, and so on, until we reach the power4 processor.
So, in summary, the Implies files will live inside these directories
now:
* sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/[processor]
* sysdeps/powerpc/powerpc[32|64]/[processor]
Practical example of the order we will use to pick power6-specific code
with the new structure.
sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/power6/fpu ->
sysdeps/unix/sysv/linux/powerpc/powerpc[32|64]/power6 ->
sysdeps/powerpc/powerpc[32|64]/power6/fpu ->
sysdeps/powerpc/powerpc[32|64]/power6 ->
sysdeps/powerpc/powerpc[32|64]/power5+/fpu ->
sysdeps/powerpc/powerpc[32|64]/power5+ ->
sysdeps/powerpc/powerpc[32|64]/power5/fpu ->
sysdeps/powerpc/powerpc[32|64]/power5 ->
sysdeps/powerpc/powerpc[32|64]/power4/fpu ->
sysdeps/powerpc/powerpc[32|64]/power4 (from here, it'll go to the
generic path as usual)