Separate benchmarks for the fast and slow implementations of pow and
exp since measuring both together doesn't make sense. Adjust the
iterations for pow and exp accordingly so that they run long enough
for the measurements to be meaningful.
The branch prediction hints is actually hurts performance in this case.
The assembly implementation make two assumptions: 1. 'fabs (x) < 2^52'
is unlikely and 2. 'x > 0.0' is unlike (if 1. is true). Since it a
general floating point function, expected input is not bounded and then
it is better to let the hardware handle the branches.