Benchmark programs are generated using parameters from the Makefile,
so it is necessary to rebuild them whenever the parameters in the
Makefile are updated. Hence, added a dependency for the generated C
source on the Makefile so that it gets regenerated when the Makefile
is updated.
Separate benchmarks for the fast and slow implementations of pow and
exp since measuring both together doesn't make sense. Adjust the
iterations for pow and exp accordingly so that they run long enough
for the measurements to be meaningful.
The branch prediction hints is actually hurts performance in this case.
The assembly implementation make two assumptions: 1. 'fabs (x) < 2^52'
is unlikely and 2. 'x > 0.0' is unlike (if 1. is true). Since it a
general floating point function, expected input is not bounded and then
it is better to let the hardware handle the branches.