Rather than trying to make it lock-free (which requires double-bookkeeping of
4 atomic ints!), just lock the mutex before calling it.
tst_bench_qthreadpool shows no difference whatsoever between the two
solutions, I get 0.005 msecs per iteration in startRunnables().
Of course looping over calls to activeThreadCount() is a bit slower,
from 0.0002 msecs per iteration to 0.00027 msecs, i.e. 35% more.
But polling activeThreadCount() from the app is a really wrong thing to
do anyway, this benchmark was just for my own curiosity about the
price of a mutex in a function that sums up 4 ints.
What matters is start() performance, which is unchanged (0.00007 msecs
is just noise compared to a 0.005 total, that's 1.4%).
Change-Id: I993444eef8bc68eff9badd581fae3626dfd1cc6d
Reviewed-by: Olivier Goffart <ogoffart@woboq.com>