as it's faster, due to one memory scan instead of two
(confirmed by microbenchmark).
Note : as ZSTD_reduceIndex() is rarely invoked,
it does not translate into a visible gain.
Consider it an exercise in auto-vectorization and micro-benchmarking.
On my laptop:
Before:
./zstd32 -b --zstd=wlog=27 silesia.tar enwik8 -S
3#silesia.tar : 211984896 -> 66683478 (3.179), 97.6 MB/s , 400.7 MB/s
3#enwik8 : 100000000 -> 35643153 (2.806), 76.5 MB/s , 303.2 MB/s
After:
./zstd32 -b --zstd=wlog=27 silesia.tar enwik8 -S
3#silesia.tar : 211984896 -> 66683478 (3.179), 97.4 MB/s , 435.0 MB/s
3#enwik8 : 100000000 -> 35643153 (2.806), 76.2 MB/s , 338.1 MB/s
Mileage vary, depending on file, and cpu type.
But a generic rule is : x86 benefits less from "long-offset mode" than x64,
maybe due to register pressure.
On "entropy", long-mode is _never_ a win for x86.
On my laptop though, it may, depending on file and compression level
(enwik8 benefits more from "long-mode" than silesia).
This makes it easier to explain that nbWorkers=0 --> single-threaded mode,
while nbWorkers=1 --> asynchronous mode (one mode thread on top of the "main" caller thread).
No need for an additional asynchronous mode flag.
nbWorkers>=2 works the same as nbThreads>=2 previously.
to avoid confusion with blocks.
also:
- jobs are cut into chunks of 512KB now, to reduce nb of mutex calls.
- fix function declaration ZSTD_getBlockSizeMax()
- fix outdated comment
Other job members are accessed directly.
This avoids a full job copy, which would access everything,
including a few members that are supposed to be used by worker only,
uselessly requiring additional locks to avoid race conditions.
writeLastEmptyBlock() must release srcBuffer
as mtctx assumes it's done by job worker.
minor : changed 2 job member names (src->srcBuffer, srcStart->prefixStart) for clarity
replaced by equivalent signal job->consumer == job->srcSize.
created additional functions
ZSTD_writeLastEmptyBlock()
and
ZSTDMT_writeLastEmptyBlock()
required when it's necessary to finish a frame with a last empty job, to create an "end of frame" marker.
It avoids creating a job with srcSize==0.