Merge pull request #39 from Cyan4973/dev

Dev
2015-08-24 00:54:15 +02:00 · 2015-08-24 00:54:15 +02:00 · ee4e957a13
commit ee4e957a13
parent 1eca5f5299 d5b7cb0bb8
19 changed files with 2774 additions and 1766 deletions
--- a/3
+++ b/3
@ -32,7 +32,7 @@
 # ################################################################

 # Version number
-export VERSION=0.0.2
+export VERSION=0.1.0
 export RELEASE=r$(VERSION)

 DESTDIR?=
@ -93,6 +93,7 @@ prg-travis:
 	@cd $(PRGDIR); $(MAKE) -e $(ZSTD_TRAVIS_CI_ENV)

 clangtest: clean
+	clang -v
 	$(MAKE) all CC=clang MOREFLAGS="-Werror -Wconversion -Wno-sign-conversion"

 gpptest: clean
--- a/README.md
+++ b/README.md
@ -1,4 +1,4 @@
- **Zstd**, short for Zstandard, is a new lossless compression algorithm, which provides both good compression ratio _and_ speed for your standard compression needs. "Standard" translates into everyday situations which neither look for highest possible ratio (which LZMA and ZPAQ cover) nor extreme speeds (which LZ4 covers).
+ **Zstd**, short for Zstandard, is a new lossless compression algorithm, which provides both good compression ratio _and_ speed for your standard compression needs. "Standard" translates into everyday situations which neither look for highest possible ratio nor extreme speed.

 It is provided as a BSD-license package, hosted on Github.

@ -7,40 +7,42 @@ It is provided as a BSD-license package, hosted on Github.
 |master      | [![Build Status](https://travis-ci.org/Cyan4973/zstd.svg?branch=master)](https://travis-ci.org/Cyan4973/zstd) |
 |dev         | [![Build Status](https://travis-ci.org/Cyan4973/zstd.svg?branch=dev)](https://travis-ci.org/Cyan4973/zstd) |

-For a taste of its performance, here are a few benchmark numbers, completed on a Core i5-4300U @ 1.9 GHz, using [fsbench 0.14.3](http://encode.ru/threads/1371-Filesystem-benchmark?p=34029&viewfull=1#post34029), an open-source benchmark program by m^2.
+For a taste of its performance, here are a few benchmark numbers, completed on a Core i7-5600U @ 2.6 GHz, using [fsbench 0.14.3](http://encode.ru/threads/1371-Filesystem-benchmark?p=34029&viewfull=1#post34029), an open-source benchmark program by m^2.

-|Name           | Ratio | C.speed | D.speed |
-|---------------|-------|---------|---------|
-|               |       |   MB/s  |  MB/s   |
-| [zlib 1.2.8 -6](http://www.zlib.net/)| 3.099 |    18   |  275    |
-| **zstd**      |**2.872**|**201**|**498**  |
-| [zlib 1.2.8 -1](http://www.zlib.net/)| 2.730 |    58   |   250   |
-| [LZ4 HC r127](https://github.com/Cyan4973/lz4)| 2.720 |   26    |  1720   |
-| QuickLZ 1.5.1b6|2.237 |  323    |  373    |
-| LZO 2.06      | 2.106 |  351    |  510    |
-| Snappy 1.1.0  | 2.091 |  238    |  964    |
-| [LZ4 r127](https://github.com/Cyan4973/lz4)| 2.084 |  370    | 1590    |
-| LZF 3.6       | 2.077 |  220    |  502    |
+|Name            | Ratio | C.speed | D.speed |
+|----------------|-------|--------:|--------:|
+|                |       |   MB/s  |  MB/s   |
+| [zlib 1.2.8] -6| 3.099 |    21   |   320   |
+| **zstd**       |**2.871**|**255**| **628** |
+| [zlib 1.2.8] -1| 2.730 |    70   |   300   | 
+| [LZ4] HC r131  | 2.720 |    25   |  2100   |
+| QuickLZ 1.5.1b6| 2.237 |   370   |   415   |
+| LZO 2.06       | 2.106 |   400   |   580   |
+| Snappy 1.1.0   | 2.091 |   330   |  1100   |
+| [LZ4] r131     | 2.101 |   450   |  2100   |
+| LZF 3.6        | 2.077 |   200   |   560   |
+
+[zlib 1.2.8]:http://www.zlib.net/
+[LZ4]:http://www.lz4.org/

 An interesting feature of zstd is that it can qualify as both a reasonably strong compressor and a fast one.

-Zstd delivers high decompression speed, at around ~500 MB/s per core.
+Zstd delivers high decompression speed, at more than >600 MB/s per core.
 Obviously, your exact mileage will vary depending on your target system.

-Zstd compression speed, on the other hand, can be configured to fit different situations.
-The first, fast, derivative offers ~200 MB/s per core, which is suitable for a few real-time scenarios.
-But similar to [LZ4](https://github.com/Cyan4973/lz4), zstd can offer derivatives trading compression time for compression ratio, while keeping decompression properties intact. "Offline compression", where compression time is of little importance because the content is only compressed once and decompressed many times, is therefore within the scope.
+Zstd compression speed will be configurable to fit different situations.
+The first version offered is the fast one, at ~250 MB/s per core, which is suitable for a few real-time scenarios.
+But similar to [LZ4], zstd can offer derivatives trading compression time for compression ratio, keeping decompression properties intact. "Offline compression", where compression time is of little importance because the content is only compressed once and decompressed many times, is therefore within scope.

 Note that high compression derivatives still have to be developed.
-It's a complex area which will certainly benefit the contributions from a few experts.
+It's a complex area which will require time and benefit from contributions.


 Another property zstd is developed for is configurable memory requirement, with the objective to fit into low-memory configurations, or servers handling many connections in parallel.

-Zstd entropy stage is provided by [FSE (Finite State Entropy)](https://github.com/Cyan4973/FiniteStateEntropy).
+Zstd entropy stage is provided by [Huff0 and FSE, from Finite State Entrop library](https://github.com/Cyan4973/FiniteStateEntropy).

-Zstd development is starting. So consider current results merely as early ones. The implementation will gradually evolve and improve overtime, especially during this first year. This is a phase which will depend a lot on user feedback, since these feedback will be key in deciding next priorities or features to add.
+Zstd is still considered experimental at this stage. Specifically, it doesn't guarantee yet that its current stream/file format will remain supported in future versions of the library. Therefore, only use Zstd in environments where you can control the availability of the decompression library. "Stable" status, including official documented format format and long-term support commitment, is projected sometimes early 2016.

-The "master" branch is reserved for stable release and betas.
-The "dev" branch is the one where all contributions will be merged. If you plan to propose a patch, please commit into the "dev" branch. Direct commit to "master" are not permitted.
-Feature branches will also exist, typically to introduce new requirements, and be temporarily available for testing before merge into "dev" branch.
+### Branch Policy
+The "dev" branch is the one where all contributions will be merged before reaching "master". If you plan to propose a patch, please commit into the "dev" branch or its own feature branch. Direct commit to "master" are not permitted.
--- a/lib/fse.c
+++ b/lib/fse.c
--- a/lib/fse.h
+++ b/lib/fse.h
@ -55,12 +55,11 @@ size_t FSE_decompress(void* dst,  size_t maxDstSize,
 /*
 FSE_compress():
    Compress content of buffer 'src', of size 'srcSize', into destination buffer 'dst'.
-    'dst' buffer must be already allocated, and sized to handle worst case situations.
-    Worst case size evaluation is provided by FSE_compressBound().
-    return : size of compressed data
-    Special values : if return == 0, srcData is not compressible => Nothing is stored within cSrc !!!
-                     if return == 1, srcData is a single byte symbol * srcSize times. Use RLE compression.
-                     if FSE_isError(return), it's an error code.
+    'dst' buffer must be already allocated. Compression runs faster is maxDstSize >= FSE_compressBound(srcSize)
+    return : size of compressed data (<= maxDstSize)
+    Special values : if return == 0, srcData is not compressible => Nothing is stored within dst !!!
+                     if return == 1, srcData is a single byte symbol * srcSize times. Use RLE compression instead.
+                     if FSE_isError(return), compression failed (more details using FSE_getErrorName())

 FSE_decompress():
    Decompress FSE data from buffer 'cSrc', of size 'cSrcSize',
@ -70,7 +69,33 @@ FSE_decompress():

    ** Important ** : FSE_decompress() doesn't decompress non-compressible nor RLE data !!!
    Why ? : making this distinction requires a header.
-    FSE library doesn't manage headers, which are intentionally left to the user layer.
+    Header management is intentionally delegated to the user layer, which can better manage special cases.
+*/
+
+
+/******************************************
+*  Huff0 simple functions
+******************************************/
+size_t HUF_compress(void* dst, size_t maxDstSize,
+              const void* src, size_t srcSize);
+size_t HUF_decompress(void* dst,  size_t maxDstSize,
+                const void* cSrc, size_t cSrcSize);
+/*
+HUF_compress():
+    Compress content of buffer 'src', of size 'srcSize', into destination buffer 'dst'.
+    'dst' buffer must be already allocated. Compression runs faster is maxDstSize >= HUF_compressBound(srcSize)
+    return : size of compressed data (<= maxDstSize)
+    Special values : if return == 0, srcData is not compressible => Nothing is stored within dst !!!
+                     if return == 1, srcData is a single byte symbol * srcSize times. Use RLE compression.
+                     if FSE_isError(return), compression failed (more details using FSE_getErrorName())
+
+HUF_decompress():
+    Decompress Huff0 data from buffer 'cSrc', of size 'cSrcSize',
+    into already allocated destination buffer 'dst', of size 'maxDstSize'.
+    return : size of regenerated data (<= maxDstSize)
+             or an error code, which can be tested using FSE_isError()
+
+    ** Important ** : HUF_decompress() doesn't decompress non-compressible nor RLE data !!!
 */


@ -98,6 +123,8 @@ FSE_compress2():
 */
 size_t FSE_compress2 (void* dst, size_t dstSize, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog);

+size_t HUF_compress2 (void* dst, size_t dstSize, const void* src, size_t srcSize, unsigned maxSymbolValue, unsigned tableLog);
+

 /******************************************
 *  FSE detailed API
@ -106,18 +133,18 @@ size_t FSE_compress2 (void* dst, size_t dstSize, const void* src, size_t srcSize
 FSE_compress() does the following:
 1. count symbol occurrence from source[] into table count[]
 2. normalize counters so that sum(count[]) == Power_of_2 (2^tableLog)
-3. save normalized counters to memory buffer using writeHeader()
+3. save normalized counters to memory buffer using writeNCount()
 4. build encoding table 'CTable' from normalized counters
 5. encode the data stream using encoding table 'CTable'

 FSE_decompress() does the following:
-1. read normalized counters with readHeader()
+1. read normalized counters with readNCount()
 2. build decoding table 'DTable' from normalized counters
 3. decode the data stream using decoding table 'DTable'

-The following API allows to trigger specific sub-functions for advanced tasks.
+The following API allows targeting specific sub-functions for advanced tasks.
 For example, it's possible to compress several blocks using the same 'CTable',
-or to save and provide normalized distribution using one's own method.
+or to save and provide normalized distribution using external method.
 */

 /* *** COMPRESSION *** */
@ -163,8 +190,8 @@ size_t FSE_writeNCount (void* buffer, size_t bufferSize, const short* normalized

 /*
 Constructor and Destructor of type FSE_CTable
-Not that its size depends on parameters 'tableLog' and 'maxSymbolValue' */
-typedef unsigned FSE_CTable;   /* don't allocate that. It's just a way to be more restrictive than void */
+    Note that its size depends on 'tableLog' and 'maxSymbolValue' */
+typedef unsigned FSE_CTable;   /* don't allocate that. It's just a way to be more restrictive than void* */
 FSE_CTable* FSE_createCTable (unsigned tableLog, unsigned maxSymbolValue);
 void        FSE_freeCTable (FSE_CTable* ct);

@ -173,30 +200,32 @@ FSE_buildCTable():
   Builds 'ct', which must be already allocated, using FSE_createCTable()
   return : 0
            or an errorCode, which can be tested using FSE_isError() */
-size_t   FSE_buildCTable(FSE_CTable* ct, const short* normalizedCounter, unsigned maxSymbolValue, unsigned tableLog);
+size_t FSE_buildCTable(FSE_CTable* ct, const short* normalizedCounter, unsigned maxSymbolValue, unsigned tableLog);

 /*
 FSE_compress_usingCTable():
   Compress 'src' using 'ct' into 'dst' which must be already allocated
-   return : size of compressed data
+   return : size of compressed data (<= maxDstSize)
+            or 0 if compressed data could not fit into 'dst'
            or an errorCode, which can be tested using FSE_isError() */
-size_t FSE_compress_usingCTable (void* dst, size_t dstSize, const void* src, size_t srcSize, const FSE_CTable* ct);
+size_t FSE_compress_usingCTable (void* dst, size_t maxDstSize, const void* src, size_t srcSize, const FSE_CTable* ct);

 /*
 Tutorial :
 ----------
-The first step is to count all symbols. FSE_count() provides one quick way to do this job.
+The first step is to count all symbols. FSE_count() does this job very fast.
 Result will be saved into 'count', a table of unsigned int, which must be already allocated, and have 'maxSymbolValuePtr[0]+1' cells.
 'src' is a table of bytes of size 'srcSize'. All values within 'src' MUST be <= maxSymbolValuePtr[0]
 maxSymbolValuePtr[0] will be updated, with its real value (necessarily <= original value)
 FSE_count() will return the number of occurrence of the most frequent symbol.
+This can be used to know if there is a single symbol within 'src', and to quickly evaluate its compressibility.
 If there is an error, the function will return an ErrorCode (which can be tested using FSE_isError()).

 The next step is to normalize the frequencies.
 FSE_normalizeCount() will ensure that sum of frequencies is == 2 ^'tableLog'.
-It also guarantees a minimum of 1 to any Symbol which frequency is >= 1.
-You can use input 'tableLog'==0 to mean "use default tableLog value".
-If you are unsure of which tableLog value to use, you can optionally call FSE_optimalTableLog(),
+It also guarantees a minimum of 1 to any Symbol with frequency >= 1.
+You can use 'tableLog'==0 to mean "use default tableLog value".
+If you are unsure of which tableLog value to use, you can ask FSE_optimalTableLog(),
 which will provide the optimal valid tableLog given sourceSize, maxSymbolValue, and a user-defined maximum (0 means "default").

 The result of FSE_normalizeCount() will be saved into a table,
@ -204,23 +233,23 @@ called 'normalizedCounter', which is a table of signed short.
 'normalizedCounter' must be already allocated, and have at least 'maxSymbolValue+1' cells.
 The return value is tableLog if everything proceeded as expected.
 It is 0 if there is a single symbol within distribution.
-If there is an error(typically, invalid tableLog value), the function will return an ErrorCode (which can be tested using FSE_isError()).
+If there is an error (ex: invalid tableLog value), the function will return an ErrorCode (which can be tested using FSE_isError()).

-'normalizedCounter' can be saved in a compact manner to a memory area using FSE_writeHeader().
-'header' buffer must be already allocated.
+'normalizedCounter' can be saved in a compact manner to a memory area using FSE_writeNCount().
+'buffer' must be already allocated.
 For guaranteed success, buffer size must be at least FSE_headerBound().
-The result of the function is the number of bytes written into 'header'.
-If there is an error, the function will return an ErrorCode (which can be tested using FSE_isError()) (for example, buffer size too small).
+The result of the function is the number of bytes written into 'buffer'.
+If there is an error, the function will return an ErrorCode (which can be tested using FSE_isError(); ex : buffer size too small).

 'normalizedCounter' can then be used to create the compression table 'CTable'.
-The space required by 'CTable' must be already allocated. Its size is provided by FSE_sizeof_CTable().
-'CTable' must be aligned of 4 bytes boundaries.
+The space required by 'CTable' must be already allocated, using FSE_createCTable().
 You can then use FSE_buildCTable() to fill 'CTable'.
-In both cases, if there is an error, the function will return an ErrorCode (which can be tested using FSE_isError()).
+If there is an error, both functions will return an ErrorCode (which can be tested using FSE_isError()).

 'CTable' can then be used to compress 'src', with FSE_compress_usingCTable().
 Similar to FSE_count(), the convention is that 'src' is assumed to be a table of char of size 'srcSize'
-The function returns the size of compressed data (without header).
+The function returns the size of compressed data (without header), necessarily <= maxDstSize.
+If it returns '0', compressed data could not fit into 'dst'.
 If there is an error, the function will return an ErrorCode (which can be tested using FSE_isError()).
 */

@ -237,26 +266,25 @@ size_t FSE_readNCount (short* normalizedCounter, unsigned* maxSymbolValuePtr, un

 /*
 Constructor and Destructor of type FSE_DTable
-Note that its size depends on parameters 'tableLog' */
-typedef unsigned FSE_DTable;   /* don't allocate that. It's just a way to be more restrictive than void */
+    Note that its size depends on 'tableLog' */
+typedef unsigned FSE_DTable;   /* don't allocate that. It's just a way to be more restrictive than void* */
 FSE_DTable* FSE_createDTable(unsigned tableLog);
 void        FSE_freeDTable(FSE_DTable* dt);

 /*
 FSE_buildDTable():
   Builds 'dt', which must be already allocated, using FSE_createDTable()
-   return : 1 if 'dt' is compatible with fast mode, 0 otherwise,
+   return : 0,
            or an errorCode, which can be tested using FSE_isError() */
 size_t FSE_buildDTable (FSE_DTable* dt, const short* normalizedCounter, unsigned maxSymbolValue, unsigned tableLog);

 /*
 FSE_decompress_usingDTable():
-   Decompress compressed source 'cSrc' of size 'cSrcSize'
-   using 'dt' into 'dst' which must be already allocated.
-   Use fastMode==1 only if authorized by result of FSE_buildDTable().
+   Decompress compressed source 'cSrc' of size 'cSrcSize' using 'dt'
+   into 'dst' which must be already allocated.
   return : size of regenerated data (necessarily <= maxDstSize)
            or an errorCode, which can be tested using FSE_isError() */
-size_t FSE_decompress_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const FSE_DTable* dt, size_t fastMode);
+size_t FSE_decompress_usingDTable(void* dst, size_t maxDstSize, const void* cSrc, size_t cSrcSize, const FSE_DTable* dt);

 /*
 Tutorial :
@ -266,26 +294,24 @@ Tutorial :
 If block is a single repeated byte, use memset() instead )

 The first step is to obtain the normalized frequencies of symbols.
-This can be performed by reading a header with FSE_readHeader().
-'normalizedCounter' must be already allocated, and have at least 'maxSymbolValuePtr[0]+1' cells of short.
+This can be performed by FSE_readNCount() if it was saved using FSE_writeNCount().
+'normalizedCounter' must be already allocated, and have at least 'maxSymbolValuePtr[0]+1' cells of signed short.
 In practice, that means it's necessary to know 'maxSymbolValue' beforehand,
 or size the table to handle worst case situations (typically 256).
-FSE_readHeader will provide 'tableLog' and 'maxSymbolValue' stored into the header.
-The result of FSE_readHeader() is the number of bytes read from 'header'.
-Note that 'headerSize' must be at least 4 bytes, even if useful information is less than that.
+FSE_readNCount() will provide 'tableLog' and 'maxSymbolValue'.
+The result of FSE_readNCount() is the number of bytes read from 'rBuffer'.
+Note that 'rBufferSize' must be at least 4 bytes, even if useful information is less than that.
 If there is an error, the function will return an error code, which can be tested using FSE_isError().

-The next step is to create the decompression tables 'FSE_DTable' from 'normalizedCounter'.
+The next step is to build the decompression tables 'FSE_DTable' from 'normalizedCounter'.
 This is performed by the function FSE_buildDTable().
 The space required by 'FSE_DTable' must be already allocated using FSE_createDTable().
-The function will return 1 if FSE_DTable is compatible with fastMode, 0 otherwise.
 If there is an error, the function will return an error code, which can be tested using FSE_isError().

 'FSE_DTable' can then be used to decompress 'cSrc', with FSE_decompress_usingDTable().
-Only trigger fastMode if it was authorized by the result of FSE_buildDTable(), otherwise decompression will fail.
-cSrcSize must be correct, otherwise decompression will fail.
-FSE_decompress_usingDTable() result will tell how many bytes were regenerated.
-If there is an error, the function will return an error code, which can be tested using FSE_isError().
+'cSrcSize' must be strictly correct, otherwise decompression will fail.
+FSE_decompress_usingDTable() result will tell how many bytes were regenerated (<=maxDstSize).
+If there is an error, the function will return an error code, which can be tested using FSE_isError(). (ex: dst buffer too small)
 */


--- a/lib/fse_static.h
+++ b/lib/fse_static.h
@ -48,12 +48,25 @@ extern "C" {
 /******************************************
 *  Static allocation
 ******************************************/
-#define FSE_MAX_HEADERSIZE 512
-#define FSE_COMPRESSBOUND(size) (size + (size>>7) + FSE_MAX_HEADERSIZE)   /* Macro can be useful for static allocation */
-/* You can statically allocate a CTable as a table of unsigned using below macro */
+/* FSE buffer bounds */
+#define FSE_NCOUNTBOUND 512
+#define FSE_BLOCKBOUND(size) (size + (size>>7))
+#define FSE_COMPRESSBOUND(size) (FSE_NCOUNTBOUND + FSE_BLOCKBOUND(size))   /* Macro version, useful for static allocation */
+
+/* You can statically allocate FSE CTable/DTable as a table of unsigned using below macro */
 #define FSE_CTABLE_SIZE_U32(maxTableLog, maxSymbolValue)   (1 + (1<<(maxTableLog-1)) + ((maxSymbolValue+1)*2))
 #define FSE_DTABLE_SIZE_U32(maxTableLog)                   (1 + (1<<maxTableLog))

+/* Huff0 buffer bounds */
+#define HUF_CTABLEBOUND 129
+#define HUF_BLOCKBOUND(size) (size + (size>>8) + 8)   /* only true if pre-filtered with fast heuristic */
+#define HUF_COMPRESSBOUND(size) (HUF_CTABLEBOUND + HUF_BLOCKBOUND(size))   /* Macro version, useful for static allocation */
+
+/* You can statically allocate Huff0 DTable as a table of unsigned short using below macro */
+#define HUF_DTABLE_SIZE_U16(maxTableLog)   (1 + (1<<maxTableLog))
+#define HUF_CREATE_STATIC_DTABLE(DTable, maxTableLog) \
+        unsigned short DTable[HUF_DTABLE_SIZE_U16(maxTableLog)] = { maxTableLog }
+

 /******************************************
 *  Error Management
@ -96,6 +109,7 @@ size_t FSE_buildDTable_rle (FSE_DTable* dt, unsigned char symbolValue);
   You will want to enable link-time-optimization to ensure these functions are properly inlined in your binary.
   Visual seems to do it automatically.
   For gcc or clang, you'll need to add -flto flag at compilation and linking stages.
+   If none of these solutions is applicable, include "fse.c" directly.
 */

 typedef struct
@ -104,6 +118,7 @@ typedef struct
    int    bitPos;
    char*  startPtr;
    char*  ptr;
+    char*  endPtr;
 } FSE_CStream_t;

 typedef struct
@ -114,10 +129,10 @@ typedef struct
    unsigned    stateLog;
 } FSE_CState_t;

-void   FSE_initCStream(FSE_CStream_t* bitC, void* dstBuffer);
+size_t FSE_initCStream(FSE_CStream_t* bitC, void* dstBuffer, size_t maxDstSize);
 void   FSE_initCState(FSE_CState_t* CStatePtr, const FSE_CTable* ct);

-void   FSE_encodeSymbol(FSE_CStream_t* bitC, FSE_CState_t* CStatePtr, unsigned char symbol);
+void   FSE_encodeSymbol(FSE_CStream_t* bitC, FSE_CState_t* CStatePtr, unsigned symbol);
 void   FSE_addBits(FSE_CStream_t* bitC, size_t value, unsigned nbBits);
 void   FSE_flushBits(FSE_CStream_t* bitC);

@ -133,17 +148,18 @@ So the first symbol you will encode is the last you will decode, like a LIFO sta

 You will need a few variables to track your CStream. They are :

-FSE_CTable ct;        // Provided by FSE_buildCTable()
-FSE_CStream_t bitC;   // bitStream tracking structure
-FSE_CState_t state;   // State tracking structure (can have several)
+FSE_CTable    ct;         // Provided by FSE_buildCTable()
+FSE_CStream_t bitStream;  // bitStream tracking structure
+FSE_CState_t  state;      // State tracking structure (can have several)


 The first thing to do is to init bitStream and state.
-    FSE_initCStream(&bitC, dstBuffer);
+    size_t errorCode = FSE_initCStream(&bitStream, dstBuffer, maxDstSize);
    FSE_initCState(&state, ct);

+Note that FSE_initCStream() can produce an error code, so its result should be tested, using FSE_isError();
 You can then encode your input data, byte after byte.
-FSE_encodeByte() outputs a maximum of 'tableLog' bits at a time.
+FSE_encodeSymbol() outputs a maximum of 'tableLog' bits at a time.
 Remember decoding will be done in reverse direction.
    FSE_encodeByte(&bitStream, &state, symbol);

@ -159,8 +175,9 @@ Writing data to memory is a manual operation, performed by the flushBits functio
 Your last FSE encoding operation shall be to flush your last state value(s).
    FSE_flushState(&bitStream, &state);

-Finally, you must then close the bitStream.
-The function returns the size in bytes of CStream.
+Finally, you must close the bitStream.
+The function returns the size of CStream in bytes.
+If data couldn't fit into dstBuffer, it will return a 0 ( == not compressible)
 If there is an error, it returns an errorCode (which can be tested using FSE_isError()).
    size_t size = FSE_closeCStream(&bitStream);
 */
@ -194,6 +211,12 @@ unsigned int  FSE_reloadDStream(FSE_DStream_t* bitD);
 unsigned FSE_endOfDStream(const FSE_DStream_t* bitD);
 unsigned FSE_endOfDState(const FSE_DState_t* DStatePtr);

+typedef enum { FSE_DStream_unfinished = 0,
+               FSE_DStream_endOfBuffer = 1,
+               FSE_DStream_completed = 2,
+               FSE_DStream_tooFar = 3 } FSE_DStream_status;  /* result of FSE_reloadDStream() */
+               /* 1,2,4,8 would be better for bitmap combinations, but slows down performance a bit ... ?! */
+
 /*
 Let's now decompose FSE_decompress_usingDTable() into its unitary components.
 You will decode FSE-encoded symbols from the bitStream,
@ -201,16 +224,16 @@ and also any other bitFields you put in, **in reverse order**.

 You will need a few variables to track your bitStream. They are :

-FSE_DStream_t DStream;  // Stream context
-FSE_DState_t DState;    // State context. Multiple ones are possible
-FSE_DTable dt;          // Decoding table, provided by FSE_buildDTable()
-U32 tableLog;           // Provided by FSE_readHeader()
+FSE_DStream_t DStream;    // Stream context
+FSE_DState_t  DState;     // State context. Multiple ones are possible
+FSE_DTable*   DTablePtr;  // Decoding table, provided by FSE_buildDTable()

 The first thing to do is to init the bitStream.
-    errorCode = FSE_initDStream(&DStream, &optionalId, srcBuffer, srcSize);
+    errorCode = FSE_initDStream(&DStream, srcBuffer, srcSize);

-You should then retrieve your initial state(s) :
-    errorCode = FSE_initDState(&DState, &DStream, dt, tableLog);
+You should then retrieve your initial state(s)
+(in reverse flushing order if you have several ones) :
+    errorCode = FSE_initDState(&DState, &DStream, DTablePtr);

 You can then decode your data, symbol after symbol.
 For information the maximum number of bits read by FSE_decodeSymbol() is 'tableLog'.
@ -218,28 +241,28 @@ Keep in mind that symbols are decoded in reverse order, like a LIFO stack (last
    unsigned char symbol = FSE_decodeSymbol(&DState, &DStream);

 You can retrieve any bitfield you eventually stored into the bitStream (in reverse order)
-Note : maximum allowed nbBits is 25
-    unsigned int bitField = FSE_readBits(&DStream, nbBits);
+Note : maximum allowed nbBits is 25, for 32-bits compatibility
+    size_t bitField = FSE_readBits(&DStream, nbBits);

-All above operations only read from local register (which size is controlled by bitD_t==32 bits).
+All above operations only read from local register (which size depends on size_t).
 Refueling the register from memory is manually performed by the reload method.
    endSignal = FSE_reloadDStream(&DStream);

 FSE_reloadDStream() result tells if there is still some more data to read from DStream.
-0 : there is still some data left into the DStream.
-1 : Dstream reached end of buffer, but is not yet fully extracted. It will not load data from memory any more.
-2 : Dstream reached its exact end, corresponding in general to decompression completed.
-3 : Dstream went too far. Decompression result is corrupted.
+FSE_DStream_unfinished : there is still some data left into the DStream.
+FSE_DStream_endOfBuffer : Dstream reached end of buffer. Its container may no longer be completely filled.
+FSE_DStream_completed : Dstream reached its exact end, corresponding in general to decompression completed.
+FSE_DStream_tooFar : Dstream went too far. Decompression result is corrupted.

-When reaching end of buffer(1), progress slowly, notably if you decode multiple symbols per loop,
+When reaching end of buffer (FSE_DStream_endOfBuffer), progress slowly, notably if you decode multiple symbols per loop,
 to properly detect the exact end of stream.
 After each decoded symbol, check if DStream is fully consumed using this simple test :
-    FSE_reloadDStream(&DStream) >= 2
+    FSE_reloadDStream(&DStream) >= FSE_DStream_completed

 When it's done, verify decompression is fully completed, by checking both DStream and the relevant states.
 Checking if DStream has reached its end is performed by :
    FSE_endOfDStream(&DStream);
-Check also the states. There might be some entropy left there, able to decode some high probability (>50%) symbol.
+Check also the states. There might be some symbols left there, if some high probability ones (>50%) are possible.
    FSE_endOfDState(&DState);
 */

@ -251,7 +274,7 @@ size_t FSE_readBitsFast(FSE_DStream_t* bitD, unsigned nbBits);
 /* faster, but works only if nbBits >= 1 (otherwise, result will be corrupted) */

 unsigned char FSE_decodeSymbolFast(FSE_DState_t* DStatePtr, FSE_DStream_t* bitD);
-/* faster, but works only if nbBits >= 1 (otherwise, result will be corrupted) */
+/* faster, but works only if allways nbBits >= 1 (otherwise, result will be corrupted) */


 #if defined (__cplusplus)
--- a/lib/zstd.c
+++ b/lib/zstd.c
--- a/lib/zstd.h
+++ b/lib/zstd.h
@ -46,8 +46,8 @@ extern "C" {
 *  Version
 **************************************/
 #define ZSTD_VERSION_MAJOR    0    /* for breaking interface changes  */
-#define ZSTD_VERSION_MINOR    0    /* for new (non-breaking) interface capabilities */
-#define ZSTD_VERSION_RELEASE  2    /* for tweaks, bug-fixes, or development */
+#define ZSTD_VERSION_MINOR    1    /* for new (non-breaking) interface capabilities */
+#define ZSTD_VERSION_RELEASE  0    /* for tweaks, bug-fixes, or development */
 #define ZSTD_VERSION_NUMBER  (ZSTD_VERSION_MAJOR *100*100 + ZSTD_VERSION_MINOR *100 + ZSTD_VERSION_RELEASE)
 unsigned ZSTD_versionNumber (void);

@ -64,8 +64,8 @@ size_t ZSTD_decompress( void* dst, size_t maxOriginalSize,
 /*
 ZSTD_compress() :
    Compresses 'srcSize' bytes from buffer 'src' into buffer 'dst', of maximum size 'dstSize'.
-    Destination buffer should be sized to handle worst cases situations (input data not compressible).
-    Worst case size evaluation is provided by function ZSTD_compressBound().
+    Destination buffer must be already allocated.
+    Compression runs faster if maxDstSize >=  ZSTD_compressBound(srcSize).
    return : the number of bytes written into buffer 'dst'
             or an error code if it fails (which can be tested using ZSTD_isError())

--- a/lib/zstd_static.h
+++ b/lib/zstd_static.h
@ -74,9 +74,9 @@ size_t ZSTD_decompressContinue(ZSTD_Dctx* dctx, void* dst, size_t maxDstSize, co
 **************************************/
 #define ZSTD_LIST_ERRORS(ITEM) \
        ITEM(ZSTD_OK_NoError) ITEM(ZSTD_ERROR_GENERIC) \
-        ITEM(ZSTD_ERROR_wrongMagicNumber) \
-        ITEM(ZSTD_ERROR_wrongSrcSize) ITEM(ZSTD_ERROR_maxDstSize_tooSmall) \
-        ITEM(ZSTD_ERROR_wrongLBlockSize) \
+        ITEM(ZSTD_ERROR_MagicNumber) \
+        ITEM(ZSTD_ERROR_SrcSize) ITEM(ZSTD_ERROR_maxDstSize_tooSmall) \
+        ITEM(ZSTD_ERROR_corruption) \
        ITEM(ZSTD_ERROR_maxCode)

 #define ZSTD_GENERATE_ENUM(ENUM) ENUM,
--- a/programs/.gitignore
+++ b/programs/.gitignore
@ -0,0 +1,31 @@
+# local binary (Makefile)
+zstd
+zstd32
+fullbench
+fullbench32
+fuzzer
+fuzzer32
+datagen
+
+# Object files
+*.o
+*.ko
+
+# Libraries
+*.lib
+*.a
+
+# Shared objects (inc. Windows DLLs)
+*.dll
+*.so
+*.so.*
+*.dylib
+
+# Executables
+*.exe
+*.out
+*.app
+
+# Visual solution files
+*.suo
+*.user
--- a/programs/Makefile
+++ b/programs/Makefile
@ -30,7 +30,7 @@
 # fullbench32: Same as fullbench, but forced to compile in 32-bits mode
 # ##########################################################################

-RELEASE?= v0.0.2
+RELEASE?= v0.1.0

 DESTDIR?=
 PREFIX ?= /usr
@ -61,7 +61,7 @@ default: zstd

 all: zstd zstd32 fullbench fullbench32 fuzzer fuzzer32 datagen

-zstd: $(ZSTDDIR)/zstd.c xxhash.c bench.c fileio.c zstdcli.c
+zstd  : $(ZSTDDIR)/zstd.c xxhash.c bench.c fileio.c zstdcli.c
 	$(CC)      $(FLAGS) $^ -o $@$(EXT)

 zstd32: $(ZSTDDIR)/zstd.c xxhash.c bench.c fileio.c zstdcli.c
@ -73,10 +73,10 @@ fullbench  : $(ZSTDDIR)/zstd.c datagen.c fullbench.c
 fullbench32: $(ZSTDDIR)/zstd.c datagen.c fullbench.c
 	$(CC) -m32 $(FLAGS) $^ -o $@$(EXT)

-fuzzer  : $(ZSTDDIR)/zstd.c xxhash.c fuzzer.c
+fuzzer  : $(ZSTDDIR)/zstd.c datagen.c xxhash.c fuzzer.c
 	$(CC)      $(FLAGS) $^ -o $@$(EXT)

-fuzzer32: $(ZSTDDIR)/zstd.c xxhash.c fuzzer.c
+fuzzer32: $(ZSTDDIR)/zstd.c datagen.c xxhash.c fuzzer.c
 	$(CC) -m32 $(FLAGS) $^ -o $@$(EXT)

 datagen : datagen.c datagencli.c
--- a/programs/bench.c
+++ b/programs/bench.c
@ -297,34 +297,37 @@ static int BMK_benchMem(void* srcBuffer, size_t srcSize, char* fileName, int cLe
            milliTime = BMK_GetMilliStart();
            while (BMK_GetMilliStart() == milliTime);
            milliTime = BMK_GetMilliStart();
-            while (BMK_GetMilliSpan(milliTime) < TIMELOOP)
+            for ( ; BMK_GetMilliSpan(milliTime) < TIMELOOP; nbLoops++)
            {
-                ZSTD_decompress(resultBuffer, srcSize, compressedBuffer, cSize);
-                nbLoops++;
+                size_t result = ZSTD_decompress(resultBuffer, srcSize, compressedBuffer, cSize);
+                if (ZSTD_isError(result))
+                {
+                    DISPLAY("\n!!! Decompression error !!! %s  !\n", ZSTD_getErrorName(result));
+                    break;
+                }
            }
            milliTime = BMK_GetMilliSpan(milliTime);

            if ((double)milliTime < fastestD*nbLoops) fastestD = (double)milliTime / nbLoops;
            DISPLAY("%1i-%-14.14s : %9i -> %9i (%5.2f%%),%7.1f MB/s ,%7.1f MB/s\r", loopNb, fileName, (int)srcSize, (int)cSize, ratio, (double)srcSize / fastestC / 1000., (double)srcSize / fastestD / 1000.);
-#endif

            /* CRC Checking */
            crcCheck = XXH64(resultBuffer, srcSize, 0);
            if (crcOrig!=crcCheck)
            {
-                unsigned i = 0;
+                unsigned i;
                DISPLAY("\n!!! WARNING !!! %14s : Invalid Checksum : %x != %x\n", fileName, (unsigned)crcOrig, (unsigned)crcCheck);
-                while (i<srcSize)
+                for (i=0; i<srcSize; i++)
                {
                    if (((BYTE*)srcBuffer)[i] != ((BYTE*)resultBuffer)[i])
                    {
                        printf("\nDecoding error at pos %u   \n", i);
                        break;
                    }
-                    i++;
                }
                break;
            }
+#endif
        }

        if (crcOrig == crcCheck)
--- a/programs/datagen.c
+++ b/programs/datagen.c
@ -153,7 +153,7 @@ void RDG_genBlock(void* buffer, size_t buffSize, size_t prefixSize, double match
        memset(buffPtr+pos, 0, size0);
        pos += size0;
        buffPtr[pos-1] = RDG_genChar(seed, lt);
-        return;
+        continue;
    }

    /* init */
@ -200,7 +200,7 @@ void RDG_genBuffer(void* buffer, size_t size, double matchProba, double litProba

 #define RDG_DICTSIZE  (32 KB)
 #define RDG_BLOCKSIZE (128 KB)
-void RDG_genOut(unsigned long long size, double matchProba, double litProba, unsigned seed)
+void RDG_genStdout(unsigned long long size, double matchProba, double litProba, unsigned seed)
 {
    BYTE* buff = (BYTE*)malloc(RDG_DICTSIZE + RDG_BLOCKSIZE);
    U64 total = 0;
--- a/programs/datagen.h
+++ b/programs/datagen.h
@ -26,15 +26,15 @@

 #include <stddef.h>   /* size_t */

-void RDG_genOut(unsigned long long size, double matchProba, double litProba, unsigned seed);
+void RDG_genStdout(unsigned long long size, double matchProba, double litProba, unsigned seed);
 void RDG_genBuffer(void* buffer, size_t size, double matchProba, double litProba, unsigned seed);
 /* RDG_genBuffer
   Generate 'size' bytes of compressible data into 'buffer'.
-   Compressibility can be controlled using 'matchProba'.
-   'LitProba' is optional, and affect variability of individual bytes. If litProba==0.0, default value is used.
+   Compressibility can be controlled using 'matchProba', which is floating point value between 0 and 1.
+   'LitProba' is optional, it affect variability of individual bytes. If litProba==0.0, default value will be used.
   Generated data pattern can be modified using different 'seed'.
-   If (matchProba, litProba and seed) are equal, the function always generate the same content.
+   For a triplet (matchProba, litProba, seed), the function always generate the same content.

-   RDG_genOut
-   Same as RDG_genBuffer, but generate data towards stdout
+   RDG_genStdout
+   Same as RDG_genBuffer, but generates data into stdout
 */
--- a/programs/datagencli.c
+++ b/programs/datagencli.c
@ -183,7 +183,7 @@ int main(int argc, char** argv)
    DISPLAYLEVEL(3, "Seed = %u \n", seed);
    if (proba!=COMPRESSIBILITY_DEFAULT) DISPLAYLEVEL(3, "Compressibility : %i%%\n", (U32)(proba*100));

-    RDG_genOut(size, proba, litProba, seed);
+    RDG_genStdout(size, proba, litProba, seed);
    DISPLAYLEVEL(1, "\n");

    return 0;
--- a/programs/fullbench.c
+++ b/programs/fullbench.c
@ -229,9 +229,7 @@ typedef struct
 static size_t g_cSize = 0;

 extern size_t ZSTD_getcBlockSize(const void* src, size_t srcSize, blockProperties_t* bpPtr);
-extern size_t ZSTD_decodeLiteralsBlock(void* ctx, void* dst, size_t maxDstSize, const BYTE** litPtr, const void* src, size_t srcSize);
-extern size_t ZSTD_decodeSeqHeaders(size_t* lastLLPtr, const BYTE** dumpsPtr, FSE_DTable* DTableLL, FSE_DTable* DTableML, FSE_DTable* DTableOffb, const void* src, size_t srcSize);
-
+extern size_t ZSTD_decodeSeqHeaders(int* nbSeq, const BYTE** dumpsPtr, FSE_DTable* DTableLL, FSE_DTable* DTableML, FSE_DTable* DTableOffb, const void* src, size_t srcSize);

 size_t local_ZSTD_compress(void* dst, size_t dstSize, void* buff2, const void* src, size_t srcSize)
 {
@ -245,12 +243,14 @@ size_t local_ZSTD_decompress(void* dst, size_t dstSize, void* buff2, const void*
    return ZSTD_decompress(dst, dstSize, buff2, g_cSize);
 }

+extern size_t ZSTD_decodeLiteralsBlock(void* ctx, void* dst, size_t maxDstSize, const BYTE** litStart, size_t* litSize, const void* src, size_t srcSize);
 size_t local_ZSTD_decodeLiteralsBlock(void* dst, size_t dstSize, void* buff2, const void* src, size_t srcSize)
 {
    U32 ctx[1<<12];
    const BYTE* ll;
+    size_t llSize;
    (void)src; (void)srcSize;
-    ZSTD_decodeLiteralsBlock(ctx, dst, dstSize, &ll, buff2, g_cSize);
+    ZSTD_decodeLiteralsBlock(ctx, dst, dstSize, &ll, &llSize, buff2, g_cSize);
    return (const BYTE*)dst + dstSize - ll;
 }

@ -258,9 +258,9 @@ size_t local_ZSTD_decodeSeqHeaders(void* dst, size_t dstSize, void* buff2, const
 {
    U32 DTableML[1<<11], DTableLL[1<<10], DTableOffb[1<<9];
    const BYTE* dumps;
-    size_t lastllSize;
+    int nbSeq;
    (void)src; (void)srcSize; (void)dst; (void)dstSize;
-    return ZSTD_decodeSeqHeaders(&lastllSize, &dumps, DTableLL, DTableML, DTableOffb, buff2, g_cSize);
+    return ZSTD_decodeSeqHeaders(&nbSeq, &dumps, DTableLL, DTableML, DTableOffb, buff2, g_cSize);
 }

 size_t local_conditionalNull(void* dst, size_t dstSize, void* buff2, const void* src, size_t srcSize)
--- a/programs/fuzzer.c
+++ b/programs/fuzzer.c
@ -47,6 +47,7 @@
 #include <sys/timeb.h>   /* timeb */
 #include <string.h>      /* strcmp */
 #include "zstd_static.h"
+#include "datagen.h"     /* RDG_genBuffer */
 #include "xxhash.h"      /* XXH64 */


@ -138,47 +139,7 @@ unsigned int FUZ_rand(unsigned int* src)
 }


-#define FUZ_RAND15BITS  (FUZ_rand(seed) & 0x7FFF)
-#define FUZ_RANDLENGTH  ( (FUZ_rand(seed) & 3) ? (FUZ_rand(seed) % 15) : (FUZ_rand(seed) % 510) + 15)
-static void FUZ_generateSynthetic(void* buffer, size_t bufferSize, double proba, U32* seed)
-{
-    BYTE* BBuffer = (BYTE*)buffer;
-    unsigned pos = 0;
-    U32 P32 = (U32)(32768 * proba);
-
-    // First Byte
-    BBuffer[pos++] = (BYTE)((FUZ_rand(seed) & 0x3F) + '0');
-
-    while (pos < bufferSize)
-    {
-        // Select : Literal (noise) or copy (within 64K)
-        if (FUZ_RAND15BITS < P32)
-        {
-            // Copy (within 64K)
-            size_t match, end;
-            size_t length = FUZ_RANDLENGTH + 4;
-            size_t offset = FUZ_RAND15BITS + 1;
-            if (offset > pos) offset = pos;
-            if (pos + length > bufferSize) length = bufferSize - pos;
-            match = pos - offset;
-            end = pos + length;
-            while (pos < end) BBuffer[pos++] = BBuffer[match++];
-        }
-        else
-        {
-            // Literal (noise)
-            size_t end;
-            size_t length = FUZ_RANDLENGTH;
-            if (pos + length > bufferSize) length = bufferSize - pos;
-            end = pos + length;
-            while (pos < end) BBuffer[pos++] = (BYTE)((FUZ_rand(seed) & 0x3F) + '0');
-        }
-    }
-}
-
-
-/*
-static unsigned FUZ_highbit(U32 v32)
+static unsigned FUZ_highbit32(U32 v32)
 {
    unsigned nbBits = 0;
    if (v32==0) return 0;
@ -189,7 +150,6 @@ static unsigned FUZ_highbit(U32 v32)
    }
    return nbBits;
 }
-*/


 static int basicUnitTests(U32 seed, double compressibility)
@ -202,7 +162,7 @@ static int basicUnitTests(U32 seed, double compressibility)
    size_t result, cSize;
    U32 testNb=0;

-    // Create compressible test buffer
+    /* Create compressible test buffer */
    CNBuffer = malloc(COMPRESSIBLE_NOISE_LENGTH);
    compressedBuffer = malloc(ZSTD_compressBound(COMPRESSIBLE_NOISE_LENGTH));
    decodedBuffer = malloc(COMPRESSIBLE_NOISE_LENGTH);
@ -212,9 +172,9 @@ static int basicUnitTests(U32 seed, double compressibility)
        testResult = 1;
        goto _end;
    }
-    FUZ_generateSynthetic(CNBuffer, COMPRESSIBLE_NOISE_LENGTH, compressibility, &randState);
+    RDG_genBuffer(CNBuffer, COMPRESSIBLE_NOISE_LENGTH, compressibility, 0., randState);

-    // Basic tests
+    /* Basic tests */
    DISPLAYLEVEL(4, "test%3i : compress %u bytes : ", testNb++, COMPRESSIBLE_NOISE_LENGTH);
    result = ZSTD_compress(compressedBuffer, ZSTD_compressBound(COMPRESSIBLE_NOISE_LENGTH), CNBuffer, COMPRESSIBLE_NOISE_LENGTH);
    if (ZSTD_isError(result)) goto _output_error;
@ -239,37 +199,36 @@ static int basicUnitTests(U32 seed, double compressibility)
    DISPLAYLEVEL(4, "test%3i : decompress with 1 missing byte : ", testNb++);
    result = ZSTD_decompress(decodedBuffer, COMPRESSIBLE_NOISE_LENGTH, compressedBuffer, cSize-1);
    if (!ZSTD_isError(result)) goto _output_error;
-    if (result != (size_t)-ZSTD_ERROR_wrongSrcSize) goto _output_error;
+    if (result != (size_t)-ZSTD_ERROR_SrcSize) goto _output_error;
    DISPLAYLEVEL(4, "OK \n");

    DISPLAYLEVEL(4, "test%3i : decompress with 1 too much byte : ", testNb++);
    result = ZSTD_decompress(decodedBuffer, COMPRESSIBLE_NOISE_LENGTH, compressedBuffer, cSize+1);
    if (!ZSTD_isError(result)) goto _output_error;
-    if (result != (size_t)-ZSTD_ERROR_wrongSrcSize) goto _output_error;
+    if (result != (size_t)-ZSTD_ERROR_SrcSize) goto _output_error;
    DISPLAYLEVEL(4, "OK \n");

    /* Decompression defense tests */
    DISPLAYLEVEL(4, "test%3i : Check input length for magic number : ", testNb++);
    result = ZSTD_decompress(decodedBuffer, COMPRESSIBLE_NOISE_LENGTH, CNBuffer, 3);
    if (!ZSTD_isError(result)) goto _output_error;
-    if (result != (size_t)-ZSTD_ERROR_wrongSrcSize) goto _output_error;
+    if (result != (size_t)-ZSTD_ERROR_SrcSize) goto _output_error;
    DISPLAYLEVEL(4, "OK \n");

    DISPLAYLEVEL(4, "test%3i : Check magic Number : ", testNb++);
    ((char*)(CNBuffer))[0] = 1;
    result = ZSTD_decompress(decodedBuffer, COMPRESSIBLE_NOISE_LENGTH, CNBuffer, 4);
    if (!ZSTD_isError(result)) goto _output_error;
-    if (result != (size_t)-ZSTD_ERROR_wrongMagicNumber) goto _output_error;
    DISPLAYLEVEL(4, "OK \n");

    /* long rle test */
    {
        size_t sampleSize = 0;
        DISPLAYLEVEL(4, "test%3i : Long RLE test : ", testNb++);
-        FUZ_generateSynthetic(CNBuffer, sampleSize, compressibility, &randState);
+        RDG_genBuffer(CNBuffer, sampleSize, compressibility, 0., randState);
        memset((char*)CNBuffer+sampleSize, 'B', 256 KB - 1);
        sampleSize += 256 KB - 1;
-        FUZ_generateSynthetic((char*)CNBuffer+sampleSize, 96 KB, compressibility, &randState);
+        RDG_genBuffer((char*)CNBuffer+sampleSize, 96 KB, compressibility, 0., randState);
        sampleSize += 96 KB;
        cSize = ZSTD_compress(compressedBuffer, ZSTD_compressBound(sampleSize), CNBuffer, sampleSize);
        if (ZSTD_isError(cSize)) goto _output_error;
@ -314,6 +273,7 @@ static const U32 maxSampleLog = 22;

 int fuzzerTests(U32 seed, U32 nbTests, unsigned startTest, double compressibility)
 {
+    BYTE* cNoiseBuffer[5];
    BYTE* srcBuffer;
    BYTE* cBuffer;
    BYTE* dstBuffer;
@ -323,54 +283,172 @@ int fuzzerTests(U32 seed, U32 nbTests, unsigned startTest, double compressibilit
    U32 result = 0;
    U32 testNb = 0;
    U32 coreSeed = seed, lseed = 0;
-    (void)startTest; (void)compressibility;

    /* allocation */
-    srcBuffer = (BYTE*)malloc (srcBufferSize);
+    cNoiseBuffer[0] = (BYTE*)malloc (srcBufferSize);
+    cNoiseBuffer[1] = (BYTE*)malloc (srcBufferSize);
+    cNoiseBuffer[2] = (BYTE*)malloc (srcBufferSize);
+    cNoiseBuffer[3] = (BYTE*)malloc (srcBufferSize);
+    cNoiseBuffer[4] = (BYTE*)malloc (srcBufferSize);
    dstBuffer = (BYTE*)malloc (dstBufferSize);
    cBuffer   = (BYTE*)malloc (cBufferSize);
-    CHECK (!srcBuffer || !dstBuffer || !cBuffer, "Not enough memory, fuzzer tests cancelled");
+    CHECK (!cNoiseBuffer[0] || !cNoiseBuffer[1] || !cNoiseBuffer[2] || !dstBuffer || !cBuffer,
+           "Not enough memory, fuzzer tests cancelled");

-    /* Create initial sample */
-    FUZ_generateSynthetic(srcBuffer, srcBufferSize, 0.50, &coreSeed);
+    /* Create initial samples */
+    RDG_genBuffer(cNoiseBuffer[0], srcBufferSize, 0.00, 0., coreSeed);    /* pure noise */
+    RDG_genBuffer(cNoiseBuffer[1], srcBufferSize, 0.05, 0., coreSeed);    /* barely compressible */
+    RDG_genBuffer(cNoiseBuffer[2], srcBufferSize, compressibility, 0., coreSeed);
+    RDG_genBuffer(cNoiseBuffer[3], srcBufferSize, 0.95, 0., coreSeed);    /* highly compressible */
+    RDG_genBuffer(cNoiseBuffer[4], srcBufferSize, 1.00, 0., coreSeed);    /* sparse content */
+    srcBuffer = cNoiseBuffer[2];

    /* catch up testNb */
-    for (testNb=0; testNb < startTest; testNb++)
+    for (testNb=1; testNb < startTest; testNb++)
        FUZ_rand(&coreSeed);

    /* test loop */
-    for (testNb=startTest; testNb < nbTests; testNb++)
+    for ( ; testNb <= nbTests; testNb++ )
    {
        size_t sampleSize, sampleStart;
        size_t cSize, dSize, dSupSize;
-        U32 sampleSizeLog;
+        U32 sampleSizeLog, buffNb;
        U64 crcOrig, crcDest;

        /* init */
        DISPLAYUPDATE(2, "\r%6u/%6u   ", testNb, nbTests);
        FUZ_rand(&coreSeed);
        lseed = coreSeed ^ prime1;
+        buffNb = FUZ_rand(&lseed) & 127;
+        if (buffNb & 7) buffNb=2;
+        else
+        {
+            buffNb >>= 3;
+            if (buffNb & 7)
+            {
+                const U32 tnb[2] = { 1, 3 };
+                buffNb = tnb[buffNb >> 3];
+            }
+            else
+            {
+                const U32 tnb[2] = { 0, 4 };
+                buffNb = tnb[buffNb >> 3];
+            }
+        }
+        srcBuffer = cNoiseBuffer[buffNb];
        sampleSizeLog = FUZ_rand(&lseed) % maxSampleLog;
-        sampleSize = (size_t)1<<sampleSizeLog;
+        sampleSize = (size_t)1 << sampleSizeLog;
        sampleSize += FUZ_rand(&lseed) & (sampleSize-1);
        sampleStart = FUZ_rand(&lseed) % (srcBufferSize - sampleSize);
        crcOrig = XXH64(srcBuffer + sampleStart, sampleSize, 0);

-        /* compression tests*/
+        /* compression test */
        cSize = ZSTD_compress(cBuffer, cBufferSize, srcBuffer + sampleStart, sampleSize);
        CHECK(ZSTD_isError(cSize), "ZSTD_compress failed");

-        /* decompression tests*/
+        /* compression failure test : too small dest buffer */
+        if (cSize > 3)
+        {
+            size_t errorCode;
+            const size_t missing = (FUZ_rand(&lseed) % (cSize-2)) + 1;   /* no problem, as cSize > 4 (frameHeaderSizer) */
+            const size_t tooSmallSize = cSize - missing;
+            static const U32 endMark = 0x4DC2B1A9;
+            U32 endCheck;
+            memcpy(dstBuffer+tooSmallSize, &endMark, 4);
+            errorCode = ZSTD_compress(dstBuffer, tooSmallSize, srcBuffer + sampleStart, sampleSize);
+            CHECK(!ZSTD_isError(errorCode), "ZSTD_compress should have failed ! (buffer too small)");
+            memcpy(&endCheck, dstBuffer+tooSmallSize, 4);
+            CHECK(endCheck != endMark, "ZSTD_compress : dst buffer overflow");
+        }
+
+        /* successfull decompression tests*/
        dSupSize = (FUZ_rand(&lseed) & 1) ? 0 : (FUZ_rand(&lseed) & 31) + 1;
        dSize = ZSTD_decompress(dstBuffer, sampleSize + dSupSize, cBuffer, cSize);
        CHECK(dSize != sampleSize, "ZSTD_decompress failed (%s)", ZSTD_getErrorName(dSize));
        crcDest = XXH64(dstBuffer, sampleSize, 0);
        CHECK(crcOrig != crcDest, "dstBuffer corrupted (pos %u / %u)", (U32)findDiff(srcBuffer+sampleStart, dstBuffer, sampleSize), (U32)sampleSize);
+
+        /* truncated src decompression test */
+        {
+            size_t errorCode;
+            const size_t missing = (FUZ_rand(&lseed) % (cSize-2)) + 1;   /* no problem, as cSize > 4 (frameHeaderSizer) */
+            const size_t tooSmallSize = cSize - missing;
+            void* cBufferTooSmall = malloc(tooSmallSize);   /* valgrind will catch overflows */
+            CHECK(cBufferTooSmall == NULL, "not enough memory !");
+            memcpy(cBufferTooSmall, cBuffer, tooSmallSize);
+            errorCode = ZSTD_decompress(dstBuffer, dstBufferSize, cBufferTooSmall, tooSmallSize);
+            CHECK(!ZSTD_isError(errorCode), "ZSTD_decompress should have failed ! (truncated src buffer)");
+            free(cBufferTooSmall);
+        }
+
+        /* too small dst decompression test */
+        if (sampleSize > 3)
+        {
+            size_t errorCode;
+            const size_t missing = (FUZ_rand(&lseed) % (sampleSize-2)) + 1;   /* no problem, as cSize > 4 (frameHeaderSizer) */
+            const size_t tooSmallSize = sampleSize - missing;
+            static const BYTE token = 0xA9;
+            dstBuffer[tooSmallSize] = token;
+            errorCode = ZSTD_decompress(dstBuffer, tooSmallSize, cBuffer, cSize);
+            CHECK(!ZSTD_isError(errorCode), "ZSTD_decompress should have failed : %u > %u (dst buffer too small)", (U32)errorCode, (U32)tooSmallSize);
+            CHECK(dstBuffer[tooSmallSize] != token, "ZSTD_decompress : dst buffer overflow");
+        }
+
+        /* noisy src decompression test */
+        if (cSize > 6)
+        {
+            const U32 maxNbBits = FUZ_highbit32((U32)(cSize-4));
+            size_t pos = 4;   /* preserve magic number (too easy to detect) */
+            U32 nbBits = FUZ_rand(&lseed) % maxNbBits;
+            size_t mask = (1<<nbBits) - 1;
+            size_t skipLength = FUZ_rand(&lseed) & mask;
+            pos += skipLength;
+
+            while (pos < cSize)
+            {
+                /* add noise */
+                size_t noiseStart, noiseLength;
+                nbBits = FUZ_rand(&lseed) % maxNbBits;
+                if (nbBits>0) nbBits--;
+                mask = (1<<nbBits) - 1;
+                noiseLength = (FUZ_rand(&lseed) & mask) + 1;
+                if ( pos+noiseLength > cSize ) noiseLength = cSize-pos;
+                noiseStart = FUZ_rand(&lseed) % (srcBufferSize - noiseLength);
+                memcpy(cBuffer + pos, srcBuffer + noiseStart, noiseLength);
+                pos += noiseLength;
+
+                /* keep some original src */
+                nbBits = FUZ_rand(&lseed) % maxNbBits;
+                mask = (1<<nbBits) - 1;
+                skipLength = FUZ_rand(&lseed) & mask;
+                pos += skipLength;
+            }
+
+            /* decompress noisy source */
+            {
+                U32 noiseSrc = FUZ_rand(&lseed) % 5;
+                const U32 endMark = 0xA9B1C3D6;
+                U32 endCheck;
+                size_t errorCode;
+                srcBuffer = cNoiseBuffer[noiseSrc];
+                memcpy(dstBuffer+sampleSize, &endMark, 4);
+                errorCode = ZSTD_decompress(dstBuffer, sampleSize, cBuffer, cSize);
+                /* result *may* be an unlikely success, but even then, it must strictly respect dest buffer boundaries */
+                CHECK((!ZSTD_isError(errorCode)) && (errorCode>sampleSize),
+                      "ZSTD_decompress on noisy src : result is too large : %u > %u (dst buffer)", (U32)errorCode, (U32)sampleSize);
+                memcpy(&endCheck, dstBuffer+sampleSize, 4);
+                CHECK(endMark!=endCheck, "ZSTD_decompress on noisy src : dst buffer overflow");
+            }
+        }
    }
    DISPLAY("\rAll fuzzer tests completed   \n");

 _cleanup:
-    free(srcBuffer);
+    free(cNoiseBuffer[0]);
+    free(cNoiseBuffer[1]);
+    free(cNoiseBuffer[2]);
+    free(cNoiseBuffer[3]);
+    free(cNoiseBuffer[4]);
    free(cBuffer);
    free(dstBuffer);
    return result;
@ -393,8 +471,9 @@ int FUZ_usage(char* programName)
    DISPLAY( " -i#    : Nb of tests (default:%u) \n", nbTestsDefault);
    DISPLAY( " -s#    : Select seed (default:prompt user)\n");
    DISPLAY( " -t#    : Select starting test number (default:0)\n");
-    DISPLAY( " -p#    : Select compressibility in %% (default:%i%%)\n", FUZ_COMPRESSIBILITY_DEFAULT);
+    DISPLAY( " -P#    : Select compressibility in %% (default:%i%%)\n", FUZ_COMPRESSIBILITY_DEFAULT);
    DISPLAY( " -v     : verbose\n");
+    DISPLAY( " -p     : pause at the end\n");
    DISPLAY( " -h     : display help and exit\n");
    return 0;
 }
--- a/programs/xxhash.c
+++ b/programs/xxhash.c
@ -35,13 +35,26 @@ You can contact the author at :
 /**************************************
 *  Tuning parameters
 **************************************/
-/* Unaligned memory access is automatically enabled for "common" CPU, such as x86.
- * For others CPU, the compiler will be more cautious, and insert extra code to ensure aligned access is respected.
- * If you know your target CPU supports unaligned memory access, you want to force this option manually to improve performance.
- * You can also enable this parameter if you know your input data will always be aligned (boundaries of 4, for U32).
+/* XXH_FORCE_MEMORY_ACCESS
+ * By default, access to unaligned memory is controlled by `memcpy()`, which is safe and portable.
+ * Unfortunately, on some target/compiler combinations, the generated assembly is sub-optimal.
+ * The below switch allow to select different access method for improved performance.
+ * Method 0 (default) : use `memcpy()`. Safe and portable.
+ * Method 1 : `__packed` statement. It depends on compiler extension (ie, not portable).
+ *            This method is safe if your compiler supports it, and *generally* as fast or faster than `memcpy`.
+ * Method 2 : direct access. This method is portable but violate C standard.
+ *            It can generate buggy code on targets which generate assembly depending on alignment.
+ *            But in some circumstances, it's the only known way to get the most performance (ie GCC + ARMv6)
+ * See http://stackoverflow.com/a/32095106/646947 for details.
+ * Prefer these methods in priority order (0 > 1 > 2)
 */
-#if defined(__ARM_FEATURE_UNALIGNED) || defined(__i386) || defined(_M_IX86) || defined(__x86_64__) || defined(_M_X64)
-#  define XXH_USE_UNALIGNED_ACCESS 1
+#ifndef XXH_FORCE_MEMORY_ACCESS   /* can be defined externally, on command line for example */
+#  if defined(__GNUC__) && ( defined(__ARM_ARCH_6__) || defined(__ARM_ARCH_6J__) || defined(__ARM_ARCH_6K__) || defined(__ARM_ARCH_6Z__) || defined(__ARM_ARCH_6ZK__) || defined(__ARM_ARCH_6T2__) )
+#    define XXH_FORCE_MEMORY_ACCESS 2
+#  elif defined(__INTEL_COMPILER) || \
+  (defined(__GNUC__) && ( defined(__ARM_ARCH_7__) || defined(__ARM_ARCH_7A__) || defined(__ARM_ARCH_7R__) || defined(__ARM_ARCH_7M__) || defined(__ARM_ARCH_7S__) ))
+#    define XXH_FORCE_MEMORY_ACCESS 1
+#  endif
 #endif

 /* XXH_ACCEPT_NULL_INPUT_POINTER :
@ -55,12 +68,21 @@ You can contact the author at :
 * By default, xxHash library provides endian-independant Hash values, based on little-endian convention.
 * Results are therefore identical for little-endian and big-endian CPU.
 * This comes at a performance cost for big-endian CPU, since some swapping is required to emulate little-endian format.
- * Should endian-independance be of no importance for your application, you may set the #define below to 1.
- * It will improve speed for Big-endian CPU.
+ * Should endian-independance be of no importance for your application, you may set the #define below to 1,
+ * to improve speed for Big-endian CPU.
 * This option has no impact on Little_Endian CPU.
 */
 #define XXH_FORCE_NATIVE_FORMAT 0

+/* XXH_USELESS_ALIGN_BRANCH :
+ * This is a minor performance trick, only useful with lots of very small keys.
+ * It means : don't make a test between aligned/unaligned, because performance will be the same.
+ * It saves one initial branch per hash.
+ */
+#if defined(__i386) || defined(_M_IX86) || defined(__x86_64__) || defined(_M_X64)
+#  define XXH_USELESS_ALIGN_BRANCH 1
+#endif
+

 /**************************************
 *  Compiler Specific Options
@ -113,20 +135,43 @@ static void* XXH_memcpy(void* dest, const void* src, size_t size) { return memcp
  typedef unsigned long long U64;
 #endif

+
+#if (defined(XXH_FORCE_MEMORY_ACCESS) && (XXH_FORCE_MEMORY_ACCESS==2))
+
+/* Force direct memory access. Only works on CPU which support unaligned memory access in hardware */
+static U32 XXH_read32(const void* memPtr) { return *(const U32*) memPtr; }
+static U64 XXH_read64(const void* memPtr) { return *(const U64*) memPtr; }
+
+#elif (defined(XXH_FORCE_MEMORY_ACCESS) && (XXH_FORCE_MEMORY_ACCESS==1))
+
+/* __pack instructions are safer, but compiler specific, hence potentially problematic for some compilers */
+/* currently only defined for gcc and icc */
+typedef union { U32 u32; U64 u64; } __attribute__((packed)) unalign;
+
+static U32 XXH_read32(const void* ptr) { return ((const unalign*)ptr)->u32; }
+static U64 XXH_read64(const void* ptr) { return ((const unalign*)ptr)->u64; }
+
+#else
+
+/* portable and safe solution. Generally efficient.
+ * see : http://stackoverflow.com/a/32095106/646947
+ */
+
 static U32 XXH_read32(const void* memPtr)
 {
-    U32 val32;
-    memcpy(&val32, memPtr, 4);
-    return val32;
+    U32 val;
+    memcpy(&val, memPtr, sizeof(val));
+    return val;
 }

 static U64 XXH_read64(const void* memPtr)
 {
-    U64 val64;
-    memcpy(&val64, memPtr, 8);
-    return val64;
+    U64 val;
+    memcpy(&val, memPtr, sizeof(val));
+    return val;
 }

+#endif // XXH_FORCE_DIRECT_MEMORY_ACCESS


 /******************************************
@ -175,8 +220,10 @@ static U64 XXH_swap64 (U64 x)
 *  Architecture Macros
 ***************************************/
 typedef enum { XXH_bigEndian=0, XXH_littleEndian=1 } XXH_endianess;
-#ifndef XXH_CPU_LITTLE_ENDIAN   /* XXH_CPU_LITTLE_ENDIAN can be defined externally, for example using a compiler switch */
-static const int one = 1;
+
+/* XXH_CPU_LITTLE_ENDIAN can be defined externally, for example one the compiler command line */
+#ifndef XXH_CPU_LITTLE_ENDIAN
+    static const int one = 1;
 #   define XXH_CPU_LITTLE_ENDIAN   (*(const char*)(&one))
 #endif

@ -315,7 +362,7 @@ FORCE_INLINE U32 XXH32_endian_align(const void* input, size_t len, U32 seed, XXH
 }


-unsigned XXH32 (const void* input, size_t len, unsigned seed)
+unsigned int XXH32 (const void* input, size_t len, unsigned int seed)
 {
 #if 0
    /* Simple version, good for code maintenance, but unfortunately slow for small inputs */
@ -326,7 +373,7 @@ unsigned XXH32 (const void* input, size_t len, unsigned seed)
 #else
    XXH_endianess endian_detected = (XXH_endianess)XXH_CPU_LITTLE_ENDIAN;

-#  if !defined(XXH_USE_UNALIGNED_ACCESS)
+#  if !defined(XXH_USELESS_ALIGN_BRANCH)
    if ((((size_t)input) & 3) == 0)   /* Input is 4-bytes aligned, leverage the speed benefit */
    {
        if ((endian_detected==XXH_littleEndian) || XXH_FORCE_NATIVE_FORMAT)
@ -466,7 +513,7 @@ unsigned long long XXH64 (const void* input, size_t len, unsigned long long seed
 #else
    XXH_endianess endian_detected = (XXH_endianess)XXH_CPU_LITTLE_ENDIAN;

-#  if !defined(XXH_USE_UNALIGNED_ACCESS)
+#  if !defined(XXH_USELESS_ALIGN_BRANCH)
    if ((((size_t)input) & 7)==0)   /* Input is aligned, let's leverage the speed advantage */
    {
        if ((endian_detected==XXH_littleEndian) || XXH_FORCE_NATIVE_FORMAT)
@ -538,7 +585,7 @@ XXH_errorcode XXH64_freeState(XXH64_state_t* statePtr)

 /*** Hash feed ***/

-XXH_errorcode XXH32_reset(XXH32_state_t* state_in, U32 seed)
+XXH_errorcode XXH32_reset(XXH32_state_t* state_in, unsigned int seed)
 {
    XXH_istate32_t* state = (XXH_istate32_t*) state_in;
    state->seed = seed;
@ -708,7 +755,7 @@ FORCE_INLINE U32 XXH32_digest_endian (const XXH32_state_t* state_in, XXH_endiane
 }


-U32 XXH32_digest (const XXH32_state_t* state_in)
+unsigned int XXH32_digest (const XXH32_state_t* state_in)
 {
    XXH_endianess endian_detected = (XXH_endianess)XXH_CPU_LITTLE_ENDIAN;

--- a/visual/2012/fuzzer/fuzzer.vcxproj
+++ b/visual/2012/fuzzer/fuzzer.vcxproj
@ -161,6 +161,7 @@
  <ItemGroup>
    <ClCompile Include="..\..\..\lib\fse.c" />
    <ClCompile Include="..\..\..\lib\zstd.c" />
+    <ClCompile Include="..\..\..\programs\datagen.c" />
    <ClCompile Include="..\..\..\programs\fuzzer.c" />
    <ClCompile Include="..\..\..\programs\xxhash.c" />
  </ItemGroup>
@ -169,6 +170,7 @@
    <ClInclude Include="..\..\..\lib\fse_static.h" />
    <ClInclude Include="..\..\..\lib\zstd.h" />
    <ClInclude Include="..\..\..\lib\zstd_static.h" />
+    <ClInclude Include="..\..\..\programs\datagen.h" />
    <ClInclude Include="..\..\..\programs\xxhash.h" />
  </ItemGroup>
  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
--- a/visual/2012/fuzzer/fuzzer.vcxproj.filters
+++ b/visual/2012/fuzzer/fuzzer.vcxproj.filters
@ -27,6 +27,9 @@
    <ClCompile Include="..\..\..\programs\xxhash.c">
      <Filter>Fichiers sources</Filter>
    </ClCompile>
+    <ClCompile Include="..\..\..\programs\datagen.c">
+      <Filter>Fichiers sources</Filter>
+    </ClCompile>
  </ItemGroup>
  <ItemGroup>
    <ClInclude Include="..\..\..\lib\fse.h">
@ -44,5 +47,8 @@
    <ClInclude Include="..\..\..\programs\xxhash.h">
      <Filter>Fichiers d%27en-tête</Filter>
    </ClInclude>
+    <ClInclude Include="..\..\..\programs\datagen.h">
+      <Filter>Fichiers d%27en-tête</Filter>
+    </ClInclude>
  </ItemGroup>
 </Project>