qt5base-lts/tests/auto/corelib/tools/collections
Lars Knoll 5b7c3e31b5 New QHash implementation
A brand new QHash implementation using a faster and more memory efficient data
structure than the old QHash.

A new implementation for QHash. Instead of a node based approach as the old
QHash, this implementation now uses a two stage lookup table. The total
amount of buckets in the table are divided into spans of 128 entries.
Inside each span, we use an array of chars to index into a storage area
for the span.

The storage area for each span is a simple array, that gets (re-)allocated
with size increments of 16 items. This gives an average memory overhead of
8*sizeof(struct{ Key; Value; }) + 128*sizeof(char) + 16 for each span.

To give good performance and avoid too many collisions, the array keeps its
load factor between .25 and .5 (and grows and rehashes if the load factor goes
above .5).

This design allows us to keep the memory overhead of the Hash very small, while
at the same time giving very good performance. The calculated overhead for a
QHash<int, int> comes to 1.7-3.3 bytes per entry and to 2.2-4.3 bytes for
a QHash<ptr, ptr>.

The new implementation also completely splits the QHash and QMultiHash classes.

One behavioral change to note is that the new QHash implementation will not
provide stable references to nodes in the hash when the table needs to grow.

Benchmarking using https://github.com/Tessil/hash-table-shootout shows
very nice performance compared to many different hash table implementation.
Numbers shown below are for a hash<int64, int64> with 1 million entries. These
numbers scale nicely (mostly in a linear fashion with some variation due to
varying load factors) to smaller and larger tables. All numbers are in seconds,
measured with gcc on Linux:

Hash table              random     random     random  random reads   full
                        insertion  insertion  full    full   after   iteration
                                   (reserved) deletes reads  deletes
------------------------------------------------------------------------------
std::unordered_map      0,3842     0,1969     0,4511  0,1300 0,1169  0,0708
google::dense_hash_map  0,1091     0,0846     0,0550  0,0452 0,0754  0,0160
google::sparse_hash_map 0,2888     0,1582     0,0948  0,1020 0,1348  0,0112
tsl::sparse_map         0,1487     0,1013     0,0735  0,0448 0,0505  0,0042
old QHash               0,2886     0,1798     0,5065  0,0840 0,0717  0,1387
new QHash               0,0940     0,0714     0,1494  0,0579 0,0449  0,0146

Numbers for hash<std::string, int64>, with the string having 15 characters:

Hash table              random     random     random  random reads
                        insertion  insertion  full    full   after
                                   (reserved) deletes reads  deletes
--------------------------------------------------------------------
std::unordered_map      0,4993     0,2563     0,5515  0,2950 0,2153
google::dense_hash_map  0,2691     0,1870     0,1547  0,1125 0,1622
google::sparse_hash_map 0,6979     0,3304     0,1884  0,1822 0,2122
tsl::sparse_map         0,4066     0,2586     0,1929  0,1146 0,1095
old QHash               0,3236     0,2064     0,5986  0,2115 0,1666
new QHash               0,2119     0,1652     0,2390  0,1378 0,0965

Memory usage numbers (in MB for a table with 1M entries) also look very nice:

Hash table        Key   int64      std::string (15 chars)
                  Value int64      int64
---------------------------------------------------------
std::unordered_map      44.63      75.35
google::dense_hash_map  32.32      80,60
google::sparse_hash_map 18.08      44.21
tsl::sparse_map         20.44      45,93
old QHash               53.95      69,16
new QHash               23.23      51,32

Fixes: QTBUG-80311
Change-Id: I5679734144bc9bca2102acbe725fcc2fa89f0dff
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2020-04-09 20:02:55 +02:00
..
.gitignore Move tst_collections test to corelib/tools. 2014-05-13 16:08:01 +02:00
collections.pro Remove QLinkedList 2020-02-19 21:01:07 +01:00
tst_collections.cpp New QHash implementation 2020-04-09 20:02:55 +02:00