QHash: tame HasQHashSingleArgOverload ODR violations

qhashfunctions.h defines a catch-all 2-arguments qHash(T, seed) in order to support datatypes that implement a 1-argument overload of qHash (i.e. qHash(Type)). The catch-all calls the 1-argument overload and XORs the result with the seed. The catch-all is constrained on the existence of such a 1-argument overload. This is done in order to make the catch-all SFINAE-friendly; otherwise merely instantiating the catch-all would trigger a hard error. Such an error would make it impossible to build a type trait that detects if one can call qHash(T, size_t) for a given type T. The constraint itself is called HasQHashSingleArgOverload and lives in a private namespace. It has been observed that HasQHashSingleArgOverload misbehaves for some datatypes. For instance, HasQHashSingleArgOverload<int> is actually false, despite qHash(123) being perfectly callable. (The second argument of qHash(int, size_t) is defaulted, so the call *is* possible.) -- Why is HasQHashSingleArgOverload<int> false? This has to do with how HasQHashSingleArgOverload<T> is implemented: as a detection trait that checks if qHash(declval<T>()) is callable. The detection itself is not a problem. Consider this code: template <typename T> constexpr bool HasQHashSingleArgOverload = /* magic */; class MyClass {}; size_t qHash(MyClass); static_assert(HasQHashSingleArgOverload<MyClass>); // OK Here, the static_assert passes, even if qHash(MyClass) (and MyClass itself) were not defined at all when HasQHashSingleArgOverload was defined. This is nothing but 2-phase lookup at work ([temp.dep.res]): the detection inside HasQHashSingleArgOverload takes into account the qHash overloads available when HasQHashSingleArgOverload was declared, as well as any other overload declared before the "point of instantiation". This means that qHash(MyClass) will be visible and detected. Let's try something slightly different: template <typename T> constexpr bool HasQHashSingleArgOverload = /* magic */; size_t qHash(int); static_assert(HasQHashSingleArgOverload<int>); // ERROR This one *does not work*. How is it possible? The answer is that 2-phase name lookup combines the names found at definition time with the names _found at instantiation time using argument-dependent lookup only_. `int` is a fundamental type and does not participate in ADL. In the example, HasQHashSingleArgOverload has actually no qHash overloads to even consider, and therefore its detection fails. You can restore detection by moving the declaration of the qHash(int) overload *before* the definition of HasQHashSingleArgOverload, so it's captured at definition time: size_t qHash(int); template <typename T> constexpr bool HasQHashSingleArgOverload = /* magic */; static_assert(HasQHashSingleArgOverload<int>); // OK! This is why HasQHashSingleArgOverload<int> is currently returning `false`: because HasQHashSingleArgOverload is defined *before* all the qHash(fundamental_type) overloads in qhashfunctions.h. -- Now consider this variation of the above, where we keep the qHash(int) overload after the detector (so, it's not found), but also prepend an Evil class implicitly convertible from int: struct Evil { Evil(int); }; size_t qHash(Evil); template <typename T> constexpr bool HasQHashSingleArgOverload = /* magic */; size_t qHash(int); static_assert(HasQHashSingleArgOverload<int>); // OK Now the static_assert passes. HasQHashSingleArgOverload is still not considering qHash(int) (it's declared after), but it's considering qHash(Evil). Can you call *that* one with an int? Yes, after a conversion to Evil. This is extremely fragile and likely an ODR violation (if not ODR, then likely falls into [temp.dep.candidate/1]). -- Does this "really matter" for a type like `int`? The answer is no. If HasQHashSingleArgOverload<int> is true, then a call like qHash(42, 123uz); will have two overloads in its overloads set: 1) qHash(int, size_t) 2) qHash(T, size_t), i.e. the catch-all template. To be pedantic, qHash<int>(const int &, size_t), that is, the instantiation of the catch-all after template type deduction for T (= int) ([over.match.funcs.general/8]). Although it may look like this is ambiguous as both calls have perfect matches for the arguments, 1) is actually a better match than 2) because it is not a template specialization ([over.match.best/2.4]). In other words: qHash(int, size_t) is *always* called when the argument is `int`, no matter the value of HasQHashSingleArgOverload<int>. The catch-all template may be added or not to the overload set, but it's a worse match anyways. -- Now, let's consider this code: enum MyEnum { E1, E2, E3 }; qHash(E1, 42uz); This code compiles, although we do not define any qHash overload specifically for enumeration types (nor one is defined by MyEnum's author). Which qHash overload gets called? Again there are two possible overloads available: 1) qHash(int, size_t). E1 can be converted to `int` ([conv.prom/3]), and this overload selected. 2) qHash(T, size_t), which after instantiation, is qHash<MyEnum>(const MyEnum &, size_t). In this case, 2) is a better match than 1), because it does not require any conversion for the arguments. Is 2) a viable overload? Unfortunately the answer here is "it depends", because it's subject to what we've learned before: since the catch-all is constrained by the HasQHashSingleArgOverload trait, names introduced before the trait may exclude or include the overload. This code: #include <qhashfunctions.h> enum MyEnum { E1, E2, E3 }; qHash(E1, 42uz); static_assert(HasQHashSingleArgOverload<MyEnum>); // ERROR will fail the static_assert. This means that only qHash(int, size_t) is in the overload set. However, this code: struct Evil { Evil(int); }; size_t qHash(Evil); #include <qhashfunctions.h> enum MyEnum { E1, E2, E3 }; qHash(E1, 42uz); static_assert(HasQHashSingleArgOverload<MyEnum>); // OK will pass the static_assert. qHash(Evil) can be called with an object of type MyEnum after an user-defined conversion sequence ([over.best.ics.general], [over.ics.user]: a standard conversion sequence, made of a lvalue-to-rvalue conversion + a integral promotion, followed by a conversion by constructor [class.conv.ctor]). Therefore, HasQHashSingleArgOverload<MyEnum> is true here; the catch-all template is added to the overload set; and it's a best match for the qHash(E1, 42uz) call. -- Is this a problem? **Yes**, and a huge one: the catch-all template does not yield the same value as the qHash(int, size_t) overload. This means that calculating hash values (e.g. QHash, QSet) will have different results depending on include ordering! A translation unit TU1 may have #include <QSet> #include <Evil> QSet<MyEnum> calculateSet { /* ... */ } And another translation unit TU2 may have #include <Evil> #include <QSet> // different order void use() { QSet<MyEnum> set = calculateSet(); } And now the two TUs cannot exchange QHash/QSet objects as they would hash the contents differently. -- `Evil` actually exists in Qt. The bug report specifies QKeySequence, which has an implicit constructor from int, but one can concoct infinite other examples. -- Congratulations if you've read so far. ========================= === PROPOSED SOLUTION === ========================= 1) Move the HasQHashSingleArgOverload detection after declaring the overloads for all the fundamental types (which we already do anyways). This means that HasQHashSingleArgOverload<fundamental_type> will now be true. It also means that the catch-all becomes available for all fundamental types, but as discussed before, for all of them we have better matches anyways. 2) For unscoped enumeration types, this means however an ABI break: the catch-all template becomes always the best match. Code compiled before this change would call qHash(int, size_t), and code compiled after this change would call the catch-all qHash<Enum>(Enum, size_t); as discussed before, the two don't yield the same results, so mixing old code and new code will break. In order to restore the old behavior, add a qHash overload for enumeration types that forwards the implementation to the integer overloads (using qToUnderlying¹). (Here I'm considering the "old", correct behavior the one that one gets by simply including QHash/QSet, declaring an enumeration and calling qHash on it. In other words, without having Evil around before including QHash.) This avoids an ABI break for most enumeration types, for which one does not explicitly define a qHash overload. It however *introduces* an ABI break for enumeration types for which there is a single-argument qHash(E) overload. This is because - before this change, the catch-all template was called, and that in turn called qHash(E) and XOR'ed the result with the seed; - after this change, the newly introduced qHash overload for enumerations gets called. It's very likely that it would not give the same result as before. I don't have a solution for this, so we'll have to accept the ABI break. Note that if one defines a two-arguments overload for an enum type, then nothing changes there (the overload is still the best match). 3) Make plans to kill the catch-all template, for Qt 7.0 at the latest. We've asked users to provide a two-args qHash overload for a very long time, it's time to stop working around that. 4) Make plans to switch from overloading qHash to specializing std::hash (or equivalent). Specializations don't overload, and we'd get rid of all these troubles with implicit conversions. -- ¹ To nitpick, qToUnderlying may select a *different* overload than the one selected by an implicit conversion. That's because an unscoped enumeration without a fixed underlying type is allowed to have an underlying type U, and implicitly convert to V, with U and V being two different types (!). U is "an integral type that can represent all the enumerator values" ([dcl.enum/7]). V is selected in a specific list in a specific order ([conv.prom]/3). This means that in theory a compiler can take enum E { E1, E2 }, give it `unsigned long long` as underlying type, and still allow for a conversion to `int`. As far as I know, no compiler we use does something as crazy as that, but if it's a concern, it needs to be fixed. [ChangeLog][Deprecation Notice] Support for overloads of qHash with only one argument is going to be removed in Qt 7. Users are encouraged to upgrade to the two-arguments overload. Please refer to the QHash documentation for more information. [ChangeLog][Potentially Binary-Incompatible Changes] If an enumeration type for which a single-argument qHash overload has been declared is being used as a key type in QHash, QMultiHash or QSet, then objects of these types are no longer binary compatible with code compiled against an earlier version of Qt. It is very unlikely that such qHash overloads exist, because enumeration types work out of the box as keys Qt unordered associative containers; users do not need to define qHash overloads for their custom enumerations. Note that there is no binary incompatibity if a *two* arguments qHash overload has been declared instead. Fixes: QTBUG-108032 Fixes: QTBUG-107033 Pick-to: 6.2 6.4 Change-Id: I2ebffb2820c553e5fdc3a341019433793a58e3ab Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io> Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2022-10-28 14:06:12 +02:00 · 2022-10-28 14:06:12 +02:00 · fb4bc5fa26
commit fb4bc5fa26
parent aa05d73076
4 changed files with 46 additions and 16 deletions
--- a/src/corelib/doc/snippets/code/src_corelib_tools_qhash.cpp
+++ b/src/corelib/doc/snippets/code/src_corelib_tools_qhash.cpp
@ -297,11 +297,11 @@ inline size_t qHash(const std::unordered_set<int> &key, size_t seed = 0)
 //! [31]

 //! [32]
-size_t qHash(K key);
-size_t qHash(const K &key);
-
 size_t qHash(K key, size_t seed);
 size_t qHash(const K &key, size_t seed);
+
+size_t qHash(K key);        // deprecated, do not use
+size_t qHash(const K &key); // deprecated, do not use
 //! [32]

 //! [33]
--- a/src/corelib/tools/qhash.cpp
+++ b/src/corelib/tools/qhash.cpp
@ -1683,10 +1683,15 @@ size_t qHash(long double key, size_t seed) noexcept

    The two-arguments overloads take an unsigned integer that should be used to
    seed the calculation of the hash function. This seed is provided by QHash
-    in order to prevent a family of \l{algorithmic complexity attacks}. If both
-    a one-argument and a two-arguments overload are defined for a key type,
-    the latter is used by QHash (note that you can simply define a
-    two-arguments version, and use a default value for the seed parameter).
+    in order to prevent a family of \l{algorithmic complexity attacks}.
+
+    \note In Qt 6 it is possible to define a \c{qHash()} overload
+    taking only one argument; support for this is deprecated. Starting
+    with Qt 7, it will be mandatory to use a two-arguments overload. If
+    both a one-argument and a two-arguments overload are defined for a
+    key type, the latter is used by QHash (note that you can simply
+    define a two-arguments version, and use a default value for the
+    seed parameter).

    The second way to provide a hashing function is by specializing
    the \c{std::hash} class for the key type \c{K}, and providing a
--- a/src/corelib/tools/qhashfunctions.h
+++ b/src/corelib/tools/qhashfunctions.h
@ -68,14 +68,6 @@ Q_DECL_CONST_FUNCTION constexpr size_t hash(size_t key, size_t seed) noexcept
    }
 }

-template <typename T, typename = void>
-constexpr inline bool HasQHashSingleArgOverload = false;
-
-template <typename T>
-constexpr inline bool HasQHashSingleArgOverload<T, std::enable_if_t<
-    std::is_convertible_v<decltype(qHash(std::declval<const T &>())), size_t>
->> = true;
-
 template <typename T1, typename T2> static constexpr bool noexceptPairHash();
 }

@ -141,6 +133,9 @@ Q_DECL_CONST_FUNCTION constexpr inline size_t qHash(std::nullptr_t, size_t seed
 {
    return seed;
 }
+template <class Enum, std::enable_if_t<std::is_enum_v<Enum>, bool> = true>
+Q_DECL_CONST_FUNCTION constexpr inline size_t qHash(Enum e, size_t seed = 0) noexcept
+{ return QHashPrivate::hash(qToUnderlying(e), seed); }

 // (some) Qt types
 Q_DECL_CONST_FUNCTION constexpr inline size_t qHash(const QChar key, size_t seed = 0) noexcept { return qHash(key.unicode(), seed); }
@ -168,9 +163,26 @@ template <typename Enum>
 Q_DECL_CONST_FUNCTION constexpr inline size_t qHash(QFlags<Enum> flags, size_t seed = 0) noexcept
 { return qHash(flags.toInt(), seed); }

-template <typename T, std::enable_if_t<QHashPrivate::HasQHashSingleArgOverload<T>, bool> = true>
+// ### Qt 7: remove this "catch-all" overload logic, and require users
+// to provide the two-argument version of qHash.
+#if (QT_VERSION < QT_VERSION_CHECK(7, 0, 0))
+// Beware of moving this code from here. It needs to see all the
+// declarations of qHash overloads for C++ fundamental types *before*
+// its own declaration.
+namespace QHashPrivate {
+template <typename T, typename = void>
+constexpr inline bool HasQHashSingleArgOverload = false;
+
+template <typename T>
+constexpr inline bool HasQHashSingleArgOverload<T, std::enable_if_t<
+    std::is_convertible_v<decltype(qHash(std::declval<const T &>())), size_t>
+>> = true;
+}
+
+template <typename T, std::enable_if_t<QHashPrivate::HasQHashSingleArgOverload<T> && !std::is_enum_v<T>, bool> = true>
 size_t qHash(const T &t, size_t seed) noexcept(noexcept(qHash(t)))
 { return qHash(t) ^ seed; }
+#endif // < Qt 7

 template<typename T>
 bool qHashEquals(const T &a, const T &b)
--- a/tests/auto/corelib/tools/qhashfunctions/tst_qhashfunctions.cpp
+++ b/tests/auto/corelib/tools/qhashfunctions/tst_qhashfunctions.cpp
@ -53,6 +53,8 @@ private Q_SLOTS:
    void stdPair_string_pairIntInt()    { stdPair_template(QString("Hello"), std::make_pair(42, -47)); } // QTBUG-92910
    void stdPair_int_pairIntPairIntInt() { stdPair_template(1, std::make_pair(2, std::make_pair(3, 4))); }

+    void enum_int_consistent_hash_qtbug108032();
+
 #if QT_DEPRECATED_SINCE(6, 6)
    void setGlobalQHashSeed();
 #endif
@ -372,6 +374,17 @@ void tst_QHashFunctions::stdPair_template(const T1 &t1, const T2 &t2)
    QCOMPARE(qHash(vpair, seed), qHash(vpair, seed));
 }

+void tst_QHashFunctions::enum_int_consistent_hash_qtbug108032()
+{
+    enum E { E1, E2, E3 };
+
+    static_assert(QHashPrivate::HasQHashSingleArgOverload<E>);
+
+    QCOMPARE(qHash(E1, seed), qHash(int(E1), seed));
+    QCOMPARE(qHash(E2, seed), qHash(int(E2), seed));
+    QCOMPARE(qHash(E3, seed), qHash(int(E3), seed));
+}
+
 #if QT_DEPRECATED_SINCE(6, 6)
 void tst_QHashFunctions::setGlobalQHashSeed()
 {