QHash: tame HasQHashSingleArgOverload ODR violations

qhashfunctions.h defines a catch-all 2-arguments qHash(T, seed)
in order to support datatypes that implement a 1-argument overload
of qHash (i.e. qHash(Type)). The catch-all calls the 1-argument
overload and XORs the result with the seed.

The catch-all is constrained on the existence of such a 1-argument
overload. This is done in order to make the catch-all SFINAE-friendly;
otherwise merely instantiating the catch-all would trigger a hard error.
Such an error would make it impossible to build a type trait that
detects if one can call qHash(T, size_t) for a given type T.

The constraint itself is called HasQHashSingleArgOverload and lives in a
private namespace.

It has been observed that HasQHashSingleArgOverload misbehaves for
some datatypes. For instance, HasQHashSingleArgOverload<int> is actually
false, despite qHash(123) being perfectly callable. (The second argument
of qHash(int, size_t) is defaulted, so the call *is* possible.)

--

Why is HasQHashSingleArgOverload<int> false?

This has to do with how HasQHashSingleArgOverload<T> is implemented: as
a detection trait that checks if qHash(declval<T>()) is callable.

The detection itself is not a problem. Consider this code:

  template <typename T>
    constexpr bool HasQHashSingleArgOverload = /* magic */;

  class MyClass {};
  size_t qHash(MyClass);

  static_assert(HasQHashSingleArgOverload<MyClass>); // OK

Here, the static_assert passes, even if qHash(MyClass) (and MyClass
itself) were not defined at all when HasQHashSingleArgOverload was
defined.

This is nothing but 2-phase lookup at work ([temp.dep.res]): the
detection inside HasQHashSingleArgOverload takes into account the qHash
overloads available when HasQHashSingleArgOverload was declared, as well
as any other overload declared before the "point of instantiation". This
means that qHash(MyClass) will be visible and detected.

Let's try something slightly different:

  template <typename T>
    constexpr bool HasQHashSingleArgOverload = /* magic */;

  size_t qHash(int);

  static_assert(HasQHashSingleArgOverload<int>); // ERROR

This one *does not work*. How is it possible? The answer is that 2-phase
name lookup combines the names found at definition time with the names
_found at instantiation time using argument-dependent lookup only_.
`int` is a fundamental type and does not participate in ADL. In the
example, HasQHashSingleArgOverload has actually no qHash overloads to
even consider, and therefore its detection fails.

You can restore detection by moving the declaration of the qHash(int)
overload *before* the definition of HasQHashSingleArgOverload, so it's
captured at definition time:

  size_t qHash(int);

  template <typename T>
    constexpr bool HasQHashSingleArgOverload = /* magic */;

  static_assert(HasQHashSingleArgOverload<int>); // OK!

This is why HasQHashSingleArgOverload<int> is currently returning
`false`: because HasQHashSingleArgOverload is defined *before* all the
qHash(fundamental_type) overloads in qhashfunctions.h.

--

Now consider this variation of the above, where we keep the qHash(int)
overload after the detector (so, it's not found), but also prepend an
Evil class implicitly convertible from int:

  struct Evil { Evil(int); };
  size_t qHash(Evil);

  template <typename T> constexpr bool HasQHashSingleArgOverload = /* magic */;

  size_t qHash(int);

  static_assert(HasQHashSingleArgOverload<int>); // OK

Now the static_assert passes. HasQHashSingleArgOverload is still not
considering qHash(int) (it's declared after), but it's considering
qHash(Evil). Can you call *that* one with an int? Yes, after a
conversion to Evil.

This is extremely fragile and likely an ODR violation (if not ODR, then
likely falls into [temp.dep.candidate/1]).

--

Does this "really matter" for a type like `int`? The answer is no. If
HasQHashSingleArgOverload<int> is true, then a call like

  qHash(42, 123uz);

will have two overloads in its overloads set:

1) qHash(int, size_t)
2) qHash(T, size_t), i.e. the catch-all template. To be pedantic,
qHash<int>(const int &, size_t), that is, the instantiation of the
catch-all after template type deduction for T (= int)
([over.match.funcs.general/8]).

Although it may look like this is ambiguous as both calls have perfect
matches for the arguments, 1) is actually a better match than 2) because
it is not a template specialization ([over.match.best/2.4]).

In other words: qHash(int, size_t) is *always* called when the argument
is `int`, no matter the value of HasQHashSingleArgOverload<int>. The
catch-all template may be added or not to the overload set, but it's
a worse match anyways.

--

Now, let's consider this code:

  enum MyEnum { E1, E2, E3 };
  qHash(E1, 42uz);

This code compiles, although we do not define any qHash overload
specifically for enumeration types (nor one is defined by MyEnum's
author).

Which qHash overload gets called?  Again there are two possible
overloads available:

1) qHash(int, size_t). E1 can be converted to `int` ([conv.prom/3]),
and this overload selected.

2) qHash(T, size_t), which after instantiation, is qHash<MyEnum>(const
MyEnum &, size_t).

In this case, 2) is a better match than 1), because it does not require
any conversion for the arguments.

Is 2) a viable overload? Unfortunately the answer here is "it depends",
because it's subject to what we've learned before: since the catch-all
is constrained by the HasQHashSingleArgOverload trait, names introduced
before the trait may exclude or include the overload.

This code:

  #include <qhashfunctions.h>

  enum MyEnum { E1, E2, E3 };
  qHash(E1, 42uz);
  static_assert(HasQHashSingleArgOverload<MyEnum>); // ERROR

will fail the static_assert. This means that only qHash(int, size_t) is
in the overload set.

However, this code:

  struct Evil { Evil(int); };
  size_t qHash(Evil);

  #include <qhashfunctions.h>

  enum MyEnum { E1, E2, E3 };
  qHash(E1, 42uz);
  static_assert(HasQHashSingleArgOverload<MyEnum>); // OK

will pass the static_assert. qHash(Evil) can be called with an object of
type MyEnum after an user-defined conversion sequence
([over.best.ics.general], [over.ics.user]: a standard conversion
sequence, made of a lvalue-to-rvalue conversion + a integral promotion,
followed by a conversion by constructor [class.conv.ctor]).
Therefore, HasQHashSingleArgOverload<MyEnum> is true here; the catch-all
template is added to the overload set; and it's a best match for the
qHash(E1, 42uz) call.

--

Is this a problem? **Yes**, and a huge one: the catch-all template does
not yield the same value as the qHash(int, size_t) overload. This means
that calculating hash values (e.g. QHash, QSet) will have different
results depending on include ordering!

A translation unit TU1 may have

  #include <QSet>
  #include <Evil>

  QSet<MyEnum> calculateSet { /* ... */ }

And another translation unit TU2 may have

  #include <Evil>
  #include <QSet> // different order

  void use() {
    QSet<MyEnum> set = calculateSet();
  }

And now the two TUs cannot exchange QHash/QSet objects as they would
hash the contents differently.

--

`Evil` actually exists in Qt. The bug report specifies QKeySequence,
which has an implicit constructor from int, but one can concoct infinite
other examples.

--

Congratulations if you've read so far.

=========================
=== PROPOSED SOLUTION ===
=========================

1) Move the HasQHashSingleArgOverload detection after declaring the
overloads for all the fundamental types (which we already do anyways).
This means that HasQHashSingleArgOverload<fundamental_type> will now
be true. It also means that the catch-all becomes available for all
fundamental types, but as discussed before, for all of them we have
better matches anyways.

2) For unscoped enumeration types, this means however an ABI break: the
catch-all template becomes always the best match. Code compiled before
this change would call qHash(int, size_t), and code compiled after this
change would call the catch-all qHash<Enum>(Enum, size_t); as discussed
before, the two don't yield the same results, so mixing old code and new
code will break.

In order to restore the old behavior, add a qHash overload for
enumeration types that forwards the implementation to the integer
overloads (using qToUnderlying¹).

(Here I'm considering the "old", correct behavior the one that one gets
by simply including QHash/QSet, declaring an enumeration and calling
qHash on it. In other words, without having Evil around before including
QHash.)

This avoids an ABI break for most enumeration types, for which one
does not explicitly define a qHash overload. It however *introduces*
an ABI break for enumeration types for which there is a single-argument
qHash(E) overload. This is because

- before this change, the catch-all template was called, and that
in turn called qHash(E) and XOR'ed the result with the seed;
- after this change, the newly introduced qHash overload for
enumerations gets called. It's very likely that it would not give
the same result as before.

I don't have a solution for this, so we'll have to accept the ABI
break.

Note that if one defines a two-arguments overload for an enum type,
then nothing changes there (the overload is still the best match).

3) Make plans to kill the catch-all template, for Qt 7.0 at the latest.
We've asked users to provide a two-args qHash overload for a very long
time, it's time to stop working around that.

4) Make plans to switch from overloading qHash to specializing std::hash
(or equivalent). Specializations don't overload, and we'd get rid of
all these troubles with implicit conversions.

--

¹ To nitpick, qToUnderlying may select a *different* overload than
the one selected by an implicit conversion.

That's because an unscoped enumeration without a fixed underlying type
is allowed to have an underlying type U, and implicitly convert to V,
with U and V being two different types (!).

U is "an integral type that can represent all the enumerator values"
([dcl.enum/7]). V is selected in a specific list in a specific order
([conv.prom]/3). This means that in theory a compiler can take enum E {
E1, E2 }, give it `unsigned long long` as underlying type, and still
allow for a conversion to `int`.

As far as I know, no compiler we use does something as crazy as that,
but if it's a concern, it needs to be fixed.

[ChangeLog][Deprecation Notice] Support for overloads of qHash with only
one argument is going to be removed in Qt 7. Users are encouraged to
upgrade to the two-arguments overload. Please refer to the QHash
documentation for more information.

[ChangeLog][Potentially Binary-Incompatible Changes] If an enumeration
type for which a single-argument qHash overload has been declared is
being used as a key type in QHash, QMultiHash or QSet, then objects of
these types are no longer binary compatible with code compiled against
an earlier version of Qt. It is very unlikely that such qHash overloads
exist, because enumeration types work out of the box as keys Qt
unordered associative containers; users do not need to define qHash
overloads for their custom enumerations. Note that there is no binary
incompatibity if a *two* arguments qHash overload has been declared
instead.

Fixes: QTBUG-108032
Fixes: QTBUG-107033
Pick-to: 6.2 6.4
Change-Id: I2ebffb2820c553e5fdc3a341019433793a58e3ab
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This commit is contained in:
Giuseppe D'Angelo 2022-10-28 14:06:12 +02:00
parent aa05d73076
commit fb4bc5fa26
4 changed files with 46 additions and 16 deletions

View File

@ -297,11 +297,11 @@ inline size_t qHash(const std::unordered_set<int> &key, size_t seed = 0)
//! [31]
//! [32]
size_t qHash(K key);
size_t qHash(const K &key);
size_t qHash(K key, size_t seed);
size_t qHash(const K &key, size_t seed);
size_t qHash(K key); // deprecated, do not use
size_t qHash(const K &key); // deprecated, do not use
//! [32]
//! [33]

View File

@ -1683,10 +1683,15 @@ size_t qHash(long double key, size_t seed) noexcept
The two-arguments overloads take an unsigned integer that should be used to
seed the calculation of the hash function. This seed is provided by QHash
in order to prevent a family of \l{algorithmic complexity attacks}. If both
a one-argument and a two-arguments overload are defined for a key type,
the latter is used by QHash (note that you can simply define a
two-arguments version, and use a default value for the seed parameter).
in order to prevent a family of \l{algorithmic complexity attacks}.
\note In Qt 6 it is possible to define a \c{qHash()} overload
taking only one argument; support for this is deprecated. Starting
with Qt 7, it will be mandatory to use a two-arguments overload. If
both a one-argument and a two-arguments overload are defined for a
key type, the latter is used by QHash (note that you can simply
define a two-arguments version, and use a default value for the
seed parameter).
The second way to provide a hashing function is by specializing
the \c{std::hash} class for the key type \c{K}, and providing a

View File

@ -68,14 +68,6 @@ Q_DECL_CONST_FUNCTION constexpr size_t hash(size_t key, size_t seed) noexcept
}
}
template <typename T, typename = void>
constexpr inline bool HasQHashSingleArgOverload = false;
template <typename T>
constexpr inline bool HasQHashSingleArgOverload<T, std::enable_if_t<
std::is_convertible_v<decltype(qHash(std::declval<const T &>())), size_t>
>> = true;
template <typename T1, typename T2> static constexpr bool noexceptPairHash();
}
@ -141,6 +133,9 @@ Q_DECL_CONST_FUNCTION constexpr inline size_t qHash(std::nullptr_t, size_t seed
{
return seed;
}
template <class Enum, std::enable_if_t<std::is_enum_v<Enum>, bool> = true>
Q_DECL_CONST_FUNCTION constexpr inline size_t qHash(Enum e, size_t seed = 0) noexcept
{ return QHashPrivate::hash(qToUnderlying(e), seed); }
// (some) Qt types
Q_DECL_CONST_FUNCTION constexpr inline size_t qHash(const QChar key, size_t seed = 0) noexcept { return qHash(key.unicode(), seed); }
@ -168,9 +163,26 @@ template <typename Enum>
Q_DECL_CONST_FUNCTION constexpr inline size_t qHash(QFlags<Enum> flags, size_t seed = 0) noexcept
{ return qHash(flags.toInt(), seed); }
template <typename T, std::enable_if_t<QHashPrivate::HasQHashSingleArgOverload<T>, bool> = true>
// ### Qt 7: remove this "catch-all" overload logic, and require users
// to provide the two-argument version of qHash.
#if (QT_VERSION < QT_VERSION_CHECK(7, 0, 0))
// Beware of moving this code from here. It needs to see all the
// declarations of qHash overloads for C++ fundamental types *before*
// its own declaration.
namespace QHashPrivate {
template <typename T, typename = void>
constexpr inline bool HasQHashSingleArgOverload = false;
template <typename T>
constexpr inline bool HasQHashSingleArgOverload<T, std::enable_if_t<
std::is_convertible_v<decltype(qHash(std::declval<const T &>())), size_t>
>> = true;
}
template <typename T, std::enable_if_t<QHashPrivate::HasQHashSingleArgOverload<T> && !std::is_enum_v<T>, bool> = true>
size_t qHash(const T &t, size_t seed) noexcept(noexcept(qHash(t)))
{ return qHash(t) ^ seed; }
#endif // < Qt 7
template<typename T>
bool qHashEquals(const T &a, const T &b)

View File

@ -53,6 +53,8 @@ private Q_SLOTS:
void stdPair_string_pairIntInt() { stdPair_template(QString("Hello"), std::make_pair(42, -47)); } // QTBUG-92910
void stdPair_int_pairIntPairIntInt() { stdPair_template(1, std::make_pair(2, std::make_pair(3, 4))); }
void enum_int_consistent_hash_qtbug108032();
#if QT_DEPRECATED_SINCE(6, 6)
void setGlobalQHashSeed();
#endif
@ -372,6 +374,17 @@ void tst_QHashFunctions::stdPair_template(const T1 &t1, const T2 &t2)
QCOMPARE(qHash(vpair, seed), qHash(vpair, seed));
}
void tst_QHashFunctions::enum_int_consistent_hash_qtbug108032()
{
enum E { E1, E2, E3 };
static_assert(QHashPrivate::HasQHashSingleArgOverload<E>);
QCOMPARE(qHash(E1, seed), qHash(int(E1), seed));
QCOMPARE(qHash(E2, seed), qHash(int(E2), seed));
QCOMPARE(qHash(E3, seed), qHash(int(E3), seed));
}
#if QT_DEPRECATED_SINCE(6, 6)
void tst_QHashFunctions::setGlobalQHashSeed()
{