qt5base-lts/tests/auto/corelib/tools/qhashfunctions
Giuseppe D'Angelo fb4bc5fa26 QHash: tame HasQHashSingleArgOverload ODR violations
qhashfunctions.h defines a catch-all 2-arguments qHash(T, seed)
in order to support datatypes that implement a 1-argument overload
of qHash (i.e. qHash(Type)). The catch-all calls the 1-argument
overload and XORs the result with the seed.

The catch-all is constrained on the existence of such a 1-argument
overload. This is done in order to make the catch-all SFINAE-friendly;
otherwise merely instantiating the catch-all would trigger a hard error.
Such an error would make it impossible to build a type trait that
detects if one can call qHash(T, size_t) for a given type T.

The constraint itself is called HasQHashSingleArgOverload and lives in a
private namespace.

It has been observed that HasQHashSingleArgOverload misbehaves for
some datatypes. For instance, HasQHashSingleArgOverload<int> is actually
false, despite qHash(123) being perfectly callable. (The second argument
of qHash(int, size_t) is defaulted, so the call *is* possible.)

--

Why is HasQHashSingleArgOverload<int> false?

This has to do with how HasQHashSingleArgOverload<T> is implemented: as
a detection trait that checks if qHash(declval<T>()) is callable.

The detection itself is not a problem. Consider this code:

  template <typename T>
    constexpr bool HasQHashSingleArgOverload = /* magic */;

  class MyClass {};
  size_t qHash(MyClass);

  static_assert(HasQHashSingleArgOverload<MyClass>); // OK

Here, the static_assert passes, even if qHash(MyClass) (and MyClass
itself) were not defined at all when HasQHashSingleArgOverload was
defined.

This is nothing but 2-phase lookup at work ([temp.dep.res]): the
detection inside HasQHashSingleArgOverload takes into account the qHash
overloads available when HasQHashSingleArgOverload was declared, as well
as any other overload declared before the "point of instantiation". This
means that qHash(MyClass) will be visible and detected.

Let's try something slightly different:

  template <typename T>
    constexpr bool HasQHashSingleArgOverload = /* magic */;

  size_t qHash(int);

  static_assert(HasQHashSingleArgOverload<int>); // ERROR

This one *does not work*. How is it possible? The answer is that 2-phase
name lookup combines the names found at definition time with the names
_found at instantiation time using argument-dependent lookup only_.
`int` is a fundamental type and does not participate in ADL. In the
example, HasQHashSingleArgOverload has actually no qHash overloads to
even consider, and therefore its detection fails.

You can restore detection by moving the declaration of the qHash(int)
overload *before* the definition of HasQHashSingleArgOverload, so it's
captured at definition time:

  size_t qHash(int);

  template <typename T>
    constexpr bool HasQHashSingleArgOverload = /* magic */;

  static_assert(HasQHashSingleArgOverload<int>); // OK!

This is why HasQHashSingleArgOverload<int> is currently returning
`false`: because HasQHashSingleArgOverload is defined *before* all the
qHash(fundamental_type) overloads in qhashfunctions.h.

--

Now consider this variation of the above, where we keep the qHash(int)
overload after the detector (so, it's not found), but also prepend an
Evil class implicitly convertible from int:

  struct Evil { Evil(int); };
  size_t qHash(Evil);

  template <typename T> constexpr bool HasQHashSingleArgOverload = /* magic */;

  size_t qHash(int);

  static_assert(HasQHashSingleArgOverload<int>); // OK

Now the static_assert passes. HasQHashSingleArgOverload is still not
considering qHash(int) (it's declared after), but it's considering
qHash(Evil). Can you call *that* one with an int? Yes, after a
conversion to Evil.

This is extremely fragile and likely an ODR violation (if not ODR, then
likely falls into [temp.dep.candidate/1]).

--

Does this "really matter" for a type like `int`? The answer is no. If
HasQHashSingleArgOverload<int> is true, then a call like

  qHash(42, 123uz);

will have two overloads in its overloads set:

1) qHash(int, size_t)
2) qHash(T, size_t), i.e. the catch-all template. To be pedantic,
qHash<int>(const int &, size_t), that is, the instantiation of the
catch-all after template type deduction for T (= int)
([over.match.funcs.general/8]).

Although it may look like this is ambiguous as both calls have perfect
matches for the arguments, 1) is actually a better match than 2) because
it is not a template specialization ([over.match.best/2.4]).

In other words: qHash(int, size_t) is *always* called when the argument
is `int`, no matter the value of HasQHashSingleArgOverload<int>. The
catch-all template may be added or not to the overload set, but it's
a worse match anyways.

--

Now, let's consider this code:

  enum MyEnum { E1, E2, E3 };
  qHash(E1, 42uz);

This code compiles, although we do not define any qHash overload
specifically for enumeration types (nor one is defined by MyEnum's
author).

Which qHash overload gets called?  Again there are two possible
overloads available:

1) qHash(int, size_t). E1 can be converted to `int` ([conv.prom/3]),
and this overload selected.

2) qHash(T, size_t), which after instantiation, is qHash<MyEnum>(const
MyEnum &, size_t).

In this case, 2) is a better match than 1), because it does not require
any conversion for the arguments.

Is 2) a viable overload? Unfortunately the answer here is "it depends",
because it's subject to what we've learned before: since the catch-all
is constrained by the HasQHashSingleArgOverload trait, names introduced
before the trait may exclude or include the overload.

This code:

  #include <qhashfunctions.h>

  enum MyEnum { E1, E2, E3 };
  qHash(E1, 42uz);
  static_assert(HasQHashSingleArgOverload<MyEnum>); // ERROR

will fail the static_assert. This means that only qHash(int, size_t) is
in the overload set.

However, this code:

  struct Evil { Evil(int); };
  size_t qHash(Evil);

  #include <qhashfunctions.h>

  enum MyEnum { E1, E2, E3 };
  qHash(E1, 42uz);
  static_assert(HasQHashSingleArgOverload<MyEnum>); // OK

will pass the static_assert. qHash(Evil) can be called with an object of
type MyEnum after an user-defined conversion sequence
([over.best.ics.general], [over.ics.user]: a standard conversion
sequence, made of a lvalue-to-rvalue conversion + a integral promotion,
followed by a conversion by constructor [class.conv.ctor]).
Therefore, HasQHashSingleArgOverload<MyEnum> is true here; the catch-all
template is added to the overload set; and it's a best match for the
qHash(E1, 42uz) call.

--

Is this a problem? **Yes**, and a huge one: the catch-all template does
not yield the same value as the qHash(int, size_t) overload. This means
that calculating hash values (e.g. QHash, QSet) will have different
results depending on include ordering!

A translation unit TU1 may have

  #include <QSet>
  #include <Evil>

  QSet<MyEnum> calculateSet { /* ... */ }

And another translation unit TU2 may have

  #include <Evil>
  #include <QSet> // different order

  void use() {
    QSet<MyEnum> set = calculateSet();
  }

And now the two TUs cannot exchange QHash/QSet objects as they would
hash the contents differently.

--

`Evil` actually exists in Qt. The bug report specifies QKeySequence,
which has an implicit constructor from int, but one can concoct infinite
other examples.

--

Congratulations if you've read so far.

=========================
=== PROPOSED SOLUTION ===
=========================

1) Move the HasQHashSingleArgOverload detection after declaring the
overloads for all the fundamental types (which we already do anyways).
This means that HasQHashSingleArgOverload<fundamental_type> will now
be true. It also means that the catch-all becomes available for all
fundamental types, but as discussed before, for all of them we have
better matches anyways.

2) For unscoped enumeration types, this means however an ABI break: the
catch-all template becomes always the best match. Code compiled before
this change would call qHash(int, size_t), and code compiled after this
change would call the catch-all qHash<Enum>(Enum, size_t); as discussed
before, the two don't yield the same results, so mixing old code and new
code will break.

In order to restore the old behavior, add a qHash overload for
enumeration types that forwards the implementation to the integer
overloads (using qToUnderlying¹).

(Here I'm considering the "old", correct behavior the one that one gets
by simply including QHash/QSet, declaring an enumeration and calling
qHash on it. In other words, without having Evil around before including
QHash.)

This avoids an ABI break for most enumeration types, for which one
does not explicitly define a qHash overload. It however *introduces*
an ABI break for enumeration types for which there is a single-argument
qHash(E) overload. This is because

- before this change, the catch-all template was called, and that
in turn called qHash(E) and XOR'ed the result with the seed;
- after this change, the newly introduced qHash overload for
enumerations gets called. It's very likely that it would not give
the same result as before.

I don't have a solution for this, so we'll have to accept the ABI
break.

Note that if one defines a two-arguments overload for an enum type,
then nothing changes there (the overload is still the best match).

3) Make plans to kill the catch-all template, for Qt 7.0 at the latest.
We've asked users to provide a two-args qHash overload for a very long
time, it's time to stop working around that.

4) Make plans to switch from overloading qHash to specializing std::hash
(or equivalent). Specializations don't overload, and we'd get rid of
all these troubles with implicit conversions.

--

¹ To nitpick, qToUnderlying may select a *different* overload than
the one selected by an implicit conversion.

That's because an unscoped enumeration without a fixed underlying type
is allowed to have an underlying type U, and implicitly convert to V,
with U and V being two different types (!).

U is "an integral type that can represent all the enumerator values"
([dcl.enum/7]). V is selected in a specific list in a specific order
([conv.prom]/3). This means that in theory a compiler can take enum E {
E1, E2 }, give it `unsigned long long` as underlying type, and still
allow for a conversion to `int`.

As far as I know, no compiler we use does something as crazy as that,
but if it's a concern, it needs to be fixed.

[ChangeLog][Deprecation Notice] Support for overloads of qHash with only
one argument is going to be removed in Qt 7. Users are encouraged to
upgrade to the two-arguments overload. Please refer to the QHash
documentation for more information.

[ChangeLog][Potentially Binary-Incompatible Changes] If an enumeration
type for which a single-argument qHash overload has been declared is
being used as a key type in QHash, QMultiHash or QSet, then objects of
these types are no longer binary compatible with code compiled against
an earlier version of Qt. It is very unlikely that such qHash overloads
exist, because enumeration types work out of the box as keys Qt
unordered associative containers; users do not need to define qHash
overloads for their custom enumerations. Note that there is no binary
incompatibity if a *two* arguments qHash overload has been declared
instead.

Fixes: QTBUG-108032
Fixes: QTBUG-107033
Pick-to: 6.2 6.4
Change-Id: I2ebffb2820c553e5fdc3a341019433793a58e3ab
Reviewed-by: Mårten Nordheim <marten.nordheim@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
2022-11-01 01:52:13 +02:00
..
.gitignore tst_QHash: Factor qHash()-related test cases into a separate test 2015-01-05 20:18:51 +01:00
CMakeLists.txt Change the license of all CMakeLists.txt and *.cmake files to BSD 2022-08-23 23:58:42 +02:00
tst_qhashfunctions.cpp QHash: tame HasQHashSingleArgOverload ODR violations 2022-11-01 01:52:13 +02:00