When creating a character class in unicode, case-insensitive mode we use
icu::UnicodeSet::closeOver() to add all characters that case-insensitive
match the characters in the class.
According to the spec only simple case folding shall be performed for
case-insensitive unicode matching, but closeOver() adds all characters
that are equal w.r.t full case folding.
The current approach of just removing strings from the closeOver set is
not enough, as single code point characters still remain in the set if
they were equal only by performing full case folding.
E.g. the characters \u0390 and \u1FD3 both fold to the same string
"\u03B9\u0308\u0301" via full case folding, but they don't have a simple
case folding in common.
To prevent these wrong matches, we calculate the set of all characters
with close overs that are wrong according to the spec at build time and
remove them from the set before adding case-insensitive equivalent
characters.
Bug: v8:13377
Change-Id: I0252c79143f266911691331dd0e1e27044ea8cba
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3952095
Commit-Queue: Patrick Thier <pthier@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Cr-Commit-Position: refs/heads/main@{#83791}