v8/test/intl
Iain Ireland 3fab9d05cf [regexp] Fix and unify non-unicode case-folding algorithms
Non-unicode, case-insensitive regexps (e.g. /foo/i, not foo/iu) use a
case-folding algorithm that doesn't quite match the Unicode
definition. There are two places in irregexp that need to do
case-folding. Prior to this patch, neither of them quite matched the
spec (https://tc39.es/ecma262/#sec-runtime-semantics-canonicalize-ch).

This patch implements the "Canonicalize" algorithm in
src/regexp/special-case.h, and uses it in the relevant places. It
replaces special-case logic around upper-casing / ASCII characters
with the following approach:

1. For most characters, calling UnicodeSet::closeOver on a set
   containing that character will produce the correct set of
   case-insensitive matches.

2. For a small handful of characters (like the sharp S that prompted
   this change), UnicodeSet::closeOver will include some characters
   that should be omitted. For example, although closeOver('ß') =
   "ßẞ", uppercase('ß') is "SS", so step 3.e means that 'ß'
   canonicalizes to itself, and should not match 'ẞ'. In these cases,
   we can skip the closeOver entirely, because it will never add an
   equivalent character. These characters are in the IgnoreSet.

3. For an even smaller handful of characters, UnicodeSet::closeOver
   will produce some characters that should be omitted, but also some
   characters that should be included. For example, closeOver('k') =
   "kKK" (lowercase k, uppercase K, U+212A KELVIN SIGN), but KELVIN
   SIGN should not match either of the other two (step 3.g). To handle
   this, we put such characters in the SpecialAddSet. In these cases,
   we closeOver the original character, but filter out the results
   that do not have the same canonical value.

The computation of IgnoreSet and SpecialAddSet happens at build time,
using the pre-existing gen-regexp-special-case.cc step.

R=jgruber@chromium.org

Bug: v8:10248
Change-Id: I00d48b180c83bb8e645cc59eda57b01eab134f0b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2072858
Reviewed-by: Frank Tang <ftang@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#66641}
2020-03-10 11:09:28 +00:00
..
bigint [Intl] Clean up by removing the following flags 2019-09-12 22:25:41 +00:00
break-iterator [Intl] Remove --harmony-intl-list-format flag from test 2019-02-13 23:22:43 +00:00
collator [Intl] Reland tests of "Validate u extension type" 2019-01-08 01:53:26 +00:00
date-format Fix hour cycle format 2019-10-31 16:18:45 +00:00
displaynames Hide date related types from Intl.DisplayNames 2019-12-19 19:31:09 +00:00
general Implement the localeMatcher: "best fit" 2019-12-18 20:58:08 +00:00
list-format Implement the localeMatcher: "best fit" 2019-12-18 20:58:08 +00:00
locale [Intl] Remove harmony-locale flag 2019-05-11 06:02:42 +00:00
number-format Remove keyword/value "ca" and "nu" from locale 2019-10-30 21:00:08 +00:00
overrides Revert "Make intl/overrides/caching.js more 'robust'" 2017-11-13 19:13:13 +00:00
plural-rules [Intl] Add order check test for Intl.* 2018-12-02 23:37:49 +00:00
relative-time-format Implement the localeMatcher: "best fit" 2019-12-18 20:58:08 +00:00
segmenter Implement the localeMatcher: "best fit" 2019-12-18 20:58:08 +00:00
string Ensure String.prototype.normalize.length is 0 2015-08-05 15:13:45 +00:00
assert.js [Intl] Fix error message to report the right method. 2019-09-12 19:32:31 +00:00
bad-target.js [intl] Remove redundant type checking system 2017-01-09 22:24:57 +00:00
BUILD.gn [build] Add data deps for d8 test suites 2018-03-26 13:44:58 +00:00
default_locale.js [Intl] Remove GetDefaultLocale 2018-12-07 06:27:42 +00:00
intl.status Revert "Fix SEGMAP_ERR by rolling ICU?" 2020-02-06 08:16:26 +00:00
not-constructors.js [intl] Remove new.target check in Intl functions and method 2016-12-20 16:06:19 +00:00
OWNERS Update test/intl OWNERS 2019-10-22 17:29:36 +00:00
regress-4870.js Use stricter type checks in Intl's bound methods 2016-05-18 14:57:58 +00:00
regress-5179.js Avoid calling the builtin String.prototype.split in Intl 2016-07-08 16:53:09 +00:00
regress-7481.js [Intl] Reland tests of "Validate u extension type" 2019-01-08 01:53:26 +00:00
regress-7770.js Add regression test to assert buffer overrun 2019-01-29 00:53:33 +00:00
regress-7982.js [Intl] Remove harmony-locale flag 2019-05-11 06:02:42 +00:00
regress-8030.js Reland test part of "[Intl] Cleans up intl-relative-time-format flag" 2019-01-24 23:17:35 +00:00
regress-8031.js [Intl] Remove --harmony-intl-list-format flag from test 2019-02-13 23:22:43 +00:00
regress-8348.js [Intl] Cutting 43K by removing Unibrow when ICU available 2019-04-03 17:58:51 +00:00
regress-8432.js Drop regress-{8432,8413} from intl.status. 2018-11-18 09:07:16 +00:00
regress-8469.js [Intl] Add regression test for -u-tz- of Intl.DateTimeFormat 2018-12-05 03:02:24 +00:00
regress-8525.js [Intl] Fix numberingSystem for NumberFormat 2018-12-06 11:08:36 +00:00
regress-8604.js Add regression test for v8:8604 2019-04-18 20:48:10 +00:00
regress-8657.js [Intl] Remove harmony-locale flag 2019-05-11 06:02:42 +00:00
regress-8866.js Add regression test for v8:8866 2019-06-06 21:54:10 +00:00
regress-9035.js [Intl] Only use DecimalFormat 2019-03-26 00:28:47 +00:00
regress-9312.js [Intl] Add test cases for %%ALIAS locales 2019-05-31 21:37:08 +00:00
regress-9356.js [Intl] Fix /ſ/i.test('ſ'.toUpperCase()) be false. 2019-09-27 18:13:00 +00:00
regress-9408.js [Intl] Clean up by removing the following flags 2019-09-12 22:25:41 +00:00
regress-9464.js Correct the name of the regression test 2019-09-13 16:14:50 +00:00
regress-9475.js [Intl] Clean up by removing the following flags 2019-09-12 22:25:41 +00:00
regress-9513.js [Intl] Clean up by removing the following flags 2019-09-12 22:25:41 +00:00
regress-9642.js Fix crash under lb_LU locale 2019-08-26 17:07:12 +00:00
regress-9731.js [Intl] Fix /k/i.test('\u212A') 2019-09-27 17:37:50 +00:00
regress-9747.js [Intl] Sync ListFormat to latest spec. 2019-09-19 20:02:53 +00:00
regress-9786.js [Intl] No throwing RangeError when "calendar" and "numberingSystem" are well-formed 2019-10-01 18:04:12 +00:00
regress-9787.js [Intl] No throwing RangeError when "calendar" and "numberingSystem" are well-formed 2019-10-01 18:04:12 +00:00
regress-9788.js [Intl] No throwing RangeError when "calendar" and "numberingSystem" are well-formed 2019-10-01 18:04:12 +00:00
regress-9812.js [Intl] Add test for calendar of formatRange 2019-10-07 22:31:39 +00:00
regress-9849.js Fix crash bug with some numberingSystem and dateStyle/timeStyle 2019-10-21 20:12:54 +00:00
regress-9887.js Remove keyword/value "ca" and "nu" from locale 2019-10-30 21:00:08 +00:00
regress-9912.js Fix the format of date range older than Oct 15 1582 2019-10-29 18:12:25 +00:00
regress-10248.js [regexp] Fix and unify non-unicode case-folding algorithms 2020-03-10 11:09:28 +00:00
regress-527926.js [Intl] Fix output of hour:'2-digit', hour12: true 2019-03-21 07:34:22 +00:00
regress-875643.js [Intl] Convert options arg to Object before processing it 2018-08-31 23:56:33 +00:00
regress-888299.js [Intl] Remove incorrect CHECK 2018-09-26 00:24:28 +00:00
regress-895942.js [Intl] Validate extension keys 2018-11-06 12:11:50 +00:00
regress-900013.js [Intl] Hide Intl["SegmentIterator"] 2018-10-30 16:32:54 +00:00
regress-903566.js [test][cleanup] Revive --time, speed up some tests 2019-09-16 11:24:11 +00:00
regress-917151.js [Intl] Fix CHECK fail in Intl::ToLanguageTag() 2019-01-04 01:33:26 +00:00
regress-925216.js [Intl] Fix DefaultHourCycle to skip hHkK in literal 2019-01-28 22:54:47 +00:00
regress-928068.js [Intl] Fix special case timezone 2019-03-07 23:33:22 +00:00
regress-930304.js [Intl] Fix Null-dereference in CreateICUDateFormatFromCache 2019-02-14 09:53:56 +00:00
regress-966285.js [Intl] Fix Null-der READ IsValidExtension<icu_64::Calendar> 2019-05-24 16:32:09 +00:00
regress-971636.js [Intl] Fix RegExp [\W] with i flag 2019-06-12 06:18:08 +00:00
regress-992694.js [Intl] Fix Hungarian number format grouping 2019-09-13 17:14:41 +00:00
regress-997401.js Remove CHECK which fail while the locale is long. 2019-09-10 19:28:54 +00:00
regress-1003748.js [Intl] Fix m(ax|in)imumFractionDigits for currency 2019-09-17 16:34:00 +00:00
regress-1012579.js Fix crash in creating NumberFormat 2019-10-11 20:42:06 +00:00
regress-1030160.js Fix format of -NAN in toLocaleString() 2020-02-24 17:11:26 +00:00
regress-1041319.js Reland "[Intl] Fix RelativeTimeFormat fatal" 2020-02-18 18:29:08 +00:00
regress-8725514.js [Intl] Throws exception on grandfather and private locale 2019-09-16 20:59:11 +00:00
testcfg.py [test] Support LC_ALL=en_US.UTF-8 in test runner 2019-06-27 11:54:29 +00:00
toStringTag.js [intl] Create the Intl constructors to C++ 2016-12-27 17:10:00 +00:00
utils.js Import intl test suite from v8-i18n project 2013-07-10 10:49:04 +00:00