AuroraMiddleware/v8 - v8 - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
jshin	b348d47bb9	Use ICU case conversion/transliterator for case conversion When I18N is enabled, use ICU's case conversion API and transliteration API [1] to implement String.prototype.to{Upper,Lower}Case and String.prototype.toLocale{Upper,Lower}Case. * ICU-based case conversion was implemented in runtime-i18n.cc/i18n.js * The above 4 functions are overridden with those in i18n.js when --icu_case_mapping flag is turned on. To control the override by the flag, they're overriden in icu-case-mapping.js Previously, toLocale{U,L}Case just called to{U,L}Case so that they didn't support locale-sensitive case conversion for Turkic languages (az, tr), Greek (el) and Lithuanian (lt). Before ICU APIs for the most general case are called, a fast-path for Latin-1 is tried. It's taken from Blink and adopted as necessary. This fast path is always tried for to{U,L}Case. For toLocale{U,L}Case, it's only taken when a locale (explicitly specified or default) is not in {az, el, lt, tr}. With these changes, a build with --icu_case_mapping=true passes a bunch of tests in test262/intl402/Strings/* and intl/* that failed before. Handling of pure ASCII strings (aligned at word boundary) are not as fast as Unibrow's implementation that uses word-by-word case conversion. OTOH, Latin-1 input handling is faster than Unibrow. General Unicode input handling is slower but more accurate. See https://docs.google.com/spreadsheets/d/1KJCJxKc1FxFXjwmYqABS0_2cNdPetvnd8gY8_HGSbrg/edit?usp=sharing for the benchmark. This CL started with http://crrev.com/1544023002#ps200001 by littledan@, but has changed significantly since. [1] See why transliteration API is needed for uppercasing in Greek. http://bugs.icu-project.org/trac/ticket/10582 R=yangguo BUG=v8:4476,v8:4477 LOG=Y TEST=test262/{built-ins,intl402}/Strings/, webkit/fast/js/, mjsunit/string-case, intl/general/case* Review-Url: https://codereview.chromium.org/1812673005 Cr-Commit-Position: refs/heads/master@{#36187}	2016-05-11 19:03:04 +00:00
yangguo@chromium.org	0dd69ec439	Allow identifier code points from supplementary multilingual planes. ES5.1 section 6 ("Source Text"): "Throughout the rest of this document, the phrase “code unit” and the word “character” will be used to refer to a 16-bit unsigned value used to represent a single 16-bit unit of text." This changed in ES6 draft section 10.1 ("Source Text"): "The ECMAScript code is expressed using Unicode, version 5.1 or later. ECMAScript source text is a sequence of code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in source text where permitted by the ECMAScript grammars." This patch is to reflect this spec change. BUG=v8:3617 LOG=Y R=jochen@chromium.org Review URL: https://codereview.chromium.org/640193002 git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@24510 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-10-10 07:13:46 +00:00
jochen@chromium.org	8bee9f0c3a	Remove test that v8Intl symbol exists, as we don't define it anymore. R=jkummerow@chromium.org Review URL: https://codereview.chromium.org/21511002 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@16013 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2013-08-01 19:20:42 +00:00
jochen@chromium.org	c61c74d24f	Import intl test suite from v8-i18n project BUG=v8:2745 R=jkummerow@chromium.org Review URL: https://codereview.chromium.org/18687003 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@15584 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2013-07-10 10:49:04 +00:00

4 Commits