AuroraMiddleware/v8 - v8 - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Toon Verwaest	e0f0d60c57	Fix & reland "[utf8] Rewrite NewStringFromUtf8 using Utf8::ValueOfIncremental" Change-Id: I2c8bd545dc606d76603bdf73f1ea54d4c04842c1 Reviewed-on: https://chromium-review.googlesource.com/c/1456101 Reviewed-by: Ulan Degenbaev <ulan@chromium.org> Commit-Queue: Toon Verwaest <verwaest@chromium.org> Cr-Commit-Position: refs/heads/master@{#59399}	2019-02-06 13:11:11 +00:00
Maya Lekova	ec30cf47c7	Revert "[utf8] Rewrite NewStringFromUtf8 using Utf8::ValueOfIncremental" This reverts commit `73dd9b5527`. Reason for revert: Broke telemetry layout tests - https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7-rel/9936 as can be seen in this roll - https://chromium-review.googlesource.com/c/chromium/src/+/1454259 Original change's description: > [utf8] Rewrite NewStringFromUtf8 using Utf8::ValueOfIncremental > > This is 3-4x faster than using the Utf8Decoder. This matters for proper > parse-time measurements using d8. > > Change-Id: I9870e9fbe400ec022a6eeb20491c80a2a32f8519 > Reviewed-on: https://chromium-review.googlesource.com/c/1451827 > Commit-Queue: Toon Verwaest <verwaest@chromium.org> > Reviewed-by: Leszek Swirski <leszeks@chromium.org> > Reviewed-by: Ulan Degenbaev <ulan@chromium.org> > Cr-Commit-Position: refs/heads/master@{#59347} TBR=ulan@chromium.org,leszeks@chromium.org,verwaest@chromium.org # Not skipping CQ checks because original CL landed > 1 day ago. Change-Id: I3f8faebb61c19a41ee496a571228f53c0d5fc8dd Reviewed-on: https://chromium-review.googlesource.com/c/1454495 Reviewed-by: Maya Lekova <mslekova@chromium.org> Commit-Queue: Yang Guo <yangguo@chromium.org> Cr-Commit-Position: refs/heads/master@{#59378}	2019-02-05 17:08:17 +00:00
Toon Verwaest	73dd9b5527	[utf8] Rewrite NewStringFromUtf8 using Utf8::ValueOfIncremental This is 3-4x faster than using the Utf8Decoder. This matters for proper parse-time measurements using d8. Change-Id: I9870e9fbe400ec022a6eeb20491c80a2a32f8519 Reviewed-on: https://chromium-review.googlesource.com/c/1451827 Commit-Queue: Toon Verwaest <verwaest@chromium.org> Reviewed-by: Leszek Swirski <leszeks@chromium.org> Reviewed-by: Ulan Degenbaev <ulan@chromium.org> Cr-Commit-Position: refs/heads/master@{#59347}	2019-02-04 16:08:19 +00:00
Clemens Hammacher	3ad032b769	[base] Introduce VectorOf helper We often need to create a {Vector} view of data owned by a container like {std::vector}. The canonical way to do this is this: Vector<T>{vec.data(), vec.size()} This pattern is repeating information which can be deduced automatically, like the type T. This CL introduces a {VectorOf} helper which can construct a {Vector} for any container providing a {data()} and {size()} accessor, and uses it to replace the pattern above. R=ishell@chromium.org Bug: v8:8238 Change-Id: Ib3a11662acc82cb83f2b4afd07ba88e579d71dba Reviewed-on: https://chromium-review.googlesource.com/c/1337584 Reviewed-by: Igor Sheludko <ishell@chromium.org> Commit-Queue: Clemens Hammacher <clemensh@chromium.org> Cr-Commit-Position: refs/heads/master@{#57538}	2018-11-15 13:02:22 +00:00
Justin Ridgewell	f6b6f71ba2	Consolidate UTF-8 Vector<char> to uc16 decoding into Iterator Too many files know how to deal with decoding, counting, and splitting UTF-8 into uc16 chars. This consolidates several callers who deal with full (Vector<char>, not streaming) bytes by using a UTF-8 Iterator to decode bytes into individual uc16 chars. R=marja@chromium.org Bug: Change-Id: Ia36df3e8c1abd0398415ad23a474557c71c19a01 Reviewed-on: https://chromium-review.googlesource.com/831093 Reviewed-by: Marja Hölttä <marja@chromium.org> Commit-Queue: Justin Ridgewell <jridgewell@google.com> Cr-Commit-Position: refs/heads/master@{#51405}	2018-02-20 20:04:41 +00:00
Justin Ridgewell	cedec225c9	Implement DFA Unicode Decoder This is a separation of the DFA Unicode Decoder from https://chromium-review.googlesource.com/c/v8/v8/+/789560 I attempted to make the DFA's table a bit more explicit in this CL. Still, the linter prevents me from letting me present the array as a "table" in source code. For a better representation, please refer to https://docs.google.com/spreadsheets/d/1L9STtkmWs-A7HdK5ZmZ-wPZ_VBjQ3-Jj_xN9c6_hLKA - - - - - Now for a big copy-paste from 789560: Essentially, reworks a standard FSM (imagine an array of structs) and flattens it out into a single-dimension array. Using Table 3-7 of the Unicode 10.0.0 standard (page 126 of http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf), we can nicely map all bytes into one of 12 character classes: 00. 0x00-0x7F 01. 0x80-0x8F (split from general continuation because this range is not valid after a 0xF0 leading byte) 02. 0x90-0x9F (split from general continuation because this range is not valid after a 0xE0 nor a 0xF4 leading byte) 03. 0xA0-0xBF (the rest of the continuation range) 04. 0xC0-0xC1, 0xF5-0xFF (the joined range of invalid bytes, notice this includes 255 which we use as a known bad byte during hex-to-int decoding) 05. 0xC2-0xDF (leading bytes which require any continuation byte afterwards) 06. 0xE0 (leading byte which requires a 0xA0-0xBF afterwards then any continuation byte after that) 07. 0xE1-0xEC, 0xEE-0xEF (leading bytes which requires any continuation afterwards then any continuation byte after that) 08. 0xED (leading byte which requires a 0x80-0x9F afterwards then any continuation byte after that) 09. 0xF1-F3 (leading bytes which requires any continuation byte afterwards then any continuation byte then any continuation byte) 10. 0xF0 (leading bytes which requires a 0x90-0xBF afterwards then any continuation byte then any continuation byte) 11. 0xF4 (leading bytes which requires a 0x80-0x8F afterwards then any continuation byte then any continuation byte) Note that 0xF0 and 0xF1-0xF3 were swapped so that fewer bytes were needed to represent the transition state ("9, 10, 10, 10" vs. "10, 9, 9, 9"). Using these 12 classes as "transitions", we can map from one state to the next. Each state is defined as some multiple of 12, so that we're always starting at the 0th column of each row of the FSM. From each state, we add the transition and get a index of the new row the FSM is entering. If at any point we encounter a bad byte, the state + bad-byte-transition is guaranteed to map us into the first row of the FSM (which contains no valid exiting transitions). The key differences from Björn's original (or his self-modified) DFA is the "bad" state is now mapped to 0 (or the first row of the FSM) instead of 12 (the second row). This saves ~50 bytes when gzipping, and also speeds up determining if a string is properly encoded (see his sample code at http://bjoern.hoehrmann.de/utf-8/decoder/dfa/#performance). Finally, I've replace his ternary check with an array access, to make the algorithm branchless. This places a requirement on the caller to 0 out the code point between successful decodings, which it could always have done because it's already branching. R=marja@google.com Bug: Change-Id: I574f208a84dc5d06caba17127b0d41f7ce1a3395 Reviewed-on: https://chromium-review.googlesource.com/805357 Commit-Queue: Justin Ridgewell <jridgewell@google.com> Reviewed-by: Marja Hölttä <marja@chromium.org> Reviewed-by: Mathias Bynens <mathias@chromium.org> Cr-Commit-Position: refs/heads/master@{#50012}	2017-12-11 21:36:13 +00:00
Mathias Bynens	822be9b238	Normalize casing of hexadecimal digits This patch normalizes the casing of hexadecimal digits in escape sequences of the form `\xNN` and integer literals of the form `0xNNNN`. Previously, the V8 code base used an inconsistent mixture of uppercase and lowercase. Google’s C++ style guide uses uppercase in its examples: https://google.github.io/styleguide/cppguide.html#Non-ASCII_Characters Moreover, uppercase letters more clearly stand out from the lowercase `x` (or `u`) characters at the start, as well as lowercase letters elsewhere in strings. BUG=v8:7109 TBR=marja@chromium.org,titzer@chromium.org,mtrofin@chromium.org,mstarzinger@chromium.org,rossberg@chromium.org,yangguo@chromium.org,mlippautz@chromium.org NOPRESUBMIT=true Cq-Include-Trybots: master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.linux:linux_chromium_rel_ng Change-Id: I790e21c25d96ad5d95c8229724eb45d2aa9e22d6 Reviewed-on: https://chromium-review.googlesource.com/804294 Commit-Queue: Mathias Bynens <mathias@chromium.org> Reviewed-by: Jakob Kummerow <jkummerow@chromium.org> Cr-Commit-Position: refs/heads/master@{#49810}	2017-12-02 01:24:40 +00:00
Marja Hölttä	fcb89f5515	[unicode] Add tests for UTF-8 decoders + minor cleanups. Verify that both UTF-8 decoders (incremental and non-incremental one) match the expectations. Also cleanup / harden the UTF-8 handling code, as suggested in https://chromium-review.googlesource.com/c/v8/v8/+/671020/ . BUG=chromium:765608 Change-Id: I6344d62ca15b75ac8e333421c94c4aa35ab8190d Reviewed-on: https://chromium-review.googlesource.com/681217 Commit-Queue: Marja Hölttä <marja@chromium.org> Reviewed-by: Camillo Bruni <cbruni@chromium.org> Cr-Commit-Position: refs/heads/master@{#48229}	2017-09-29 13:18:52 +00:00
jbroman	9d524bd33d	Fix out-of-range access in unibrow::Utf8::CalculateValue. This code should not access bytes out of the permitted range in order to check the range of a possible UTF-8 value. Instead, the length check should occur before such checks. BUG=chromium:667260, chromium:662822 Review-Url: https://codereview.chromium.org/2520053003 Cr-Commit-Position: refs/heads/master@{#41165}	2016-11-22 09:27:59 +00:00

9 Commits