v8/test/cctest/parsing
Marja Hölttä 6389b7e6b6 [unicode] Return (the correct) errors for overlong / surrogate sequences.
This fix is two-fold:

1) Incremental UTF-8 decoding: Unify incorrect UTF-8 handling between V8 and
Blink.

Incremental UTF-8 decoding used to allow some overlong sequences / invalid code
points which Blink treated as errors. This caused the decoder and the Blink
UTF-8 decoder to produce a different number of bytes, resulting in random
failures when scripts were streamed (especially, this was detected by the
skipping inner functions feature which adds CHECKs against expected function
positions).

2) Non-incremental UTF-8 decoding: return the correct amount of invalid characters.

According to the encoding spec ( https://encoding.spec.whatwg.org/#utf-8-decoder
), the first byte of an overlong sequence / invalid code point generates an
invalid character, and the rest of the bytes are not processed (i.e., pushed
back to the byte stream). When they're handled, they will look like lonely
continuation bytes, and will generate an invalid character each.

As a result, an overlong 4-byte sequence should generate 4 invalid characters
(not 1).

This is a potentially breaking change, since the (non-incremental) UTF-8
decoding is exposed via the API (String::NewFromUtf8). The behavioral difference
happens when the client is passing in invalid UTF-8 (containing overlong /
surrogate sequences).

However, afaict, this doesn't change the semantics of any JavaScript program:
according to the ECMAScript spec, the program is a sequence of Unicode code
points, and there's no way to invoke the UTF-8 decoding functionalities from
inside JavaScript. Though, this changes the behavior of d8 when decoding source
files which are invalid UTF-8.

This doesn't change anything related to URI decoding (it already throws
exceptions for overlong sequences / invalid code points).

BUG: chromium:765608, chromium:758236, v8:5516
Bug: 
Change-Id: Ib029f6a8e87186794b092e4e8af32d01cee3ada0
Reviewed-on: https://chromium-review.googlesource.com/671020
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Franziska Hinkelmann <franzih@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Cr-Commit-Position: refs/heads/master@{#48105}
2017-09-21 10:44:40 +00:00
..
test-parse-decision.cc Start preparing test/cctest for jumbo compilation 2017-08-14 20:58:10 +00:00
test-preparser.cc [parser] Skipping inner funcs: fix sloppy block generators. 2017-08-31 05:42:36 +00:00
test-scanner-streams.cc [unicode] Return (the correct) errors for overlong / surrogate sequences. 2017-09-21 10:44:40 +00:00
test-scanner.cc Start preparing test/cctest for jumbo compilation 2017-08-14 20:58:10 +00:00