Commit Graph

21 Commits

Author SHA1 Message Date
Mathias Bynens
822be9b238 Normalize casing of hexadecimal digits
This patch normalizes the casing of hexadecimal digits in escape
sequences of the form `\xNN` and integer literals of the form
`0xNNNN`.

Previously, the V8 code base used an inconsistent mixture of uppercase
and lowercase.

Google’s C++ style guide uses uppercase in its examples:
https://google.github.io/styleguide/cppguide.html#Non-ASCII_Characters

Moreover, uppercase letters more clearly stand out from the lowercase
`x` (or `u`) characters at the start, as well as lowercase letters
elsewhere in strings.

BUG=v8:7109
TBR=marja@chromium.org,titzer@chromium.org,mtrofin@chromium.org,mstarzinger@chromium.org,rossberg@chromium.org,yangguo@chromium.org,mlippautz@chromium.org
NOPRESUBMIT=true

Cq-Include-Trybots: master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.linux:linux_chromium_rel_ng
Change-Id: I790e21c25d96ad5d95c8229724eb45d2aa9e22d6
Reviewed-on: https://chromium-review.googlesource.com/804294
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Reviewed-by: Jakob Kummerow <jkummerow@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49810}
2017-12-02 01:24:40 +00:00
Marja Hölttä
6389b7e6b6 [unicode] Return (the correct) errors for overlong / surrogate sequences.
This fix is two-fold:

1) Incremental UTF-8 decoding: Unify incorrect UTF-8 handling between V8 and
Blink.

Incremental UTF-8 decoding used to allow some overlong sequences / invalid code
points which Blink treated as errors. This caused the decoder and the Blink
UTF-8 decoder to produce a different number of bytes, resulting in random
failures when scripts were streamed (especially, this was detected by the
skipping inner functions feature which adds CHECKs against expected function
positions).

2) Non-incremental UTF-8 decoding: return the correct amount of invalid characters.

According to the encoding spec ( https://encoding.spec.whatwg.org/#utf-8-decoder
), the first byte of an overlong sequence / invalid code point generates an
invalid character, and the rest of the bytes are not processed (i.e., pushed
back to the byte stream). When they're handled, they will look like lonely
continuation bytes, and will generate an invalid character each.

As a result, an overlong 4-byte sequence should generate 4 invalid characters
(not 1).

This is a potentially breaking change, since the (non-incremental) UTF-8
decoding is exposed via the API (String::NewFromUtf8). The behavioral difference
happens when the client is passing in invalid UTF-8 (containing overlong /
surrogate sequences).

However, afaict, this doesn't change the semantics of any JavaScript program:
according to the ECMAScript spec, the program is a sequence of Unicode code
points, and there's no way to invoke the UTF-8 decoding functionalities from
inside JavaScript. Though, this changes the behavior of d8 when decoding source
files which are invalid UTF-8.

This doesn't change anything related to URI decoding (it already throws
exceptions for overlong sequences / invalid code points).

BUG: chromium:765608, chromium:758236, v8:5516
Bug: 
Change-Id: Ib029f6a8e87186794b092e4e8af32d01cee3ada0
Reviewed-on: https://chromium-review.googlesource.com/671020
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Franziska Hinkelmann <franzih@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Cr-Commit-Position: refs/heads/master@{#48105}
2017-09-21 10:44:40 +00:00
Marja Hölttä
68310c9f69 [scanner] UTF-8 handling fix (errors near chunk end).
The bug occurred when we detected an erroneous char late, and put the last
character in a chunk into the "incomplete char" buffer. It was not correctly
retrieved when seeking.

BUG=v8:6836

Change-Id: I8ca946dfdb39244c5ca0bdcebe047047010b3a07
Reviewed-on: https://chromium-review.googlesource.com/670729
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Cr-Commit-Position: refs/heads/master@{#48066}
2017-09-18 14:13:26 +00:00
Marja Hölttä
9f21cab8c8 Revert "Reland#2 [parser] Refactor streaming scanner streams."
This reverts commit de9269f3c3.

Something's still wrong in the encoding handling (see bug).

Bug: chromium:763106
Cq-Include-Trybots: master.tryserver.chromium.linux:linux_chromium_rel_ng
Change-Id: Icd19dd42b84b9d090e191375a2942b9941110bcf
Reviewed-on: https://chromium-review.googlesource.com/657386
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Cr-Commit-Position: refs/heads/master@{#47924}
2017-09-08 13:36:04 +00:00
Marja Hölttä
4e45342994 [script streaming] Fix U+feff handling.
U+feff is the UTF BOM but if it occurs inside the text, it's a "zero-width
no-break space". However, the UTF-8 decoder in script streaming still thought
it's a BOM and skipped it. The correct way to handle it would be to create a
U+feff code point instead - the Scanner will then handle it as whitespace.

This is a discrepancy between the Blink UTF-8 decoder and the V8 UTF-8 decoder,
and caused the source positions be off by one. This bug went unnoticed, since
normally off-by-one in this situation doesn't make the code to break.

BUG=chromium:758508,chromium:758236

Change-Id: Ib92a3ee65c402e21b77e42537db2a021cff55379
Reviewed-on: https://chromium-review.googlesource.com/632096
Reviewed-by: Adam Klein <adamk@chromium.org>
Commit-Queue: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#47583}
2017-08-24 19:35:12 +00:00
Wiktor Garbacz
de9269f3c3 Reland#2 [parser] Refactor streaming scanner streams.
Unify, simplify logic, reduce UTF8 specific handling.

Intend of this is also to have stream views.
Stream views can be used concurrently by multiple threads, but
only one thread may fetch new data from the underlying source.
This together with unified stream view creation is intended to be
used for parse tasks.

BUG=v8:6093

Change-Id: I83c6f1e6ad280c28da690da41c466dfcbb7915e6
Reviewed-on: https://chromium-review.googlesource.com/535474
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Commit-Queue: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45994}
2017-06-19 10:18:01 +00:00
Wiktor Garbacz
f4f723e818 [parsing] Fix past the end position for streaming streams.
Also, as this is hard to track down, always DCHECK position after ReadBlock().

Change-Id: Ie32c3a311dd8df91f651b6d82ccacc7c95e6fde0
Reviewed-on: https://chromium-review.googlesource.com/528196
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45811}
2017-06-09 11:35:24 +00:00
Marja Hölttä
4ca7022295 Revert "Reland [parser] Refactor streaming scanner streams."
This reverts commit 7fa071a48b.

Reason for revert: https://bugs.chromium.org/p/chromium/issues/detail?id=729482

Original change's description:
> Reland [parser] Refactor streaming scanner streams.
> 
> Unify, simplify logic, reduce UTF8 specific handling.
> 
> Intend of this is also to have stream views.
> Stream views can be used concurrently by multiple threads, but
> only one thread may fetch new data from the underlying source.
> This together with unified stream view creation is intended to be
> used for parse tasks.
> 
> BUG=v8:6093
> 
> Change-Id: I3bce48185fa2c986d16619a9a8ece3ff4c4f5e60
> Reviewed-on: https://chromium-review.googlesource.com/509489
> Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
> Reviewed-by: Marja Hölttä <marja@chromium.org>
> Commit-Queue: Wiktor Garbacz <wiktorg@google.com>
> Cr-Commit-Position: refs/heads/master@{#45688}

TBR=marja@chromium.org,vogelheim@chromium.org,wiktorg@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
BUG=v8:6093

Change-Id: Iefa7c43a2f6ae3a7f3ef0f77d87b6ae36ae4be99
Reviewed-on: https://chromium-review.googlesource.com/525712
Reviewed-by: Marja Hölttä <marja@chromium.org>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Commit-Queue: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45725}
2017-06-06 11:42:30 +00:00
Wiktor Garbacz
7fa071a48b Reland [parser] Refactor streaming scanner streams.
Unify, simplify logic, reduce UTF8 specific handling.

Intend of this is also to have stream views.
Stream views can be used concurrently by multiple threads, but
only one thread may fetch new data from the underlying source.
This together with unified stream view creation is intended to be
used for parse tasks.

BUG=v8:6093

Change-Id: I3bce48185fa2c986d16619a9a8ece3ff4c4f5e60
Reviewed-on: https://chromium-review.googlesource.com/509489
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Commit-Queue: Wiktor Garbacz <wiktorg@google.com>
Cr-Commit-Position: refs/heads/master@{#45688}
2017-06-02 13:50:08 +00:00
Adam Klein
9397c1b73a Revert "[parser] Refactor streaming scanner streams."
This reverts commit ce538f70c1.

Reason for revert: breaks BOM handling (thus breaking Outlook web apps).

Original change's description:
> [parser] Refactor streaming scanner streams.
> 
> Unify, simplify logic, reduce UTF8 specific handling.
> 
> Intend of this is also to have stream views.
> Stream views can be used concurrently by multiple threads, but
> only one thread may fetch new data from the underlying source.
> This together with unified stream view creation is intended to be
> used for parse tasks.
> 
> BUG=v8:6093
> 
> Change-Id: Ied8e93090c506d4735080298f0fdaeed32043915
> Reviewed-on: https://chromium-review.googlesource.com/501789
> Commit-Queue: Wiktor Garbacz <wiktorg@google.com>
> Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
> Reviewed-by: Marja Hölttä <marja@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#45336}

TBR=marja@chromium.org,vogelheim@chromium.org,jochen@chromium.org,wiktorg@google.com
BUG=v8:6093, chromium:724166

Change-Id: I022a23b8052d20d83a640c07b7864c622548bf90
Reviewed-on: https://chromium-review.googlesource.com/508888
Reviewed-by: Adam Klein <adamk@chromium.org>
Commit-Queue: Adam Klein <adamk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45404}
2017-05-18 19:28:58 +00:00
Wiktor Garbacz
ce538f70c1 [parser] Refactor streaming scanner streams.
Unify, simplify logic, reduce UTF8 specific handling.

Intend of this is also to have stream views.
Stream views can be used concurrently by multiple threads, but
only one thread may fetch new data from the underlying source.
This together with unified stream view creation is intended to be
used for parse tasks.

BUG=v8:6093

Change-Id: Ied8e93090c506d4735080298f0fdaeed32043915
Reviewed-on: https://chromium-review.googlesource.com/501789
Commit-Queue: Wiktor Garbacz <wiktorg@google.com>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45336}
2017-05-16 11:34:41 +00:00
Wiktor Garbacz
4f5dcd4b4d [parser] Fix chunked utf8 stream handling.
BUG=v8:6377

Change-Id: I5bdd41bdda83d7efe4b37d24d44e2e8c2339a30a
Reviewed-on: https://chromium-review.googlesource.com/500168
Commit-Queue: Wiktor Garbacz <wiktorg@google.com>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45204}
2017-05-09 18:41:00 +00:00
bbudge
771e86fdf2 Remove Factory::NewStringFromASCII method.
BUG=none

Review-Url: https://codereview.chromium.org/2759513002
Cr-Commit-Position: refs/heads/master@{#43913}
2017-03-17 17:52:50 +00:00
ishell@chromium.org
32971301ea Rename TypeFeedbackVector to FeedbackVector.
... and TypeFeedbackMetadata to FeedbackMetadata.

BUG=

Change-Id: I2556d1c2a8f37b8cf3d532cc98d973b6dc7e9e6c
Reviewed-on: https://chromium-review.googlesource.com/439244
Commit-Queue: Igor Sheludko <ishell@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Michael Stanton <mvstanton@chromium.org>
Reviewed-by: Jaroslav Sevcik <jarin@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Reviewed-by: Hannes Payer <hpayer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#42999}
2017-02-07 14:46:36 +00:00
vogelheim
10bb974ec3 [scanner] Regression test for Utf-8 BOM handling (crbug.com/685618).
The existing unit test explicitly checked for this case, but was - under
the right circumstances - defeated by the optimization to not re-run the
whole position search if we were close enough.

BUG=chromium:685618

Review-Url: https://codereview.chromium.org/2663883002
Cr-Commit-Position: refs/heads/master@{#42794}
2017-01-30 23:21:03 +00:00
verwaest
ce63eb08f9 [counters] Move waiting for more data from background-parsing into callbacks
BUG=

Review-Url: https://codereview.chromium.org/2549083002
Cr-Commit-Position: refs/heads/master@{#41492}
2016-12-05 15:47:12 +00:00
predrag.rudic
f04a9b4936 Fix 'MIPS: Fix Utf16CharacterStream scanner crash due to missaligned access'
Removed a wrong condition test in  TwoByteExternalBufferedStream. This changed fixes errors that may occur under some conditions.

Review-Url: https://codereview.chromium.org/2469723002
Cr-Commit-Position: refs/heads/master@{#40722}
2016-11-03 12:32:16 +00:00
vogelheim
138127a608 Fix bad-char handling in utf-8 streaming streams. Also add test.
R=jochen@chromium.org
BUG=chromium:651333, v8:4947

Review-Url: https://codereview.chromium.org/2391273002
Cr-Commit-Position: refs/heads/master@{#40004}
2016-10-05 17:18:58 +00:00
vogelheim
a2b8b6e7db Handle Utf-8 BOM at beginning of an Utf-8 stream.
(This should enable to drop the BOM handling in the Blink bindings.)

R=marja@chromium.org
BUG=v8:4947

Review-Url: https://codereview.chromium.org/2354973002
Cr-Commit-Position: refs/heads/master@{#39579}
2016-09-21 08:40:10 +00:00
vogelheim
b36c60cce8 Remove legacy API on Utf16CharacterStream.
BUG=v8:4947

Review-Url: https://codereview.chromium.org/2347883002
Cr-Commit-Position: refs/heads/master@{#39533}
2016-09-20 09:44:00 +00:00
vogelheim
642d6d314c Rework scanner-character-streams.
- Smaller, more consistent streams API (Advance, Back, pos, Seek)
- Remove implementations from the header, in favor of creation functions.

Observe:
- Performance:
  - All Utf16CharacterStream methods have an inlinable V8_LIKELY w/ a
    body of only a few instructions. I expect most calls to end up there.
  - There used to be performance problems w/ bookmarking, particularly
    with copying too much data on SetBookmark w/ UTF-8 streaming streams.
    All those copies are gone.
  - The old streaming streams implementation used to copy data even for
    2-byte input. It no longer does.
  - The only remaining 'slow' method is the Seek(.) slow case for utf-8
    streaming streams. I don't expect this to be called a lot; and even if,
    I expect it to be offset by the gains in the (vastly more frequent)
    calls to the other methods or the 'fast path'.
  - If it still bothers us, there are several ways to speed it up.
- API & code cleanliness:
  - I want to remove the 'old' API in a follow-up CL, which should mostly
    delete code, or replace it 1:1.
  - In a 2nd follow-up I want to delete much of the UTF-8 handling in Blink
    for streaming streams.
  - The "bookmark" is now always implemented (and mostly very fast), so we
    should be able to use it for more things.
- Testing & correctness:
  - The unit tests now cover all stream implementations,
    and are pretty good and triggering all the edge cases.
  - Vastly more DCHECKs of the invariants.

BUG=v8:4947

Review-Url: https://codereview.chromium.org/2314663002
Cr-Commit-Position: refs/heads/master@{#39464}
2016-09-16 08:29:52 +00:00