Utf*Characterstream caches the data pointer of ExternalStrings through
ExternalStringStream, so lock the strings in ExternalStringStream.
Bug: chromium:877044
Cq-Include-Trybots: luci.chromium.try:linux_chromium_rel_ng
Change-Id: I241caaf64e109b33e2f9982573e11c514410509c
Reviewed-on: https://chromium-review.googlesource.com/1194003
Commit-Queue: Benoit L <lizeb@chromium.org>
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Reviewed-by: Toon Verwaest <verwaest@chromium.org>
Reviewed-by: Ross McIlroy <rmcilroy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#55613}
Currently chunked and streaming character streams don't have cloning support as this
requires additional changes on the Blink side.
BUG=v8:8041
Change-Id: I167b3b1cd5ae1ac4038f3715b6a679d3e65d9a85
Reviewed-on: https://chromium-review.googlesource.com/1183429
Reviewed-by: Toon Verwaest <verwaest@chromium.org>
Commit-Queue: Ross McIlroy <rmcilroy@chromium.org>
Cr-Commit-Position: refs/heads/master@{#55444}
This reverts the following 3 CLs:
Revert "[scanner] Templatize scan functions by encoding"
Revert "[asm] Remove invalid static cast of character stream"
Revert "[scanner] Prepare CharacterStreams for specializing scanner and parser by character type"
The original idea behind this work was to avoid copying, converting and
buffering characters to be scanned by specializing the scanner functions. The
additional benefit was for scanner functions to have a bigger window over the
input. Even though we can get a pretty nice speedup from having a larger
window, in practice this rarely helps. The cost is a larger binary.
Since we can't eagerly convert utf8 to utf16 due to memory overhead, we'd also
need to have a specialized version of the scanner just for utf8. That's pretty
complex, and likely won't be better than simply bulk converting and buffering
utf8 as utf16.
Change-Id: Ic3564683932a0097e3f9f51cd88f62c6ac879dcb
Reviewed-on: https://chromium-review.googlesource.com/1183190
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Commit-Queue: Toon Verwaest <verwaest@chromium.org>
Cr-Commit-Position: refs/heads/master@{#55258}
This way we can avoid reencoding everything to utf16 (buffered) and avoid the
overhead of needing to check the encoding for each character individually.
This may result in a minor asm.js scanning regression due to one-byte tokens
possibly being more common.
Change-Id: I90b51c256d56d4f4fa2d235d7e1e58fc01e43f31
Reviewed-on: https://chromium-review.googlesource.com/1172437
Commit-Queue: Toon Verwaest <verwaest@chromium.org>
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#55217}
This templatizes CharacterStream by char type, and makes them subclass ScannerStream.
Methods that are widely used by tests are marked virtual on ScannerStream and final on
CharacterStream<T> so the specialized scanner will know what to call. ParseInfo passes
around ScannerStream, but the scanner requires the explicit CharacterStream<T>. Since
AdvanceUntil is templatized by FunctionType, I couldn't mark that virtual; so instead
I adjusted those tests to operate directly on ucs2 (not utf8 since we'll drop that in
the future).
In the end no functionality was changed. Some calls became virtual in tests. This is
mainly just preparation.
Change-Id: I0b4def65d3eb8fa5c806027c7e9123a590ebbdb5
Reviewed-on: https://chromium-review.googlesource.com/1156690
Commit-Queue: Toon Verwaest <verwaest@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#54848}
AdvanceUntil allows the Utf16CharacterStream to advance until a charater is found
that passes the check.
Bug: v8:7926
Change-Id: Iae39fb24194aa0ee2f544a55a7847956aa324b64
Reviewed-on: https://chromium-review.googlesource.com/1151303
Commit-Queue: Florian Sattler <sattlerf@google.com>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#54783}
This CL depends on Reland^2 "Avoiding re-externalization of strings"
(Idb1b6d1b29499f66bf8cd704977c40b027f99dbd)..
Previously landed as Ied341ec6268000343d2a577b22f2a483460b01f5 and
I3fe2b294f6e038d77787cf0870d244ba7cc20550
Previously reviewed at https://chromium-review.googlesource.com/1121736 and
https://chromium-review.googlesource.com/1118164
Bug: chromium:845409
Change-Id: Ied50bbcaa22a90ecaf15dca19dbc9aaec1737223
Reviewed-on: https://chromium-review.googlesource.com/1147227
Reviewed-by: Michael Lippautz <mlippautz@chromium.org>
Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Commit-Queue: Rodrigo Bruno <rfbpb@google.com>
Cr-Commit-Position: refs/heads/master@{#54712}
The embedder should ultimately be responsible for handling this since they
anyway give us a copy of the data. They can easily make sure that the chunks we
get do not have lonely bytes.
Cq-Include-Trybots: luci.chromium.try:linux_chromium_rel_ng
Change-Id: Ie862107bbbdd00c4d904fbb457a206c2fd52e5d0
Reviewed-on: https://chromium-review.googlesource.com/1127044
Reviewed-by: Ulan Degenbaev <ulan@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Commit-Queue: Toon Verwaest <verwaest@chromium.org>
Cr-Commit-Position: refs/heads/master@{#54262}
There is no good reason to have the meat of most objects' initialization
logic in heap.cc, all wrapped by the CALL_HEAP_FUNCTION macro. Instead,
this CL changes the protocol between Heap and Factory to be AllocateRaw,
and all object initialization work after (possibly retried) successful
raw allocation happens in the Factory.
This saves about 20KB of binary size on x64.
Original review: https://chromium-review.googlesource.com/c/v8/v8/+/959533
Originally landed as r52416 / f9a2e24bbc
Cq-Include-Trybots: luci.v8.try:v8_linux_noi18n_rel_ng
Change-Id: Id072cbe6b3ed30afd339c7e502844b99ca12a647
Reviewed-on: https://chromium-review.googlesource.com/1000540
Commit-Queue: Jakob Kummerow <jkummerow@chromium.org>
Reviewed-by: Hannes Payer <hpayer@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Cr-Commit-Position: refs/heads/master@{#52492}
This reverts commit f9a2e24bbc.
Reason for revert: gc stress failures not all fixed by follow up.
Original change's description:
> [cleanup] Refactor the Factory
>
> There is no good reason to have the meat of most objects' initialization
> logic in heap.cc, all wrapped by the CALL_HEAP_FUNCTION macro. Instead,
> this CL changes the protocol between Heap and Factory to be AllocateRaw,
> and all object initialization work after (possibly retried) successful
> raw allocation happens in the Factory.
>
> This saves about 20KB of binary size on x64.
>
> Cq-Include-Trybots: luci.v8.try:v8_linux_noi18n_rel_ng
> Change-Id: Icbfdc4266d7be8b48d2fe085f03411743dc6a0ca
> Reviewed-on: https://chromium-review.googlesource.com/959533
> Commit-Queue: Jakob Kummerow <jkummerow@chromium.org>
> Reviewed-by: Hannes Payer <hpayer@chromium.org>
> Reviewed-by: Yang Guo <yangguo@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#52416}
TBR=jkummerow@chromium.org,yangguo@chromium.org,mstarzinger@chromium.org,hpayer@chromium.org
Change-Id: Idbbc53478742f3e9525eee83342afc6aedae122f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Cq-Include-Trybots: luci.v8.try:v8_linux_noi18n_rel_ng
Reviewed-on: https://chromium-review.googlesource.com/999414
Reviewed-by: Michael Achenbach <machenbach@chromium.org>
Commit-Queue: Michael Achenbach <machenbach@chromium.org>
Cr-Commit-Position: refs/heads/master@{#52420}
There is no good reason to have the meat of most objects' initialization
logic in heap.cc, all wrapped by the CALL_HEAP_FUNCTION macro. Instead,
this CL changes the protocol between Heap and Factory to be AllocateRaw,
and all object initialization work after (possibly retried) successful
raw allocation happens in the Factory.
This saves about 20KB of binary size on x64.
Cq-Include-Trybots: luci.v8.try:v8_linux_noi18n_rel_ng
Change-Id: Icbfdc4266d7be8b48d2fe085f03411743dc6a0ca
Reviewed-on: https://chromium-review.googlesource.com/959533
Commit-Queue: Jakob Kummerow <jkummerow@chromium.org>
Reviewed-by: Hannes Payer <hpayer@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Cr-Commit-Position: refs/heads/master@{#52416}
This patch normalizes the casing of hexadecimal digits in escape
sequences of the form `\xNN` and integer literals of the form
`0xNNNN`.
Previously, the V8 code base used an inconsistent mixture of uppercase
and lowercase.
Google’s C++ style guide uses uppercase in its examples:
https://google.github.io/styleguide/cppguide.html#Non-ASCII_Characters
Moreover, uppercase letters more clearly stand out from the lowercase
`x` (or `u`) characters at the start, as well as lowercase letters
elsewhere in strings.
BUG=v8:7109
TBR=marja@chromium.org,titzer@chromium.org,mtrofin@chromium.org,mstarzinger@chromium.org,rossberg@chromium.org,yangguo@chromium.org,mlippautz@chromium.org
NOPRESUBMIT=true
Cq-Include-Trybots: master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.linux:linux_chromium_rel_ng
Change-Id: I790e21c25d96ad5d95c8229724eb45d2aa9e22d6
Reviewed-on: https://chromium-review.googlesource.com/804294
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Reviewed-by: Jakob Kummerow <jkummerow@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49810}
This fix is two-fold:
1) Incremental UTF-8 decoding: Unify incorrect UTF-8 handling between V8 and
Blink.
Incremental UTF-8 decoding used to allow some overlong sequences / invalid code
points which Blink treated as errors. This caused the decoder and the Blink
UTF-8 decoder to produce a different number of bytes, resulting in random
failures when scripts were streamed (especially, this was detected by the
skipping inner functions feature which adds CHECKs against expected function
positions).
2) Non-incremental UTF-8 decoding: return the correct amount of invalid characters.
According to the encoding spec ( https://encoding.spec.whatwg.org/#utf-8-decoder
), the first byte of an overlong sequence / invalid code point generates an
invalid character, and the rest of the bytes are not processed (i.e., pushed
back to the byte stream). When they're handled, they will look like lonely
continuation bytes, and will generate an invalid character each.
As a result, an overlong 4-byte sequence should generate 4 invalid characters
(not 1).
This is a potentially breaking change, since the (non-incremental) UTF-8
decoding is exposed via the API (String::NewFromUtf8). The behavioral difference
happens when the client is passing in invalid UTF-8 (containing overlong /
surrogate sequences).
However, afaict, this doesn't change the semantics of any JavaScript program:
according to the ECMAScript spec, the program is a sequence of Unicode code
points, and there's no way to invoke the UTF-8 decoding functionalities from
inside JavaScript. Though, this changes the behavior of d8 when decoding source
files which are invalid UTF-8.
This doesn't change anything related to URI decoding (it already throws
exceptions for overlong sequences / invalid code points).
BUG: chromium:765608, chromium:758236, v8:5516
Bug:
Change-Id: Ib029f6a8e87186794b092e4e8af32d01cee3ada0
Reviewed-on: https://chromium-review.googlesource.com/671020
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Franziska Hinkelmann <franzih@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Cr-Commit-Position: refs/heads/master@{#48105}
The bug occurred when we detected an erroneous char late, and put the last
character in a chunk into the "incomplete char" buffer. It was not correctly
retrieved when seeking.
BUG=v8:6836
Change-Id: I8ca946dfdb39244c5ca0bdcebe047047010b3a07
Reviewed-on: https://chromium-review.googlesource.com/670729
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Cr-Commit-Position: refs/heads/master@{#48066}
U+feff is the UTF BOM but if it occurs inside the text, it's a "zero-width
no-break space". However, the UTF-8 decoder in script streaming still thought
it's a BOM and skipped it. The correct way to handle it would be to create a
U+feff code point instead - the Scanner will then handle it as whitespace.
This is a discrepancy between the Blink UTF-8 decoder and the V8 UTF-8 decoder,
and caused the source positions be off by one. This bug went unnoticed, since
normally off-by-one in this situation doesn't make the code to break.
BUG=chromium:758508,chromium:758236
Change-Id: Ib92a3ee65c402e21b77e42537db2a021cff55379
Reviewed-on: https://chromium-review.googlesource.com/632096
Reviewed-by: Adam Klein <adamk@chromium.org>
Commit-Queue: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#47583}
Unify, simplify logic, reduce UTF8 specific handling.
Intend of this is also to have stream views.
Stream views can be used concurrently by multiple threads, but
only one thread may fetch new data from the underlying source.
This together with unified stream view creation is intended to be
used for parse tasks.
BUG=v8:6093
Change-Id: I83c6f1e6ad280c28da690da41c466dfcbb7915e6
Reviewed-on: https://chromium-review.googlesource.com/535474
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Commit-Queue: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45994}
Also, as this is hard to track down, always DCHECK position after ReadBlock().
Change-Id: Ie32c3a311dd8df91f651b6d82ccacc7c95e6fde0
Reviewed-on: https://chromium-review.googlesource.com/528196
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45811}
This reverts commit 7fa071a48b.
Reason for revert: https://bugs.chromium.org/p/chromium/issues/detail?id=729482
Original change's description:
> Reland [parser] Refactor streaming scanner streams.
>
> Unify, simplify logic, reduce UTF8 specific handling.
>
> Intend of this is also to have stream views.
> Stream views can be used concurrently by multiple threads, but
> only one thread may fetch new data from the underlying source.
> This together with unified stream view creation is intended to be
> used for parse tasks.
>
> BUG=v8:6093
>
> Change-Id: I3bce48185fa2c986d16619a9a8ece3ff4c4f5e60
> Reviewed-on: https://chromium-review.googlesource.com/509489
> Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
> Reviewed-by: Marja Hölttä <marja@chromium.org>
> Commit-Queue: Wiktor Garbacz <wiktorg@google.com>
> Cr-Commit-Position: refs/heads/master@{#45688}
TBR=marja@chromium.org,vogelheim@chromium.org,wiktorg@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
BUG=v8:6093
Change-Id: Iefa7c43a2f6ae3a7f3ef0f77d87b6ae36ae4be99
Reviewed-on: https://chromium-review.googlesource.com/525712
Reviewed-by: Marja Hölttä <marja@chromium.org>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Commit-Queue: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45725}
Unify, simplify logic, reduce UTF8 specific handling.
Intend of this is also to have stream views.
Stream views can be used concurrently by multiple threads, but
only one thread may fetch new data from the underlying source.
This together with unified stream view creation is intended to be
used for parse tasks.
BUG=v8:6093
Change-Id: I3bce48185fa2c986d16619a9a8ece3ff4c4f5e60
Reviewed-on: https://chromium-review.googlesource.com/509489
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Commit-Queue: Wiktor Garbacz <wiktorg@google.com>
Cr-Commit-Position: refs/heads/master@{#45688}
This reverts commit ce538f70c1.
Reason for revert: breaks BOM handling (thus breaking Outlook web apps).
Original change's description:
> [parser] Refactor streaming scanner streams.
>
> Unify, simplify logic, reduce UTF8 specific handling.
>
> Intend of this is also to have stream views.
> Stream views can be used concurrently by multiple threads, but
> only one thread may fetch new data from the underlying source.
> This together with unified stream view creation is intended to be
> used for parse tasks.
>
> BUG=v8:6093
>
> Change-Id: Ied8e93090c506d4735080298f0fdaeed32043915
> Reviewed-on: https://chromium-review.googlesource.com/501789
> Commit-Queue: Wiktor Garbacz <wiktorg@google.com>
> Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
> Reviewed-by: Marja Hölttä <marja@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#45336}
TBR=marja@chromium.org,vogelheim@chromium.org,jochen@chromium.org,wiktorg@google.com
BUG=v8:6093, chromium:724166
Change-Id: I022a23b8052d20d83a640c07b7864c622548bf90
Reviewed-on: https://chromium-review.googlesource.com/508888
Reviewed-by: Adam Klein <adamk@chromium.org>
Commit-Queue: Adam Klein <adamk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45404}
Unify, simplify logic, reduce UTF8 specific handling.
Intend of this is also to have stream views.
Stream views can be used concurrently by multiple threads, but
only one thread may fetch new data from the underlying source.
This together with unified stream view creation is intended to be
used for parse tasks.
BUG=v8:6093
Change-Id: Ied8e93090c506d4735080298f0fdaeed32043915
Reviewed-on: https://chromium-review.googlesource.com/501789
Commit-Queue: Wiktor Garbacz <wiktorg@google.com>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45336}
... and TypeFeedbackMetadata to FeedbackMetadata.
BUG=
Change-Id: I2556d1c2a8f37b8cf3d532cc98d973b6dc7e9e6c
Reviewed-on: https://chromium-review.googlesource.com/439244
Commit-Queue: Igor Sheludko <ishell@chromium.org>
Reviewed-by: Michael Starzinger <mstarzinger@chromium.org>
Reviewed-by: Michael Stanton <mvstanton@chromium.org>
Reviewed-by: Jaroslav Sevcik <jarin@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Reviewed-by: Hannes Payer <hpayer@chromium.org>
Cr-Commit-Position: refs/heads/master@{#42999}
The existing unit test explicitly checked for this case, but was - under
the right circumstances - defeated by the optimization to not re-run the
whole position search if we were close enough.
BUG=chromium:685618
Review-Url: https://codereview.chromium.org/2663883002
Cr-Commit-Position: refs/heads/master@{#42794}
Removed a wrong condition test in TwoByteExternalBufferedStream. This changed fixes errors that may occur under some conditions.
Review-Url: https://codereview.chromium.org/2469723002
Cr-Commit-Position: refs/heads/master@{#40722}
- Smaller, more consistent streams API (Advance, Back, pos, Seek)
- Remove implementations from the header, in favor of creation functions.
Observe:
- Performance:
- All Utf16CharacterStream methods have an inlinable V8_LIKELY w/ a
body of only a few instructions. I expect most calls to end up there.
- There used to be performance problems w/ bookmarking, particularly
with copying too much data on SetBookmark w/ UTF-8 streaming streams.
All those copies are gone.
- The old streaming streams implementation used to copy data even for
2-byte input. It no longer does.
- The only remaining 'slow' method is the Seek(.) slow case for utf-8
streaming streams. I don't expect this to be called a lot; and even if,
I expect it to be offset by the gains in the (vastly more frequent)
calls to the other methods or the 'fast path'.
- If it still bothers us, there are several ways to speed it up.
- API & code cleanliness:
- I want to remove the 'old' API in a follow-up CL, which should mostly
delete code, or replace it 1:1.
- In a 2nd follow-up I want to delete much of the UTF-8 handling in Blink
for streaming streams.
- The "bookmark" is now always implemented (and mostly very fast), so we
should be able to use it for more things.
- Testing & correctness:
- The unit tests now cover all stream implementations,
and are pretty good and triggering all the edge cases.
- Vastly more DCHECKs of the invariants.
BUG=v8:4947
Review-Url: https://codereview.chromium.org/2314663002
Cr-Commit-Position: refs/heads/master@{#39464}