Port commit 3e3a027da1
Beside, some registers are changed to callee-saved, and the previous
related save and restore operations are removed.
Bug: v8:11382
Change-Id: Ic3161f8173771c1b7c190c77cbaf2534f52ec422
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3281673
Reviewed-by: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Commit-Queue: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Auto-Submit: Liu yu <liuyu@loongson.cn>
Cr-Commit-Position: refs/heads/main@{#77902}
This CL adds an Allocator to SmallVector to control how dynamic
storage is managed. The default value uses the plain old C++
std::allocator<T>, i.e. acts like malloc/free.
For use with zone memory, one can pass a ZoneAllocator as follows:
// Allocates in zone memory.
base::SmallVector<int, kInitialSize, ZoneAllocator<int>>
xs(ZoneAllocator<int>(zone));
Note: this is a follow-up to crrev.com/c/3240823.
Drive-by: hide the internal `reset` function. It doesn't free the
dynamic backing store; that's a surprise and should not be exposed to
external use.
Change-Id: I1f92f184924541e2269493fb52c30f2fdec032be
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3257711
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Reviewed-by: Igor Sheludko <ishell@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77755}
Capture group names were extended in
https://github.com/tc39/ecma262/pull/1869/fileshttps://github.com/tc39/ecma262/pull/1932/files
RegExpIdentifierName now explicitly enables unicode (+U) for
unicode escape sequences; likewise, surrogate pairs are now allowed
unconditionally.
The implementation simply switches on unicode temporarily while
parsing a capture group name.
Good news everyone, /(?<𝒜>.)/ is now a legal pattern.
Bug: v8:10384
Change-Id: Ida805998eb91ed717b2e05d81d52c1ed61104e3f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3233234
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77722}
.. as a custom data structure with questionable value.
Also: a few drive-by refactors.
Change-Id: I74957b70c4357795dc46ef5520d58b6a78be31b2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3240823
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77674}
Unfortunately, CharacterRanges may use 0x10ffff as a marker value
signifying 'highest possible code unit' irrespective of whether the
regexp instance has the unicode flag or not. This value makes it
through RegExpCharacterClass::ToNode unmodified (since no surrogate
desugaring takes place without /u). Correctly mask out the 0xffff
value for purposes of building our uint16_t range array.
Note: It'd be better to never introduce 0x10ffff in the first place,
but given the irregexp pipeline's lack of hackability I hesitate to
change this - we are sure to rely on it implicitly in other spots.
Drive-by: Refactors.
Fixed: chromium:1264508
Bug: v8:11069
Change-Id: Ib3c5780e91f682f1a6d15f26eb4cf03636d93c25
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3256549
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77673}
Character class handling in the irregexp pipeline is quite complex;
codepoints outside the BMP (basic multilingual plane) are only
translated into surrogate pairs when needed, e.g. when the subject
string is two-byte. If not needed, the codepoints simply stay part of
the list of CharacterRanges.
In EmitCharClass, we determine the valid subset of ranges through
ranges_length; until this CL, we forgot to pass that information on to
MakeRangeArray. Do that now by truncating the list of CharacterRanges.
Fixed: chromium:1262423
Change-Id: I5bb5b839e9935890ca2d10908ad66d72c3217178
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3240782
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77514}
Port 8bbb44e537
Port 7c08633bf6
Change-Id: Iebc3e223a0a7bc5f31ef0f21d8589e60ccdc0833
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3233695
Auto-Submit: Yahan Lu <yahan@iscas.ac.cn>
Commit-Queue: ji qiu <qiuji@iscas.ac.cn>
Reviewed-by: ji qiu <qiuji@iscas.ac.cn>
Cr-Commit-Position: refs/heads/main@{#77485}
Port 8bbb44e537
Original Commit Message:
Large character classes may easily be created when unicode
properties (e.g.: /\p{L}/u and /\P{L}/u) are used - these are
expanded internally into character classes that consist of hundreds
of character ranges. Previously to this CL, we'd emit branching code
for each of these ranges, leading to very large regexp code objects.
This CL adds a new codegen mode for large character classes (where
'large' currently means > 16 ranges). Instead of emitting branching
code inline, the ranges are written into a ByteArray and we call into
the C function IsCharacterInRangeArray for the actual branching logic.
The ByteArray is smaller than emitted code and is deduplicated if the
same character class is matched repeatedly in the same pattern.
Note this mode is *not* implemented for the interpreter, since we
currently don't have a constant pool for irregexp bytecode, and thus
cannot reference ByteArrays.
R=jgruber@chromium.org, joransiu@ca.ibm.com, junyan@redhat.com, midawson@redhat.com
BUG=
LOG=N
Change-Id: I2ded01fa2767e56e72be81b949eefb5fb85b7013
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3231981
Reviewed-by: Junliang Yan <junyan@redhat.com>
Commit-Queue: Milad Fa <mfarazma@redhat.com>
Cr-Commit-Position: refs/heads/main@{#77473}
Large character classes may easily be created when unicode
properties (e.g.: /\p{L}/u and /\P{L}/u) are used - these are
expanded internally into character classes that consist of hundreds
of character ranges. Previously to this CL, we'd emit branching code
for each of these ranges, leading to very large regexp code objects.
This CL adds a new codegen mode for large character classes (where
'large' currently means > 16 ranges). Instead of emitting branching
code inline, the ranges are written into a ByteArray and we call into
the C function IsCharacterInRangeArray for the actual branching logic.
The ByteArray is smaller than emitted code and is deduplicated if the
same character class is matched repeatedly in the same pattern.
Note this mode is *not* implemented for the interpreter, since we
currently don't have a constant pool for irregexp bytecode, and thus
cannot reference ByteArrays.
Bug: v8:11069
Change-Id: I2d728e42d85114b796c637f791848731a104cd54
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3229377
Reviewed-by: Patrick Thier <pthier@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77463}
- Anonymous namespaces instead of static functions.
- Comments.
- Reserve enough space in the range ZoneList.
Change-Id: Ie79fda770974796cd590a155dc5fd504472e5bc9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3220341
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77391}
.. instead of referring to them through magic chars {s,S,w,W,d,D,n,.,*}.
Change-Id: Ib50937a2a7d4229a021377586a54be3db9ed8c1d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3217196
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77337}
Marking the labels as unused is only needed when we abort code
generation. Otherwise the DCHECKs in the label destructors are useful to
catch bugs.
R=jgruber@chromium.org
Bug: v8:12244
Change-Id: I63198f98a7acd1f2528d31964c01bc6815ba99a9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3205899
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77267}
Replace 'virtual' by 'override' when overriding methods.
This uncovered one method which was unnecessarily virtual:
{RegExpMacroAssemblerARM64::CheckCharacters}.
R=jgruber@chromium.org
Bug: v8:12244
Change-Id: Ia4480b7b234d3d40cc5821c38ef83f74f8421b6b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3204966
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77252}
When aborting code generation, we need to call {AbortedCodeGeneration}
on the {MacroAssembler} contained in the {RegExpMacroAssemblerARM}.
R=jgruber@chromium.org
Bug: chromium:1255368
Change-Id: If37351e8f5715e23affd21ad2de8a8eaad3ea094
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3204965
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77250}
When scanning for capture groups, we have to consider the case that the
current state is inside a character class. In that case skip everything
until the end of the current character class. Otherwise we would wrongly
count open brackets inside the character class as start of a capture
group.
Bug: chromium:1254704
Change-Id: I91d2177c464f7e507413d96216fe570253f17676
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3199871
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77204}
Port 3e3a027da1
Original Commit Message:
Irregexp reentrancy (crrev.com/c/3162604) introduced a bug for global
regexp execution in which each iteration would use a new stack region
(i.e. we forgot to pop the regexp stack pointer when starting a new
iteration).
This CL fixes that by popping the stack pointer on the loop backedge.
At a high level:
- Initialize the backtrack_stackpointer earlier and avoid clobbering
it by setup code.
- Pop it on the loop backedge.
- Slightly refactor Push/Pop operations to avoid unneeded memory
accesses.
R=jgruber@chromium.org, joransiu@ca.ibm.com, junyan@redhat.com, midawson@redhat.com
BUG=
LOG=N
Change-Id: Iafe6814d3695e83fced6a46209accf5e712d56f6
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3198391
Reviewed-by: Junliang Yan <junyan@redhat.com>
Commit-Queue: Milad Fa <mfarazma@redhat.com>
Cr-Commit-Position: refs/heads/main@{#77180}
Irregexp reentrancy (crrev.com/c/3162604) introduced a bug for global
regexp execution in which each iteration would use a new stack region
(i.e. we forgot to pop the regexp stack pointer when starting a new
iteration).
This CL fixes that by popping the stack pointer on the loop backedge.
At a high level:
- Initialize the backtrack_stackpointer earlier and avoid clobbering
it by setup code.
- Pop it on the loop backedge.
- Slightly refactor Push/Pop operations to avoid unneeded memory
accesses.
Bug: v8:11382
Change-Id: Ibad6235767e110089a2b346034f923590b286a05
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3194251
Reviewed-by: Patrick Thier <pthier@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77158}
.. and refactor js-regexp.h.
- Hide the generic DataAt/SetDataAt accessors and replace them by
dedicated accessors. Use the common lower_case naming scheme for
these.
- Shuffle around definitions in js-regexp.h s.t. they are in a
meaningful order.
- Dedupe the source/flags accessors - these fields are stored both
on the instance and on the data array. We keep only accessors for
the instance. Previously, these were disambiguated through naming
oddities (e.g. Pattern() returned data->source).
Change-Id: I3d53c8b095f0d59621ff779608438f7fa5e8c92a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3193534
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Camillo Bruni <cbruni@chromium.org>
Reviewed-by: Maya Lekova <mslekova@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77138}
This is a reland of 7d849870ff
Original change's description:
> [regexp] Reorganize and deduplicate in the regexp parser
>
> The parser is organized in a somewhat tricky way s.t. it can be
> hard to map the implementation back to the specified grammar.
>
> In particular, the logic for CharacterClassEscape, ClassEscape,
> and CharacterEscape was implemented twice - once inside a character
> class, once outside.
>
> This CL refactors related logic to have only a single implementation.
>
> As a drive-by, fix one related inconsistency related to \k inside
> a character class.
>
> Fixed: v8:10602
> Change-Id: I5858840159694fa6f8d1aa857027db80754e3dfd
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3178966
> Reviewed-by: Mathias Bynens <mathias@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/main@{#77114}
Fixed: v8:10602,chromium:1253976
Change-Id: I9e7cc6a34d3be06e1a68895775aa50b0eee78c57
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3193531
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77135}
This reverts commit 7d849870ff.
Reason for revert: Will block roll. Broke error message tests upstream:
https://ci.chromium.org/ui/p/v8/builders/ci/V8%20Blink%20Win/6635/overview
Original change's description:
> [regexp] Reorganize and deduplicate in the regexp parser
>
> The parser is organized in a somewhat tricky way s.t. it can be
> hard to map the implementation back to the specified grammar.
>
> In particular, the logic for CharacterClassEscape, ClassEscape,
> and CharacterEscape was implemented twice - once inside a character
> class, once outside.
>
> This CL refactors related logic to have only a single implementation.
>
> As a drive-by, fix one related inconsistency related to \k inside
> a character class.
>
> Fixed: v8:10602
> Change-Id: I5858840159694fa6f8d1aa857027db80754e3dfd
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3178966
> Reviewed-by: Mathias Bynens <mathias@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/main@{#77114}
Change-Id: Ic7404d6c9f0e6ea51e8cd8f1ab672856dca0c637
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3190692
Auto-Submit: Shu-yu Guo <syg@chromium.org>
Commit-Queue: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Cr-Commit-Position: refs/heads/main@{#77125}
The parser is organized in a somewhat tricky way s.t. it can be
hard to map the implementation back to the specified grammar.
In particular, the logic for CharacterClassEscape, ClassEscape,
and CharacterEscape was implemented twice - once inside a character
class, once outside.
This CL refactors related logic to have only a single implementation.
As a drive-by, fix one related inconsistency related to \k inside
a character class.
Fixed: v8:10602
Change-Id: I5858840159694fa6f8d1aa857027db80754e3dfd
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3178966
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77114}
Port e301d71ff5
[compiler] Teach InstructionScheduler about protected memory accesses
Port a0ace8a8a5
[wasm] Interpret table.grow result as 32 bit
Port [regexp] Fix UAF in RegExpMacroAssembler
Change-Id: Ieac5e4deae9c6bbf844788d927f5201b906495f6
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3189213
Commit-Queue: Ji Qiu <qiuji@iscas.ac.cn>
Reviewed-by: Ji Qiu <qiuji@iscas.ac.cn>
Cr-Commit-Position: refs/heads/main@{#77108}
.. by turning `masm_` into a unique_ptr s.t. it's freed after the
NoRootArrayScope which references it.
Fixed: chromium:1252620
Change-Id: I24580c5a96d76a973b2b083e7a76b95f93bb6068
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3185459
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Patrick Thier <pthier@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77082}
Port: bba7c09aad
Original Commit Message:
.. by reusing the regexp stack from potentially multiple nested
irregexp activations.
To do this, we now maintain a stack pointer in RegExpStack. This stack
pointer is synchronized at all boundaries between generated irregexp
code and the outside world, i.e. when entering or returning from
irregexp code, and when calling into C functions such as GrowStack.
Fixed: v8:11382
Change-Id: I0f97363a069c65f4fbe081b2f9fa796f9d950f43
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3179030
Reviewed-by: Junliang Yan <junyan@redhat.com>
Commit-Queue: Milad Fa <mfarazma@redhat.com>
Cr-Commit-Position: refs/heads/main@{#77023}
The argument is no longer in use.
Bug: v8:11382
Change-Id: I7febc7fe7ef17ae462c700f0dba3ca1beade3021
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3173681
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77017}
The proposal has changed and we'll start on the new implementation
from scratch.
Bug: v8:11935, v8:7467
Change-Id: I29e39a414027d80fd91764ce02a05d7c032a41f7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3178964
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77016}
.. by reusing the regexp stack from potentially multiple nested
irregexp activations.
To do this, we now maintain a stack pointer in RegExpStack. This stack
pointer is synchronized at all boundaries between generated irregexp
code and the outside world, i.e. when entering or returning from
irregexp code, and when calling into C functions such as GrowStack.
Fixed: v8:11382
Change-Id: I5ed27630c1a64ebf3afb9ddf80fb60ea067c0c40
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3162604
Reviewed-by: Toon Verwaest <verwaest@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Commit-Queue: Toon Verwaest <verwaest@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77013}
If a stack overflow occurs inside the regexp parser, propagate that
information to the parser.
Bug: v8:896,chromium:1243989
Change-Id: I5ced27ff968ad97764e156643e1980b3a722af1a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3127717
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/main@{#76568}