Commit Graph

715 Commits

Author SHA1 Message Date
Jakob Gruber
2edff88402 [regexp] Standardize handling of stack overflow crash in ToNode
Use the FatalProcessOutOfMemory function such that tooling recognizes
these crashes as OOM's.

Drive-by: Skip one more test that leads to such stack overflows.

Fixed: v8:12555, chromium:1288456
Bug: v8:12472
Change-Id: Ib9203a4aa0487744f7cea9a212aeeffda579ae23
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3401861
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/main@{#78692}
2022-01-20 09:04:59 +00:00
Jakob Gruber
abbb54ed5a [regexp] Extend case-insensitive handling in RationalizeConsecutiveAtoms
Apply case-insensitive comparisons not only for the initial character,
but for the entire prefix. This avoids degenerate behavior for patterns
like /aaaa|AAAA|AAAA/i (i.e. generate a single 4-char prefix instead of
four 1-char prefixes).

Bug: v8:12472
Change-Id: Ib2b49fe73ca846a1b7ec90056cc64bdf5cf33026
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3398114
Reviewed-by: Patrick Thier <pthier@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#78668}
2022-01-18 14:41:22 +00:00
Jakob Gruber
cbddd61d60 [regexp] Periodically check for stack overflow during node generation
Recursive ToNode node generation may overflow the stack for large
graphs. As a quick fix, insert periodic stack overflow checks in
selected ToNode methods.

As a more permanent fix, in the future we could abort gracefully
(instead of crashing on a CHECK), and/or refactor into iterative node
generation.

Bug: v8:12472
Change-Id: Ie5fbe838c5f6a5192d7d9b44bfe6f6c76a8d26e7
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3398112
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#78667}
2022-01-18 12:59:31 +00:00
Lu Yahan
0dbcfe1fde [riscv64] Improve unaligned memory accesses
This commit allows using unaligned load/store, which is more efficient
for 2 bytes,4 bytes and 8 bytes memory access.
Use RISCV_HAS_NO_UNALIGNED to control whether enable the fast path or not.

Change-Id: I1d321e6e5fa5bc31541c8dbfe582881d80743483
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3329803
Reviewed-by: ji qiu <qiuji@iscas.ac.cn>
Commit-Queue: Yahan Lu <yahan@iscas.ac.cn>
Cr-Commit-Position: refs/heads/main@{#78427}
2021-12-22 01:56:43 +00:00
Lu Yahan
b66334313c [riscv64] use callee save register in regexp
Bug: v8:12502

Change-Id: I8d1b599fc945e276b70901953368768594470204
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3343861
Reviewed-by: ji qiu <qiuji@iscas.ac.cn>
Commit-Queue: ji qiu <qiuji@iscas.ac.cn>
Cr-Commit-Position: refs/heads/main@{#78421}
2021-12-21 04:32:02 +00:00
Igor Sheludko
a7db5fcbad [ext-code-space][compiler] Support calling CodeT targets
... in order to avoid Code <-> CodeT conversions in builtins.
This CL changes the meaning of RelocInfo::CODE_TARGET which now expects
CodeT objects as a code target.

In order to reduce code churn this CL makes BUILTIN_CODE and friends
return CodeT instead of Code. In the follow-up CLs BUILTIN_CODET and
friends will be removed.

Bug: v8:11880
Change-Id: Ib8f60973e55c60fc62ba84707471da388f8201b4
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3338483
Reviewed-by: Patrick Thier <pthier@chromium.org>
Reviewed-by: Nico Hartmann <nicohartmann@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Igor Sheludko <ishell@chromium.org>
Cr-Commit-Position: refs/heads/main@{#78393}
2021-12-16 13:45:12 +00:00
Jakob Gruber
2e17aaca2a [regexp] Fix CharacterRange limits again again again
When emitting code, character ranges must only specify ranges which
the actual subject string (one- or two-byte) may contain.

This was not always the case, specifically for ranges with
`from <= kMaxUint8` and `to > kMaxUint8`.

The reason this is so tricky: 1. not all parts of the pipeline know
whether we are compiling for one- or two-byte subjects; 2. for
case-insensitive regexps, an out-of-bounds CharacterRange may have an
in-bounds case equivalent (e.g. /[Ÿ]/i also matches 'ÿ' == \u{ff}),
which only gets added somewhere in the middle of the pipeline.

Our current solution is to clamp immediately before code emission. We
also keep the existing handling/dchecks of the 0x10ffff marker value
which may occur in the two-byte subject case.

Bug: v8:11069
Change-Id: Ic7b34a13a900ea2aa3df032daac9236bf5682a42
Fixed: chromium:1275096
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3306569
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/main@{#78186}
2021-12-01 15:13:09 +00:00
Ng Zhi An
6c6a602451 [regexp] Fix -Wshadow warnings
Bug: v8:12244,v8:12245
Change-Id: I38c9a767bd17f76bbf269ad79adc6798d94753a2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3273529
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77914}
2021-11-15 22:33:43 +00:00
Liu Yu
5ee6b7a701 [loong][mips][regexp] Fix stack growth for global regexps
Port commit 3e3a027da1

Beside, some registers are changed to callee-saved, and the previous
related save and restore operations are removed.

Bug: v8:11382

Change-Id: Ic3161f8173771c1b7c190c77cbaf2534f52ec422
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3281673
Reviewed-by: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Commit-Queue: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Auto-Submit: Liu yu <liuyu@loongson.cn>
Cr-Commit-Position: refs/heads/main@{#77902}
2021-11-15 12:48:04 +00:00
Ng Zhi An
7ce84cbb37 [regexp] Fix -Wshadow warnings
Bug: v8:12244,v8:12245
Change-Id: I5b908f056222c57e796fb76e86ceea9a77cde77f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3265066
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Zhi An Ng <zhin@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77782}
2021-11-09 01:31:57 +00:00
Jakob Gruber
c9d23462a5 [regexp] Fix yet another invalid use related to range arrays
`Equals` did not properly account for arrays with odd lengths.

Bug: v8:11069
Change-Id: I3264ebef248adcecd59b902bf1521cfddbd5a69d
Fixed: chromium:1267674
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3264218
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77756}
2021-11-08 09:51:53 +00:00
Jakob Gruber
3a858a91fa [base] Extend SmallVector for use with Zone storage
This CL adds an Allocator to SmallVector to control how dynamic
storage is managed. The default value uses the plain old C++
std::allocator<T>, i.e. acts like malloc/free.

For use with zone memory, one can pass a ZoneAllocator as follows:

  // Allocates in zone memory.
  base::SmallVector<int, kInitialSize, ZoneAllocator<int>>
    xs(ZoneAllocator<int>(zone));

Note: this is a follow-up to crrev.com/c/3240823.

Drive-by: hide the internal `reset` function. It doesn't free the
dynamic backing store; that's a surprise and should not be exposed to
external use.

Change-Id: I1f92f184924541e2269493fb52c30f2fdec032be
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3257711
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Reviewed-by: Igor Sheludko <ishell@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77755}
2021-11-08 07:52:46 +00:00
Jakob Gruber
f67dd50a16 [regexp] Update capture name parsing for recent spec changes
Capture group names were extended in

https://github.com/tc39/ecma262/pull/1869/files
https://github.com/tc39/ecma262/pull/1932/files

RegExpIdentifierName now explicitly enables unicode (+U) for
unicode escape sequences; likewise, surrogate pairs are now allowed
unconditionally.

The implementation simply switches on unicode temporarily while
parsing a capture group name.

Good news everyone, /(?<𝒜>.)/ is now a legal pattern.

Bug: v8:10384
Change-Id: Ida805998eb91ed717b2e05d81d52c1ed61104e3f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3233234
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77722}
2021-11-05 10:09:07 +00:00
Jakob Gruber
4593f3c6c6 [string] Micro-optimize String::Flatten
- Use a StringShape instead of repeatedly querying type.
- Add a shortcut for already-flat strings.
- Unhandlify where possible (all except SlowFlatten).
- Mark String::Flatten and StringShape methods V8_INLINE.
- Add a specialized ConsString::IsFlat overload.

Drive-by: Various (add const, remove this->, helper methods).

Bug: v8:12195
Change-Id: If20df12bc29c29cff2005fdc9bd826ed9f303463
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3259527
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Commit-Queue: Camillo Bruni <cbruni@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77701}
2021-11-04 10:43:44 +00:00
Jakob Gruber
f5274dfe75 [regexp] Check we've got a ByteArray in the interpreter
Happy hunting.

Bug: chromium:1262676
Change-Id: I0f3a5519cb9ed3dc4787acd61cb437ee8c2bf2d1
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3257716
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Igor Sheludko <ishell@chromium.org>
Reviewed-by: Igor Sheludko <ishell@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77681}
2021-11-03 13:17:39 +00:00
Jakob Gruber
a7e9b8f0a4 [regexp] Remove BufferedZoneList
.. as a custom data structure with questionable value.

Also: a few drive-by refactors.

Change-Id: I74957b70c4357795dc46ef5520d58b6a78be31b2
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3240823
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77674}
2021-11-03 09:33:02 +00:00
Jakob Gruber
bfa681ffb9 [regexp] Handle marker value 0x10ffff in MakeRangeArray
Unfortunately, CharacterRanges may use 0x10ffff as a marker value
signifying 'highest possible code unit' irrespective of whether the
regexp instance has the unicode flag or not. This value makes it
through RegExpCharacterClass::ToNode unmodified (since no surrogate
desugaring takes place without /u). Correctly mask out the 0xffff
value for purposes of building our uint16_t range array.

Note: It'd be better to never introduce 0x10ffff in the first place,
but given the irregexp pipeline's lack of hackability I hesitate to
change this - we are sure to rely on it implicitly in other spots.

Drive-by: Refactors.

Fixed: chromium:1264508
Bug: v8:11069
Change-Id: Ib3c5780e91f682f1a6d15f26eb4cf03636d93c25
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3256549
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77673}
2021-11-03 09:23:00 +00:00
Jakob Gruber
aa5c6889a9 [regexp] Fix an invalid DCHECK
s/LT/LE/.

Fixed: chromium:1263912
Bug: v8:11069
Change-Id: I0e3378dc62e4912332deeefcfce00f23a2ec63d8
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3247192
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77574}
2021-10-27 13:03:08 +00:00
Jakob Gruber
c1e32791a3 [regexp] Allow empty ranges in GetQuickCheckDetails
A follow-up to crrev.com/c/3240782.

Drive-by: extend JSRegExp printing.

Fixed: chromium:1263327
Bug: v8:11069
Change-Id: Iff64ded27ca93641f0f572df2ce0a9f846948f7f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3245110
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77536}
2021-10-26 08:14:40 +00:00
Jakob Gruber
b7dc9915ff [regexp] Only emit valid ranges in MakeRangeArray
Character class handling in the irregexp pipeline is quite complex;
codepoints outside the BMP (basic multilingual plane) are only
translated into surrogate pairs when needed, e.g. when the subject
string is two-byte. If not needed, the codepoints simply stay part of
the list of CharacterRanges.

In EmitCharClass, we determine the valid subset of ranges through
ranges_length; until this CL, we forgot to pass that information on to
MakeRangeArray. Do that now by truncating the list of CharacterRanges.

Fixed: chromium:1262423
Change-Id: I5bb5b839e9935890ca2d10908ad66d72c3217178
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3240782
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77514}
2021-10-25 09:32:49 +00:00
Lu Yahan
6b2809df52 [riscv64][regexp] Compact codegen for large character classes
Port 8bbb44e537
Port 7c08633bf6

Change-Id: Iebc3e223a0a7bc5f31ef0f21d8589e60ccdc0833
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3233695
Auto-Submit: Yahan Lu <yahan@iscas.ac.cn>
Commit-Queue: ji qiu <qiuji@iscas.ac.cn>
Reviewed-by: ji qiu <qiuji@iscas.ac.cn>
Cr-Commit-Position: refs/heads/main@{#77485}
2021-10-21 01:58:51 +00:00
Milad Fa
841d33a591 PPC/s390: [regexp] Compact codegen for large character classes
Port 8bbb44e537

Original Commit Message:

    Large character classes may easily be created when unicode
    properties (e.g.: /\p{L}/u and /\P{L}/u) are used - these are
    expanded internally into character classes that consist of hundreds
    of character ranges. Previously to this CL, we'd emit branching code
    for each of these ranges, leading to very large regexp code objects.

    This CL adds a new codegen mode for large character classes (where
    'large' currently means > 16 ranges). Instead of emitting branching
    code inline, the ranges are written into a ByteArray and we call into
    the C function IsCharacterInRangeArray for the actual branching logic.
    The ByteArray is smaller than emitted code and is deduplicated if the
    same character class is matched repeatedly in the same pattern.

    Note this mode is *not* implemented for the interpreter, since we
    currently don't have a constant pool for irregexp bytecode, and thus
    cannot reference ByteArrays.

R=jgruber@chromium.org, joransiu@ca.ibm.com, junyan@redhat.com, midawson@redhat.com
BUG=
LOG=N

Change-Id: I2ded01fa2767e56e72be81b949eefb5fb85b7013
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3231981
Reviewed-by: Junliang Yan <junyan@redhat.com>
Commit-Queue: Milad Fa <mfarazma@redhat.com>
Cr-Commit-Position: refs/heads/main@{#77473}
2021-10-20 13:33:50 +00:00
Zhao Jiazhong
58559fb7c1 [loong64][mips][regexp] Compact codegen for large character classes
Port commit 8bbb44e537

Bug: v8:11069
Change-Id: I66532e8410390bc220d7811e320bb44181b00d1f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3234303
Reviewed-by: Liu yu <liuyu@loongson.cn>
Commit-Queue: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Cr-Commit-Position: refs/heads/main@{#77468}
2021-10-20 10:53:40 +00:00
Jakob Gruber
8bbb44e537 [regexp] Compact codegen for large character classes
Large character classes may easily be created when unicode
properties (e.g.: /\p{L}/u and /\P{L}/u) are used - these are
expanded internally into character classes that consist of hundreds
of character ranges. Previously to this CL, we'd emit branching code
for each of these ranges, leading to very large regexp code objects.

This CL adds a new codegen mode for large character classes (where
'large' currently means > 16 ranges). Instead of emitting branching
code inline, the ranges are written into a ByteArray and we call into
the C function IsCharacterInRangeArray for the actual branching logic.
The ByteArray is smaller than emitted code and is deduplicated if the
same character class is matched repeatedly in the same pattern.

Note this mode is *not* implemented for the interpreter, since we
currently don't have a constant pool for irregexp bytecode, and thus
cannot reference ByteArrays.

Bug: v8:11069
Change-Id: I2d728e42d85114b796c637f791848731a104cd54
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3229377
Reviewed-by: Patrick Thier <pthier@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77463}
2021-10-19 18:20:54 +00:00
Jakob Gruber
a2b9710fd8 [regexp] More cleanups
- Anonymous namespaces instead of static functions.
- Comments.
- Reserve enough space in the range ZoneList.

Change-Id: Ie79fda770974796cd590a155dc5fd504472e5bc9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3220341
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77391}
2021-10-14 07:47:21 +00:00
Liu Yu
21fbf41695 [loong64][mips][regexp][cleanup] Use 'override' instead of 'virtual'
Port commit 7c08633bf6

Bug: v8:12244
Change-Id: Ib6ccca9e8e3e79ec7ba7b6c522f3aa1989ab50ec
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3219706
Auto-Submit: Liu yu <liuyu@loongson.cn>
Reviewed-by: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Commit-Queue: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Cr-Commit-Position: refs/heads/main@{#77372}
2021-10-13 03:06:51 +00:00
Milad Fa
5638b7db6a PPC/s390: [regexp] Add dedicated enums for standard character sets
Port b4aa41d0fc

Original Commit Message:

    .. instead of referring to them through magic chars {s,S,w,W,d,D,n,.,*}.

R=jgruber@chromium.org, joransiu@ca.ibm.com, junyan@redhat.com, midawson@redhat.com
BUG=
LOG=N

Change-Id: Id1543bee0fe676876d1d7c7e49d3f4742c9959d9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3216038
Reviewed-by: Junliang Yan <junyan@redhat.com>
Commit-Queue: Milad Fa <mfarazma@redhat.com>
Cr-Commit-Position: refs/heads/main@{#77365}
2021-10-12 17:00:35 +00:00
Lu Yahan
3d5e30cfe1 [riscv64][regexp] Add dedicated enums for standard character sets
Port b4aa41d0fc

Change-Id: Ie60c57d432879da89ac30179b5a462b6f93b220b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3218718
Commit-Queue: ji qiu <qiuji@iscas.ac.cn>
Reviewed-by: ji qiu <qiuji@iscas.ac.cn>
Cr-Commit-Position: refs/heads/main@{#77362}
2021-10-12 16:01:34 +00:00
Milad Fa
d02005f463 PPC/s390: [regexp] Various refactors
Port 12ecb4f567

Original Commit Message:

    No functional changes.

    - Removed unused Isolate* argument from regexp extrefs.
    - Added const where possible.
    - Removed unused functions.
    - Shuffled declarations for better readability.
    - ...

R=jgruber@chromium.org, joransiu@ca.ibm.com, junyan@redhat.com, midawson@redhat.com
BUG=
LOG=N

Change-Id: I58f21f9f75a7c7bb592b7b07dedd9c32ae8a270c
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3216034
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Milad Fa <mfarazma@redhat.com>
Cr-Commit-Position: refs/heads/main@{#77359}
2021-10-12 15:23:34 +00:00
Liu Yu
728e209030 [loong64][mips][regexp] Add dedicated enums for standard character sets
Port commit b4aa41d0fc

Change-Id: I00e7b81450a1a751b536d29bc4bb4b69ad57b7c6
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3218720
Auto-Submit: Liu yu <liuyu@loongson.cn>
Reviewed-by: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Commit-Queue: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Cr-Commit-Position: refs/heads/main@{#77351}
2021-10-12 13:32:34 +00:00
Liu Yu
6b00c94c3c [loong64][mips][regexp] Various refactors
Port commit 12ecb4f567

Change-Id: I7dab9491ad1216515f0a45f026419a55c7cda86a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3218719
Reviewed-by: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Commit-Queue: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Auto-Submit: Liu yu <liuyu@loongson.cn>
Cr-Commit-Position: refs/heads/main@{#77343}
2021-10-12 11:57:57 +00:00
Jakob Gruber
b4aa41d0fc [regexp] Add dedicated enums for standard character sets
.. instead of referring to them through magic chars {s,S,w,W,d,D,n,.,*}.

Change-Id: Ib50937a2a7d4229a021377586a54be3db9ed8c1d
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3217196
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77337}
2021-10-12 09:35:09 +00:00
Lu Yahan
85e287c9cb [riscv64] Port 3217193: [regexp] Various refactors
Change-Id: I2d9cb95d8b04a96f436b6f8eae1ce87d80df7f6f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3218710
Reviewed-by: ji qiu <qiuji@iscas.ac.cn>
Commit-Queue: ji qiu <qiuji@iscas.ac.cn>
Cr-Commit-Position: refs/heads/main@{#77331}
2021-10-12 07:49:57 +00:00
Jakob Gruber
12ecb4f567 [regexp] Various refactors
No functional changes.

- Removed unused Isolate* argument from regexp extrefs.
- Added const where possible.
- Removed unused functions.
- Shuffled declarations for better readability.
- ...

Change-Id: I6d9093052e8de4e33e9411541a691d0bab7b20c9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3217193
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77316}
2021-10-11 13:02:43 +00:00
Milad Fa
3df8a57b8c PPC/s390: [regexp][cleanup] Use 'override' instead of 'virtual'
Port 7c08633bf6

Original Commit Message:

    Replace 'virtual' by 'override' when overriding methods.
    This uncovered one method which was unnecessarily virtual:
    {RegExpMacroAssemblerARM64::CheckCharacters}.

R=clemensb@chromium.org, joransiu@ca.ibm.com, junyan@redhat.com, midawson@redhat.com
BUG=
LOG=N

Change-Id: I542aeae836b5b78284291ed39844a5c166ed06ad
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3208811
Reviewed-by: Junliang Yan <junyan@redhat.com>
Commit-Queue: Milad Fa <mfarazma@redhat.com>
Cr-Commit-Position: refs/heads/main@{#77269}
2021-10-06 19:06:08 +00:00
Clemens Backes
d4ac75e812 [regexp][arm64] Only unuse labels on abortion
Marking the labels as unused is only needed when we abort code
generation. Otherwise the DCHECKs in the label destructors are useful to
catch bugs.

R=jgruber@chromium.org

Bug: v8:12244
Change-Id: I63198f98a7acd1f2528d31964c01bc6815ba99a9
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3205899
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77267}
2021-10-06 16:29:34 +00:00
Clemens Backes
7c08633bf6 [regexp][cleanup] Use 'override' instead of 'virtual'
Replace 'virtual' by 'override' when overriding methods.
This uncovered one method which was unnecessarily virtual:
{RegExpMacroAssemblerARM64::CheckCharacters}.

R=jgruber@chromium.org

Bug: v8:12244
Change-Id: Ia4480b7b234d3d40cc5821c38ef83f74f8421b6b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3204966
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77252}
2021-10-06 11:12:22 +00:00
Clemens Backes
bab8254c32 [regexp][arm] Fix regexp assembler abortion
When aborting code generation, we need to call {AbortedCodeGeneration}
on the {MacroAssembler} contained in the {RegExpMacroAssemblerARM}.

R=jgruber@chromium.org

Bug: chromium:1255368
Change-Id: If37351e8f5715e23affd21ad2de8a8eaad3ea094
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3204965
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77250}
2021-10-06 10:02:13 +00:00
Patrick Thier
55374d16ba [regexp] Fix ScanForCaptures when invoked inside a character class.
When scanning for capture groups, we have to consider the case that the
current state is inside a character class. In that case skip everything
until the end of the current character class. Otherwise we would wrongly
count open brackets inside the character class as start of a capture
group.

Bug: chromium:1254704
Change-Id: I91d2177c464f7e507413d96216fe570253f17676
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3199871
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77204}
2021-10-04 06:41:42 +00:00
Milad Fa
9227a8da33 PPC/s390: [regexp] Fix stack growth for global regexps
Port 3e3a027da1

Original Commit Message:

    Irregexp reentrancy (crrev.com/c/3162604) introduced a bug for global
    regexp execution in which each iteration would use a new stack region
    (i.e. we forgot to pop the regexp stack pointer when starting a new
    iteration).

    This CL fixes that by popping the stack pointer on the loop backedge.

    At a high level:

    - Initialize the backtrack_stackpointer earlier and avoid clobbering
      it by setup code.
    - Pop it on the loop backedge.
    - Slightly refactor Push/Pop operations to avoid unneeded memory
      accesses.

R=jgruber@chromium.org, joransiu@ca.ibm.com, junyan@redhat.com, midawson@redhat.com
BUG=
LOG=N

Change-Id: Iafe6814d3695e83fced6a46209accf5e712d56f6
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3198391
Reviewed-by: Junliang Yan <junyan@redhat.com>
Commit-Queue: Milad Fa <mfarazma@redhat.com>
Cr-Commit-Position: refs/heads/main@{#77180}
2021-09-30 18:40:23 +00:00
Jakob Gruber
3e3a027da1 [regexp] Fix stack growth for global regexps
Irregexp reentrancy (crrev.com/c/3162604) introduced a bug for global
regexp execution in which each iteration would use a new stack region
(i.e. we forgot to pop the regexp stack pointer when starting a new
iteration).

This CL fixes that by popping the stack pointer on the loop backedge.

At a high level:

- Initialize the backtrack_stackpointer earlier and avoid clobbering
  it by setup code.
- Pop it on the loop backedge.
- Slightly refactor Push/Pop operations to avoid unneeded memory
  accesses.

Bug: v8:11382
Change-Id: Ibad6235767e110089a2b346034f923590b286a05
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3194251
Reviewed-by: Patrick Thier <pthier@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77158}
2021-09-30 07:57:17 +00:00
Jakob Gruber
77906a700c [regexp] Hide the generic JSRegExp::DataAt/SetDataAt accessors
.. and refactor js-regexp.h.

- Hide the generic DataAt/SetDataAt accessors and replace them by
  dedicated accessors. Use the common lower_case naming scheme for
  these.
- Shuffle around definitions in js-regexp.h s.t. they are in a
  meaningful order.
- Dedupe the source/flags accessors - these fields are stored both
  on the instance and on the data array. We keep only accessors for
  the instance. Previously, these were disambiguated through naming
  oddities (e.g. Pattern() returned data->source).

Change-Id: I3d53c8b095f0d59621ff779608438f7fa5e8c92a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3193534
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Camillo Bruni <cbruni@chromium.org>
Reviewed-by: Maya Lekova <mslekova@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77138}
2021-09-29 11:37:41 +00:00
Jakob Gruber
8965d90362 Reland "[regexp] Reorganize and deduplicate in the regexp parser"
This is a reland of 7d849870ff

Original change's description:
> [regexp] Reorganize and deduplicate in the regexp parser
>
> The parser is organized in a somewhat tricky way s.t. it can be
> hard to map the implementation back to the specified grammar.
>
> In particular, the logic for CharacterClassEscape, ClassEscape,
> and CharacterEscape was implemented twice - once inside a character
> class, once outside.
>
> This CL refactors related logic to have only a single implementation.
>
> As a drive-by, fix one related inconsistency related to \k inside
> a character class.
>
> Fixed: v8:10602
> Change-Id: I5858840159694fa6f8d1aa857027db80754e3dfd
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3178966
> Reviewed-by: Mathias Bynens <mathias@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/main@{#77114}

Fixed: v8:10602,chromium:1253976
Change-Id: I9e7cc6a34d3be06e1a68895775aa50b0eee78c57
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3193531
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77135}
2021-09-29 07:33:12 +00:00
Shu-yu Guo
294a77eab5 Revert "[regexp] Reorganize and deduplicate in the regexp parser"
This reverts commit 7d849870ff.

Reason for revert: Will block roll. Broke error message tests upstream:

https://ci.chromium.org/ui/p/v8/builders/ci/V8%20Blink%20Win/6635/overview


Original change's description:
> [regexp] Reorganize and deduplicate in the regexp parser
>
> The parser is organized in a somewhat tricky way s.t. it can be
> hard to map the implementation back to the specified grammar.
>
> In particular, the logic for CharacterClassEscape, ClassEscape,
> and CharacterEscape was implemented twice - once inside a character
> class, once outside.
>
> This CL refactors related logic to have only a single implementation.
>
> As a drive-by, fix one related inconsistency related to \k inside
> a character class.
>
> Fixed: v8:10602
> Change-Id: I5858840159694fa6f8d1aa857027db80754e3dfd
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3178966
> Reviewed-by: Mathias Bynens <mathias@chromium.org>
> Commit-Queue: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/main@{#77114}

Change-Id: Ic7404d6c9f0e6ea51e8cd8f1ab672856dca0c637
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3190692
Auto-Submit: Shu-yu Guo <syg@chromium.org>
Commit-Queue: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com>
Cr-Commit-Position: refs/heads/main@{#77125}
2021-09-28 16:15:15 +00:00
Jakob Gruber
69e1a42e2b [regexp] Use ZoneVector in RegExpBytecodeGenerator
.. to avoid the expensive malloc call.

Fixed: v8:9455
Change-Id: I6734fe07a3884b228d818f60be83d9e45c2ee383
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3190105
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Patrick Thier <pthier@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77118}
2021-09-28 13:44:20 +00:00
Jakob Gruber
7d849870ff [regexp] Reorganize and deduplicate in the regexp parser
The parser is organized in a somewhat tricky way s.t. it can be
hard to map the implementation back to the specified grammar.

In particular, the logic for CharacterClassEscape, ClassEscape,
and CharacterEscape was implemented twice - once inside a character
class, once outside.

This CL refactors related logic to have only a single implementation.

As a drive-by, fix one related inconsistency related to \k inside
a character class.

Fixed: v8:10602
Change-Id: I5858840159694fa6f8d1aa857027db80754e3dfd
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3178966
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77114}
2021-09-28 12:07:35 +00:00
Lu Yahan
3991395843 [riscv64] Fix build error
Port e301d71ff5
 [compiler] Teach InstructionScheduler about protected memory accesses

Port a0ace8a8a5
 [wasm] Interpret table.grow result as 32 bit

Port [regexp] Fix UAF in RegExpMacroAssembler

Change-Id: Ieac5e4deae9c6bbf844788d927f5201b906495f6
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3189213
Commit-Queue: Ji Qiu <qiuji@iscas.ac.cn>
Reviewed-by: Ji Qiu <qiuji@iscas.ac.cn>
Cr-Commit-Position: refs/heads/main@{#77108}
2021-09-28 07:01:56 +00:00
Lu Yahan
64b96fb8df [riscv64] [regexp]: Allow reentrant irregexp execution
Port 3162604 3173681
Bug: v8:11382

Change-Id: Iea5910dfe1f091cb0d202f1abe894562f5c6c63f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3184561
Reviewed-by: Ji Qiu <qiuji@iscas.ac.cn>
Commit-Queue: Ji Qiu <qiuji@iscas.ac.cn>
Cr-Commit-Position: refs/heads/main@{#77105}
2021-09-28 00:24:24 +00:00
Zhao Jiazhong
f5e48df1f2 [mips][loong64][regexp] Fix regexp test failures
Port commit bba7c09aad
  [regexp] Allow reentrant irregexp execution

Port commit 4bbfc4b7a6
  [regexp] Remove the `stack` parameter from regexp matchers

Port commit c1700c56ad
  [regexp] Fix UAF in RegExpMacroAssembler

Bug: v8:11382
Change-Id: Ie2e95d7b19ecbd740e8d8a4130c725416abc114a
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3185562
Reviewed-by: Liu yu <liuyu@loongson.cn>
Commit-Queue: Zhao Jiazhong <zhaojiazhong-hf@loongson.cn>
Cr-Commit-Position: refs/heads/main@{#77090}
2021-09-27 13:49:08 +00:00
Jakob Gruber
c1700c56ad [regexp] Fix UAF in RegExpMacroAssembler
.. by turning `masm_` into a unique_ptr s.t. it's freed after the
NoRootArrayScope which references it.

Fixed: chromium:1252620
Change-Id: I24580c5a96d76a973b2b083e7a76b95f93bb6068
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3185459
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Patrick Thier <pthier@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Patrick Thier <pthier@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77082}
2021-09-27 09:16:58 +00:00