This is a reland of 4b15b984ad
Updates since original: fix an arithmetic overflow bug, remove an invalid
DCHECK, add a unit test that would trigger that DCHECK.
Original change's description:
> [regexp] Better quick checks on loop entry nodes
>
> Like the predecessor change https://crrev.com/c/v8/v8/+/1702125 , this
> change is inspired by attempting to exit earlier from generated RegExp
> code, when no further matches are possible because any match would be
> too long. The motivating example this time is the following expression,
> which tests whether a string of Unicode playing cards has five of the
> same suit in a row:
>
> /([🂡-🂮]{5})|([🂱-🂾]{5})|([🃁-🃎]{5})|([🃑-🃞]{5})/u
>
> A human reading this expression can readily see that any match requires
> at least 10 characters (5 surrogate pairs), but the LoopChoiceNode for
> each repeated option reports its minimum distance to the end of a match
> as zero. This is correct, because the LoopChoiceNode's behavior depends
> on additional state (the loop counter). However, the preceding node, a
> SET_REGISTER action that initializes the loop counter, could confidently
> state that it consumes at least 10 characters. Furthermore, when we try
> to emit a quick check for that action, we could follow only paths from
> the LoopChoiceNode that are possible based on the minimum iteration
> count. This change implements both of those "could"s.
>
> I expect this improvement to apply pretty broadly to expressions that
> use minimum repetition counts and that don't meet the criteria for
> unrolling. In this particular case, I get about 12% improvement on the
> overall UniPoker test, due to reducing the execution time of this
> expression by 85% and the execution time of another similar expression
> that checks for n-of-a-kind by 20%.
>
> Bug: v8:9305
>
> Change-Id: I319e381743967bdf83324be75bae943fbb5dd496
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1704941
> Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#62963}
Bug: v8:9305
Change-Id: I992070d383009013881bf778242254c27134b650
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1726674
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
Cr-Commit-Position: refs/heads/master@{#63009}
This reverts commit 4b15b984ad.
Reason for revert: UBSan failure (https://logs.chromium.org/logs/v8/buildbucket/cr-buildbucket.appspot.com/8906578530303352544/+/steps/Check/0/logs/regress-126412/0).
Original change's description:
> [regexp] Better quick checks on loop entry nodes
>
> Like the predecessor change https://crrev.com/c/v8/v8/+/1702125 , this
> change is inspired by attempting to exit earlier from generated RegExp
> code, when no further matches are possible because any match would be
> too long. The motivating example this time is the following expression,
> which tests whether a string of Unicode playing cards has five of the
> same suit in a row:
>
> /([🂡-🂮]{5})|([🂱-🂾]{5})|([🃁-🃎]{5})|([🃑-🃞]{5})/u
>
> A human reading this expression can readily see that any match requires
> at least 10 characters (5 surrogate pairs), but the LoopChoiceNode for
> each repeated option reports its minimum distance to the end of a match
> as zero. This is correct, because the LoopChoiceNode's behavior depends
> on additional state (the loop counter). However, the preceding node, a
> SET_REGISTER action that initializes the loop counter, could confidently
> state that it consumes at least 10 characters. Furthermore, when we try
> to emit a quick check for that action, we could follow only paths from
> the LoopChoiceNode that are possible based on the minimum iteration
> count. This change implements both of those "could"s.
>
> I expect this improvement to apply pretty broadly to expressions that
> use minimum repetition counts and that don't meet the criteria for
> unrolling. In this particular case, I get about 12% improvement on the
> overall UniPoker test, due to reducing the execution time of this
> expression by 85% and the execution time of another similar expression
> that checks for n-of-a-kind by 20%.
>
> Bug: v8:9305
>
> Change-Id: I319e381743967bdf83324be75bae943fbb5dd496
> Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1704941
> Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
> Reviewed-by: Jakob Gruber <jgruber@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#62963}
TBR=jgruber@chromium.org,seth.brenith@microsoft.com
Change-Id: Iac085b75e054fdf0d218987cfe449be1f1630545
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: v8:9305
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1725621
Reviewed-by: Leszek Swirski <leszeks@chromium.org>
Commit-Queue: Leszek Swirski <leszeks@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62977}
Like the predecessor change https://crrev.com/c/v8/v8/+/1702125 , this
change is inspired by attempting to exit earlier from generated RegExp
code, when no further matches are possible because any match would be
too long. The motivating example this time is the following expression,
which tests whether a string of Unicode playing cards has five of the
same suit in a row:
/([🂡-🂮]{5})|([🂱-🂾]{5})|([🃁-🃎]{5})|([🃑-🃞]{5})/u
A human reading this expression can readily see that any match requires
at least 10 characters (5 surrogate pairs), but the LoopChoiceNode for
each repeated option reports its minimum distance to the end of a match
as zero. This is correct, because the LoopChoiceNode's behavior depends
on additional state (the loop counter). However, the preceding node, a
SET_REGISTER action that initializes the loop counter, could confidently
state that it consumes at least 10 characters. Furthermore, when we try
to emit a quick check for that action, we could follow only paths from
the LoopChoiceNode that are possible based on the minimum iteration
count. This change implements both of those "could"s.
I expect this improvement to apply pretty broadly to expressions that
use minimum repetition counts and that don't meet the criteria for
unrolling. In this particular case, I get about 12% improvement on the
overall UniPoker test, due to reducing the execution time of this
expression by 85% and the execution time of another similar expression
that checks for n-of-a-kind by 20%.
Bug: v8:9305
Change-Id: I319e381743967bdf83324be75bae943fbb5dd496
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1704941
Commit-Queue: Seth Brenith <seth.brenith@microsoft.com>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62963}
Until now lookbehind assertions have been quantifiable in non-unicode regexps.
This seems to be an oversight in the spec.
R=jgruber@chromium.org
Bug: v8:7462
Change-Id: Iad0db441089c7510dd2c42a861db92c05545ce1e
Reviewed-on: https://chromium-review.googlesource.com/926102
Commit-Queue: Yang Guo <yangguo@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#51392}
Both of these features were shipped in Chrome 62.
Bug: v8:4545, v8:6172
Cq-Include-Trybots: master.tryserver.v8:v8_linux_noi18n_rel_ng
Change-Id: Ie00dcbeded7517a15696d4a78fcfbbf162919923
Reviewed-on: https://chromium-review.googlesource.com/775601
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Commit-Queue: Adam Klein <adamk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49453}