Since https://codereview.chromium.org/2777583003, the Boyer-Moore
lookahead (used by the irregexp engine) also looks inside submatches
to narrow down its range of accepted characters at specific offsets.
But the end of a submatch, designated by a PositiveSubmatchSuccess
action node, was not handled correctly. When a submatch terminates,
we have no knowledge of what may follow, and thus must accept any
character at following positions. This is done by the SetRest call
added in this CL.
An example, since this is fairly obscure:
/^.*?Y(((?=B?).)*)Y$/s
The initial non-greedy loop, together with the s flag,
will trigger an attempted Boyer-Moore lookahead. After this follows
an unconditional Y, a *-quantified loop matching any char and
containing a lookahead that matches either 1 B or 0 B's, and an
unconditional trailing Y.
When the BM lookahead scans the subject string for the beginning of
this pattern after the non-greedy loop, it should look for: a Y at
offset 0, and either a B, a Y, or '.' (-> any character) at offset 1.
Prior to this CL this was not the case:
- The lookaround is internally generated as a submatch.
- The optional 'B?' is unrolled into 'either B followed by submatch
end' or 'submatch end'.
- Filling in BM infos terminates when encountering a submatch end.
Thus in the former case we added B to the set of accepted characters
and terminated, while in the latter case we simply terminated.o
This CL ensures that BM will accept any character at any offset at or
exceeding the first encountered submatch end.
Bug: v8:8770
Change-Id: Iff998ba307cd9669203846a9182798b8cf6a85dc
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1679506
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Erik Corry <erikcorry@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Cr-Commit-Position: refs/heads/master@{#62460}