[regexp] Only emit valid ranges in MakeRangeArray

Character class handling in the irregexp pipeline is quite complex;
codepoints outside the BMP (basic multilingual plane) are only
translated into surrogate pairs when needed, e.g. when the subject
string is two-byte. If not needed, the codepoints simply stay part of
the list of CharacterRanges.

In EmitCharClass, we determine the valid subset of ranges through
ranges_length; until this CL, we forgot to pass that information on to
MakeRangeArray. Do that now by truncating the list of CharacterRanges.

Fixed: chromium:1262423
Change-Id: I5bb5b839e9935890ca2d10908ad66d72c3217178
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3240782
Commit-Queue: Jakob Gruber <jgruber@chromium.org>
Auto-Submit: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/main@{#77514}
This commit is contained in:
Jakob Gruber 2021-10-25 10:52:20 +02:00 committed by V8 LUCI CQ
parent e26aced708
commit b7dc9915ff
3 changed files with 11 additions and 2 deletions

View File

@ -1234,6 +1234,8 @@ void EmitCharClass(RegExpMacroAssembler* macro_assembler,
ranges_length--;
}
ranges->Rewind(ranges_length); // Drop all uninteresting ranges.
if (ranges_length == 0) {
if (!cc->is_negated()) {
macro_assembler->GoTo(on_failure);

View File

@ -144,12 +144,12 @@ Handle<ByteArray> MakeRangeArray(Isolate* isolate,
isolate->factory()->NewByteArray(size_in_bytes);
for (int i = 0; i < ranges_length; i++) {
const CharacterRange& r = ranges->at(i);
DCHECK_NE(r.from(), kMaxUInt16);
DCHECK_LT(r.from(), kMaxUInt16);
range_array->set_uint16(i * 2 + 0, r.from());
if (i == ranges_length - 1 && r.to() == kMaxUInt16) {
break; // Avoid overflow by leaving the last range open-ended.
}
DCHECK_NE(r.to(), kMaxUInt16);
DCHECK_LT(r.to(), kMaxUInt16);
range_array->set_uint16(i * 2 + 1, r.to() + 1); // Exclusive.
}
return range_array;

View File

@ -0,0 +1,7 @@
// Copyright 2021 the V8 project authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
//
// Flags: --no-regexp-tier-up
assertNull(/[B\p{S}\p{C}]/iu.exec(""));