v8/test/cctest
Justin Ridgewell cedec225c9 Implement DFA Unicode Decoder
This is a separation of the DFA Unicode Decoder from
https://chromium-review.googlesource.com/c/v8/v8/+/789560

I attempted to make the DFA's table a bit more explicit in this CL. Still, the
linter prevents me from letting me present the array as a "table" in source
code. For a better representation, please refer to
https://docs.google.com/spreadsheets/d/1L9STtkmWs-A7HdK5ZmZ-wPZ_VBjQ3-Jj_xN9c6_hLKA

- - - - -

Now for a big copy-paste from 789560:

Essentially, reworks a standard FSM (imagine an
array of structs) and flattens it out into a single-dimension array.
Using Table 3-7 of the Unicode 10.0.0 standard (page 126 of
http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf), we can nicely
map all bytes into one of 12 character classes:

00. 0x00-0x7F
01. 0x80-0x8F (split from general continuation because this range is not
    valid after a 0xF0 leading byte)
02. 0x90-0x9F (split from general continuation because this range is not
    valid after a 0xE0 nor a 0xF4 leading byte)
03. 0xA0-0xBF (the rest of the continuation range)
04. 0xC0-0xC1, 0xF5-0xFF (the joined range of invalid bytes, notice this
    includes 255 which we use as a known bad byte during hex-to-int
        decoding)
05. 0xC2-0xDF (leading bytes which require any continuation byte
    afterwards)
06. 0xE0 (leading byte which requires a 0xA0-0xBF afterwards then any
    continuation byte after that)
07. 0xE1-0xEC, 0xEE-0xEF (leading bytes which requires any continuation
    afterwards then any continuation byte after that)
08. 0xED (leading byte which requires a 0x80-0x9F afterwards then any
    continuation byte after that)
09. 0xF1-F3 (leading bytes which requires any continuation byte
    afterwards then any continuation byte then any continuation byte)
10. 0xF0 (leading bytes which requires a 0x90-0xBF afterwards then any
    continuation byte then any continuation byte)
11. 0xF4 (leading bytes which requires a 0x80-0x8F afterwards then any
    continuation byte then any continuation byte)

Note that 0xF0 and 0xF1-0xF3 were swapped so that fewer bytes were
needed to represent the transition state ("9, 10, 10, 10" vs.
"10, 9, 9, 9").

Using these 12 classes as "transitions", we can map from one state to
the next. Each state is defined as some multiple of 12, so that we're
always starting at the 0th column of each row of the FSM. From each
state, we add the transition and get a index of the new row the FSM is
entering.

If at any point we encounter a bad byte, the state + bad-byte-transition
is guaranteed to map us into the first row of the FSM (which contains no
valid exiting transitions).

The key differences from Björn's original (or his self-modified) DFA is
the "bad" state is now mapped to 0 (or the first row of the FSM) instead
of 12 (the second row). This saves ~50 bytes when gzipping, and also
speeds up determining if a string is properly encoded (see his sample
code at http://bjoern.hoehrmann.de/utf-8/decoder/dfa/#performance).

Finally, I've replace his ternary check with an array access, to make
the algorithm branchless. This places a requirement on the caller to 0
out the code point between successful decodings, which it could always
have done because it's already branching.

R=marja@google.com

Bug: 
Change-Id: I574f208a84dc5d06caba17127b0d41f7ce1a3395
Reviewed-on: https://chromium-review.googlesource.com/805357
Commit-Queue: Justin Ridgewell <jridgewell@google.com>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#50012}
2017-12-11 21:36:13 +00:00
..
compiler Revert "[turbofan] Implement on-stack returns (Intel)" 2017-12-11 16:54:57 +00:00
heap [heap] Correctly restore platform in IncrementalMarkingUsingTasks test. 2017-12-08 10:39:12 +00:00
interpreter Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
libplatform [cleanup] use unique_ptr for the DefaultPlatform 2017-11-14 09:57:18 +00:00
libsampler
parsing [parser] Remove use counter for U+2028 & U+2029 2017-12-11 20:32:39 +00:00
wasm [ia32][wasm] Enable more SIMD tests on IA32 2017-12-11 02:28:06 +00:00
assembler-helper-arm.cc Revert "Revert "[cctest] Clarify that tests for sync instructions are simulator specific"" 2017-11-02 13:11:45 +00:00
assembler-helper-arm.h Revert "Revert "[cctest] Clarify that tests for sync instructions are simulator specific"" 2017-11-02 13:11:45 +00:00
BUILD.gn [heap] Increase test coverage for embedder tracing 2017-12-07 14:11:51 +00:00
cctest_exe.isolate
cctest.cc [wasm] First step of refactoring trap handling to be per module. 2017-12-07 01:00:55 +00:00
cctest.gyp [heap] Increase test coverage for embedder tracing 2017-12-07 14:11:51 +00:00
cctest.h [test] Switch cctest to be W^X compliant as well. 2017-12-07 13:04:23 +00:00
cctest.isolate
cctest.status Revert "[turbofan] Implement on-stack returns (Intel)" 2017-12-11 16:54:57 +00:00
DEPS
expression-type-collector-macros.h [cleanup] Fix remaining (D)CHECK macro usages 2017-10-18 10:12:31 +00:00
gay-fixed.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
gay-fixed.h
gay-precision.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
gay-precision.h
gay-shortest.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
gay-shortest.h
log-eq-of-logging-and-traversal.js Reland "[logging] Use OFStream for log events" 2017-10-20 22:47:01 +00:00
OWNERS MIPS: Update OWNERS 2017-11-10 14:33:48 +00:00
print-extension.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
print-extension.h
profiler-extension.cc [cpu-profiler] Deprecate Isolate::GetCpuProfiler and CpuProfiler::CollectSample functions. 2017-11-21 00:56:56 +00:00
profiler-extension.h
scope-test-helper.h [reland] [parser] Skipping inner funcs: Use less memory for variables. 2017-10-25 08:49:37 +00:00
setup-isolate-for-tests.cc [heap] remove heap init from shipping binary. 2017-09-07 05:24:49 +00:00
setup-isolate-for-tests.h [heap] remove heap init from shipping binary. 2017-09-07 05:24:49 +00:00
test-access-checks.cc
test-accessor-assembler.cc Remove ComputeFlags, simply pass in Code::Kind instead of Code::Flags 2017-09-29 15:37:27 +00:00
test-accessors.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
test-allocation.cc [Memory] Clean up base OS memory abstractions. 2017-11-03 18:49:55 +00:00
test-api-accessors.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
test-api-interceptors.cc [api] Mark SetNamedPropertyHandler as soon to be deprecated 2017-12-04 11:06:50 +00:00
test-api.cc [heap] Provide the number of native and detached context via Heap API. 2017-12-11 18:14:31 +00:00
test-api.h Start preparing test/cctest for jumbo compilation 2017-08-14 20:58:10 +00:00
test-array-list.cc [iwyu] Remove stale TODOs about objects-inl.h inclusion. 2017-10-09 11:14:59 +00:00
test-assembler-arm64.cc [test] Switch cctest to be W^X compliant as well. 2017-12-07 13:04:23 +00:00
test-assembler-arm.cc [simulator] De-dupe {CALL_GENERATED_CODE} macro definition. 2017-12-11 17:07:44 +00:00
test-assembler-ia32.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-assembler-mips64.cc [simulator] De-dupe {CALL_GENERATED_CODE} macro definition. 2017-12-11 17:07:44 +00:00
test-assembler-mips.cc [simulator] De-dupe {CALL_GENERATED_CODE} macro definition. 2017-12-11 17:07:44 +00:00
test-assembler-ppc.cc [simulator] De-dupe {CALL_GENERATED_CODE} macro definition. 2017-12-11 17:07:44 +00:00
test-assembler-s390.cc [simulator] De-dupe {CALL_GENERATED_CODE} macro definition. 2017-12-11 17:07:44 +00:00
test-assembler-x64.cc [test] Switch cctest to be W^X compliant as well. 2017-12-07 13:04:23 +00:00
test-atomicops.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-bignum-dtoa.cc [jumbo] add test namespaces for cctest 2017-09-21 08:46:16 +00:00
test-bignum.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-bit-vector.cc Start preparing test/cctest for jumbo compilation 2017-08-14 20:58:10 +00:00
test-circular-queue.cc
test-code-layout.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-code-stub-assembler.cc [runtime] Cleanup Map fields and bit fields definitions. 2017-12-07 11:55:56 +00:00
test-code-stubs-arm64.cc [test] Switch cctest to be W^X compliant as well. 2017-12-07 13:04:23 +00:00
test-code-stubs-arm.cc [test] Switch cctest to be W^X compliant as well. 2017-12-07 13:04:23 +00:00
test-code-stubs-ia32.cc [test] Switch cctest to be W^X compliant as well. 2017-12-07 13:04:23 +00:00
test-code-stubs-mips64.cc [test] Switch cctest to be W^X compliant as well. 2017-12-07 13:04:23 +00:00
test-code-stubs-mips.cc [test] Switch cctest to be W^X compliant as well. 2017-12-07 13:04:23 +00:00
test-code-stubs-x64.cc [test] Switch cctest to be W^X compliant as well. 2017-12-07 13:04:23 +00:00
test-code-stubs.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-code-stubs.h [cctest] Avoid disallowed "using namespace" directive. 2017-09-01 08:28:36 +00:00
test-compiler.cc [compiler] Split compilation timer on caching decision 2017-11-01 17:10:45 +00:00
test-constantpool.cc Start preparing test/cctest for jumbo compilation 2017-08-14 20:58:10 +00:00
test-conversions.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-cpu-profiler.cc [cpu-profiler] Clear code entries when no observers are present. 2017-12-06 22:58:05 +00:00
test-date.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
test-debug.cc Enable clang's -Wunreachable-code warning. 2017-12-04 13:09:25 +00:00
test-decls.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
test-deoptimization.cc Pass Isolate pointer to String::Utf8Value/Value constructors 2017-08-28 18:17:08 +00:00
test-dictionary.cc [runtime] Make GetHash and GetOrCreateHash member functions 2017-08-22 00:35:31 +00:00
test-disasm-arm64.cc [arm64] Enforce restriction on stlxr instructions 2017-11-29 13:19:28 +00:00
test-disasm-arm.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-disasm-ia32.cc [ia32][wasm] Enable more SIMD tests on IA32 2017-12-11 02:28:06 +00:00
test-disasm-mips64.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-disasm-mips.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-disasm-ppc.cc [cctest] Avoid disallowed "using namespace" directive. 2017-09-01 08:28:36 +00:00
test-disasm-s390.cc [cctest] Avoid disallowed "using namespace" directive. 2017-09-01 08:28:36 +00:00
test-disasm-x64.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
test-diy-fp.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-double.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-dtoa.cc [jumbo] add test namespaces for cctest 2017-09-21 08:46:16 +00:00
test-elements-kind.cc [factory] Simplify JSFunction creation 2017-11-08 13:52:13 +00:00
test-experimental-extra.js
test-extra.js Add isPromise V8 extras util 2017-04-06 13:16:35 +00:00
test-fast-dtoa.cc [jumbo] add test namespaces for cctest 2017-09-21 08:46:16 +00:00
test-feedback-vector.cc Add speculation mode to Call node 2017-12-08 14:51:10 +00:00
test-feedback-vector.h [objects] Make feedback vector a first-class object 2017-07-27 13:31:55 +00:00
test-field-type-tracking.cc [cleanup] Replace V8_UINT64_C macro by proper C++11 syntax 2017-12-01 13:13:37 +00:00
test-fixed-dtoa.cc Start preparing test/cctest for jumbo compilation 2017-08-14 20:58:10 +00:00
test-flags.cc Start preparing test/cctest for jumbo compilation 2017-08-14 20:58:10 +00:00
test-func-name-inference.cc [parsing] Make FuncNameInferrer handle extends clauses properly 2017-12-11 18:39:52 +00:00
test-fuzz-arm64.cc [cctest] Avoid disallowed "using namespace" directive. 2017-09-01 08:28:36 +00:00
test-global-handles.cc Global handles: More test coverage 2017-11-20 14:48:04 +00:00
test-global-object.cc Pass Isolate pointer to String::Utf8Value/Value constructors 2017-08-28 18:17:08 +00:00
test-hashcode.cc [runtime] Make GetHash and GetOrCreateHash member functions 2017-08-22 00:35:31 +00:00
test-hashmap.cc [cleanup] Fix remaining (D)CHECK macro usages 2017-10-18 10:12:31 +00:00
test-heap-profiler.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-identity-map.cc [iwyu] Remove stale TODOs about objects-inl.h inclusion. 2017-10-09 11:14:59 +00:00
test-inobject-slack-tracking.cc [runtime] Stop using Map::unused_property_fields() byte. 2017-11-21 14:07:04 +00:00
test-intl.cc [intl] Implement Intl.NumberFormat.prototype.formatToParts 2017-06-30 20:14:18 +00:00
test-javascript-arm64.cc [jumbo] arm64 cctest fixes 2017-09-30 17:17:23 +00:00
test-js-arm64-variables.cc [jumbo] arm64 cctest fixes 2017-09-30 17:17:23 +00:00
test-liveedit.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
test-lockers.cc [jumbo] add test namespaces for cctest 2017-09-21 08:46:16 +00:00
test-log-stack-tracer.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-log.cc [log] Support first function execution logging with --log-function-events 2017-11-30 16:38:59 +00:00
test-macro-assembler-arm.cc [simulator] De-dupe {CALL_GENERATED_CODE} macro definition. 2017-12-11 17:07:44 +00:00
test-macro-assembler-mips64.cc [simulator] De-dupe {CALL_GENERATED_CODE} macro definition. 2017-12-11 17:07:44 +00:00
test-macro-assembler-mips.cc [simulator] De-dupe {CALL_GENERATED_CODE} macro definition. 2017-12-11 17:07:44 +00:00
test-macro-assembler-x64.cc [test] Switch cctest to be W^X compliant as well. 2017-12-07 13:04:23 +00:00
test-managed.cc Refactor Managed construction 2017-09-04 11:37:42 +00:00
test-mementos.cc [iwyu] Remove stale TODOs about objects-inl.h inclusion. 2017-10-09 11:14:59 +00:00
test-modules.cc [api] Expose a module's status and exception. 2017-06-28 15:54:57 +00:00
test-object.cc [bigint] Encapsulate internals in MutableBigInt 2017-11-17 23:06:52 +00:00
test-orderedhashtable.cc [jumbo] add test namespaces for cctest 2017-09-21 08:46:16 +00:00
test-parsing.cc [parser] Remove use counter for U+2028 & U+2029 2017-12-11 20:32:39 +00:00
test-platform.cc [Memory] Rewrite platform OS Commit / Uncommit in terms of permissions. 2017-11-21 16:48:55 +00:00
test-profile-generator.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
test-random-number-generator.cc [cleanup] Fix remaining (D)CHECK macro usages 2017-10-18 10:12:31 +00:00
test-regexp.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-representation.cc Start preparing test/cctest for jumbo compilation 2017-08-14 20:58:10 +00:00
test-run-wasm-relocation-arm64.cc [wasm] Annotate some more {Code} mutation sites. 2017-11-07 11:51:50 +00:00
test-run-wasm-relocation-arm.cc [wasm] Annotate some more {Code} mutation sites. 2017-11-07 11:51:50 +00:00
test-run-wasm-relocation-ia32.cc [wasm] Annotate some more {Code} mutation sites. 2017-11-07 11:51:50 +00:00
test-run-wasm-relocation-x64.cc [wasm] Annotate some more {Code} mutation sites. 2017-11-07 11:51:50 +00:00
test-sampler-api.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
test-serialize.cc Reland "Add support to produce code cache after execute" 2017-12-01 14:02:47 +00:00
test-strings.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-strtod.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-symbols.cc [iwyu] Remove stale TODOs about objects-inl.h inclusion. 2017-10-09 11:14:59 +00:00
test-sync-primitives-arm64.cc Revert "Revert "[cctest] Clarify that tests for sync instructions are simulator specific"" 2017-11-02 13:11:45 +00:00
test-sync-primitives-arm.cc [simulator] De-dupe {CALL_GENERATED_CODE} macro definition. 2017-12-11 17:07:44 +00:00
test-thread-termination.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
test-threads.cc Use nullptr instead of NULL where possible 2017-10-13 17:21:49 +00:00
test-trace-event.cc [cleanup] Replace List with std::vector in cctests and d8. 2017-08-29 13:29:26 +00:00
test-traced-value.cc Avoid octal escape sequences 2017-12-01 15:08:14 +00:00
test-transitions.cc [iwyu] Remove stale TODOs about objects-inl.h inclusion. 2017-10-09 11:14:59 +00:00
test-transitions.h Refactor TransitionArray access 2017-07-28 19:41:21 +00:00
test-typedarrays.cc Start preparing test/cctest for jumbo compilation 2017-08-14 20:58:10 +00:00
test-types.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-unbound-queue.cc
test-unboxed-doubles.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-unscopables-hidden-prototype.cc
test-usecounters.cc Remove always-on flag --harmony-strict-legacy-accessor-builtins 2017-11-17 04:06:30 +00:00
test-utils-arm64.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-utils-arm64.h Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-utils.cc Normalize casing of hexadecimal digits 2017-12-02 01:24:40 +00:00
test-version.cc [build] Introduce an embedder version string 2017-10-05 07:17:45 +00:00
test-weakmaps.cc [heap] Increase test coverage for embedder tracing 2017-12-07 14:11:51 +00:00
test-weaksets.cc [factory] Simplify JSFunction creation 2017-11-08 13:52:13 +00:00
testcfg.py Reland "Reland "[test] Creating command before execution phase."" 2017-12-04 13:40:29 +00:00
trace-extension.cc
trace-extension.h
types-fuzz.h
unicode-helpers.h Implement DFA Unicode Decoder 2017-12-11 21:36:13 +00:00