Commit Graph

48 Commits

Author SHA1 Message Date
Justin Ridgewell
cedec225c9 Implement DFA Unicode Decoder
This is a separation of the DFA Unicode Decoder from
https://chromium-review.googlesource.com/c/v8/v8/+/789560

I attempted to make the DFA's table a bit more explicit in this CL. Still, the
linter prevents me from letting me present the array as a "table" in source
code. For a better representation, please refer to
https://docs.google.com/spreadsheets/d/1L9STtkmWs-A7HdK5ZmZ-wPZ_VBjQ3-Jj_xN9c6_hLKA

- - - - -

Now for a big copy-paste from 789560:

Essentially, reworks a standard FSM (imagine an
array of structs) and flattens it out into a single-dimension array.
Using Table 3-7 of the Unicode 10.0.0 standard (page 126 of
http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf), we can nicely
map all bytes into one of 12 character classes:

00. 0x00-0x7F
01. 0x80-0x8F (split from general continuation because this range is not
    valid after a 0xF0 leading byte)
02. 0x90-0x9F (split from general continuation because this range is not
    valid after a 0xE0 nor a 0xF4 leading byte)
03. 0xA0-0xBF (the rest of the continuation range)
04. 0xC0-0xC1, 0xF5-0xFF (the joined range of invalid bytes, notice this
    includes 255 which we use as a known bad byte during hex-to-int
        decoding)
05. 0xC2-0xDF (leading bytes which require any continuation byte
    afterwards)
06. 0xE0 (leading byte which requires a 0xA0-0xBF afterwards then any
    continuation byte after that)
07. 0xE1-0xEC, 0xEE-0xEF (leading bytes which requires any continuation
    afterwards then any continuation byte after that)
08. 0xED (leading byte which requires a 0x80-0x9F afterwards then any
    continuation byte after that)
09. 0xF1-F3 (leading bytes which requires any continuation byte
    afterwards then any continuation byte then any continuation byte)
10. 0xF0 (leading bytes which requires a 0x90-0xBF afterwards then any
    continuation byte then any continuation byte)
11. 0xF4 (leading bytes which requires a 0x80-0x8F afterwards then any
    continuation byte then any continuation byte)

Note that 0xF0 and 0xF1-0xF3 were swapped so that fewer bytes were
needed to represent the transition state ("9, 10, 10, 10" vs.
"10, 9, 9, 9").

Using these 12 classes as "transitions", we can map from one state to
the next. Each state is defined as some multiple of 12, so that we're
always starting at the 0th column of each row of the FSM. From each
state, we add the transition and get a index of the new row the FSM is
entering.

If at any point we encounter a bad byte, the state + bad-byte-transition
is guaranteed to map us into the first row of the FSM (which contains no
valid exiting transitions).

The key differences from Björn's original (or his self-modified) DFA is
the "bad" state is now mapped to 0 (or the first row of the FSM) instead
of 12 (the second row). This saves ~50 bytes when gzipping, and also
speeds up determining if a string is properly encoded (see his sample
code at http://bjoern.hoehrmann.de/utf-8/decoder/dfa/#performance).

Finally, I've replace his ternary check with an array access, to make
the algorithm branchless. This places a requirement on the caller to 0
out the code point between successful decodings, which it could always
have done because it's already branching.

R=marja@google.com

Bug: 
Change-Id: I574f208a84dc5d06caba17127b0d41f7ce1a3395
Reviewed-on: https://chromium-review.googlesource.com/805357
Commit-Queue: Justin Ridgewell <jridgewell@google.com>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Reviewed-by: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#50012}
2017-12-11 21:36:13 +00:00
Mathias Bynens
822be9b238 Normalize casing of hexadecimal digits
This patch normalizes the casing of hexadecimal digits in escape
sequences of the form `\xNN` and integer literals of the form
`0xNNNN`.

Previously, the V8 code base used an inconsistent mixture of uppercase
and lowercase.

Google’s C++ style guide uses uppercase in its examples:
https://google.github.io/styleguide/cppguide.html#Non-ASCII_Characters

Moreover, uppercase letters more clearly stand out from the lowercase
`x` (or `u`) characters at the start, as well as lowercase letters
elsewhere in strings.

BUG=v8:7109
TBR=marja@chromium.org,titzer@chromium.org,mtrofin@chromium.org,mstarzinger@chromium.org,rossberg@chromium.org,yangguo@chromium.org,mlippautz@chromium.org
NOPRESUBMIT=true

Cq-Include-Trybots: master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.linux:linux_chromium_rel_ng
Change-Id: I790e21c25d96ad5d95c8229724eb45d2aa9e22d6
Reviewed-on: https://chromium-review.googlesource.com/804294
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Reviewed-by: Jakob Kummerow <jkummerow@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49810}
2017-12-02 01:24:40 +00:00
Mathias Bynens
96e181f95c [unicode] Clean up comments
BUG=v8:7109

Change-Id: I976eeb012e5de944468f01b0676902fc82cd9604
Reviewed-on: https://chromium-review.googlesource.com/802828
Reviewed-by: Marja Hölttä <marja@chromium.org>
Commit-Queue: Mathias Bynens <mathias@chromium.org>
Cr-Commit-Position: refs/heads/master@{#49798}
2017-12-01 14:50:48 +00:00
Marja Hölttä
fcb89f5515 [unicode] Add tests for UTF-8 decoders + minor cleanups.
Verify that both UTF-8 decoders (incremental and non-incremental one) match the
expectations.

Also cleanup / harden the UTF-8 handling code, as suggested in
https://chromium-review.googlesource.com/c/v8/v8/+/671020/ .


BUG=chromium:765608

Change-Id: I6344d62ca15b75ac8e333421c94c4aa35ab8190d
Reviewed-on: https://chromium-review.googlesource.com/681217
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Cr-Commit-Position: refs/heads/master@{#48229}
2017-09-29 13:18:52 +00:00
Marja Hölttä
f130bfd394 [unicode] Fix overlong / surrogate sequences detection some more.
Follow up to https://chromium-review.googlesource.com/671020

We still didn't return the correct amount of invalid characters, according to
the Encoding spec ( https://encoding.spec.whatwg.org/#utf-8-decoder ), when we
saw a byte sequence which was as start of an overlong / invalid sequence, but
there weren't enough continuation bytes.

A more rigorous test will follow in
https://chromium-review.googlesource.com/c/v8/v8/+/681217

BUG=chromium:765608

Change-Id: I535670edc14d3bae144e5a9ca373f12eec78a934
Reviewed-on: https://chromium-review.googlesource.com/681674
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Reviewed-by: Yang Guo <yangguo@chromium.org>
Cr-Commit-Position: refs/heads/master@{#48165}
2017-09-26 12:08:24 +00:00
Marja Hölttä
6389b7e6b6 [unicode] Return (the correct) errors for overlong / surrogate sequences.
This fix is two-fold:

1) Incremental UTF-8 decoding: Unify incorrect UTF-8 handling between V8 and
Blink.

Incremental UTF-8 decoding used to allow some overlong sequences / invalid code
points which Blink treated as errors. This caused the decoder and the Blink
UTF-8 decoder to produce a different number of bytes, resulting in random
failures when scripts were streamed (especially, this was detected by the
skipping inner functions feature which adds CHECKs against expected function
positions).

2) Non-incremental UTF-8 decoding: return the correct amount of invalid characters.

According to the encoding spec ( https://encoding.spec.whatwg.org/#utf-8-decoder
), the first byte of an overlong sequence / invalid code point generates an
invalid character, and the rest of the bytes are not processed (i.e., pushed
back to the byte stream). When they're handled, they will look like lonely
continuation bytes, and will generate an invalid character each.

As a result, an overlong 4-byte sequence should generate 4 invalid characters
(not 1).

This is a potentially breaking change, since the (non-incremental) UTF-8
decoding is exposed via the API (String::NewFromUtf8). The behavioral difference
happens when the client is passing in invalid UTF-8 (containing overlong /
surrogate sequences).

However, afaict, this doesn't change the semantics of any JavaScript program:
according to the ECMAScript spec, the program is a sequence of Unicode code
points, and there's no way to invoke the UTF-8 decoding functionalities from
inside JavaScript. Though, this changes the behavior of d8 when decoding source
files which are invalid UTF-8.

This doesn't change anything related to URI decoding (it already throws
exceptions for overlong sequences / invalid code points).

BUG: chromium:765608, chromium:758236, v8:5516
Bug: 
Change-Id: Ib029f6a8e87186794b092e4e8af32d01cee3ada0
Reviewed-on: https://chromium-review.googlesource.com/671020
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Franziska Hinkelmann <franzih@chromium.org>
Reviewed-by: Camillo Bruni <cbruni@chromium.org>
Cr-Commit-Position: refs/heads/master@{#48105}
2017-09-21 10:44:40 +00:00
Marja Hölttä
2b6780dc17 [scanner] Don't use UnicodeCache for IsLineTerminator.
For such a simple predicate, calling a(n inline) function that checks against
the values is faster (*) than maintaining the cache.

(*) When scanning a file that contains only comments, we're basically calling
IsLineTerminator in a loop. Parsing such files is now 7-18% faster in local
experiments.

BUG=v8:6092

Change-Id: I6a8f2aba9669a76152292f4e6c7853638d15aae3
Reviewed-on: https://chromium-review.googlesource.com/645633
Commit-Queue: Marja Hölttä <marja@chromium.org>
Reviewed-by: Adam Klein <adamk@chromium.org>
Cr-Commit-Position: refs/heads/master@{#47810}
2017-09-05 07:04:06 +00:00
Jungshik Shin
1163aba720 Remove icu_case_mapping flag
icu-case-mapping was shipped a few months ago. By dropping
the flag, unibrow's case conversion code won't be included
by default because V8_INTL_SUPPORT is on by default.

BUG=v8:4477, v8:4476
TEST=test262/{built-ins,intl402}/Strings/*, webkit/fast/js/*,
     mjsunit/string-case, intl/general/case*

Cq-Include-Trybots: master.tryserver.v8:v8_linux_noi18n_rel_ng
Change-Id: I78be9cc64b4588bc5af79ecbbadf93af6e84a1df
Reviewed-on: https://chromium-review.googlesource.com/534541
Commit-Queue: Jungshik Shin <jshin@chromium.org>
Reviewed-by: Benedikt Meurer <bmeurer@chromium.org>
Reviewed-by: Daniel Ehrenberg <littledan@chromium.org>
Cr-Commit-Position: refs/heads/master@{#46304}
2017-06-29 03:47:27 +00:00
jshin
4aeb94a42d Use ICU for ID_START, ID_CONTINUE and WhiteSpace check
Use ICU to check ID_Start, ID_Continue and WhiteSpace even for BMP
when V8_INTL_SUPPORT is on (which is default).

Change LineTerminator::Is() to check 4 code points from
ES#sec-line-terminators instead of using tables and Lookup function.

Remove Lowercase::Is(). It's not used anywhere.

Update webkit/{ToNumber,parseFloat}.js to have the correct expectation
for U+180E and the corresponding expected files. This is a follow-up to
an earlier change ( https://codereview.chromium.org/2720953003 ).

CQ_INCLUDE_TRYBOTS=master.tryserver.v8:v8_win_dbg,v8_mac_dbg;master.tryserver.chromium.android:android_arm64_dbg_recipe
CQ_INCLUDE_TRYBOTS=master.tryserver.v8:v8_linux_noi18n_rel_ng

BUG=v8:5370,v8:5155
TEST=unittests --gtest_filter=CharP*
TEST=webkit: ToNumber, parseFloat
TEST=test262: built-ins/Number/S9.3*, built-ins/parse{Int,Float}/S15*
TEST=test262: language/white-space/mong*
TEST=test262: built-ins/String/prototype/trim/u180e
TEST=mjsunit: whitespaces

Review-Url: https://codereview.chromium.org/2331303002
Cr-Commit-Position: refs/heads/master@{#45957}
2017-06-14 20:32:49 +00:00
Andreas Haas
8e0daf78da [wasm] Also kBadChar is a valid utf8 character
The validation of utf8 strings in WebAssembly modules used the character
kBadChar = 0xFFFD to indicate a validation error. However, this
character can appear in a valid utf8 string. This CL fixes this problem
by duplicating some of the code in {Utf8::CalculateValue} and inlining
it directly into Utf8::Validate. Note that Utf8::Validate is used only
for WebAssembly.

Tests for this change are in the WebAssembly spec tests, which I will
update in a separate CL.

R=vogelheim@chromium.org

Change-Id: I8697b9299f3e98a8eafdf193bff8bdff90efd7dc
Reviewed-on: https://chromium-review.googlesource.com/509534
Reviewed-by: Daniel Vogelheim <vogelheim@chromium.org>
Commit-Queue: Andreas Haas <ahaas@chromium.org>
Cr-Commit-Position: refs/heads/master@{#45476}
2017-05-23 09:28:06 +00:00
Wiktor Garbacz
9a8efd8a4e [cleanup] Remove return after UNREACHABLE
Change-Id: I20ed35a7fb5104a9cc66bb54fa8966589c43d7f9
Reviewed-on: https://chromium-review.googlesource.com/507287
Reviewed-by: Andreas Haas <ahaas@chromium.org>
Reviewed-by: Benedikt Meurer <bmeurer@chromium.org>
Reviewed-by: Daniel Clifford <danno@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Marja Hölttä <marja@chromium.org>
Reviewed-by: Jochen Eisinger <jochen@chromium.org>
Commit-Queue: Wiktor Garbacz <wiktorg@google.com>
Cr-Commit-Position: refs/heads/master@{#45458}
2017-05-22 13:10:01 +00:00
yangguo
a5dfa06213 [unibrow] remove mongolian vowel separator as white space.
Unibrow is currently at Unicode version 7.0.0, which does not
include mongolian vowel separator (\u180E) as white space. In
order to appease test262 at the time however we kept it as a
whitespace.

Test262 has since then been updated. And while this is not an
update of unibrow, we are removing \u180E as white space here.

R=jshin@chromium.org, littledan@chromium.org
BUG=v8:5155

Review-Url: https://codereview.chromium.org/2720953003
Cr-Commit-Position: refs/heads/master@{#43485}
2017-02-28 13:42:29 +00:00
jbroman
9d524bd33d Fix out-of-range access in unibrow::Utf8::CalculateValue.
This code should not access bytes out of the permitted range in order to check
the range of a possible UTF-8 value. Instead, the length check should occur
before such checks.

BUG=chromium:667260, chromium:662822

Review-Url: https://codereview.chromium.org/2520053003
Cr-Commit-Position: refs/heads/master@{#41165}
2016-11-22 09:27:59 +00:00
vogelheim
fd40ebb1e6 Return kBadChar for longest subpart of incomplete utf-8 character.
This brings the two utf-8 decoders (bulk + incremental) in line.
Technically, either behaviour was correct, since the utf-8 spec
demands incomplete utf-8 be handled, but does not specify how.
Unicode recommends that "the maximal subpart at that offset
should be replaced by a single U+FFFD," and with this change we
consistently do that. More details + spec references in the bug.

BUG=chromium:662822

Review-Url: https://codereview.chromium.org/2493143003
Cr-Commit-Position: refs/heads/master@{#41025}
2016-11-16 11:03:08 +00:00
ulan
8ddc260d3b [parser, serializer] Fix more -Wsign-compare warnings.
BUG=v8:5614

Review-Url: https://codereview.chromium.org/2481013010
Cr-Commit-Position: refs/heads/master@{#40927}
2016-11-11 13:54:26 +00:00
vogelheim
138127a608 Fix bad-char handling in utf-8 streaming streams. Also add test.
R=jochen@chromium.org
BUG=chromium:651333, v8:4947

Review-Url: https://codereview.chromium.org/2391273002
Cr-Commit-Position: refs/heads/master@{#40004}
2016-10-05 17:18:58 +00:00
vogelheim
642d6d314c Rework scanner-character-streams.
- Smaller, more consistent streams API (Advance, Back, pos, Seek)
- Remove implementations from the header, in favor of creation functions.

Observe:
- Performance:
  - All Utf16CharacterStream methods have an inlinable V8_LIKELY w/ a
    body of only a few instructions. I expect most calls to end up there.
  - There used to be performance problems w/ bookmarking, particularly
    with copying too much data on SetBookmark w/ UTF-8 streaming streams.
    All those copies are gone.
  - The old streaming streams implementation used to copy data even for
    2-byte input. It no longer does.
  - The only remaining 'slow' method is the Seek(.) slow case for utf-8
    streaming streams. I don't expect this to be called a lot; and even if,
    I expect it to be offset by the gains in the (vastly more frequent)
    calls to the other methods or the 'fast path'.
  - If it still bothers us, there are several ways to speed it up.
- API & code cleanliness:
  - I want to remove the 'old' API in a follow-up CL, which should mostly
    delete code, or replace it 1:1.
  - In a 2nd follow-up I want to delete much of the UTF-8 handling in Blink
    for streaming streams.
  - The "bookmark" is now always implemented (and mostly very fast), so we
    should be able to use it for more things.
- Testing & correctness:
  - The unit tests now cover all stream implementations,
    and are pretty good and triggering all the edge cases.
  - Vastly more DCHECKs of the invariants.

BUG=v8:4947

Review-Url: https://codereview.chromium.org/2314663002
Cr-Commit-Position: refs/heads/master@{#39464}
2016-09-16 08:29:52 +00:00
clemensh
f0523e3046 [wasm] Add UTF-8 validation
Names passed for imports and exports are checked during decoding,
leading to errors if they are no valid UTF-8. Function names are not
checked during decode, but rather lead to undefined being returned at
runtime if they are not UTF-8.

We need to do these checks on the Wasm side, since the factory
methods assume to get valid UTF-8 strings.

R=titzer@chromium.org, yangguo@chromium.org

Review-Url: https://codereview.chromium.org/1967023004
Cr-Commit-Position: refs/heads/master@{#36208}
2016-05-12 13:02:14 +00:00
mstarzinger
92e85aed10 [presubmit] Fix build/include linter violations.
R=bmeurer@chromium.org

Review URL: https://codereview.chromium.org/1318863004

Cr-Commit-Position: refs/heads/master@{#30554}
2015-09-03 07:56:14 +00:00
jochen
3d5b2f807b Update UTF-8 decoder to detect more special cases.
The blink version is stricter and for parsing it's important that both
decoders behave the same.

BUG=chromium:489944
R=vogelheim@chromium.org
LOG=n

Review URL: https://codereview.chromium.org/1148653007

Cr-Commit-Position: refs/heads/master@{#28601}
2015-05-22 18:47:16 +00:00
marja
0e3b5386ae Scanner / Unicode decoding: use size_t instead of unsigned.
size_t is the correct data type for this purpose. Our APIs (in particular
ExternalSourceStream::GetMoreData) are already using it, and there were some
static_casts to convert between them.

This CL doesn't intend to fix all of V8, just the minimal sense-making part
around scanner character streams.

BUG=

Review URL: https://codereview.chromium.org/864273005

Cr-Commit-Position: refs/heads/master@{#26449}
2015-02-05 07:54:34 +00:00
yangguo@chromium.org
8659e50723 Update unicode to 7.0.0.
And do not use code points with PATTERN_* property for identifier start.
Maintain that \u180E is a white space character.

BUG=v8:2892
LOG=Y
R=dpino@igalia.com, mathias@qiwi.be

Review URL: https://codereview.chromium.org/638643002

git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@24473 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2014-10-08 14:55:03 +00:00
bmeurer@chromium.org
d07a2eb806 Rename ASSERT* to DCHECK*.
This way we don't clash with the ASSERT* macros
defined by GoogleTest, and we are one step closer
to being able to replace our homegrown base/ with
base/ from Chrome.

R=jochen@chromium.org, svenpanne@chromium.org

Review URL: https://codereview.chromium.org/430503007

git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@22812 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2014-08-04 11:34:54 +00:00
mstarzinger@chromium.org
fec6e62dfb Check alpha-sorting of includes during presubmit.
R=rossberg@chromium.org

Review URL: https://codereview.chromium.org/333013002

git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@21894 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2014-06-20 08:40:11 +00:00
jochen@chromium.org
56a486c322 Use full include paths everywhere
- this avoids using relative include paths which are forbidden by the style guide
- makes the code more readable since it's clear which header is meant
- allows for starting to use checkdeps

BUG=none
R=jkummerow@chromium.org, danno@chromium.org
LOG=n

Review URL: https://codereview.chromium.org/304153016

git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@21625 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2014-06-03 08:12:43 +00:00
bmeurer@chromium.org
d4b533d41b Bulk update of Google copyright headers in source files.
R=svenpanne@chromium.org

Review URL: https://codereview.chromium.org/259183002

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@21035 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2014-04-29 06:42:26 +00:00
yangguo@chromium.org
b618d2a42a Fix inconsistencies wrt whitespaces.
This relands r19196 with fixes.

BUG=v8:3109
LOG=Y
R=mstarzinger@chromium.org

Review URL: https://codereview.chromium.org/141323007

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@19222 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2014-02-10 12:43:10 +00:00
yangguo@chromium.org
02674ee414 Keep two empty lines between declarations for cpp files
R=yangguo@chromium.org

Review URL: https://codereview.chromium.org/18509003

Patch from Haitao Feng <haitao.feng@intel.com>.

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@15510 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2013-07-05 09:52:11 +00:00
yangguo@chromium.org
04ccb975f4 Remove InputBuffer
R=yangguo@chromium.org
BUG=

Review URL: https://chromiumcodereview.appspot.com/11727004
Patch from Dan Carney <dcarney@google.com>.

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@13298 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2013-01-03 09:18:01 +00:00
yangguo@chromium.org
eedcaf1866 Remove Utf8InputBuffer
R=yangguo@chromium.org
BUG=

Review URL: https://chromiumcodereview.appspot.com/11649018
Patch from Dan Carney <dcarney@google.com>.

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@13248 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2012-12-20 09:20:37 +00:00
erik.corry@gmail.com
03cfc4363b Fix input and output to handle UTF16 surrogate pairs.
Review URL: https://chromiumcodereview.appspot.com/9600009

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@11007 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2012-03-12 12:35:28 +00:00
mstarzinger@chromium.org
2a2ed5004f Update unicode tables to version 6.1.0.
R=erik.corry@gmail.com
BUG=v8:1965

Review URL: https://chromiumcodereview.appspot.com/9615005

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@10933 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2012-03-06 09:43:12 +00:00
erik.corry@gmail.com
70da367f6b More spelling changes.
Review URL: http://codereview.chromium.org/9231009

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@10407 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2012-01-16 12:38:59 +00:00
vitalyr@chromium.org
7976ca2cbc Merge isolates to bleeding_edge.
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@7271 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2011-03-18 20:35:07 +00:00
vitalyr@chromium.org
76e226f832 Revert r7268: it borked the history.
git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@7269 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2011-03-18 19:41:05 +00:00
vitalyr@chromium.org
6ff7fdebd3 Merge isolates to bleeding_edge.
Review URL: http://codereview.chromium.org/6685088

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@7268 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2011-03-18 18:49:56 +00:00
lrn@chromium.org
b9bd4952a7 Changed uncast -1 in unsigned context to use constant kSentinel.
Review URL: http://codereview.chromium.org/5993006

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@6133 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2011-01-03 10:28:39 +00:00
lrn@chromium.org
66574f31de Unicode: Reduced size of tables.
Review URL: http://codereview.chromium.org/3043032

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@5161 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2010-07-30 12:59:57 +00:00
lrn@chromium.org
1d24f5f56b Updated unicode library.
Added Nl category to letters predicate (as requried for JS identifiers).
Changed/simplified representation of canonicalization ranges.
Truncated tables to code points in the BMP (all that is used by JS).
Reformatted tables to avoid excessively long lines.
Removed duplicate entries from multi-character mapping result tables.

Review URL: http://codereview.chromium.org/3030026

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@5155 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2010-07-30 07:10:22 +00:00
deanm@chromium.org
e9f42cde46 Small cleanup to Utf8::CalculateValue:
- Don't duplicate kMaxXByteChar constants.
  - Don't compare signed and unsigned integers.

Review URL: http://codereview.chromium.org/155414


git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@2434 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2009-07-13 11:17:51 +00:00
erik.corry@gmail.com
2d4dd93bdd Misc. portability fixes.
Review URL: http://codereview.chromium.org/42337

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@1538 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2009-03-18 15:20:26 +00:00
christian.plesner.hansen@gmail.com
144c8c790a Fixed problem where the two lower-case sigmas would uncanonicalize to
themselves and upper-case sigma, but upper-case sigma would
uncanonicalize to just lower-case final sigma.


git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@844 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2008-11-26 06:05:07 +00:00
christian.plesner.hansen@gmail.com
b57b4a15cd Merge regexp2000 back into bleeding_edge
Review URL: http://codereview.chromium.org/12427

git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@832 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2008-11-25 11:07:48 +00:00
christian.plesner.hansen@gmail.com
06fa6d1cde - Case-sensitive atomic regular expressions now use the same code as
String.indexOf to do matching.
- The --log option is no longer automatically enabled by the other log
  options.


git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@413 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2008-10-02 15:35:28 +00:00
christian.plesner.hansen@gmail.com
9bed566bdb Changed copyright header from google inc. to v8 project authors.
Added presubmit step to check copyright.



git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@242 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2008-09-09 20:08:45 +00:00
christian.plesner.hansen@gmail.com
3351499cb5 Fixed problem where asian characters were not categorized as letters
because they were defined using different syntax in the unicode
database.



git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@200 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2008-09-08 10:45:01 +00:00
mads.s.ager@gmail.com
dceb5f6a8f Improved test support.
Fixed issue with building samples and cctests on 64-bit machines.



git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@23 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2008-08-28 09:55:41 +00:00
christian.plesner.hansen
43d26ecc35 Initial export.
git-svn-id: http://v8.googlecode.com/svn/trunk@2 ce2b1a6d-e550-0410-aec6-3dcde31c8c00
2008-07-03 15:10:15 +00:00