AuroraMiddleware/v8 - v8 - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Justin Ridgewell	cedec225c9	Implement DFA Unicode Decoder This is a separation of the DFA Unicode Decoder from https://chromium-review.googlesource.com/c/v8/v8/+/789560 I attempted to make the DFA's table a bit more explicit in this CL. Still, the linter prevents me from letting me present the array as a "table" in source code. For a better representation, please refer to https://docs.google.com/spreadsheets/d/1L9STtkmWs-A7HdK5ZmZ-wPZ_VBjQ3-Jj_xN9c6_hLKA - - - - - Now for a big copy-paste from 789560: Essentially, reworks a standard FSM (imagine an array of structs) and flattens it out into a single-dimension array. Using Table 3-7 of the Unicode 10.0.0 standard (page 126 of http://www.unicode.org/versions/Unicode10.0.0/ch03.pdf), we can nicely map all bytes into one of 12 character classes: 00. 0x00-0x7F 01. 0x80-0x8F (split from general continuation because this range is not valid after a 0xF0 leading byte) 02. 0x90-0x9F (split from general continuation because this range is not valid after a 0xE0 nor a 0xF4 leading byte) 03. 0xA0-0xBF (the rest of the continuation range) 04. 0xC0-0xC1, 0xF5-0xFF (the joined range of invalid bytes, notice this includes 255 which we use as a known bad byte during hex-to-int decoding) 05. 0xC2-0xDF (leading bytes which require any continuation byte afterwards) 06. 0xE0 (leading byte which requires a 0xA0-0xBF afterwards then any continuation byte after that) 07. 0xE1-0xEC, 0xEE-0xEF (leading bytes which requires any continuation afterwards then any continuation byte after that) 08. 0xED (leading byte which requires a 0x80-0x9F afterwards then any continuation byte after that) 09. 0xF1-F3 (leading bytes which requires any continuation byte afterwards then any continuation byte then any continuation byte) 10. 0xF0 (leading bytes which requires a 0x90-0xBF afterwards then any continuation byte then any continuation byte) 11. 0xF4 (leading bytes which requires a 0x80-0x8F afterwards then any continuation byte then any continuation byte) Note that 0xF0 and 0xF1-0xF3 were swapped so that fewer bytes were needed to represent the transition state ("9, 10, 10, 10" vs. "10, 9, 9, 9"). Using these 12 classes as "transitions", we can map from one state to the next. Each state is defined as some multiple of 12, so that we're always starting at the 0th column of each row of the FSM. From each state, we add the transition and get a index of the new row the FSM is entering. If at any point we encounter a bad byte, the state + bad-byte-transition is guaranteed to map us into the first row of the FSM (which contains no valid exiting transitions). The key differences from Björn's original (or his self-modified) DFA is the "bad" state is now mapped to 0 (or the first row of the FSM) instead of 12 (the second row). This saves ~50 bytes when gzipping, and also speeds up determining if a string is properly encoded (see his sample code at http://bjoern.hoehrmann.de/utf-8/decoder/dfa/#performance). Finally, I've replace his ternary check with an array access, to make the algorithm branchless. This places a requirement on the caller to 0 out the code point between successful decodings, which it could always have done because it's already branching. R=marja@google.com Bug: Change-Id: I574f208a84dc5d06caba17127b0d41f7ce1a3395 Reviewed-on: https://chromium-review.googlesource.com/805357 Commit-Queue: Justin Ridgewell <jridgewell@google.com> Reviewed-by: Marja Hölttä <marja@chromium.org> Reviewed-by: Mathias Bynens <mathias@chromium.org> Cr-Commit-Position: refs/heads/master@{#50012}	2017-12-11 21:36:13 +00:00
clemensh	f0523e3046	[wasm] Add UTF-8 validation Names passed for imports and exports are checked during decoding, leading to errors if they are no valid UTF-8. Function names are not checked during decode, but rather lead to undefined being returned at runtime if they are not UTF-8. We need to do these checks on the Wasm side, since the factory methods assume to get valid UTF-8 strings. R=titzer@chromium.org, yangguo@chromium.org Review-Url: https://codereview.chromium.org/1967023004 Cr-Commit-Position: refs/heads/master@{#36208}	2016-05-12 13:02:14 +00:00
marja	0e3b5386ae	Scanner / Unicode decoding: use size_t instead of unsigned. size_t is the correct data type for this purpose. Our APIs (in particular ExternalSourceStream::GetMoreData) are already using it, and there were some static_casts to convert between them. This CL doesn't intend to fix all of V8, just the minimal sense-making part around scanner character streams. BUG= Review URL: https://codereview.chromium.org/864273005 Cr-Commit-Position: refs/heads/master@{#26449}	2015-02-05 07:54:34 +00:00
jkummerow@chromium.org	f96e386d9a	Replace C++ bitfields with our own BitFields Shave this yak from orbit, it's the only way to be sure. BUG=chromium:427616 LOG=n R=svenpanne@chromium.org Review URL: https://codereview.chromium.org/700963002 Cr-Commit-Position: refs/heads/master@{#25148} git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@25148 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-11-05 12:41:46 +00:00
yangguo@chromium.org	8659e50723	Update unicode to 7.0.0. And do not use code points with PATTERN_* property for identifier start. Maintain that \u180E is a white space character. BUG=v8:2892 LOG=Y R=dpino@igalia.com, mathias@qiwi.be Review URL: https://codereview.chromium.org/638643002 git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@24473 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-10-08 14:55:03 +00:00
bmeurer@chromium.org	d07a2eb806	Rename ASSERT* to DCHECK. This way we don't clash with the ASSERT macros defined by GoogleTest, and we are one step closer to being able to replace our homegrown base/ with base/ from Chrome. R=jochen@chromium.org, svenpanne@chromium.org Review URL: https://codereview.chromium.org/430503007 git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@22812 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-08-04 11:34:54 +00:00
jochen@chromium.org	a4506cd3f2	Move platform abstraction to base library Also split v8-core independent methods from checks.h to base/logging.h and merge v8checks with the rest of checks. The CPU::FlushICache method is moved to CpuFeatures::FlushICache RoundUp and related methods are moved to base/macros.h Remove all layering violations from src/libplatform BUG=none R=jkummerow@chromium.org LOG=n Review URL: https://codereview.chromium.org/358363002 git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@22092 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-06-30 13:25:46 +00:00
jochen@chromium.org	56a486c322	Use full include paths everywhere - this avoids using relative include paths which are forbidden by the style guide - makes the code more readable since it's clear which header is meant - allows for starting to use checkdeps BUG=none R=jkummerow@chromium.org, danno@chromium.org LOG=n Review URL: https://codereview.chromium.org/304153016 git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@21625 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-06-03 08:12:43 +00:00
jochen@chromium.org	84e078e561	Reland 21502 - "Move OS::MemCopy and OS::MemMove out of platform to utils" Verified that arm builds locally. BUG=none TBR=jkummerow@chromium.org LOG=n Review URL: https://codereview.chromium.org/306473004 git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@21512 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-05-27 07:57:22 +00:00
jochen@chromium.org	eabd5a19b9	Revert 21502 - "Move OS::MemCopy and OS::MemMove out of platform to utils" TBR=jkummerow@chromium.org Review URL: https://codereview.chromium.org/297303004 git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@21504 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-05-26 19:56:27 +00:00
jochen@chromium.org	a5a21a0da4	Move OS::MemCopy and OS::MemMove out of platform to utils Since both are jitted on some platforms and depend on codegen, they don't belong to the platform abstraction. At the same time, I can't put them to codegen.h, as this would introduce cyclic dependencies. BUG=none R=jkummerow@chromium.org LOG=n Review URL: https://codereview.chromium.org/302563004 git-svn-id: https://v8.googlecode.com/svn/branches/bleeding_edge@21502 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-05-26 19:33:15 +00:00
bmeurer@chromium.org	d4b533d41b	Bulk update of Google copyright headers in source files. R=svenpanne@chromium.org Review URL: https://codereview.chromium.org/259183002 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@21035 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-04-29 06:42:26 +00:00
dcarney@chromium.org	f93f8ded96	String:WriteUtf8: Add REPLACE_INVALID_UTF8 option This patch makes String::WriteUtf8 replace invalid code points (i.e. unmatched surrogates) with the unicode replacement character when REPLACE_INVALID_UTF8 is set. This is done to avoid creating invalid UTF-8 output which can lead to compatibility issues with software requiring valid UTF-8 inputs (e.g. the WebSocket protocol requires valid UTF-8 and terminates connections when invalid UTF-8 is encountered). R=dcarney@chromium.org BUG= Review URL: https://codereview.chromium.org/121173009 Patch from Felix Geisendörfer <haimuiba@gmail.com>. git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@18683 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2014-01-20 09:52:54 +00:00
jkummerow@chromium.org	586c4e74b6	Replace OS::MemCopy with OS::MemMove (just as fast but more flexible). Review URL: https://codereview.chromium.org/13932006 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@14280 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2013-04-16 12:30:51 +00:00
dcarney@chromium.org	5e1d926053	Some Utf8Length microoptimizations R=yangguo@chromium.org BUG= Review URL: https://codereview.chromium.org/12783002 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@13938 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2013-03-13 19:43:45 +00:00
yangguo@chromium.org	0c822b21cb	Fix some latin-1 webkit units tests R=yangguo@chromium.org BUG= Review URL: https://chromiumcodereview.appspot.com/11962035 Patch from Dan Carney <dcarney@google.com>. git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@13455 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2013-01-21 16:11:31 +00:00
yangguo@chromium.org	a8d59243b9	Cleanup latin-1 conversion check in regexp engine R=yangguo@chromium.org BUG= Review URL: https://chromiumcodereview.appspot.com/11880045 Patch from Dan Carney <dcarney@google.com>. git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@13400 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2013-01-16 13:04:07 +00:00
yangguo@chromium.org	04ccb975f4	Remove InputBuffer R=yangguo@chromium.org BUG= Review URL: https://chromiumcodereview.appspot.com/11727004 Patch from Dan Carney <dcarney@google.com>. git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@13298 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2013-01-03 09:18:01 +00:00
yangguo@chromium.org	eedcaf1866	Remove Utf8InputBuffer R=yangguo@chromium.org BUG= Review URL: https://chromiumcodereview.appspot.com/11649018 Patch from Dan Carney <dcarney@google.com>. git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@13248 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2012-12-20 09:20:37 +00:00
yangguo@chromium.org	7cbca775ee	Reland regexp global optimizations. BUG= Review URL: https://chromiumcodereview.appspot.com/10872010 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@12396 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2012-08-28 09:37:41 +00:00
erik.corry@gmail.com	03cfc4363b	Fix input and output to handle UTF16 surrogate pairs. Review URL: https://chromiumcodereview.appspot.com/9600009 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@11007 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2012-03-12 12:35:28 +00:00
lrn@chromium.org	1d24f5f56b	Updated unicode library. Added Nl category to letters predicate (as requried for JS identifiers). Changed/simplified representation of canonicalization ranges. Truncated tables to code points in the BMP (all that is used by JS). Reformatted tables to avoid excessively long lines. Removed duplicate entries from multi-character mapping result tables. Review URL: http://codereview.chromium.org/3030026 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@5155 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2010-07-30 07:10:22 +00:00
deanm@chromium.org	eb906555fc	Cleanup include guards: - Fix some typos / guards that didn't match the filename. - Fix some style inconsistencies. - Add guards to files that were missing them. - Add the directory name to the guard. Review URL: http://codereview.chromium.org/99343 git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@1845 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2009-05-04 13:36:43 +00:00
christian.plesner.hansen@gmail.com	9bed566bdb	Changed copyright header from google inc. to v8 project authors. Added presubmit step to check copyright. git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@242 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2008-09-09 20:08:45 +00:00
christian.plesner.hansen@gmail.com	3351499cb5	Fixed problem where asian characters were not categorized as letters because they were defined using different syntax in the unicode database. git-svn-id: http://v8.googlecode.com/svn/branches/bleeding_edge@200 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2008-09-08 10:45:01 +00:00
christian.plesner.hansen	43d26ecc35	Initial export. git-svn-id: http://v8.googlecode.com/svn/trunk@2 ce2b1a6d-e550-0410-aec6-3dcde31c8c00	2008-07-03 15:10:15 +00:00

26 Commits