fe3eb3ed5c
* ICU-13530 copy C/C++ files UTrie2 -> UTrie3 X-SVN-Rev: 40754 * ICU-13530 UTrie3 new files copied from UTrie2: rename types/functions/macros X-SVN-Rev: 40755 * ICU-13530 debug-print building each UTrie2 X-SVN-Rev: 40756 * ICU-13530 remove two-byte-UTF-8 errorValue block; move highValue from end of data array into header; add errorValue to header X-SVN-Rev: 40762 * ICU-13530 UTrie3 U16_NEXT/PREV: errorValue for unpaired surrogates X-SVN-Rev: 40763 * ICU-13530 no more separate values for lead surrogate code units X-SVN-Rev: 40764 * ICU-13530 change from 11:5 trie bits to 10:6 for simpler UTF-8 code X-SVN-Rev: 40766 * ICU-13530 UTrie2 build UTrie3 as well, print sizes X-SVN-Rev: 40767 * ICU-13530 debug-print countSame, sumOverlaps, countInitial X-SVN-Rev: 40768 * ICU-13530 debug-print whether trie is for CanonIterData X-SVN-Rev: 40769 * ICU-13530 no index-shift for BMP data, no separate index-2 for 2-byte UTF-8; builder changes incomplete X-SVN-Rev: 40777 * ICU-13530 remove errorValue and highStart from UNewTrie3 X-SVN-Rev: 40778 * ICU-13530 rewrite UTrie3 builder code X-SVN-Rev: 40783 * ICU-13530 UTrie3 bug fixes X-SVN-Rev: 40788 * ICU-13530 fully re-inline _UTRIE3_U8_NEXT() X-SVN-Rev: 40790 * ICU-13530 find most common all-same data block for dataNullBlock and initialValue X-SVN-Rev: 40792 * ICU-13530 UTrie3 iterator functions take start and return the end of a range, rather than callback call for each range X-SVN-Rev: 40800 * ICU-13530 mask off unused data value bits before building a UTrie3 with values less than 32 bits wide X-SVN-Rev: 40803 * ICU-13530 split utrie3builder.h out of utrie3.h X-SVN-Rev: 40804 * ICU-13530 separate types UTrie3 vs. UTrie3Builder, implement builder as wrapper over C++ class Trie3Builder in .cpp X-SVN-Rev: 40809 * ICU-13530 function to make a UTrie3Builder from a UTrie3 X-SVN-Rev: 40810 * ICU-13530 debug-print some data; some cleanup X-SVN-Rev: 40865 * ICU-13530 BMP 10:6 but supplementary 10:6:4 X-SVN-Rev: 40984 * ICU-13530 move errorValue & highValue to the end of the data table, minimal padding to 4 bytes X-SVN-Rev: 41011 * ICU-13530 index-1 table gap of index-2 null blocks X-SVN-Rev: 41018 * ICU-13530 test with more than 128k compacted data X-SVN-Rev: 41034 * ICU-13530 supplementary bits 11:5:4 saves a little space X-SVN-Rev: 41039 * ICU-13530 supplementary bits 6:5:5:4 instead of gap: about same size but simpler X-SVN-Rev: 41050 * ICU-13530 remove unnecessary utrie3_clone(built trie) X-SVN-Rev: 41058 * ICU-13530 remove unnecessary UTrie3StringIterator X-SVN-Rev: 41059 * ICU-13530 back to UTRIE3_GET...() macros *returning* data values X-SVN-Rev: 41060 * ICU-13530 fast vs. small X-SVN-Rev: 41066 * ICU-13530 always load NFC data, add simple normalization performance test X-SVN-Rev: 41110 * ICU-13530 change normalization main trie to UTrie3 with special values for lead surrogates; forbid non-inert surrogate code *points* because unable to store values different from code *units*; runtime code work around that for code point lookup and iteration; adjust UTS 46 for normalization no longer mapping unpaired surrogates to U+FFFD X-SVN-Rev: 41122 * ICU-13530 simplenormperf bug fix and NFC base line X-SVN-Rev: 41126 * ICU-13530 move normalization getRange skipping lead surrogates to API getRangeSkipLead() X-SVN-Rev: 41182 * ICU-13530 switch CanonIterData and gennorm2 Norms to UTrie3 X-SVN-Rev: 41183 * ICU-13530 remove unused overwrite parameter from setRange() X-SVN-Rev: 41184 * ICU-13530 getRange skip lead -> fixed surrogates X-SVN-Rev: 41219 * ICU-13530 minor cleanup X-SVN-Rev: 41221 * ICU-13530 UTS 46 code map unpaired surrogates to U+FFFD before normalization X-SVN-Rev: 41224 * ICU-13530 minor internal-docs cleanup X-SVN-Rev: 41225 * ICU-13530 rename UTrie3 to UCPTrie, and other name changes X-SVN-Rev: 41226 * ICU-13530 add 8-bit data option; add type-any & valueBits-any for fromBinary(); macros consistently source type then data width X-SVN-Rev: 41234 * ICU-13530 scrub the API docs for the proposal X-SVN-Rev: 41319 * ICU-13530 tag internal definitions as such, or move them to an internal header X-SVN-Rev: 41320 * ICU-13530 Java API skeleton X-SVN-Rev: 41326 * ICU-13530 API feedback: ValueWidth, MutableCodePointTrie, base CodePointMap, ... X-SVN-Rev: 41382 * ICU-13530 add UCPTrie valueWidth field and padding, and combine data pointers into a union X-SVN-Rev: 41408 * ICU-13530 switch some macros to using dataAccess parameter: separate index vs. data lookups, no macro variant for each value width X-SVN-Rev: 41409 * ICU-13530 StringIterator is no longer a java.util.Iterator (bad fit) X-SVN-Rev: 41455 * ICU-13530 CodePointTrie.java code complete X-SVN-Rev: 41518 * ICU-13530 finish Java port incl test; keep C++ parallel * ICU-13530 adjust API for feedback: rename HandleValue to FilterValue, change getRange+getRangeFixedSurr(bool allSurr) to enum RangeOption+getRange(enum option); change remaining C macros to use dataAccess for 16/32/8-bit value widths; fix/clarify some API docs * ICU-13530 add javadoc * ICU-13530 document UCPTrie binary data format * ICU-13530 update .nrm formatVersion 3->4, document change in surrogate handling with new trie * ICU-13530 re-hardcode NFC data * move trie swapper code into new file; add new files to Windows project files; turn off trie debugging * ICU-13530 minor cleanup * ICU-13530 test more range starts; fix a C test leak * ICU-13530 regenerate Java data from scratch * ICU-13530 review feedback changes: API docs typos, more @internal, C++11 field initializers, fix potential leak in MutableCodePointTrie::fromUCPTrie() * ICU-13530 rename interface FilterValue to ValueFilter
84 lines
1.8 KiB
Plaintext
84 lines
1.8 KiB
Plaintext
# Copyright (C) 2016 and later: Unicode, Inc. and others.
|
|
# License & terms of use: http://www.unicode.org/copyright.html
|
|
# Copyright (C) 2010, International Business Machines
|
|
# Corporation and others. All Rights Reserved.
|
|
#
|
|
# file name: testnorm.txt
|
|
# encoding: US-ASCII
|
|
# tab size: 8 (not used)
|
|
# indentation:4
|
|
#
|
|
# created on: 2010feb15
|
|
# created by: Markus W. Scherer
|
|
#
|
|
# Normalization test data, for improving code coverage.
|
|
|
|
# Selection of Canonical_Combining_Class (ccc) values
|
|
0300..0314:230
|
|
0315:232
|
|
0316..0319:220
|
|
031A:232
|
|
031B:216
|
|
031C..0320:220
|
|
0321..0322:202
|
|
0323..0326:220
|
|
0327..0328:202
|
|
0329..0333:220
|
|
0334..0338:1
|
|
0339..033C:220
|
|
033D..0344:230
|
|
0345:240
|
|
0346:230
|
|
0347..0349:220
|
|
034A..034C:230
|
|
034D..034E:220
|
|
0350..0352:230
|
|
0353..0356:220
|
|
0357:230
|
|
0358:232
|
|
0359..035A:220
|
|
035B:230
|
|
035C:233
|
|
035D..035E:234
|
|
035F:233
|
|
0360..0361:234
|
|
0362:233
|
|
0363..036F:230
|
|
# ICU 63 normalization with UCPTrie requires inert surrogate code points.
|
|
# D802:2 # surrogates with non-zero combining classes
|
|
# D803:3
|
|
# D804:4
|
|
110B9:9
|
|
110BA:7
|
|
|
|
# Some interesting mappings
|
|
00C0=0041 0300
|
|
00C1=0041 0301
|
|
00C2=0041 0302
|
|
00C3=0041 0303
|
|
00C4=0041 0308
|
|
00C5=0041 030A
|
|
00C7=0043 0327
|
|
# ICU 63 normalization with UCPTrie requires inert surrogate code points.
|
|
# D800>D7FF # surrogates with mappings, and mappings to empty strings
|
|
# D801>
|
|
# DFFE>
|
|
# DFFF>FFFF
|
|
E000>
|
|
E001=61 338 # composition with trail<=33FF and composite>7FFF
|
|
E002=E001 308 # recursive mapping needs reordering
|
|
E003>62 307 327 337 # mapping needs reordering
|
|
E011=E010 F0011 # composition of BMP+supplementary, and F0011 is maybe & combines-fwd
|
|
E111>1101 # mapping ends in Jamo L
|
|
E112>1102 62 # mapping starts with Jamo L
|
|
FFF3>FFF4
|
|
FFF4>FFF5
|
|
FFF5>FFF7
|
|
FFF7>10037
|
|
10036>FFF6
|
|
10077>10037
|
|
1109A=11099 110BA
|
|
1109C=1109B 110BA
|
|
110AB=110A5 110BA
|
|
F0010=F0011 E012 # composition of supplementary+BMP
|