2009-11-13 19:25:21 +00:00
|
|
|
* Copyright (C) 2004-2009, International Business Machines
|
2006-03-03 20:59:01 +00:00
|
|
|
* Corporation and others. All Rights Reserved.
|
|
|
|
*
|
|
|
|
* file name: changes.txt
|
|
|
|
* encoding: US-ASCII
|
|
|
|
* tab size: 8 (not used)
|
|
|
|
* indentation:4
|
|
|
|
*
|
|
|
|
* created on: 2004may06
|
|
|
|
* created by: Markus W. Scherer
|
|
|
|
*
|
|
|
|
* change log for Unicode updates
|
|
|
|
|
|
|
|
---------------------------------------------------------------------------- ***
|
|
|
|
|
2009-11-13 19:25:21 +00:00
|
|
|
Unicode 5.2 update
|
|
|
|
|
|
|
|
*** related ICU Trac tickets
|
|
|
|
|
|
|
|
7084 Unicode 5.2
|
|
|
|
|
|
|
|
7167 verify collation bytes
|
|
|
|
7235 Java test NAME_ALIAS
|
|
|
|
7236 Java DerivedCoreProperties.txt test
|
|
|
|
7237 Java BidiTest.txt
|
|
|
|
7238 UTrie2 in core unidata
|
|
|
|
7239 test for tailoring gaps
|
|
|
|
7240 Java fix CollationMiscTest
|
|
|
|
7243 update layout engine for Unicode 5.2
|
|
|
|
|
|
|
|
*** Unicode version numbers
|
|
|
|
- makedata.mak
|
|
|
|
- uchar.h
|
|
|
|
- configure.in & configure
|
|
|
|
- update ucdVersion in gennames.c if an algorithmic range changes
|
|
|
|
|
|
|
|
*** data files & enums & parser code
|
|
|
|
|
|
|
|
* file preparation
|
|
|
|
|
|
|
|
python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
|
|
|
|
- includes finding files regardless of version numbers,
|
|
|
|
copying them, and performing the equivalent processing of the
|
|
|
|
ucdstrip and ucdmerge tools on the desired set of files
|
|
|
|
|
|
|
|
* notes on changes
|
|
|
|
- PropertyAliases.txt
|
|
|
|
moved from numeric to enumerated:
|
|
|
|
ccc ; Canonical_Combining_Class
|
|
|
|
new string properties:
|
|
|
|
NFKC_CF ; NFKC_Casefold
|
|
|
|
Name_Alias; Name_Alias
|
|
|
|
new binary properties:
|
|
|
|
Cased ; Cased
|
|
|
|
CI ; Case_Ignorable
|
|
|
|
CWCF ; Changes_When_Casefolded
|
|
|
|
CWCM ; Changes_When_Casemapped
|
|
|
|
CWKCF ; Changes_When_NFKC_Casefolded
|
|
|
|
CWL ; Changes_When_Lowercased
|
|
|
|
CWT ; Changes_When_Titlecased
|
|
|
|
CWU ; Changes_When_Uppercased
|
|
|
|
new CJK Unihan properties (not supported by ICU)
|
|
|
|
- PropertyValueAliases.txt
|
|
|
|
new block names
|
|
|
|
new scripts
|
|
|
|
one script code change:
|
|
|
|
sc ; Qaai ; Inherited
|
|
|
|
->
|
|
|
|
sc ; Zinh ; Inherited ; Qaai
|
|
|
|
new Line_Break (lb) value:
|
|
|
|
lb ; CP ; Close_Parenthesis
|
|
|
|
new Joining_Group (jg) values: Farsi_Yeh, Nya
|
|
|
|
other new values:
|
|
|
|
ccc; 214; ATA ; Attached_Above
|
|
|
|
- DerivedBidiClass.txt
|
|
|
|
new default-R range: U+1E800 - U+1EFFF
|
|
|
|
- UnicodeData.txt
|
|
|
|
all of the ISO comments are gone
|
|
|
|
new CJK block end:
|
|
|
|
9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
|
|
|
|
new CJK block:
|
|
|
|
2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
|
|
|
|
2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
|
|
|
|
|
|
|
|
* genpname
|
|
|
|
- run preparse.pl
|
|
|
|
+ cd \svn\icuproj\icu\trunk\source\tools\genpname
|
|
|
|
+ make sure that data.h is writable
|
|
|
|
+ perl preparse.pl \svn\icuproj\icu\trunk > out.txt
|
|
|
|
+ preparse.pl complains with errors like the following:
|
|
|
|
Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
|
|
|
|
This is because ICU 4.0 had scripts from ISO 15924 which are now
|
|
|
|
added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
|
|
|
|
and PropertyValueAliases.txt.
|
|
|
|
-> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
|
|
|
|
Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
|
|
|
|
+ preparse.pl complains with errors about block names missing from uchar.h; add them
|
|
|
|
|
|
|
|
* uchar.h & uscript.h & uprops.h & uprops.c & genprops
|
|
|
|
- new block & script values
|
|
|
|
+ 26 new blocks
|
|
|
|
copy new blocks from Blocks.txt
|
|
|
|
MS VC++ 2008 regular expression:
|
|
|
|
find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
|
|
|
|
replace with " UBLOCK_\3 = 172, /*[\1]*/"
|
|
|
|
+ several new script values already added in ICU 4.0 for ISO 15924 coverage
|
|
|
|
(removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
|
|
|
|
+ 3 new script values added for ISO 15924 and Unicode 5.2 coverage
|
|
|
|
+ 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
|
|
|
|
(added to SyntheticPropertyValueAliases.txt)
|
|
|
|
- new Joining Group (JG) values: Farsi_Yeh, Nya
|
|
|
|
- new Line_Break (lb) value:
|
|
|
|
lb ; CP ; Close_Parenthesis
|
|
|
|
|
|
|
|
* hardcoded Unihan range end/limit
|
|
|
|
- Unihan range end moves from 9FC3 to 9FCB
|
|
|
|
search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
|
|
|
|
+ do change gennames.c
|
|
|
|
|
|
|
|
* Compare definitions of new binary properties with what we used to use
|
|
|
|
in algorithms, to see if the definitions changed.
|
|
|
|
- Verified that definitions for Cased and Case_Ignorable are unchanged.
|
|
|
|
The gencase tool now parses the newly public Case_Ignorable values
|
|
|
|
in case the definition changes in the future.
|
|
|
|
|
|
|
|
* uchar.c & uprops.h & uprops.c & genprops
|
|
|
|
- new numeric values that didn't exist in Unicode data before:
|
|
|
|
1/7, 1/9, 1/10, 3/10, 1/16, 3/16
|
|
|
|
the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
|
|
|
|
therefore redesign the encoding of numeric types and values for formatVersion 6;
|
|
|
|
design for simple numbers up to at least 144 ("one gross"),
|
|
|
|
large values up to at least 10^20,
|
|
|
|
and fractions with numerators -1..17 and denominators 1..16
|
|
|
|
to cover current and expected future values
|
|
|
|
(e.g., more Han numeric values, Meroitic twelfths)
|
|
|
|
|
|
|
|
* reimplement Hangul_Syllable_Type for new Jamo characters
|
|
|
|
- the old code assumed that all Jamo characters are in the 11xx block
|
|
|
|
- Unicode 5.2 fills holes there and adds new Jamo characters in
|
|
|
|
A960..A97F; Hangul Jamo Extended-A
|
|
|
|
and in
|
|
|
|
D7B0..D7FF; Hangul Jamo Extended-B
|
|
|
|
- Hangul_Syllable_Type can be trivially derived from a subset of
|
|
|
|
Grapheme_Cluster_Break values
|
|
|
|
|
|
|
|
* build Unicode data source code for hardcoding core data
|
|
|
|
C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
|
|
|
|
|
|
|
|
ICU data make path is \svn\icuproj\icu\trunk\source\data\
|
|
|
|
ICU root path is \svn\icuproj\icu\trunk
|
|
|
|
Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
|
|
|
|
Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
|
|
|
|
Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
|
|
|
|
Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
|
|
|
|
Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
|
|
|
|
Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
|
|
|
|
Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
|
|
|
|
Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
|
|
|
|
Creating data file for Unicode Property Names
|
|
|
|
Creating data file for Unicode Character Properties
|
|
|
|
Creating data file for Unicode Case Mapping Properties
|
|
|
|
Creating data file for Unicode BiDi/Shaping Properties
|
|
|
|
Creating data file for Unicode Normalization
|
|
|
|
Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
|
|
|
|
Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
|
|
|
|
|
|
|
|
- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
|
|
|
|
and rebuild the common library
|
|
|
|
|
|
|
|
*** UCA
|
|
|
|
|
|
|
|
- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
|
|
|
|
- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
|
|
|
|
- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
|
|
|
|
[ Begin obsolete instructions:
|
|
|
|
Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
|
|
|
|
- generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
|
|
|
|
on Windows:
|
|
|
|
python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
|
|
|
|
python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
|
|
|
|
End obsolete instructions]
|
|
|
|
- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
|
|
|
|
not just the *_STUB.txt files
|
|
|
|
- note on intltest: if collate/UCAConformanceTest fails, then
|
|
|
|
utility/MultithreadTest/TestCollators will fail as well;
|
|
|
|
fix the conformance test before looking into the multi-thread test
|
|
|
|
|
|
|
|
*** Implement Cased & Case_Ignorable properties
|
|
|
|
- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
|
|
|
|
- Problem: These properties should be disjoint, but aren't
|
|
|
|
- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
|
|
|
|
- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
|
|
|
|
|
|
|
|
*** Implement Changes_When_Xyz properties
|
|
|
|
- without stored data
|
|
|
|
|
|
|
|
*** Implement Name_Alias property
|
|
|
|
- add it as another name field in unames.icu
|
|
|
|
- make it available via u_charName() and UCharNameChoice and
|
|
|
|
- consider it in u_charFromName()
|
|
|
|
|
|
|
|
*** Break iterators
|
|
|
|
|
|
|
|
* Update break iterator rules to new UAX versions and new property values
|
|
|
|
* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
|
|
|
|
|
|
|
|
*** new BidiTest file
|
|
|
|
- review format and data
|
|
|
|
- copy BidiTest.txt to source/test/testdata
|
|
|
|
- write test code using this data
|
|
|
|
- fix ICU code where it fails the conformance test
|
|
|
|
|
|
|
|
*** Java
|
|
|
|
- generally, find and update code corresponding to C/C++
|
|
|
|
- UCharacter.UnicodeBlock constants:
|
|
|
|
a) add an _ID integer per new block, update COUNT
|
|
|
|
b) add a class instance per new block
|
|
|
|
Visual Studio regex:
|
|
|
|
find UBLOCK_{[^ ]+} = [0-9]+, {/.+}
|
|
|
|
replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
|
|
|
|
- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
|
|
|
|
|
|
|
|
- port test changes to Java
|
|
|
|
|
|
|
|
>> DONE with the above, TODO the following <<
|
|
|
|
|
|
|
|
*** LayoutEngine script information
|
|
|
|
* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
|
|
|
|
ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
|
|
|
|
ScriptRunData.cpp, which is no longer needed.)
|
|
|
|
|
|
|
|
The generated files have a current copyright date and "@draft" statement.
|
|
|
|
|
|
|
|
-> Eric Mader wrote in email on 20090930:
|
|
|
|
"I think the tool has been modified to update @draft to @stable for
|
|
|
|
older scripts and to add @draft for new scripts.
|
|
|
|
(I worked with an intern on this last year.)
|
|
|
|
You should check the output after you run it."
|
|
|
|
|
|
|
|
* copy the above files into <icu>/source/layout, replacing the old files.
|
|
|
|
|
|
|
|
Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
|
|
|
|
and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
|
|
|
|
|
|
|
|
-> Eric Mader wrote in email on 20090930:
|
|
|
|
"This is just a matter of making sure that all the per-script tables have
|
|
|
|
entries for any new scripts that were added.
|
|
|
|
If any new Indic characters were added, then the class tables in
|
|
|
|
IndicClassTables.cpp should be updated to reflect this.
|
|
|
|
John Emmons should know how to do this if it's required."
|
|
|
|
|
|
|
|
* rebuild the layout and layoutex libraries.
|
|
|
|
|
|
|
|
*** Documentation
|
|
|
|
- Update User Guide
|
|
|
|
+ Jamo_Short_Name, sfc->scf, binary property value aliases
|
|
|
|
|
|
|
|
---------------------------------------------------------------------------- ***
|
|
|
|
|
2008-04-04 22:47:43 +00:00
|
|
|
Unicode 5.1 update
|
|
|
|
|
|
|
|
*** related ICU Trac tickets
|
|
|
|
|
|
|
|
5696 Update to Unicode 5.1
|
|
|
|
|
|
|
|
*** Unicode version numbers
|
|
|
|
- makedata.mak
|
|
|
|
- uchar.h
|
|
|
|
- configure.in & configure
|
|
|
|
- update ucdVersion in gennames.c if an algorithmic range changes
|
|
|
|
|
|
|
|
*** data files & enums & parser code
|
|
|
|
|
|
|
|
* file preparation
|
|
|
|
- ucdstrip:
|
|
|
|
DerivedCoreProperties.txt
|
|
|
|
DerivedNormalizationProps.txt
|
|
|
|
NormalizationTest.txt
|
|
|
|
PropList.txt
|
|
|
|
Scripts.txt
|
|
|
|
GraphemeBreakProperty.txt
|
|
|
|
SentenceBreakProperty.txt
|
|
|
|
WordBreakProperty.txt
|
|
|
|
- ucdstrip and ucdmerge:
|
|
|
|
EastAsianWidth.txt
|
|
|
|
LineBreak.txt
|
|
|
|
|
|
|
|
* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
|
|
|
|
copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\Blocks.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
|
|
|
|
copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
|
|
|
|
|
|
|
|
ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
|
|
|
|
ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
|
|
|
|
ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
|
|
|
|
ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
|
|
|
|
ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
|
|
|
|
ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
|
|
|
|
ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
|
|
|
|
ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
|
|
|
|
ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
|
|
|
|
ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
|
|
|
|
|
|
|
|
* genpname
|
|
|
|
- run preparse.pl
|
|
|
|
+ cd \svn\icuproj\icu\uni51\source\tools\genpname
|
|
|
|
+ make sure that data.h is writable
|
|
|
|
+ perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
|
|
|
|
+ preparse.pl complains with errors like the following:
|
|
|
|
Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
|
|
|
|
This is because ICU 3.8 had scripts from ISO 15924 which are now
|
|
|
|
added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
|
|
|
|
and PropertyValueAliases.txt.
|
|
|
|
-> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
|
|
|
|
Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
|
|
|
|
+ PropertyValueAliases.txt now explicitly contains values for boolean properties:
|
|
|
|
N/Y, No/Yes, F/T, False/True
|
|
|
|
-> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
|
|
|
|
It will use further values from the file if present.
|
|
|
|
|
|
|
|
* uchar.h & uscript.h & uprops.h & uprops.c & genprops
|
|
|
|
- new block & script values
|
|
|
|
+ 17 new blocks
|
|
|
|
+ 11 new script values already added in ICU 3.8 for ISO 15924 coverage
|
|
|
|
(removed from SyntheticPropertyValueAliases.txt)
|
|
|
|
+ 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
|
|
|
|
(added to SyntheticPropertyValueAliases.txt)
|
|
|
|
- uprops.icu (uprops.h) only provides 7 bits for script codes.
|
|
|
|
In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
|
|
|
|
There is none above 127 yet which is the script code for an
|
|
|
|
assigned Unicode character, so ICU 4.0 uprops.icu does not store any
|
|
|
|
script code values greater than 127.
|
|
|
|
However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
|
|
|
|
in a parallel bit field, and that overflows now.
|
|
|
|
Also, future values >=128 would be incompatible anyway.
|
|
|
|
uprops.h is modified to move around several of the bit fields
|
|
|
|
in the properties vector words, and now uses 8 bits for the script code.
|
|
|
|
Two other bit fields also grow to accommodate future growth:
|
|
|
|
Block (current count: 172) grows from 8 to 9 bits,
|
|
|
|
and Word_Break grows from 4 to 5 bits.
|
|
|
|
- renamed property Simple_Case_Folding (sfc->scf)
|
|
|
|
+ nothing to be done: handled as normal alias
|
|
|
|
- new property JSN Jamo_Short_Name
|
|
|
|
+ no new API: only contributes to the Name property
|
|
|
|
- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
|
|
|
|
- new Joining Group (JG) value: Burushashki_Yeh_Barree
|
|
|
|
- new Sentence_Break (SB) values:
|
|
|
|
SB ; CR ; CR
|
|
|
|
SB ; EX ; Extend
|
|
|
|
SB ; LF ; LF
|
|
|
|
SB ; SC ; SContinue
|
|
|
|
- new Word_Break (WB) values:
|
|
|
|
WB ; CR ; CR
|
|
|
|
WB ; Extend ; Extend
|
|
|
|
WB ; LF ; LF
|
|
|
|
WB ; MB ; MidNumLet
|
|
|
|
|
|
|
|
* Further changes in the 2008-02-29 update:
|
|
|
|
- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
|
|
|
|
because they should not normally be invisible.
|
|
|
|
- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
|
|
|
|
- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
|
|
|
|
- new Word_Break (WB) value: NL=Newline
|
|
|
|
|
|
|
|
* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
|
|
|
|
- Unihan range end moves from 9FBB to 9FC3
|
|
|
|
search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
|
|
|
|
+ do change gennames.c
|
|
|
|
|
|
|
|
* build Unicode data source code for hardcoding core data
|
|
|
|
C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
|
|
|
|
|
|
|
|
ICU data make path is \svn\icuproj\icu\uni51\source\data\
|
|
|
|
ICU root path is \svn\icuproj\icu\uni51
|
|
|
|
Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
|
|
|
|
Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
|
|
|
|
Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
|
|
|
|
Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
|
|
|
|
Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
|
|
|
|
Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
|
|
|
|
Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
|
|
|
|
Creating data file for Unicode Character Properties
|
|
|
|
Creating data file for Unicode Case Mapping Properties
|
|
|
|
Creating data file for Unicode BiDi/Shaping Properties
|
|
|
|
Creating data file for Unicode Normalization
|
|
|
|
Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
|
|
|
|
Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
|
|
|
|
|
|
|
|
- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
|
|
|
|
and rebuild the common library
|
|
|
|
|
|
|
|
*** Break iterators
|
|
|
|
|
|
|
|
* Update break iterator rules to new UAX versions and new property values
|
|
|
|
|
|
|
|
*** UCA
|
|
|
|
|
|
|
|
* update FractionalUCA.txt and UCARules.txt with new canonical closure
|
|
|
|
|
|
|
|
*** Test suites
|
|
|
|
- Test that APIs using Unicode property value aliases (like UnicodeSet)
|
|
|
|
support all of the boolean values N/Y, No/Yes, F/T, False/True
|
|
|
|
-> TestBinaryValues() tests in both cintltst and intltest
|
|
|
|
|
|
|
|
*** LayoutEngine script information
|
|
|
|
* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
|
|
|
|
ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
|
|
|
|
ScriptRunData.cpp, which is no longer needed.)
|
|
|
|
|
|
|
|
The generated files have a current copyright date and "@draft" statement.
|
|
|
|
|
|
|
|
* copy the above files into <icu>/source/layout, replacing the old files.
|
|
|
|
|
|
|
|
Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
|
|
|
|
and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
|
|
|
|
|
|
|
|
* rebuild the layout and layoutex libraries.
|
|
|
|
|
|
|
|
*** Documentation
|
|
|
|
- Update User Guide
|
|
|
|
+ Jamo_Short_Name, sfc->scf, binary property value aliases
|
|
|
|
|
|
|
|
---------------------------------------------------------------------------- ***
|
|
|
|
|
2006-03-03 20:59:01 +00:00
|
|
|
Unicode 5.0 update
|
|
|
|
|
|
|
|
*** related Jitterbugs
|
|
|
|
|
|
|
|
5084 RFE: Update to Unicode 5.0
|
|
|
|
|
|
|
|
*** data files & enums & parser code
|
|
|
|
|
|
|
|
* file preparation
|
|
|
|
- ucdstrip:
|
|
|
|
DerivedCoreProperties.txt
|
|
|
|
DerivedNormalizationProps.txt
|
|
|
|
NormalizationTest.txt
|
2006-06-13 23:14:38 +00:00
|
|
|
PropList.txt
|
|
|
|
Scripts.txt
|
2006-03-03 20:59:01 +00:00
|
|
|
GraphemeBreakProperty.txt
|
|
|
|
SentenceBreakProperty.txt
|
|
|
|
WordBreakProperty.txt
|
|
|
|
- ucdstrip and ucdmerge:
|
|
|
|
EastAsianWidth.txt
|
|
|
|
LineBreak.txt
|
|
|
|
|
2008-04-04 22:47:43 +00:00
|
|
|
* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
|
2006-08-07 17:33:57 +00:00
|
|
|
copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\Blocks.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
|
|
|
|
copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
|
|
|
|
|
|
|
|
ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
|
|
|
|
ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
|
|
|
|
ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
|
|
|
|
ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
|
|
|
|
ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
|
|
|
|
ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
|
|
|
|
ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
|
|
|
|
ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
|
|
|
|
ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
|
|
|
|
ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
|
|
|
|
|
2006-03-03 20:59:01 +00:00
|
|
|
* update FractionalUCA.txt and UCARules.txt with new canonical closure
|
|
|
|
|
|
|
|
* genpname
|
|
|
|
- run preparse.pl
|
2006-06-13 23:14:38 +00:00
|
|
|
+ make sure that data.h is writable
|
2006-03-03 20:59:01 +00:00
|
|
|
+ perl preparse.pl \cvs\oss\icu > out.txt
|
|
|
|
|
|
|
|
* uchar.h & uscript.h & uprops.h & uprops.c & genprops
|
|
|
|
- new block & script values
|
|
|
|
+ script values already added in ICU 3.6 because all of ISO 15924 is now covered
|
|
|
|
|
|
|
|
* build Unicode data source code for hardcoding core data
|
|
|
|
C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
|
|
|
|
|
|
|
|
ICU data make path is \cvs\oss\icu\source\data\
|
|
|
|
ICU root path is \cvs\oss\icu
|
|
|
|
Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
|
|
|
|
[etc.]
|
|
|
|
Creating data file for Unicode Character Properties
|
|
|
|
Creating data file for Unicode Case Mapping Properties
|
|
|
|
Creating data file for Unicode BiDi/Shaping Properties
|
|
|
|
Creating data file for Unicode Normalization
|
|
|
|
Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
|
|
|
|
Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
|
|
|
|
|
|
|
|
- copy the .c source files to C:\cvs\oss\icu\source\common
|
|
|
|
and rebuild the common library
|
|
|
|
|
|
|
|
*** Unicode version numbers
|
|
|
|
- makedata.mak
|
|
|
|
- uchar.h
|
|
|
|
- configure.in
|
|
|
|
|
2006-08-23 01:30:16 +00:00
|
|
|
*** LayoutEngine script information
|
|
|
|
* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
|
|
|
|
ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
|
|
|
|
ScriptRunData.cpp, which is no longer needed.)
|
|
|
|
|
|
|
|
The generated files have a current copyright date and "@draft" statement.
|
|
|
|
|
|
|
|
* copy the above files into <icu>/source/layout, replacing the old files.
|
|
|
|
|
|
|
|
Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
|
|
|
|
and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
|
|
|
|
|
|
|
|
* rebuild the layout and layoutex libraries.
|
|
|
|
|
2006-03-03 20:59:01 +00:00
|
|
|
---------------------------------------------------------------------------- ***
|
|
|
|
|
2005-01-10 18:02:54 +00:00
|
|
|
Unicode 4.1 update
|
|
|
|
|
|
|
|
*** related Jitterbugs
|
|
|
|
|
|
|
|
4332 RFE: Update to Unicode 4.1
|
|
|
|
4157 RBBI, TR29 4.1 updates
|
|
|
|
|
|
|
|
*** data files & enums & parser code
|
|
|
|
|
|
|
|
* file preparation
|
|
|
|
- ucdstrip:
|
|
|
|
DerivedCoreProperties.txt
|
|
|
|
DerivedNormalizationProps.txt
|
|
|
|
NormalizationTest.txt
|
|
|
|
GraphemeBreakProperty.txt
|
|
|
|
SentenceBreakProperty.txt
|
|
|
|
WordBreakProperty.txt
|
|
|
|
- ucdstrip and ucdmerge:
|
|
|
|
EastAsianWidth.txt
|
|
|
|
LineBreak.txt
|
|
|
|
|
|
|
|
* add new files to the repository
|
|
|
|
GraphemeBreakProperty.txt
|
|
|
|
SentenceBreakProperty.txt
|
|
|
|
WordBreakProperty.txt
|
|
|
|
|
|
|
|
* update FractionalUCA.txt and UCARules.txt with new canonical closure
|
|
|
|
|
|
|
|
* genpname
|
|
|
|
- handle new enumerated properties in sub read_uchar
|
|
|
|
- run preparse.pl
|
|
|
|
|
|
|
|
* uchar.h & uscript.h & uprops.h & uprops.c & genprops
|
|
|
|
- new binary properties
|
|
|
|
+ Pattern_Syntax
|
|
|
|
+ Pattern_White_Space
|
|
|
|
- new enumerated properties
|
|
|
|
+ Grapheme_Cluster_Break
|
|
|
|
+ Sentence_Break
|
|
|
|
+ Word_Break
|
|
|
|
- new block & script & line break values
|
|
|
|
|
|
|
|
* gencase
|
|
|
|
- case-ignorable changes
|
|
|
|
see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
|
|
|
|
now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
|
|
|
|
|
|
|
|
*** Unicode version numbers
|
|
|
|
- makedata.mak
|
|
|
|
- uchar.h
|
|
|
|
- configure.in
|
|
|
|
|
|
|
|
*** tests
|
|
|
|
- verify that u_charMirror() round-trips
|
|
|
|
- test all new properties and some new values of old properties
|
|
|
|
|
|
|
|
*** other code
|
|
|
|
|
|
|
|
* hardcoded Unihan range end/limit
|
|
|
|
- Unihan range end moves from 9FA5 to 9FBB
|
|
|
|
search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
|
|
|
|
+ do not modify BOCU/BOCSU code because that would change the encoding
|
|
|
|
and break binary compatibility!
|
|
|
|
+ similarly, do not change the GB 18030 range data (ucnvmbcs.c),
|
|
|
|
NamePrepProfile.txt
|
|
|
|
+ ignore trietest.c: test data is arbitrary
|
|
|
|
+ ignore tstnorm.cpp: test optimization, not important
|
|
|
|
+ ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
|
|
|
|
+ do change line_th.txt and word_th.txt
|
|
|
|
by replacing hardcoded ranges with the new property values
|
|
|
|
+ do change gennames.c
|
|
|
|
|
|
|
|
source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
|
|
|
|
source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
|
|
|
|
source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
|
|
|
|
|
|
|
|
* case mappings
|
|
|
|
- compare new special casing context conditions with previous ones
|
|
|
|
see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
|
|
|
|
|
|
|
|
* genpname
|
|
|
|
- consider storing only the short name if it is the same as the long name
|
|
|
|
|
|
|
|
*** other reviews
|
|
|
|
- UAX #29 changes (grapheme/word/sentence breaks)
|
|
|
|
- UAX #14 changes (line breaks)
|
|
|
|
- Pattern_Syntax & Pattern_White_Space
|
|
|
|
|
|
|
|
---------------------------------------------------------------------------- ***
|
|
|
|
|
2004-05-06 02:47:53 +00:00
|
|
|
Unicode 4.0.1 update
|
|
|
|
|
|
|
|
*** related Jitterbugs
|
|
|
|
|
|
|
|
3170 RFE: Update to Unicode 4.0.1
|
|
|
|
3171 Add new Unicode 4.0.1 properties
|
|
|
|
3520 use Unicode 4.0.1 updates for break iteration
|
|
|
|
|
|
|
|
*** data files & enums & parser code
|
|
|
|
|
|
|
|
* file preparation
|
|
|
|
- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
|
|
|
|
- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
|
|
|
|
|
|
|
|
* file fixes
|
|
|
|
- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
|
|
|
|
according to PRI #26
|
|
|
|
http://www.unicode.org/review/resolved-pri.html#pri26
|
|
|
|
- undone again because no corrigendum in sight;
|
|
|
|
instead modified tests to not check consistency on this for Unicode 4.0.1
|
|
|
|
|
|
|
|
* ucdterms.txt
|
|
|
|
- update from http://www.unicode.org/copyright.html
|
|
|
|
formatted for plain text
|
|
|
|
|
|
|
|
* uchar.h & uprops.h & uprops.c & genprops
|
|
|
|
- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
|
|
|
|
- add U_LB_INSEPARABLE due to a spelling fix
|
|
|
|
+ put short name comment only on line with new constant
|
|
|
|
for genpname perl script parser
|
|
|
|
- new binary properties
|
|
|
|
+ STerm
|
|
|
|
+ Variation_Selector
|
|
|
|
|
|
|
|
* genpname
|
|
|
|
- fix genpname perl script so that it doesn't choke on more than 2 names per property value
|
|
|
|
- perl script: correctly calculate the maximum number of fields per row
|
|
|
|
|
|
|
|
* uscript.h
|
|
|
|
- new script code Hrkt=Katakana_Or_Hiragana
|
|
|
|
|
|
|
|
* gennorm.c track changes in DerivedNormalizationProps.txt
|
|
|
|
- "FNC" -> "FC_NFKC"
|
|
|
|
- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
|
|
|
|
|
|
|
|
* genprops/props2.c track changes in DerivedNumericValues.txt
|
|
|
|
- changed from 3 columns to 2, dropping the numeric type
|
|
|
|
+ assume that the type is always numeric for Han characters,
|
|
|
|
and that only those are added in addition to what UnicodeData.txt lists
|
|
|
|
|
|
|
|
*** Unicode version numbers
|
|
|
|
- makedata.mak
|
|
|
|
- uchar.h
|
|
|
|
- configure.in
|
|
|
|
|
|
|
|
*** tests
|
|
|
|
- update test of default bidi classes according to PRI #28
|
|
|
|
/tsutil/cucdtst/TestUnicodeData
|
|
|
|
http://www.unicode.org/review/resolved-pri.html#pri28
|
|
|
|
- bidi tests: change exemplar character for ES depending on Unicode version
|
|
|
|
- change hardcoded expected property values where they change
|
|
|
|
|
|
|
|
*** other code
|
|
|
|
|
|
|
|
* name matching
|
|
|
|
- read UCD.html
|
|
|
|
|
|
|
|
* scripts
|
|
|
|
- use new Hrkt=Katakana_Or_Hiragana
|
|
|
|
|
|
|
|
* ZWJ & ZWNJ
|
|
|
|
- are now part of combining character sequences
|
|
|
|
- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
|