e7e2fd7f07
X-SVN-Rev: 20142
276 lines
9.2 KiB
Plaintext
276 lines
9.2 KiB
Plaintext
* Copyright (C) 2004-2006, International Business Machines
|
|
* Corporation and others. All Rights Reserved.
|
|
*
|
|
* file name: changes.txt
|
|
* encoding: US-ASCII
|
|
* tab size: 8 (not used)
|
|
* indentation:4
|
|
*
|
|
* created on: 2004may06
|
|
* created by: Markus W. Scherer
|
|
*
|
|
* change log for Unicode updates
|
|
|
|
---------------------------------------------------------------------------- ***
|
|
|
|
Unicode 5.0 update
|
|
|
|
*** related Jitterbugs
|
|
|
|
5084 RFE: Update to Unicode 5.0
|
|
|
|
*** data files & enums & parser code
|
|
|
|
* file preparation
|
|
- ucdstrip:
|
|
DerivedCoreProperties.txt
|
|
DerivedNormalizationProps.txt
|
|
NormalizationTest.txt
|
|
PropList.txt
|
|
Scripts.txt
|
|
GraphemeBreakProperty.txt
|
|
SentenceBreakProperty.txt
|
|
WordBreakProperty.txt
|
|
- ucdstrip and ucdmerge:
|
|
EastAsianWidth.txt
|
|
LineBreak.txt
|
|
|
|
* my ucd2unidata.txt (needs to be updated each time with UCD and file version numbers)
|
|
copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
|
|
copy 5.0.0\ucd\Blocks.txt ..\unidata\
|
|
copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
|
|
copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
|
|
copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
|
|
copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
|
|
copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
|
|
copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
|
|
copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
|
|
copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
|
|
copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
|
|
copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
|
|
copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
|
|
|
|
ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
|
|
ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
|
|
ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
|
|
ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
|
|
ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
|
|
ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
|
|
ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
|
|
ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
|
|
ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
|
|
ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
|
|
|
|
* update FractionalUCA.txt and UCARules.txt with new canonical closure
|
|
|
|
* genpname
|
|
- run preparse.pl
|
|
+ make sure that data.h is writable
|
|
+ perl preparse.pl \cvs\oss\icu > out.txt
|
|
|
|
* uchar.h & uscript.h & uprops.h & uprops.c & genprops
|
|
- new block & script values
|
|
+ script values already added in ICU 3.6 because all of ISO 15924 is now covered
|
|
|
|
* build Unicode data source code for hardcoding core data
|
|
C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
|
|
|
|
ICU data make path is \cvs\oss\icu\source\data\
|
|
ICU root path is \cvs\oss\icu
|
|
Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
|
|
[etc.]
|
|
Creating data file for Unicode Character Properties
|
|
Creating data file for Unicode Case Mapping Properties
|
|
Creating data file for Unicode BiDi/Shaping Properties
|
|
Creating data file for Unicode Normalization
|
|
Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
|
|
Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
|
|
|
|
- copy the .c source files to C:\cvs\oss\icu\source\common
|
|
and rebuild the common library
|
|
|
|
*** Unicode version numbers
|
|
- makedata.mak
|
|
- uchar.h
|
|
- configure.in
|
|
|
|
*** LayoutEngine script information
|
|
* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
|
|
ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
|
|
ScriptRunData.cpp, which is no longer needed.)
|
|
|
|
The generated files have a current copyright date and "@draft" statement.
|
|
|
|
* copy the above files into <icu>/source/layout, replacing the old files.
|
|
|
|
Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
|
|
and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
|
|
|
|
* rebuild the layout and layoutex libraries.
|
|
|
|
---------------------------------------------------------------------------- ***
|
|
|
|
Unicode 4.1 update
|
|
|
|
*** related Jitterbugs
|
|
|
|
4332 RFE: Update to Unicode 4.1
|
|
4157 RBBI, TR29 4.1 updates
|
|
|
|
*** data files & enums & parser code
|
|
|
|
* file preparation
|
|
- ucdstrip:
|
|
DerivedCoreProperties.txt
|
|
DerivedNormalizationProps.txt
|
|
NormalizationTest.txt
|
|
GraphemeBreakProperty.txt
|
|
SentenceBreakProperty.txt
|
|
WordBreakProperty.txt
|
|
- ucdstrip and ucdmerge:
|
|
EastAsianWidth.txt
|
|
LineBreak.txt
|
|
|
|
* add new files to the repository
|
|
GraphemeBreakProperty.txt
|
|
SentenceBreakProperty.txt
|
|
WordBreakProperty.txt
|
|
|
|
* update FractionalUCA.txt and UCARules.txt with new canonical closure
|
|
|
|
* genpname
|
|
- handle new enumerated properties in sub read_uchar
|
|
- run preparse.pl
|
|
|
|
* uchar.h & uscript.h & uprops.h & uprops.c & genprops
|
|
- new binary properties
|
|
+ Pattern_Syntax
|
|
+ Pattern_White_Space
|
|
- new enumerated properties
|
|
+ Grapheme_Cluster_Break
|
|
+ Sentence_Break
|
|
+ Word_Break
|
|
- new block & script & line break values
|
|
|
|
* gencase
|
|
- case-ignorable changes
|
|
see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
|
|
now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
|
|
|
|
*** Unicode version numbers
|
|
- makedata.mak
|
|
- uchar.h
|
|
- configure.in
|
|
|
|
*** tests
|
|
- verify that u_charMirror() round-trips
|
|
- test all new properties and some new values of old properties
|
|
|
|
*** other code
|
|
|
|
* hardcoded Unihan range end/limit
|
|
- Unihan range end moves from 9FA5 to 9FBB
|
|
search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
|
|
+ do not modify BOCU/BOCSU code because that would change the encoding
|
|
and break binary compatibility!
|
|
+ similarly, do not change the GB 18030 range data (ucnvmbcs.c),
|
|
NamePrepProfile.txt
|
|
+ ignore trietest.c: test data is arbitrary
|
|
+ ignore tstnorm.cpp: test optimization, not important
|
|
+ ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
|
|
+ do change line_th.txt and word_th.txt
|
|
by replacing hardcoded ranges with the new property values
|
|
+ do change gennames.c
|
|
|
|
source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
|
|
source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
|
|
source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
|
|
|
|
* case mappings
|
|
- compare new special casing context conditions with previous ones
|
|
see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
|
|
|
|
* genpname
|
|
- consider storing only the short name if it is the same as the long name
|
|
|
|
*** other reviews
|
|
- UAX #29 changes (grapheme/word/sentence breaks)
|
|
- UAX #14 changes (line breaks)
|
|
- Pattern_Syntax & Pattern_White_Space
|
|
|
|
---------------------------------------------------------------------------- ***
|
|
|
|
Unicode 4.0.1 update
|
|
|
|
*** related Jitterbugs
|
|
|
|
3170 RFE: Update to Unicode 4.0.1
|
|
3171 Add new Unicode 4.0.1 properties
|
|
3520 use Unicode 4.0.1 updates for break iteration
|
|
|
|
*** data files & enums & parser code
|
|
|
|
* file preparation
|
|
- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
|
|
- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
|
|
|
|
* file fixes
|
|
- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
|
|
according to PRI #26
|
|
http://www.unicode.org/review/resolved-pri.html#pri26
|
|
- undone again because no corrigendum in sight;
|
|
instead modified tests to not check consistency on this for Unicode 4.0.1
|
|
|
|
* ucdterms.txt
|
|
- update from http://www.unicode.org/copyright.html
|
|
formatted for plain text
|
|
|
|
* uchar.h & uprops.h & uprops.c & genprops
|
|
- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
|
|
- add U_LB_INSEPARABLE due to a spelling fix
|
|
+ put short name comment only on line with new constant
|
|
for genpname perl script parser
|
|
- new binary properties
|
|
+ STerm
|
|
+ Variation_Selector
|
|
|
|
* genpname
|
|
- fix genpname perl script so that it doesn't choke on more than 2 names per property value
|
|
- perl script: correctly calculate the maximum number of fields per row
|
|
|
|
* uscript.h
|
|
- new script code Hrkt=Katakana_Or_Hiragana
|
|
|
|
* gennorm.c track changes in DerivedNormalizationProps.txt
|
|
- "FNC" -> "FC_NFKC"
|
|
- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
|
|
|
|
* genprops/props2.c track changes in DerivedNumericValues.txt
|
|
- changed from 3 columns to 2, dropping the numeric type
|
|
+ assume that the type is always numeric for Han characters,
|
|
and that only those are added in addition to what UnicodeData.txt lists
|
|
|
|
*** Unicode version numbers
|
|
- makedata.mak
|
|
- uchar.h
|
|
- configure.in
|
|
|
|
*** tests
|
|
- update test of default bidi classes according to PRI #28
|
|
/tsutil/cucdtst/TestUnicodeData
|
|
http://www.unicode.org/review/resolved-pri.html#pri28
|
|
- bidi tests: change exemplar character for ES depending on Unicode version
|
|
- change hardcoded expected property values where they change
|
|
|
|
*** other code
|
|
|
|
* name matching
|
|
- read UCD.html
|
|
|
|
* scripts
|
|
- use new Hrkt=Katakana_Or_Hiragana
|
|
|
|
* ZWJ & ZWNJ
|
|
- are now part of combining character sequences
|
|
- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
|