Updated data file

X-SVN-Rev: 5563
This commit is contained in:
Syn Wee Quek 2001-08-23 00:53:33 +00:00
parent 22f0bf8b4b
commit 33daec891b
4 changed files with 372 additions and 10 deletions

View File

@ -0,0 +1,163 @@
# CompositionExclusions-3.txt
#
# Composition Exclusions
# This file lists the characters from the UAX #15 Composition Exclusion Table.
#
# For more information, see
# http://www.unicode.org/unicode/reports/tr15/#Primary Exclusion List Table
# (1) Script Specifics
# This list of characters cannot be derived from the UnicodeData file.
0958 # DEVANAGARI LETTER QA
0959 # DEVANAGARI LETTER KHHA
095A # DEVANAGARI LETTER GHHA
095B # DEVANAGARI LETTER ZA
095C # DEVANAGARI LETTER DDDHA
095D # DEVANAGARI LETTER RHA
095E # DEVANAGARI LETTER FA
095F # DEVANAGARI LETTER YYA
09DC # BENGALI LETTER RRA
09DD # BENGALI LETTER RHA
09DF # BENGALI LETTER YYA
0A33 # GURMUKHI LETTER LLA
0A36 # GURMUKHI LETTER SHA
0A59 # GURMUKHI LETTER KHHA
0A5A # GURMUKHI LETTER GHHA
0A5B # GURMUKHI LETTER ZA
0A5E # GURMUKHI LETTER FA
0B5C # ORIYA LETTER RRA
0B5D # ORIYA LETTER RHA
0F43 # TIBETAN LETTER GHA
0F4D # TIBETAN LETTER DDHA
0F52 # TIBETAN LETTER DHA
0F57 # TIBETAN LETTER BHA
0F5C # TIBETAN LETTER DZHA
0F69 # TIBETAN LETTER KSSA
0F76 # TIBETAN VOWEL SIGN VOCALIC R
0F78 # TIBETAN VOWEL SIGN VOCALIC L
0F93 # TIBETAN SUBJOINED LETTER GHA
0F9D # TIBETAN SUBJOINED LETTER DDHA
0FA2 # TIBETAN SUBJOINED LETTER DHA
0FA7 # TIBETAN SUBJOINED LETTER BHA
0FAC # TIBETAN SUBJOINED LETTER DZHA
0FB9 # TIBETAN SUBJOINED LETTER KSSA
FB1D # HEBREW LETTER YOD WITH HIRIQ
FB1F # HEBREW LIGATURE YIDDISH YOD YOD PATAH
FB2A # HEBREW LETTER SHIN WITH SHIN DOT
FB2B # HEBREW LETTER SHIN WITH SIN DOT
FB2C # HEBREW LETTER SHIN WITH DAGESH AND SHIN DOT
FB2D # HEBREW LETTER SHIN WITH DAGESH AND SIN DOT
FB2E # HEBREW LETTER ALEF WITH PATAH
FB2F # HEBREW LETTER ALEF WITH QAMATS
FB30 # HEBREW LETTER ALEF WITH MAPIQ
FB31 # HEBREW LETTER BET WITH DAGESH
FB32 # HEBREW LETTER GIMEL WITH DAGESH
FB33 # HEBREW LETTER DALET WITH DAGESH
FB34 # HEBREW LETTER HE WITH MAPIQ
FB35 # HEBREW LETTER VAV WITH DAGESH
FB36 # HEBREW LETTER ZAYIN WITH DAGESH
FB38 # HEBREW LETTER TET WITH DAGESH
FB39 # HEBREW LETTER YOD WITH DAGESH
FB3A # HEBREW LETTER FINAL KAF WITH DAGESH
FB3B # HEBREW LETTER KAF WITH DAGESH
FB3C # HEBREW LETTER LAMED WITH DAGESH
FB3E # HEBREW LETTER MEM WITH DAGESH
FB40 # HEBREW LETTER NUN WITH DAGESH
FB41 # HEBREW LETTER SAMEKH WITH DAGESH
FB43 # HEBREW LETTER FINAL PE WITH DAGESH
FB44 # HEBREW LETTER PE WITH DAGESH
FB46 # HEBREW LETTER TSADI WITH DAGESH
FB47 # HEBREW LETTER QOF WITH DAGESH
FB48 # HEBREW LETTER RESH WITH DAGESH
FB49 # HEBREW LETTER SHIN WITH DAGESH
FB4A # HEBREW LETTER TAV WITH DAGESH
FB4B # HEBREW LETTER VAV WITH HOLAM
FB4C # HEBREW LETTER BET WITH RAFE
FB4D # HEBREW LETTER KAF WITH RAFE
FB4E # HEBREW LETTER PE WITH RAFE
# (2) Post Composition Version precomposed characters
# These characters cannot be derived from the UnicodeData file.
1D15E # MUSICAL SYMBOL HALF NOTE
1D15F # MUSICAL SYMBOL QUARTER NOTE
1D160 # MUSICAL SYMBOL EIGHTH NOTE
1D161 # MUSICAL SYMBOL SIXTEENTH NOTE
1D162 # MUSICAL SYMBOL THIRTY-SECOND NOTE
1D163 # MUSICAL SYMBOL SIXTY-FOURTH NOTE
1D164 # MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE
1D1BB # MUSICAL SYMBOL MINIMA
1D1BC # MUSICAL SYMBOL MINIMA BLACK
1D1BD # MUSICAL SYMBOL SEMIMINIMA WHITE
1D1BE # MUSICAL SYMBOL SEMIMINIMA BLACK
1D1BF # MUSICAL SYMBOL FUSA WHITE
1D1C0 # MUSICAL SYMBOL FUSA BLACK
# (3) Singleton Decompositions
# These characters can be derived from the UnicodeData file
# by including all characters whose canonical decomposition
# consists of a single character.
# These characters are simply quoted here for reference.
# 0340 COMBINING GRAVE TONE MARK
# 0341 COMBINING ACUTE TONE MARK
# 0343 COMBINING GREEK KORONIS
# 0374 GREEK NUMERAL SIGN
# 037E GREEK QUESTION MARK
# 0387 GREEK ANO TELEIA
# 1F71 GREEK SMALL LETTER ALPHA WITH OXIA
# 1F73 GREEK SMALL LETTER EPSILON WITH OXIA
# 1F75 GREEK SMALL LETTER ETA WITH OXIA
# 1F77 GREEK SMALL LETTER IOTA WITH OXIA
# 1F79 GREEK SMALL LETTER OMICRON WITH OXIA
# 1F7B GREEK SMALL LETTER UPSILON WITH OXIA
# 1F7D GREEK SMALL LETTER OMEGA WITH OXIA
# 1FBB GREEK CAPITAL LETTER ALPHA WITH OXIA
# 1FBE GREEK PROSGEGRAMMENI
# 1FC9 GREEK CAPITAL LETTER EPSILON WITH OXIA
# 1FCB GREEK CAPITAL LETTER ETA WITH OXIA
# 1FD3 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
# 1FDB GREEK CAPITAL LETTER IOTA WITH OXIA
# 1FE3 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
# 1FEB GREEK CAPITAL LETTER UPSILON WITH OXIA
# 1FEE GREEK DIALYTIKA AND OXIA
# 1FEF GREEK VARIA
# 1FF9 GREEK CAPITAL LETTER OMICRON WITH OXIA
# 1FFB GREEK CAPITAL LETTER OMEGA WITH OXIA
# 1FFD GREEK OXIA
# 2000 EN QUAD
# 2001 EM QUAD
# 2126 OHM SIGN
# 212A KELVIN SIGN
# 212B ANGSTROM SIGN
# 2329 LEFT-POINTING ANGLE BRACKET
# 232A RIGHT-POINTING ANGLE BRACKET
# F900 CJK COMPATIBILITY IDEOGRAPH-F900
#.. FA0D CJK COMPATIBILITY IDEOGRAPH-FA0D
# FA10 CJK COMPATIBILITY IDEOGRAPH-FA10
# FA12 CJK COMPATIBILITY IDEOGRAPH-FA12
# FA15 CJK COMPATIBILITY IDEOGRAPH-FA15
#.. FA1E CJK COMPATIBILITY IDEOGRAPH-FA1E
# FA20 CJK COMPATIBILITY IDEOGRAPH-FA20
# FA22 CJK COMPATIBILITY IDEOGRAPH-FA22
# FA25 CJK COMPATIBILITY IDEOGRAPH-FA25
# FA26 CJK COMPATIBILITY IDEOGRAPH-FA26
# FA2A CJK COMPATIBILITY IDEOGRAPH-FA2A
#.. FA2D CJK COMPATIBILITY IDEOGRAPH-FA2D
# 2F800 CJK COMPATIBILITY IDEOGRAPH-2F800
#.. 2FA1D CJK COMPATIBILITY IDEOGRAPH-2FA1D
# (4) Non-Starter Decompositions
# These characters can be derived from the UnicodeData file
# by including all characters whose canonical decomposition consists
# of a sequence of characters, the first of which has a non-zero
# combining class.
# These characters are simply quoted here for reference.
# 0344 COMBINING GREEK DIALYTIKA TONOS
# 0F73 TIBETAN VOWEL SIGN II
# 0F75 TIBETAN VOWEL SIGN UU
# 0F81 TIBETAN VOWEL SIGN REVERSED II

View File

@ -1,5 +1,7 @@
# CompositionExclusions-3.txt
#
# Composition Exclusions
# This file lists the characters from the UTR #15 Composition Exclusion Table.
# This file lists the characters from the UAX #15 Composition Exclusion Table.
#
# For more information, see
# http://www.unicode.org/unicode/reports/tr15/#Primary Exclusion List Table
@ -40,6 +42,7 @@
0FA7 # TIBETAN SUBJOINED LETTER BHA
0FAC # TIBETAN SUBJOINED LETTER DZHA
0FB9 # TIBETAN SUBJOINED LETTER KSSA
FB1D # HEBREW LETTER YOD WITH HIRIQ
FB1F # HEBREW LIGATURE YIDDISH YOD YOD PATAH
FB2A # HEBREW LETTER SHIN WITH SHIN DOT
FB2B # HEBREW LETTER SHIN WITH SIN DOT
@ -74,9 +77,22 @@ FB4C # HEBREW LETTER BET WITH RAFE
FB4D # HEBREW LETTER KAF WITH RAFE
FB4E # HEBREW LETTER PE WITH RAFE
# (2) Post Composition Version characters
# (2) Post Composition Version precomposed characters
# These characters cannot be derived from the UnicodeData file.
# (There are no characters in this category in this version of Unicode.)
1D15E # MUSICAL SYMBOL HALF NOTE
1D15F # MUSICAL SYMBOL QUARTER NOTE
1D160 # MUSICAL SYMBOL EIGHTH NOTE
1D161 # MUSICAL SYMBOL SIXTEENTH NOTE
1D162 # MUSICAL SYMBOL THIRTY-SECOND NOTE
1D163 # MUSICAL SYMBOL SIXTY-FOURTH NOTE
1D164 # MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE
1D1BB # MUSICAL SYMBOL MINIMA
1D1BC # MUSICAL SYMBOL MINIMA BLACK
1D1BD # MUSICAL SYMBOL SEMIMINIMA WHITE
1D1BE # MUSICAL SYMBOL SEMIMINIMA BLACK
1D1BF # MUSICAL SYMBOL FUSA WHITE
1D1C0 # MUSICAL SYMBOL FUSA BLACK
# (3) Singleton Decompositions
# These characters can be derived from the UnicodeData file
@ -129,12 +145,14 @@ FB4E # HEBREW LETTER PE WITH RAFE
# FA26 CJK COMPATIBILITY IDEOGRAPH-FA26
# FA2A CJK COMPATIBILITY IDEOGRAPH-FA2A
#.. FA2D CJK COMPATIBILITY IDEOGRAPH-FA2D
# 2F800 CJK COMPATIBILITY IDEOGRAPH-2F800
#.. 2FA1D CJK COMPATIBILITY IDEOGRAPH-2FA1D
# (4) Non-Starter Decompositions
# These characters can be derived from the UnicodeData file
# by including all characters whose canonical decomposition consists
# of a sequence of characters, the first of which has a canonical
# class of zero.
# of a sequence of characters, the first of which has a non-zero
# combining class.
# These characters are simply quoted here for reference.
# 0344 COMBINING GREEK DIALYTIKA TONOS

View File

@ -0,0 +1,163 @@
# CompositionExclusions-3.txt
#
# Composition Exclusions
# This file lists the characters from the UAX #15 Composition Exclusion Table.
#
# For more information, see
# http://www.unicode.org/unicode/reports/tr15/#Primary Exclusion List Table
# (1) Script Specifics
# This list of characters cannot be derived from the UnicodeData file.
0958 # DEVANAGARI LETTER QA
0959 # DEVANAGARI LETTER KHHA
095A # DEVANAGARI LETTER GHHA
095B # DEVANAGARI LETTER ZA
095C # DEVANAGARI LETTER DDDHA
095D # DEVANAGARI LETTER RHA
095E # DEVANAGARI LETTER FA
095F # DEVANAGARI LETTER YYA
09DC # BENGALI LETTER RRA
09DD # BENGALI LETTER RHA
09DF # BENGALI LETTER YYA
0A33 # GURMUKHI LETTER LLA
0A36 # GURMUKHI LETTER SHA
0A59 # GURMUKHI LETTER KHHA
0A5A # GURMUKHI LETTER GHHA
0A5B # GURMUKHI LETTER ZA
0A5E # GURMUKHI LETTER FA
0B5C # ORIYA LETTER RRA
0B5D # ORIYA LETTER RHA
0F43 # TIBETAN LETTER GHA
0F4D # TIBETAN LETTER DDHA
0F52 # TIBETAN LETTER DHA
0F57 # TIBETAN LETTER BHA
0F5C # TIBETAN LETTER DZHA
0F69 # TIBETAN LETTER KSSA
0F76 # TIBETAN VOWEL SIGN VOCALIC R
0F78 # TIBETAN VOWEL SIGN VOCALIC L
0F93 # TIBETAN SUBJOINED LETTER GHA
0F9D # TIBETAN SUBJOINED LETTER DDHA
0FA2 # TIBETAN SUBJOINED LETTER DHA
0FA7 # TIBETAN SUBJOINED LETTER BHA
0FAC # TIBETAN SUBJOINED LETTER DZHA
0FB9 # TIBETAN SUBJOINED LETTER KSSA
FB1D # HEBREW LETTER YOD WITH HIRIQ
FB1F # HEBREW LIGATURE YIDDISH YOD YOD PATAH
FB2A # HEBREW LETTER SHIN WITH SHIN DOT
FB2B # HEBREW LETTER SHIN WITH SIN DOT
FB2C # HEBREW LETTER SHIN WITH DAGESH AND SHIN DOT
FB2D # HEBREW LETTER SHIN WITH DAGESH AND SIN DOT
FB2E # HEBREW LETTER ALEF WITH PATAH
FB2F # HEBREW LETTER ALEF WITH QAMATS
FB30 # HEBREW LETTER ALEF WITH MAPIQ
FB31 # HEBREW LETTER BET WITH DAGESH
FB32 # HEBREW LETTER GIMEL WITH DAGESH
FB33 # HEBREW LETTER DALET WITH DAGESH
FB34 # HEBREW LETTER HE WITH MAPIQ
FB35 # HEBREW LETTER VAV WITH DAGESH
FB36 # HEBREW LETTER ZAYIN WITH DAGESH
FB38 # HEBREW LETTER TET WITH DAGESH
FB39 # HEBREW LETTER YOD WITH DAGESH
FB3A # HEBREW LETTER FINAL KAF WITH DAGESH
FB3B # HEBREW LETTER KAF WITH DAGESH
FB3C # HEBREW LETTER LAMED WITH DAGESH
FB3E # HEBREW LETTER MEM WITH DAGESH
FB40 # HEBREW LETTER NUN WITH DAGESH
FB41 # HEBREW LETTER SAMEKH WITH DAGESH
FB43 # HEBREW LETTER FINAL PE WITH DAGESH
FB44 # HEBREW LETTER PE WITH DAGESH
FB46 # HEBREW LETTER TSADI WITH DAGESH
FB47 # HEBREW LETTER QOF WITH DAGESH
FB48 # HEBREW LETTER RESH WITH DAGESH
FB49 # HEBREW LETTER SHIN WITH DAGESH
FB4A # HEBREW LETTER TAV WITH DAGESH
FB4B # HEBREW LETTER VAV WITH HOLAM
FB4C # HEBREW LETTER BET WITH RAFE
FB4D # HEBREW LETTER KAF WITH RAFE
FB4E # HEBREW LETTER PE WITH RAFE
# (2) Post Composition Version precomposed characters
# These characters cannot be derived from the UnicodeData file.
1D15E # MUSICAL SYMBOL HALF NOTE
1D15F # MUSICAL SYMBOL QUARTER NOTE
1D160 # MUSICAL SYMBOL EIGHTH NOTE
1D161 # MUSICAL SYMBOL SIXTEENTH NOTE
1D162 # MUSICAL SYMBOL THIRTY-SECOND NOTE
1D163 # MUSICAL SYMBOL SIXTY-FOURTH NOTE
1D164 # MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE
1D1BB # MUSICAL SYMBOL MINIMA
1D1BC # MUSICAL SYMBOL MINIMA BLACK
1D1BD # MUSICAL SYMBOL SEMIMINIMA WHITE
1D1BE # MUSICAL SYMBOL SEMIMINIMA BLACK
1D1BF # MUSICAL SYMBOL FUSA WHITE
1D1C0 # MUSICAL SYMBOL FUSA BLACK
# (3) Singleton Decompositions
# These characters can be derived from the UnicodeData file
# by including all characters whose canonical decomposition
# consists of a single character.
# These characters are simply quoted here for reference.
# 0340 COMBINING GRAVE TONE MARK
# 0341 COMBINING ACUTE TONE MARK
# 0343 COMBINING GREEK KORONIS
# 0374 GREEK NUMERAL SIGN
# 037E GREEK QUESTION MARK
# 0387 GREEK ANO TELEIA
# 1F71 GREEK SMALL LETTER ALPHA WITH OXIA
# 1F73 GREEK SMALL LETTER EPSILON WITH OXIA
# 1F75 GREEK SMALL LETTER ETA WITH OXIA
# 1F77 GREEK SMALL LETTER IOTA WITH OXIA
# 1F79 GREEK SMALL LETTER OMICRON WITH OXIA
# 1F7B GREEK SMALL LETTER UPSILON WITH OXIA
# 1F7D GREEK SMALL LETTER OMEGA WITH OXIA
# 1FBB GREEK CAPITAL LETTER ALPHA WITH OXIA
# 1FBE GREEK PROSGEGRAMMENI
# 1FC9 GREEK CAPITAL LETTER EPSILON WITH OXIA
# 1FCB GREEK CAPITAL LETTER ETA WITH OXIA
# 1FD3 GREEK SMALL LETTER IOTA WITH DIALYTIKA AND OXIA
# 1FDB GREEK CAPITAL LETTER IOTA WITH OXIA
# 1FE3 GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND OXIA
# 1FEB GREEK CAPITAL LETTER UPSILON WITH OXIA
# 1FEE GREEK DIALYTIKA AND OXIA
# 1FEF GREEK VARIA
# 1FF9 GREEK CAPITAL LETTER OMICRON WITH OXIA
# 1FFB GREEK CAPITAL LETTER OMEGA WITH OXIA
# 1FFD GREEK OXIA
# 2000 EN QUAD
# 2001 EM QUAD
# 2126 OHM SIGN
# 212A KELVIN SIGN
# 212B ANGSTROM SIGN
# 2329 LEFT-POINTING ANGLE BRACKET
# 232A RIGHT-POINTING ANGLE BRACKET
# F900 CJK COMPATIBILITY IDEOGRAPH-F900
#.. FA0D CJK COMPATIBILITY IDEOGRAPH-FA0D
# FA10 CJK COMPATIBILITY IDEOGRAPH-FA10
# FA12 CJK COMPATIBILITY IDEOGRAPH-FA12
# FA15 CJK COMPATIBILITY IDEOGRAPH-FA15
#.. FA1E CJK COMPATIBILITY IDEOGRAPH-FA1E
# FA20 CJK COMPATIBILITY IDEOGRAPH-FA20
# FA22 CJK COMPATIBILITY IDEOGRAPH-FA22
# FA25 CJK COMPATIBILITY IDEOGRAPH-FA25
# FA26 CJK COMPATIBILITY IDEOGRAPH-FA26
# FA2A CJK COMPATIBILITY IDEOGRAPH-FA2A
#.. FA2D CJK COMPATIBILITY IDEOGRAPH-FA2D
# 2F800 CJK COMPATIBILITY IDEOGRAPH-2F800
#.. 2FA1D CJK COMPATIBILITY IDEOGRAPH-2FA1D
# (4) Non-Starter Decompositions
# These characters can be derived from the UnicodeData file
# by including all characters whose canonical decomposition consists
# of a sequence of characters, the first of which has a non-zero
# combining class.
# These characters are simply quoted here for reference.
# 0344 COMBINING GREEK DIALYTIKA TONOS
# 0F73 TIBETAN VOWEL SIGN II
# 0F75 TIBETAN VOWEL SIGN UU
# 0F81 TIBETAN VOWEL SIGN REVERSED II

View File

@ -1,5 +1,7 @@
# CompositionExclusions-3.txt
#
# Composition Exclusions
# This file lists the characters from the UTR #15 Composition Exclusion Table.
# This file lists the characters from the UAX #15 Composition Exclusion Table.
#
# For more information, see
# http://www.unicode.org/unicode/reports/tr15/#Primary Exclusion List Table
@ -40,6 +42,7 @@
0FA7 # TIBETAN SUBJOINED LETTER BHA
0FAC # TIBETAN SUBJOINED LETTER DZHA
0FB9 # TIBETAN SUBJOINED LETTER KSSA
FB1D # HEBREW LETTER YOD WITH HIRIQ
FB1F # HEBREW LIGATURE YIDDISH YOD YOD PATAH
FB2A # HEBREW LETTER SHIN WITH SHIN DOT
FB2B # HEBREW LETTER SHIN WITH SIN DOT
@ -74,9 +77,22 @@ FB4C # HEBREW LETTER BET WITH RAFE
FB4D # HEBREW LETTER KAF WITH RAFE
FB4E # HEBREW LETTER PE WITH RAFE
# (2) Post Composition Version characters
# (2) Post Composition Version precomposed characters
# These characters cannot be derived from the UnicodeData file.
# (There are no characters in this category in this version of Unicode.)
1D15E # MUSICAL SYMBOL HALF NOTE
1D15F # MUSICAL SYMBOL QUARTER NOTE
1D160 # MUSICAL SYMBOL EIGHTH NOTE
1D161 # MUSICAL SYMBOL SIXTEENTH NOTE
1D162 # MUSICAL SYMBOL THIRTY-SECOND NOTE
1D163 # MUSICAL SYMBOL SIXTY-FOURTH NOTE
1D164 # MUSICAL SYMBOL ONE HUNDRED TWENTY-EIGHTH NOTE
1D1BB # MUSICAL SYMBOL MINIMA
1D1BC # MUSICAL SYMBOL MINIMA BLACK
1D1BD # MUSICAL SYMBOL SEMIMINIMA WHITE
1D1BE # MUSICAL SYMBOL SEMIMINIMA BLACK
1D1BF # MUSICAL SYMBOL FUSA WHITE
1D1C0 # MUSICAL SYMBOL FUSA BLACK
# (3) Singleton Decompositions
# These characters can be derived from the UnicodeData file
@ -129,12 +145,14 @@ FB4E # HEBREW LETTER PE WITH RAFE
# FA26 CJK COMPATIBILITY IDEOGRAPH-FA26
# FA2A CJK COMPATIBILITY IDEOGRAPH-FA2A
#.. FA2D CJK COMPATIBILITY IDEOGRAPH-FA2D
# 2F800 CJK COMPATIBILITY IDEOGRAPH-2F800
#.. 2FA1D CJK COMPATIBILITY IDEOGRAPH-2FA1D
# (4) Non-Starter Decompositions
# These characters can be derived from the UnicodeData file
# by including all characters whose canonical decomposition consists
# of a sequence of characters, the first of which has a canonical
# class of zero.
# of a sequence of characters, the first of which has a non-zero
# combining class.
# These characters are simply quoted here for reference.
# 0344 COMBINING GREEK DIALYTIKA TONOS