From ca5515af1a2649bd50728d88d64edc6508e54262 Mon Sep 17 00:00:00 2001 From: Markus Scherer Date: Fri, 7 Mar 2003 21:36:45 +0000 Subject: [PATCH] ICU-2427 replace ArabicShaping.txt with derived files to avoid tracking the formula for jt=T X-SVN-Rev: 11260 --- icu4c/source/data/unidata/ArabicShaping.txt | 235 ----------- .../data/unidata/DerivedJoiningGroup.txt | 375 ++++++++++++++++++ .../data/unidata/DerivedJoiningType.txt | 214 ++++++++++ 3 files changed, 589 insertions(+), 235 deletions(-) delete mode 100644 icu4c/source/data/unidata/ArabicShaping.txt create mode 100644 icu4c/source/data/unidata/DerivedJoiningGroup.txt create mode 100644 icu4c/source/data/unidata/DerivedJoiningType.txt diff --git a/icu4c/source/data/unidata/ArabicShaping.txt b/icu4c/source/data/unidata/ArabicShaping.txt deleted file mode 100644 index 50cc26005a..0000000000 --- a/icu4c/source/data/unidata/ArabicShaping.txt +++ /dev/null @@ -1,235 +0,0 @@ -# ArabicShaping-4.0.0.txt -# -# This file is a normative contributory data file in the -# Unicode Character Database. -# -# This file defines the shaping classes for Arabic and Syriac -# positional shaping, repeating in machine readable form the -# information printed in Tables 8-6, 8-7, 8-8, 8-10, 8-11, and -# 8-13 of The Unicode Standard, Version 3.0, plus additions -# for Unicode 3.1 and Unicode 3.2. -# -# See sections 8.2 and 8.3 of The Unicode Standard, Version 3.0 -# for more information. -# -# Each line contains four fields, separated by a semicolon. -# -# The first field gives the code point, in 4-digit hexadecimal -# form, of an Arabic or Syriac character. -# The second field gives a short schematic name for that character, -# abbreviated from the normative Unicode character name. -# The third field defines the joining type: R right-joining, -# D dual-joining, U non-joining -# The fourth field defines the joining group. -# -# -# Note: Characters of joining type T and most characters of -# joining type U are not explicitly listed in this file. -# -# Characters of joining type T can derived by the following formula: -# T = Mn + Cf - ZWNJ - ZWJ -# where Mn and Cf are the general category values. In other words, -# any non-spacing mark or any format control character, except -# U+200C ZERO WIDTH NON-JOINER (joining type U) and U+200D ZERO WIDTH -# JOINER (joining type C). -# -# For an explicit listing of characters of joining type T, see -# the derived property file DerivedJoiningType.txt. -# -# There are currently no characters of type L defined in Unicode. -# -# Joining type U includes all characters which are neither joining -# type T, nor explicitly marked in this file as types R, L, D, or C. -# -# ############################################################# - -# Unicode; Schematic Name; Joining Type; Joining Group - -# Arabic characters - -0621; HAMZA; U; -0622; MADDA ON ALEF; R; ALEF -0623; HAMZA ON ALEF; R; ALEF -0624; HAMZA ON WAW; R; WAW -0625; HAMZA UNDER ALEF; R; ALEF -0626; HAMZA ON YEH; D; YEH -0627; ALEF; R; ALEF -0628; BEH; D; BEH -0629; TEH MARBUTA; R; TEH MARBUTA -062A; TEH; D; BEH -062B; THEH; D; BEH -062C; JEEM; D; HAH -062D; HAH; D; HAH -062E; KHAH; D; HAH -062F; DAL; R; DAL -0630; THAL; R; DAL -0631; REH; R; REH -0632; ZAIN; R; REH -0633; SEEN; D; SEEN -0634; SHEEN; D; SEEN -0635; SAD; D; SAD -0636; DAD; D; SAD -0637; TAH; D; TAH -0638; ZAH; D; TAH -0639; AIN; D; AIN -063A; GHAIN; D; AIN -0640; TATWEEL; C; -0641; FEH; D; FEH -0642; QAF; D; QAF -0643; KAF; D; KAF -0644; LAM; D; LAM -0645; MEEM; D; MEEM -0646; NOON; D; NOON -0647; HEH; D; HEH -0648; WAW; R; WAW -0649; ALEF MAKSURA; D; YEH -064A; YEH; D; YEH -066E; DOTLESS BEH; D; BEH -066F; DOTLESS QAF; D; QAF -0671; HAMZAT WASL ON ALEF; R; ALEF -0672; WAVY HAMZA ON ALEF; R; ALEF -0673; WAVY HAMZA UNDER ALEF; R; ALEF -0674; HIGH HAMZA; U; -0675; HIGH HAMZA ALEF; R; ALEF -0676; HIGH HAMZA WAW; R; WAW -0677; HIGH HAMZA WAW WITH DAMMA; R; WAW -0678; HIGH HAMZA YEH; D; YEH -0679; TEH WITH SMALL TAH; D; BEH -067A; TEH WITH 2 DOTS VERTICAL ABOVE; D; BEH -067B; BEH WITH 2 DOTS VERTICAL BELOW; D; BEH -067C; TEH WITH RING; D; BEH -067D; TEH WITH 3 DOTS ABOVE DOWNWARD; D; BEH -067E; TEH WITH 3 DOTS BELOW; D; BEH -067F; TEH WITH 4 DOTS ABOVE; D; BEH -0680; BEH WITH 4 DOTS BELOW; D; BEH -0681; HAMZA ON HAH; D; HAH -0682; HAH WITH 2 DOTS VERTICAL ABOVE; D; HAH -0683; HAH WITH MIDDLE 2 DOTS; D; HAH -0684; HAH WITH MIDDLE 2 DOTS VERTICAL; D; HAH -0685; HAH WITH 3 DOTS ABOVE; D; HAH -0686; HAH WITH MIDDLE 3 DOTS DOWNWARD; D; HAH -0687; HAH WITH MIDDLE 4 DOTS; D; HAH -0688; DAL WITH SMALL TAH; R; DAL -0689; DAL WITH RING; R; DAL -068A; DAL WITH DOT BELOW; R; DAL -068B; DAL WITH DOT BELOW AND SMALL TAH; R; DAL -068C; DAL WITH 2 DOTS ABOVE; R; DAL -068D; DAL WITH 2 DOTS BELOW; R; DAL -068E; DAL WITH 3 DOTS ABOVE; R; DAL -068F; DAL WITH 3 DOTS ABOVE DOWNWARD; R; DAL -0690; DAL WITH 4 DOTS ABOVE; R; DAL -0691; REH WITH SMALL TAH; R; REH -0692; REH WITH SMALL V; R; REH -0693; REH WITH RING; R; REH -0694; REH WITH DOT BELOW; R; REH -0695; REH WITH SMALL V BELOW; R; REH -0696; REH WITH DOT BELOW AND DOT ABOVE; R; REH -0697; REH WITH 2 DOTS ABOVE; R; REH -0698; REH WITH 3 DOTS ABOVE; R; REH -0699; REH WITH 4 DOTS ABOVE; R; REH -069A; SEEN WITH DOT BELOW AND DOT ABOVE; D; SEEN -069B; SEEN WITH 3 DOTS BELOW; D; SEEN -069C; SEEN WITH 3 DOTS BELOW AND 3 DOTS ABOVE; D; SEEN -069D; SAD WITH 2 DOTS BELOW; D; SAD -069E; SAD WITH 3 DOTS ABOVE; D; SAD -069F; TAH WITH 3 DOTS ABOVE; D; TAH -06A0; AIN WITH 3 DOTS ABOVE; D; AIN -06A1; DOTLESS FEH; D; FEH -06A2; FEH WITH DOT MOVED BELOW; D; FEH -06A3; FEH WITH DOT BELOW; D; FEH -06A4; FEH WITH 3 DOTS ABOVE; D; FEH -06A5; FEH WITH 3 DOTS BELOW; D; FEH -06A6; FEH WITH 4 DOTS ABOVE; D; FEH -06A7; QAF WITH DOT ABOVE; D; QAF -06A8; QAF WITH 3 DOTS ABOVE; D; QAF -06A9; OPEN KAF; D; GAF -06AA; SWASH KAF; D; SWASH KAF -06AB; KAF WITH RING; D; GAF -06AC; KAF WITH DOT ABOVE; D; KAF -06AD; KAF WITH 3 DOTS ABOVE; D; KAF -06AE; KAF WITH 3 DOTS BELOW; D; KAF -06AF; GAF; D; GAF -06B0; GAF WITH RING; D; GAF -06B1; GAF WITH 2 DOTS ABOVE; D; GAF -06B2; GAF WITH 2 DOTS BELOW; D; GAF -06B3; GAF WITH 2 DOTS VERTICAL BELOW; D; GAF -06B4; GAF WITH 3 DOTS ABOVE; D; GAF -06B5; LAM WITH SMALL V; D; LAM -06B6; LAM WITH DOT ABOVE; D; LAM -06B7; LAM WITH 3 DOTS ABOVE; D; LAM -06B8; LAM WITH 3 DOTS BELOW; D; LAM -06B9; NOON WITH DOT BELOW; D; NOON -06BA; DOTLESS NOON; D; NOON -06BB; DOTLESS NOON WITH SMALL TAH; D; NOON -06BC; NOON WITH RING; D; NOON -06BD; NOON WITH 3 DOTS ABOVE; D; NOON -06BE; KNOTTED HEH; D; KNOTTED HEH -06BF; HAH WITH MIDDLE 3 DOTS DOWNWARD AND DOT ABOVE; D; HAH -06C0; HAMZA ON HEH; R; TEH MARBUTA -06C1; HEH GOAL; D; HEH GOAL -06C2; HAMZA ON HEH GOAL; R; HAMZA ON HEH GOAL -06C3; TEH MARBUTA GOAL; R; HAMZA ON HEH GOAL -06C4; WAW WITH RING; R; WAW -06C5; WAW WITH BAR; R; WAW -06C6; WAW WITH SMALL V; R; WAW -06C7; WAW WITH DAMMA; R; WAW -06C8; WAW WITH ALEF ABOVE; R; WAW -06C9; WAW WITH INVERTED SMALL V; R; WAW -06CA; WAW WITH 2 DOTS ABOVE; R; WAW -06CB; WAW WITH 3 DOTS ABOVE; R; WAW -06CC; DOTLESS YEH; D; YEH -06CD; YEH WITH TAIL; R; YEH WITH TAIL -06CE; YEH WITH SMALL V; D; YEH -06CF; WAW WITH DOT ABOVE; R; WAW -06D0; YEH WITH 2 DOTS VERTICAL BELOW; D; YEH -06D1; YEH WITH 3 DOTS BELOW; D; YEH -06D2; YEH BARREE; R; YEH BARREE -06D3; HAMZA ON YEH BARREE; R; YEH BARREE -06D5; AE; R; TEH MARBUTA -06EE; DAL WITH INVERTED V; R; DAL -06EF; REH WITH INVERTED V; R; REH -06FF; HEH WITH INVERTED V; D; KNOTTED HEH -06FA; SEEN WITH DOT BELOW AND 3 DOTS ABOVE; D; SEEN -06FB; DAD WITH DOT BELOW; D; SAD -06FC; GHAIN WITH DOT BELOW; D; AIN - -# Syriac characters - -0710; ALAPH; R; ALAPH -0712; BETH; D; BETH -0713; GAMAL; D; GAMAL -0714; GAMAL GARSHUNI; D; GAMAL -0715; DALATH; R; DALATH RISH -0716; DOTLESS DALATH RISH; R; DALATH RISH -0717; HE; R; HE -0718; WAW; R; SYRIAC WAW -0719; ZAIN; R; ZAIN -071A; HETH; D; HETH -071B; TETH; D; TETH -071C; TETH GARSHUNI; D; TETH -071D; YUDH; D; YUDH -071E; YUDH HE; R; YUDH HE -071F; KAPH; D; KAPH -0720; LAMADH; D; LAMADH -0721; MIM; D; MIM -0722; NUN; D; NUN -0723; SEMKATH; D; SEMKATH -0724; FINAL SEMKATH; D; FINAL SEMKATH -0725; E; D; E -0726; PE; D; PE -0727; REVERSED PE; D; REVERSED PE -0728; SADHE; R; SADHE -0729; QAPH; D; QAPH -072A; RISH; R; DALATH RISH -072B; SHIN; D; SHIN -072C; TAW; R; TAW -072D; PERSIAN BHETH; D; BETH -072E; PERSIAN GHAMAL; D; GAMAL -072F; PERSIAN DHALATH; R; DALATH RISH -074D; SOGDIAN ZHAIN; R; ZHAIN -074E; SOGDIAN KHAPH; D; KHAPH -074F; SOGDIAN FE; D; FE - -# Other - -200D; ZERO WIDTH JOINER; C; diff --git a/icu4c/source/data/unidata/DerivedJoiningGroup.txt b/icu4c/source/data/unidata/DerivedJoiningGroup.txt new file mode 100644 index 0000000000..0899e4522d --- /dev/null +++ b/icu4c/source/data/unidata/DerivedJoiningGroup.txt @@ -0,0 +1,375 @@ +# DerivedJoiningGroup-4.0.0.txt +# Date: 2003-02-20,17:13:55 GMT [MD] +# +# Unicode Character Database: Derived Property Data +# Generated algorithmically from the Unicode Character Database +# For documentation, see UCD.html +# Note: Unassigned and Noncharacter codepoints are omitted, +# except when listing Noncharacter or Cn. +# ================================================ + + +# ================================================ +# Joining Group (listing ArabicShaping.txt, field 2) +# ================================================ + +0639..063A ; AIN # Lo [2] ARABIC LETTER AIN..ARABIC LETTER GHAIN +06A0 ; AIN # Lo ARABIC LETTER AIN WITH THREE DOTS ABOVE +06FC ; AIN # Lo ARABIC LETTER GHAIN WITH DOT BELOW + +# Total code points: 4 + +# ================================================ + +0710 ; ALAPH # Lo SYRIAC LETTER ALAPH + +# Total code points: 1 + +# ================================================ + +0622..0623 ; ALEF # Lo [2] ARABIC LETTER ALEF WITH MADDA ABOVE..ARABIC LETTER ALEF WITH HAMZA ABOVE +0625 ; ALEF # Lo ARABIC LETTER ALEF WITH HAMZA BELOW +0627 ; ALEF # Lo ARABIC LETTER ALEF +0671..0673 ; ALEF # Lo [3] ARABIC LETTER ALEF WASLA..ARABIC LETTER ALEF WITH WAVY HAMZA BELOW +0675 ; ALEF # Lo ARABIC LETTER HIGH HAMZA ALEF + +# Total code points: 8 + +# ================================================ + +0628 ; BEH # Lo ARABIC LETTER BEH +062A..062B ; BEH # Lo [2] ARABIC LETTER TEH..ARABIC LETTER THEH +066E ; BEH # Lo ARABIC LETTER DOTLESS BEH +0679..0680 ; BEH # Lo [8] ARABIC LETTER TTEH..ARABIC LETTER BEHEH + +# Total code points: 12 + +# ================================================ + +0712 ; BETH # Lo SYRIAC LETTER BETH +072D ; BETH # Lo SYRIAC LETTER PERSIAN BHETH + +# Total code points: 2 + +# ================================================ + +062F..0630 ; DAL # Lo [2] ARABIC LETTER DAL..ARABIC LETTER THAL +0688..0690 ; DAL # Lo [9] ARABIC LETTER DDAL..ARABIC LETTER DAL WITH FOUR DOTS ABOVE +06EE ; DAL # Lo ARABIC LETTER DAL WITH INVERTED V + +# Total code points: 12 + +# ================================================ + +0715..0716 ; DALATH_RISH # Lo [2] SYRIAC LETTER DALATH..SYRIAC LETTER DOTLESS DALATH RISH +072A ; DALATH_RISH # Lo SYRIAC LETTER RISH +072F ; DALATH_RISH # Lo SYRIAC LETTER PERSIAN DHALATH + +# Total code points: 4 + +# ================================================ + +0725 ; E # Lo SYRIAC LETTER E + +# Total code points: 1 + +# ================================================ + +0641 ; FEH # Lo ARABIC LETTER FEH +06A1..06A6 ; FEH # Lo [6] ARABIC LETTER DOTLESS FEH..ARABIC LETTER PEHEH + +# Total code points: 7 + +# ================================================ + +0724 ; FINAL_SEMKATH # Lo SYRIAC LETTER FINAL SEMKATH + +# Total code points: 1 + +# ================================================ + +06A9 ; GAF # Lo ARABIC LETTER KEHEH +06AB ; GAF # Lo ARABIC LETTER KAF WITH RING +06AF..06B4 ; GAF # Lo [6] ARABIC LETTER GAF..ARABIC LETTER GAF WITH THREE DOTS ABOVE + +# Total code points: 8 + +# ================================================ + +0713..0714 ; GAMAL # Lo [2] SYRIAC LETTER GAMAL..SYRIAC LETTER GAMAL GARSHUNI +072E ; GAMAL # Lo SYRIAC LETTER PERSIAN GHAMAL + +# Total code points: 3 + +# ================================================ + +062C..062E ; HAH # Lo [3] ARABIC LETTER JEEM..ARABIC LETTER KHAH +0681..0687 ; HAH # Lo [7] ARABIC LETTER HAH WITH HAMZA ABOVE..ARABIC LETTER TCHEHEH +06BF ; HAH # Lo ARABIC LETTER TCHEH WITH DOT ABOVE + +# Total code points: 11 + +# ================================================ + +06C2..06C3 ; HAMZA_ON_HEH_GOAL # Lo [2] ARABIC LETTER HEH GOAL WITH HAMZA ABOVE..ARABIC LETTER TEH MARBUTA GOAL + +# Total code points: 2 + +# ================================================ + +0717 ; HE # Lo SYRIAC LETTER HE + +# Total code points: 1 + +# ================================================ + +0647 ; HEH # Lo ARABIC LETTER HEH + +# Total code points: 1 + +# ================================================ + +06C1 ; HEH_GOAL # Lo ARABIC LETTER HEH GOAL + +# Total code points: 1 + +# ================================================ + +071A ; HETH # Lo SYRIAC LETTER HETH + +# Total code points: 1 + +# ================================================ + +0643 ; KAF # Lo ARABIC LETTER KAF +06AC..06AE ; KAF # Lo [3] ARABIC LETTER KAF WITH DOT ABOVE..ARABIC LETTER KAF WITH THREE DOTS BELOW + +# Total code points: 4 + +# ================================================ + +071F ; KAPH # Lo SYRIAC LETTER KAPH + +# Total code points: 1 + +# ================================================ + +06BE ; KNOTTED_HEH # Lo ARABIC LETTER HEH DOACHASHMEE +06FF ; KNOTTED_HEH # Lo ARABIC LETTER HEH WITH INVERTED V + +# Total code points: 2 + +# ================================================ + +0644 ; LAM # Lo ARABIC LETTER LAM +06B5..06B8 ; LAM # Lo [4] ARABIC LETTER LAM WITH SMALL V..ARABIC LETTER LAM WITH THREE DOTS BELOW + +# Total code points: 5 + +# ================================================ + +0720 ; LAMADH # Lo SYRIAC LETTER LAMADH + +# Total code points: 1 + +# ================================================ + +0645 ; MEEM # Lo ARABIC LETTER MEEM + +# Total code points: 1 + +# ================================================ + +0721 ; MIM # Lo SYRIAC LETTER MIM + +# Total code points: 1 + +# ================================================ + +0646 ; NOON # Lo ARABIC LETTER NOON +06B9..06BD ; NOON # Lo [5] ARABIC LETTER NOON WITH DOT BELOW..ARABIC LETTER NOON WITH THREE DOTS ABOVE + +# Total code points: 6 + +# ================================================ + +0722 ; NUN # Lo SYRIAC LETTER NUN + +# Total code points: 1 + +# ================================================ + +0726 ; PE # Lo SYRIAC LETTER PE + +# Total code points: 1 + +# ================================================ + +0642 ; QAF # Lo ARABIC LETTER QAF +066F ; QAF # Lo ARABIC LETTER DOTLESS QAF +06A7..06A8 ; QAF # Lo [2] ARABIC LETTER QAF WITH DOT ABOVE..ARABIC LETTER QAF WITH THREE DOTS ABOVE + +# Total code points: 4 + +# ================================================ + +0729 ; QAPH # Lo SYRIAC LETTER QAPH + +# Total code points: 1 + +# ================================================ + +0631..0632 ; REH # Lo [2] ARABIC LETTER REH..ARABIC LETTER ZAIN +0691..0699 ; REH # Lo [9] ARABIC LETTER RREH..ARABIC LETTER REH WITH FOUR DOTS ABOVE +06EF ; REH # Lo ARABIC LETTER REH WITH INVERTED V + +# Total code points: 12 + +# ================================================ + +0727 ; REVERSED_PE # Lo SYRIAC LETTER REVERSED PE + +# Total code points: 1 + +# ================================================ + +0635..0636 ; SAD # Lo [2] ARABIC LETTER SAD..ARABIC LETTER DAD +069D..069E ; SAD # Lo [2] ARABIC LETTER SAD WITH TWO DOTS BELOW..ARABIC LETTER SAD WITH THREE DOTS ABOVE +06FB ; SAD # Lo ARABIC LETTER DAD WITH DOT BELOW + +# Total code points: 5 + +# ================================================ + +0728 ; SADHE # Lo SYRIAC LETTER SADHE + +# Total code points: 1 + +# ================================================ + +0633..0634 ; SEEN # Lo [2] ARABIC LETTER SEEN..ARABIC LETTER SHEEN +069A..069C ; SEEN # Lo [3] ARABIC LETTER SEEN WITH DOT BELOW AND DOT ABOVE..ARABIC LETTER SEEN WITH THREE DOTS BELOW AND THREE DOTS ABOVE +06FA ; SEEN # Lo ARABIC LETTER SHEEN WITH DOT BELOW + +# Total code points: 6 + +# ================================================ + +0723 ; SEMKATH # Lo SYRIAC LETTER SEMKATH + +# Total code points: 1 + +# ================================================ + +072B ; SHIN # Lo SYRIAC LETTER SHIN + +# Total code points: 1 + +# ================================================ + +06AA ; SWASH_KAF # Lo ARABIC LETTER SWASH KAF + +# Total code points: 1 + +# ================================================ + +0637..0638 ; TAH # Lo [2] ARABIC LETTER TAH..ARABIC LETTER ZAH +069F ; TAH # Lo ARABIC LETTER TAH WITH THREE DOTS ABOVE + +# Total code points: 3 + +# ================================================ + +072C ; TAW # Lo SYRIAC LETTER TAW + +# Total code points: 1 + +# ================================================ + +0629 ; TEH_MARBUTA # Lo ARABIC LETTER TEH MARBUTA +06C0 ; TEH_MARBUTA # Lo ARABIC LETTER HEH WITH YEH ABOVE +06D5 ; TEH_MARBUTA # Lo ARABIC LETTER AE + +# Total code points: 3 + +# ================================================ + +071B..071C ; TETH # Lo [2] SYRIAC LETTER TETH..SYRIAC LETTER TETH GARSHUNI + +# Total code points: 2 + +# ================================================ + +0624 ; WAW # Lo ARABIC LETTER WAW WITH HAMZA ABOVE +0648 ; WAW # Lo ARABIC LETTER WAW +0676..0677 ; WAW # Lo [2] ARABIC LETTER HIGH HAMZA WAW..ARABIC LETTER U WITH HAMZA ABOVE +06C4..06CB ; WAW # Lo [8] ARABIC LETTER WAW WITH RING..ARABIC LETTER VE +06CF ; WAW # Lo ARABIC LETTER WAW WITH DOT ABOVE + +# Total code points: 13 + +# ================================================ + +0718 ; SYRIAC_WAW # Lo SYRIAC LETTER WAW + +# Total code points: 1 + +# ================================================ + +0626 ; YEH # Lo ARABIC LETTER YEH WITH HAMZA ABOVE +0649..064A ; YEH # Lo [2] ARABIC LETTER ALEF MAKSURA..ARABIC LETTER YEH +0678 ; YEH # Lo ARABIC LETTER HIGH HAMZA YEH +06CC ; YEH # Lo ARABIC LETTER FARSI YEH +06CE ; YEH # Lo ARABIC LETTER YEH WITH SMALL V +06D0..06D1 ; YEH # Lo [2] ARABIC LETTER E..ARABIC LETTER YEH WITH THREE DOTS BELOW + +# Total code points: 8 + +# ================================================ + +06D2..06D3 ; YEH_BARREE # Lo [2] ARABIC LETTER YEH BARREE..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE + +# Total code points: 2 + +# ================================================ + +06CD ; YEH_WITH_TAIL # Lo ARABIC LETTER YEH WITH TAIL + +# Total code points: 1 + +# ================================================ + +071D ; YUDH # Lo SYRIAC LETTER YUDH + +# Total code points: 1 + +# ================================================ + +071E ; YUDH_HE # Lo SYRIAC LETTER YUDH HE + +# Total code points: 1 + +# ================================================ + +0719 ; ZAIN # Lo SYRIAC LETTER ZAIN + +# Total code points: 1 + +# ================================================ + +074D ; ZHAIN # Lo SYRIAC LETTER SOGDIAN ZHAIN + +# Total code points: 1 + +# ================================================ + +074E ; KHAPH # Lo SYRIAC LETTER SOGDIAN KHAPH + +# Total code points: 1 + +# ================================================ + +074F ; FE # Lo SYRIAC LETTER SOGDIAN FE + +# Total code points: 1 + diff --git a/icu4c/source/data/unidata/DerivedJoiningType.txt b/icu4c/source/data/unidata/DerivedJoiningType.txt new file mode 100644 index 0000000000..43d5471cd0 --- /dev/null +++ b/icu4c/source/data/unidata/DerivedJoiningType.txt @@ -0,0 +1,214 @@ +# DerivedJoiningType-4.0.0.txt +# Date: 2003-02-19,17:51:39 GMT [MD] +# +# Unicode Character Database: Derived Property Data +# Generated algorithmically from the Unicode Character Database +# For documentation, see UCD.html +# Note: Unassigned and Noncharacter codepoints are omitted, +# except when listing Noncharacter or Cn. +# ================================================ + + +# ================================================ +# Joining Type (listing ArabicShaping.txt, field 1). +# Type T is derived from Mn + Cf - ZWNJ - ZWJ +# All other code points have the type U +# ================================================ + +0640 ; C # Lm ARABIC TATWEEL +200D ; C # Cf ZERO WIDTH JOINER + +# Total code points: 2 + +# ================================================ + +0626 ; D # Lo ARABIC LETTER YEH WITH HAMZA ABOVE +0628 ; D # Lo ARABIC LETTER BEH +062A..062E ; D # Lo [5] ARABIC LETTER TEH..ARABIC LETTER KHAH +0633..063A ; D # Lo [8] ARABIC LETTER SEEN..ARABIC LETTER GHAIN +0641..0647 ; D # Lo [7] ARABIC LETTER FEH..ARABIC LETTER HEH +0649..064A ; D # Lo [2] ARABIC LETTER ALEF MAKSURA..ARABIC LETTER YEH +066E..066F ; D # Lo [2] ARABIC LETTER DOTLESS BEH..ARABIC LETTER DOTLESS QAF +0678..0687 ; D # Lo [16] ARABIC LETTER HIGH HAMZA YEH..ARABIC LETTER TCHEHEH +069A..06BF ; D # Lo [38] ARABIC LETTER SEEN WITH DOT BELOW AND DOT ABOVE..ARABIC LETTER TCHEH WITH DOT ABOVE +06C1 ; D # Lo ARABIC LETTER HEH GOAL +06CC ; D # Lo ARABIC LETTER FARSI YEH +06CE ; D # Lo ARABIC LETTER YEH WITH SMALL V +06D0..06D1 ; D # Lo [2] ARABIC LETTER E..ARABIC LETTER YEH WITH THREE DOTS BELOW +06FA..06FC ; D # Lo [3] ARABIC LETTER SHEEN WITH DOT BELOW..ARABIC LETTER GHAIN WITH DOT BELOW +06FF ; D # Lo ARABIC LETTER HEH WITH INVERTED V +0712..0714 ; D # Lo [3] SYRIAC LETTER BETH..SYRIAC LETTER GAMAL GARSHUNI +071A..071D ; D # Lo [4] SYRIAC LETTER HETH..SYRIAC LETTER YUDH +071F..0727 ; D # Lo [9] SYRIAC LETTER KAPH..SYRIAC LETTER REVERSED PE +0729 ; D # Lo SYRIAC LETTER QAPH +072B ; D # Lo SYRIAC LETTER SHIN +072D..072E ; D # Lo [2] SYRIAC LETTER PERSIAN BHETH..SYRIAC LETTER PERSIAN GHAMAL +074E..074F ; D # Lo [2] SYRIAC LETTER SOGDIAN KHAPH..SYRIAC LETTER SOGDIAN FE + +# Total code points: 111 + +# ================================================ + +0622..0625 ; R # Lo [4] ARABIC LETTER ALEF WITH MADDA ABOVE..ARABIC LETTER ALEF WITH HAMZA BELOW +0627 ; R # Lo ARABIC LETTER ALEF +0629 ; R # Lo ARABIC LETTER TEH MARBUTA +062F..0632 ; R # Lo [4] ARABIC LETTER DAL..ARABIC LETTER ZAIN +0648 ; R # Lo ARABIC LETTER WAW +0671..0673 ; R # Lo [3] ARABIC LETTER ALEF WASLA..ARABIC LETTER ALEF WITH WAVY HAMZA BELOW +0675..0677 ; R # Lo [3] ARABIC LETTER HIGH HAMZA ALEF..ARABIC LETTER U WITH HAMZA ABOVE +0688..0699 ; R # Lo [18] ARABIC LETTER DDAL..ARABIC LETTER REH WITH FOUR DOTS ABOVE +06C0 ; R # Lo ARABIC LETTER HEH WITH YEH ABOVE +06C2..06CB ; R # Lo [10] ARABIC LETTER HEH GOAL WITH HAMZA ABOVE..ARABIC LETTER VE +06CD ; R # Lo ARABIC LETTER YEH WITH TAIL +06CF ; R # Lo ARABIC LETTER WAW WITH DOT ABOVE +06D2..06D3 ; R # Lo [2] ARABIC LETTER YEH BARREE..ARABIC LETTER YEH BARREE WITH HAMZA ABOVE +06D5 ; R # Lo ARABIC LETTER AE +06EE..06EF ; R # Lo [2] ARABIC LETTER DAL WITH INVERTED V..ARABIC LETTER REH WITH INVERTED V +0710 ; R # Lo SYRIAC LETTER ALAPH +0715..0719 ; R # Lo [5] SYRIAC LETTER DALATH..SYRIAC LETTER ZAIN +071E ; R # Lo SYRIAC LETTER YUDH HE +0728 ; R # Lo SYRIAC LETTER SADHE +072A ; R # Lo SYRIAC LETTER RISH +072C ; R # Lo SYRIAC LETTER TAW +072F ; R # Lo SYRIAC LETTER PERSIAN DHALATH +074D ; R # Lo SYRIAC LETTER SOGDIAN ZHAIN + +# Total code points: 65 + +# ================================================ + + +# Total code points: 0 + +# ================================================ + +00AD ; T # Cf SOFT HYPHEN +0300..0357 ; T # Mn [88] COMBINING GRAVE ACCENT..COMBINING RIGHT HALF RING ABOVE +035D..036F ; T # Mn [19] COMBINING DOUBLE BREVE..COMBINING LATIN SMALL LETTER X +0483..0486 ; T # Mn [4] COMBINING CYRILLIC TITLO..COMBINING CYRILLIC PSILI PNEUMATA +0591..05A1 ; T # Mn [17] HEBREW ACCENT ETNAHTA..HEBREW ACCENT PAZER +05A3..05B9 ; T # Mn [23] HEBREW ACCENT MUNAH..HEBREW POINT HOLAM +05BB..05BD ; T # Mn [3] HEBREW POINT QUBUTS..HEBREW POINT METEG +05BF ; T # Mn HEBREW POINT RAFE +05C1..05C2 ; T # Mn [2] HEBREW POINT SHIN DOT..HEBREW POINT SIN DOT +05C4 ; T # Mn HEBREW MARK UPPER DOT +0600..0603 ; T # Cf [4] ARABIC NUMBER SIGN..ARABIC SIGN SAFHA +0610..0615 ; T # Mn [6] ARABIC SIGN SALLALLAHOU ALAYHE WASSALLAM..ARABIC SMALL HIGH TAH +064B..0658 ; T # Mn [14] ARABIC FATHATAN..ARABIC MARK NOON GHUNNA +0670 ; T # Mn ARABIC LETTER SUPERSCRIPT ALEF +06D6..06DC ; T # Mn [7] ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA..ARABIC SMALL HIGH SEEN +06DD ; T # Cf ARABIC END OF AYAH +06DF..06E4 ; T # Mn [6] ARABIC SMALL HIGH ROUNDED ZERO..ARABIC SMALL HIGH MADDA +06E7..06E8 ; T # Mn [2] ARABIC SMALL HIGH YEH..ARABIC SMALL HIGH NOON +06EA..06ED ; T # Mn [4] ARABIC EMPTY CENTRE LOW STOP..ARABIC SMALL LOW MEEM +070F ; T # Cf SYRIAC ABBREVIATION MARK +0711 ; T # Mn SYRIAC LETTER SUPERSCRIPT ALAPH +0730..074A ; T # Mn [27] SYRIAC PTHAHA ABOVE..SYRIAC BARREKH +07A6..07B0 ; T # Mn [11] THAANA ABAFILI..THAANA SUKUN +0901..0902 ; T # Mn [2] DEVANAGARI SIGN CANDRABINDU..DEVANAGARI SIGN ANUSVARA +093C ; T # Mn DEVANAGARI SIGN NUKTA +0941..0948 ; T # Mn [8] DEVANAGARI VOWEL SIGN U..DEVANAGARI VOWEL SIGN AI +094D ; T # Mn DEVANAGARI SIGN VIRAMA +0951..0954 ; T # Mn [4] DEVANAGARI STRESS SIGN UDATTA..DEVANAGARI ACUTE ACCENT +0962..0963 ; T # Mn [2] DEVANAGARI VOWEL SIGN VOCALIC L..DEVANAGARI VOWEL SIGN VOCALIC LL +0981 ; T # Mn BENGALI SIGN CANDRABINDU +09BC ; T # Mn BENGALI SIGN NUKTA +09C1..09C4 ; T # Mn [4] BENGALI VOWEL SIGN U..BENGALI VOWEL SIGN VOCALIC RR +09CD ; T # Mn BENGALI SIGN VIRAMA +09E2..09E3 ; T # Mn [2] BENGALI VOWEL SIGN VOCALIC L..BENGALI VOWEL SIGN VOCALIC LL +0A01..0A02 ; T # Mn [2] GURMUKHI SIGN ADAK BINDI..GURMUKHI SIGN BINDI +0A3C ; T # Mn GURMUKHI SIGN NUKTA +0A41..0A42 ; T # Mn [2] GURMUKHI VOWEL SIGN U..GURMUKHI VOWEL SIGN UU +0A47..0A48 ; T # Mn [2] GURMUKHI VOWEL SIGN EE..GURMUKHI VOWEL SIGN AI +0A4B..0A4D ; T # Mn [3] GURMUKHI VOWEL SIGN OO..GURMUKHI SIGN VIRAMA +0A70..0A71 ; T # Mn [2] GURMUKHI TIPPI..GURMUKHI ADDAK +0A81..0A82 ; T # Mn [2] GUJARATI SIGN CANDRABINDU..GUJARATI SIGN ANUSVARA +0ABC ; T # Mn GUJARATI SIGN NUKTA +0AC1..0AC5 ; T # Mn [5] GUJARATI VOWEL SIGN U..GUJARATI VOWEL SIGN CANDRA E +0AC7..0AC8 ; T # Mn [2] GUJARATI VOWEL SIGN E..GUJARATI VOWEL SIGN AI +0ACD ; T # Mn GUJARATI SIGN VIRAMA +0AE2..0AE3 ; T # Mn [2] GUJARATI VOWEL SIGN VOCALIC L..GUJARATI VOWEL SIGN VOCALIC LL +0B01 ; T # Mn ORIYA SIGN CANDRABINDU +0B3C ; T # Mn ORIYA SIGN NUKTA +0B3F ; T # Mn ORIYA VOWEL SIGN I +0B41..0B43 ; T # Mn [3] ORIYA VOWEL SIGN U..ORIYA VOWEL SIGN VOCALIC R +0B4D ; T # Mn ORIYA SIGN VIRAMA +0B56 ; T # Mn ORIYA AI LENGTH MARK +0B82 ; T # Mn TAMIL SIGN ANUSVARA +0BC0 ; T # Mn TAMIL VOWEL SIGN II +0BCD ; T # Mn TAMIL SIGN VIRAMA +0C3E..0C40 ; T # Mn [3] TELUGU VOWEL SIGN AA..TELUGU VOWEL SIGN II +0C46..0C48 ; T # Mn [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI +0C4A..0C4D ; T # Mn [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA +0C55..0C56 ; T # Mn [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK +0CBC ; T # Mn KANNADA SIGN NUKTA +0CBF ; T # Mn KANNADA VOWEL SIGN I +0CC6 ; T # Mn KANNADA VOWEL SIGN E +0CCC..0CCD ; T # Mn [2] KANNADA VOWEL SIGN AU..KANNADA SIGN VIRAMA +0D41..0D43 ; T # Mn [3] MALAYALAM VOWEL SIGN U..MALAYALAM VOWEL SIGN VOCALIC R +0D4D ; T # Mn MALAYALAM SIGN VIRAMA +0DCA ; T # Mn SINHALA SIGN AL-LAKUNA +0DD2..0DD4 ; T # Mn [3] SINHALA VOWEL SIGN KETTI IS-PILLA..SINHALA VOWEL SIGN KETTI PAA-PILLA +0DD6 ; T # Mn SINHALA VOWEL SIGN DIGA PAA-PILLA +0E31 ; T # Mn THAI CHARACTER MAI HAN-AKAT +0E34..0E3A ; T # Mn [7] THAI CHARACTER SARA I..THAI CHARACTER PHINTHU +0E47..0E4E ; T # Mn [8] THAI CHARACTER MAITAIKHU..THAI CHARACTER YAMAKKAN +0EB1 ; T # Mn LAO VOWEL SIGN MAI KAN +0EB4..0EB9 ; T # Mn [6] LAO VOWEL SIGN I..LAO VOWEL SIGN UU +0EBB..0EBC ; T # Mn [2] LAO VOWEL SIGN MAI KON..LAO SEMIVOWEL SIGN LO +0EC8..0ECD ; T # Mn [6] LAO TONE MAI EK..LAO NIGGAHITA +0F18..0F19 ; T # Mn [2] TIBETAN ASTROLOGICAL SIGN -KHYUD PA..TIBETAN ASTROLOGICAL SIGN SDONG TSHUGS +0F35 ; T # Mn TIBETAN MARK NGAS BZUNG NYI ZLA +0F37 ; T # Mn TIBETAN MARK NGAS BZUNG SGOR RTAGS +0F39 ; T # Mn TIBETAN MARK TSA -PHRU +0F71..0F7E ; T # Mn [14] TIBETAN VOWEL SIGN AA..TIBETAN SIGN RJES SU NGA RO +0F80..0F84 ; T # Mn [5] TIBETAN VOWEL SIGN REVERSED I..TIBETAN MARK HALANTA +0F86..0F87 ; T # Mn [2] TIBETAN SIGN LCI RTAGS..TIBETAN SIGN YANG RTAGS +0F90..0F97 ; T # Mn [8] TIBETAN SUBJOINED LETTER KA..TIBETAN SUBJOINED LETTER JA +0F99..0FBC ; T # Mn [36] TIBETAN SUBJOINED LETTER NYA..TIBETAN SUBJOINED LETTER FIXED-FORM RA +0FC6 ; T # Mn TIBETAN SYMBOL PADMA GDAN +102D..1030 ; T # Mn [4] MYANMAR VOWEL SIGN I..MYANMAR VOWEL SIGN UU +1032 ; T # Mn MYANMAR VOWEL SIGN AI +1036..1037 ; T # Mn [2] MYANMAR SIGN ANUSVARA..MYANMAR SIGN DOT BELOW +1039 ; T # Mn MYANMAR SIGN VIRAMA +1058..1059 ; T # Mn [2] MYANMAR VOWEL SIGN VOCALIC L..MYANMAR VOWEL SIGN VOCALIC LL +1712..1714 ; T # Mn [3] TAGALOG VOWEL SIGN I..TAGALOG SIGN VIRAMA +1732..1734 ; T # Mn [3] HANUNOO VOWEL SIGN I..HANUNOO SIGN PAMUDPOD +1752..1753 ; T # Mn [2] BUHID VOWEL SIGN I..BUHID VOWEL SIGN U +1772..1773 ; T # Mn [2] TAGBANWA VOWEL SIGN I..TAGBANWA VOWEL SIGN U +17B4..17B5 ; T # Cf [2] KHMER VOWEL INHERENT AQ..KHMER VOWEL INHERENT AA +17B7..17BD ; T # Mn [7] KHMER VOWEL SIGN I..KHMER VOWEL SIGN UA +17C6 ; T # Mn KHMER SIGN NIKAHIT +17C9..17D3 ; T # Mn [11] KHMER SIGN MUUSIKATOAN..KHMER SIGN BATHAMASAT +17DD ; T # Mn KHMER SIGN ATTHACAN +180B..180D ; T # Mn [3] MONGOLIAN FREE VARIATION SELECTOR ONE..MONGOLIAN FREE VARIATION SELECTOR THREE +18A9 ; T # Mn MONGOLIAN LETTER ALI GALI DAGALGA +1920..1922 ; T # Mn [3] LIMBU VOWEL SIGN A..LIMBU VOWEL SIGN U +1927..1928 ; T # Mn [2] LIMBU VOWEL SIGN E..LIMBU VOWEL SIGN O +1932 ; T # Mn LIMBU SMALL LETTER ANUSVARA +1939..193B ; T # Mn [3] LIMBU SIGN MUKPHRENG..LIMBU SIGN SA-I +200E..200F ; T # Cf [2] LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT MARK +202A..202E ; T # Cf [5] LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE +2060..2063 ; T # Cf [4] WORD JOINER..INVISIBLE SEPARATOR +206A..206F ; T # Cf [6] INHIBIT SYMMETRIC SWAPPING..NOMINAL DIGIT SHAPES +20D0..20DC ; T # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE +20E1 ; T # Mn COMBINING LEFT RIGHT ARROW ABOVE +20E5..20EA ; T # Mn [6] COMBINING REVERSE SOLIDUS OVERLAY..COMBINING LEFTWARDS ARROW OVERLAY +302A..302F ; T # Mn [6] IDEOGRAPHIC LEVEL TONE MARK..HANGUL DOUBLE DOT TONE MARK +3099..309A ; T # Mn [2] COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK..COMBINING KATAKANA-HIRAGANA SEMI-VOICED SOUND MARK +FB1E ; T # Mn HEBREW POINT JUDEO-SPANISH VARIKA +FE00..FE0F ; T # Mn [16] VARIATION SELECTOR-1..VARIATION SELECTOR-16 +FE20..FE23 ; T # Mn [4] COMBINING LIGATURE LEFT HALF..COMBINING DOUBLE TILDE RIGHT HALF +FEFF ; T # Cf ZERO WIDTH NO-BREAK SPACE +FFF9..FFFB ; T # Cf [3] INTERLINEAR ANNOTATION ANCHOR..INTERLINEAR ANNOTATION TERMINATOR +1D167..1D169 ; T # Mn [3] MUSICAL SYMBOL COMBINING TREMOLO-1..MUSICAL SYMBOL COMBINING TREMOLO-3 +1D173..1D17A ; T # Cf [8] MUSICAL SYMBOL BEGIN BEAM..MUSICAL SYMBOL END PHRASE +1D17B..1D182 ; T # Mn [8] MUSICAL SYMBOL COMBINING ACCENT..MUSICAL SYMBOL COMBINING LOURE +1D185..1D18B ; T # Mn [7] MUSICAL SYMBOL COMBINING DOIT..MUSICAL SYMBOL COMBINING TRIPLE TONGUE +1D1AA..1D1AD ; T # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO +E0001 ; T # Cf LANGUAGE TAG +E0020..E007F ; T # Cf [96] TAG SPACE..CANCEL TAG +E0100..E01EF ; T # Mn [240] VARIATION SELECTOR-17..VARIATION SELECTOR-256 + +# Total code points: 927 +