Update to Unicode 8.0.0.

Update __STDC_ISO_10646__ to 201505L for Unicode 8.0.0.
Update character encoding, ctype, and transliteration tables.
New scripts autogenerate transliteration tables.
This commit is contained in:
Mike FABIAN 2015-12-10 00:30:51 -05:00 committed by Carlos O'Donell
parent 589ac52328
commit 23256f5ed8
17 changed files with 5986 additions and 1363 deletions

View File

@ -1,3 +1,9 @@
2015-12-09 Mike FABIAN <mfabian@redhat.com>
[BZ 18568]
* include/stdc-predef.h (__STDC_ISO_10646__): Update to
201505L, for Unicode 8.
2015-12-09 Carlos O'Donell <carlos@redhat.com>
* locale/C-translit.h: Regenerate.

6
NEWS
View File

@ -7,6 +7,12 @@ using `glibc' in the "product" field.
Version 2.23
* Unicode 8.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 8.0.0, using new
and/or improved generator scripts contributed by Mike FABIAN (Red Hat).
These updates cause user visible changes, such as the fixes for bugs
89, 16061, and 18568.
* sched_setaffinity, pthread_setaffinity_np no longer attempt to guess the
kernel-internal CPU set size. This means that requests that change the
CPU affinity which failed before (for example, an all-ones CPU mask) will

View File

@ -49,14 +49,10 @@
# define __STDC_IEC_559_COMPLEX__ 1
#endif
/* wchar_t uses Unicode 7.0.0. Version 7.0 of the Unicode Standard is
synchronized with ISO/IEC 10646:2012, plus Amendments 1 (published
on April, 2013) and 2 (not yet published as of February, 2015).
Additionally, it includes the accelerated publication of U+20BD
RUBLE SIGN. Therefore Unicode 7.0.0 is between 10646:2012 and
10646:2014, and so we use the date ISO/IEC 10646:2012 Amd.1 was
published. */
#define __STDC_ISO_10646__ 201304L
/* wchar_t uses Unicode 8.0.0. Version 8.0 of the Unicode Standard is
synchronized with ISO/IEC 10646:2014, plus Amendment 1 (published
2015-05-15). */
#define __STDC_ISO_10646__ 201505L
/* We do not support C11 <threads.h>. */
#define __STDC_NO_THREADS__ 1

View File

@ -1,5 +1,21 @@
2015-12-09 Mike FABIAN <mfabian@redhat.com>
[BZ 18568]
* unicode-gen/Makefile (UNICODE_VERSION): Set to 8.0.0.
* unicode-gen/UnicodeData.txt: Update to Unicode 8.0.0 release.
* unicode-gen/DerivedCoreProperties.txt: Likewise.
* unicode-gen/EastAsianWidth.txt: Likewise.
* unicode-gen/gen_translit_combining.py (is_combining_remove):
Ignore AHOM or SIGNWRITING combining characters.
* charmaps/UTF-8: Regenerate.
* locales/i18n: Regenerate.
* locales/translit_circle: Regenerate.
* locales/translit_cjk_compat: Regenerate.
* locales/translit_combining: Regenerate.
* locales/translit_compat: Regenerate.
* locales/translit_font: Regenerate.
* locales/translit_fraction: Regenerate.
[BZ #89]
* locales/da_DK: Add more transliteration rules.
* locales/nb_NO: Likewise.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -2,7 +2,7 @@ escape_char /
comment_char %
% Transliterations of encircled characters.
% Generated automatically from UnicodeData.txt by gen_translit_circle.py on 2015-12-09 for Unicode 7.0.0.
% Generated automatically from UnicodeData.txt by gen_translit_circle.py on 2015-12-09 for Unicode 8.0.0.
LC_CTYPE

View File

@ -2,7 +2,7 @@ escape_char /
comment_char %
% Transliterations of CJK compatibility characters.
% Generated automatically from UnicodeData.txt by gen_translit_cjk_compat.py on 2015-12-09 for Unicode 7.0.0.
% Generated automatically from UnicodeData.txt by gen_translit_cjk_compat.py on 2015-12-09 for Unicode 8.0.0.
LC_CTYPE

View File

@ -3,7 +3,7 @@ comment_char %
% Transliterations that remove all combining characters (accents,
% pronounciation marks, etc.).
% Generated automatically from UnicodeData.txt by gen_translit_combining.py on 2015-12-09 for Unicode 7.0.0.
% Generated automatically from UnicodeData.txt by gen_translit_combining.py on 2015-12-09 for Unicode 8.0.0.
LC_CTYPE
@ -439,6 +439,8 @@ translit_start
<U06EC> ""
% ARABIC SMALL LOW MEEM
<U06ED> ""
% ARABIC TURNED DAMMA BELOW
<U08E3> ""
% ARABIC CURLY FATHA
<U08E4> ""
% ARABIC CURLY DAMMA

View File

@ -2,7 +2,7 @@ escape_char /
comment_char %
% Transliterations of compatibility characters and ligatures.
% Generated automatically from UnicodeData.txt by gen_translit_compat.py on 2015-12-09 for Unicode 7.0.0.
% Generated automatically from UnicodeData.txt by gen_translit_compat.py on 2015-12-09 for Unicode 8.0.0.
LC_CTYPE

View File

@ -2,7 +2,7 @@ escape_char /
comment_char %
% Transliterations of font equivalents.
% Generated automatically from UnicodeData.txt by gen_translit_font.py on 2015-12-09 for Unicode 7.0.0.
% Generated automatically from UnicodeData.txt by gen_translit_font.py on 2015-12-09 for Unicode 8.0.0.
LC_CTYPE

View File

@ -2,7 +2,7 @@ escape_char /
comment_char %
% Transliterations of fractions.
% Generated automatically from UnicodeData.txt by gen_translit_fraction.py on 2015-12-09 for Unicode 7.0.0.
% Generated automatically from UnicodeData.txt by gen_translit_fraction.py on 2015-12-09 for Unicode 8.0.0.
% The replacements have been surrounded with spaces, because fractions are
% often preceded by a decimal number and followed by a unit or a math symbol.

File diff suppressed because it is too large Load Diff

View File

@ -1,12 +1,12 @@
# EastAsianWidth-7.0.0.txt
# Date: 2014-02-28, 23:15:00 GMT [KW, LI]
# EastAsianWidth-8.0.0.txt
# Date: 2015-02-10, 21:00:00 GMT [KW, LI]
#
# East_Asian_Width Property
#
# This file is an informative contributory data file in the
# Unicode Character Database.
#
# Copyright (c) 1991-2014 Unicode, Inc.
# Copyright (c) 1991-2015 Unicode, Inc.
# For terms of use, see http://www.unicode.org/terms_of_use.html
#
# The format is two fields separated by a semicolon.
@ -23,6 +23,7 @@
# CJK Unified Ideographs Extension B: U+20000..U+2A6DF
# CJK Unified Ideographs Extension C: U+2A700..U+2B73F
# CJK Unified Ideographs Extension D: U+2B740..U+2B81F
# CJK Unified Ideographs Extension E: U+2B820..U+2CEAF
# CJK Compatibility Ideographs Supplement: U+2F800..U+2FA1F
# and any other reserved code points on
# Planes 2 and 3: U+20000..U+2FFFD
@ -328,8 +329,8 @@
0840..0858;N # Lo [25] MANDAIC LETTER HALQA..MANDAIC LETTER AIN
0859..085B;N # Mn [3] MANDAIC AFFRICATION MARK..MANDAIC GEMINATION MARK
085E;N # Po MANDAIC PUNCTUATION
08A0..08B2;N # Lo [19] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER ZAIN WITH INVERTED V ABOVE
08E4..08FF;N # Mn [28] ARABIC CURLY FATHA..ARABIC MARK SIDEWAYS NOON GHUNNA
08A0..08B4;N # Lo [21] ARABIC LETTER BEH WITH SMALL V BELOW..ARABIC LETTER KAF WITH DOT BELOW
08E3..08FF;N # Mn [29] ARABIC TURNED DAMMA BELOW..ARABIC MARK SIDEWAYS NOON GHUNNA
0900..0902;N # Mn [3] DEVANAGARI SIGN INVERTED CANDRABINDU..DEVANAGARI SIGN ANUSVARA
0903;N # Mc DEVANAGARI SIGN VISARGA
0904..0939;N # Lo [54] DEVANAGARI LETTER SHORT A..DEVANAGARI LETTER HA
@ -421,6 +422,7 @@
0AE6..0AEF;N # Nd [10] GUJARATI DIGIT ZERO..GUJARATI DIGIT NINE
0AF0;N # Po GUJARATI ABBREVIATION SIGN
0AF1;N # Sc GUJARATI RUPEE SIGN
0AF9;N # Lo GUJARATI LETTER ZHA
0B01;N # Mn ORIYA SIGN CANDRABINDU
0B02..0B03;N # Mc [2] ORIYA SIGN ANUSVARA..ORIYA SIGN VISARGA
0B05..0B0C;N # Lo [8] ORIYA LETTER A..ORIYA LETTER VOCALIC L
@ -483,7 +485,7 @@
0C46..0C48;N # Mn [3] TELUGU VOWEL SIGN E..TELUGU VOWEL SIGN AI
0C4A..0C4D;N # Mn [4] TELUGU VOWEL SIGN O..TELUGU SIGN VIRAMA
0C55..0C56;N # Mn [2] TELUGU LENGTH MARK..TELUGU AI LENGTH MARK
0C58..0C59;N # Lo [2] TELUGU LETTER TSA..TELUGU LETTER DZA
0C58..0C5A;N # Lo [3] TELUGU LETTER TSA..TELUGU LETTER RRRA
0C60..0C61;N # Lo [2] TELUGU LETTER VOCALIC RR..TELUGU LETTER VOCALIC LL
0C62..0C63;N # Mn [2] TELUGU VOWEL SIGN VOCALIC L..TELUGU VOWEL SIGN VOCALIC LL
0C66..0C6F;N # Nd [10] TELUGU DIGIT ZERO..TELUGU DIGIT NINE
@ -524,7 +526,7 @@
0D4D;N # Mn MALAYALAM SIGN VIRAMA
0D4E;N # Lo MALAYALAM LETTER DOT REPH
0D57;N # Mc MALAYALAM AU LENGTH MARK
0D60..0D61;N # Lo [2] MALAYALAM LETTER VOCALIC RR..MALAYALAM LETTER VOCALIC LL
0D5F..0D61;N # Lo [3] MALAYALAM LETTER ARCHAIC II..MALAYALAM LETTER VOCALIC LL
0D62..0D63;N # Mn [2] MALAYALAM VOWEL SIGN VOCALIC L..MALAYALAM VOWEL SIGN VOCALIC LL
0D66..0D6F;N # Nd [10] MALAYALAM DIGIT ZERO..MALAYALAM DIGIT NINE
0D70..0D75;N # No [6] MALAYALAM NUMBER TEN..MALAYALAM FRACTION THREE QUARTERS
@ -680,7 +682,8 @@
1369..137C;N # No [20] ETHIOPIC DIGIT ONE..ETHIOPIC NUMBER TEN THOUSAND
1380..138F;N # Lo [16] ETHIOPIC SYLLABLE SEBATBEIT MWA..ETHIOPIC SYLLABLE PWE
1390..1399;N # So [10] ETHIOPIC TONAL MARK YIZET..ETHIOPIC TONAL MARK KURT
13A0..13F4;N # Lo [85] CHEROKEE LETTER A..CHEROKEE LETTER YV
13A0..13F5;N # Lu [86] CHEROKEE LETTER A..CHEROKEE LETTER MV
13F8..13FD;N # Ll [6] CHEROKEE SMALL LETTER YE..CHEROKEE SMALL LETTER MV
1400;N # Pd CANADIAN SYLLABICS HYPHEN
1401..166C;N # Lo [620] CANADIAN SYLLABICS E..CANADIAN SYLLABICS CARRIER TTSA
166D..166E;N # Po [2] CANADIAN SYLLABICS CHI SIGN..CANADIAN SYLLABICS FULL STOP
@ -748,9 +751,7 @@
1950..196D;N # Lo [30] TAI LE LETTER KA..TAI LE LETTER AI
1970..1974;N # Lo [5] TAI LE LETTER TONE-2..TAI LE LETTER TONE-6
1980..19AB;N # Lo [44] NEW TAI LUE LETTER HIGH QA..NEW TAI LUE LETTER LOW SUA
19B0..19C0;N # Mc [17] NEW TAI LUE VOWEL SIGN VOWEL SHORTENER..NEW TAI LUE VOWEL SIGN IY
19C1..19C7;N # Lo [7] NEW TAI LUE LETTER FINAL V..NEW TAI LUE LETTER FINAL B
19C8..19C9;N # Mc [2] NEW TAI LUE TONE MARK-1..NEW TAI LUE TONE MARK-2
19B0..19C9;N # Lo [26] NEW TAI LUE VOWEL SIGN VOWEL SHORTENER..NEW TAI LUE TONE MARK-2
19D0..19D9;N # Nd [10] NEW TAI LUE DIGIT ZERO..NEW TAI LUE DIGIT NINE
19DA;N # No NEW TAI LUE THAM DIGIT ONE
19DE..19DF;N # So [2] NEW TAI LUE SIGN LAE..NEW TAI LUE SIGN LAEV
@ -944,7 +945,7 @@
20A9;H # Sc WON SIGN
20AA..20AB;N # Sc [2] NEW SHEQEL SIGN..DONG SIGN
20AC;A # Sc EURO SIGN
20AD..20BD;N # Sc [17] KIP SIGN..RUBLE SIGN
20AD..20BE;N # Sc [18] KIP SIGN..LARI SIGN
20D0..20DC;N # Mn [13] COMBINING LEFT HARPOON ABOVE..COMBINING FOUR DOTS ABOVE
20DD..20E0;N # Me [4] COMBINING ENCLOSING CIRCLE..COMBINING ENCLOSING CIRCLE BACKSLASH
20E1;N # Mn COMBINING LEFT RIGHT ARROW ABOVE
@ -1004,6 +1005,7 @@
2183..2184;N # L& [2] ROMAN NUMERAL REVERSED ONE HUNDRED..LATIN SMALL LETTER REVERSED C
2185..2188;N # Nl [4] ROMAN NUMERAL SIX LATE FORM..ROMAN NUMERAL ONE HUNDRED THOUSAND
2189;A # No VULGAR FRACTION ZERO THIRDS
218A..218B;N # So [2] TURNED DIGIT TWO..TURNED DIGIT THREE
2190..2194;A # Sm [5] LEFTWARDS ARROW..LEFT RIGHT ARROW
2195..2199;A # So [5] UP DOWN ARROW..SOUTH WEST ARROW
219A..219B;N # Sm [2] LEFTWARDS ARROW WITH STROKE..RIGHTWARDS ARROW WITH STROKE
@ -1262,6 +1264,7 @@
2B98..2BB9;N # So [34] THREE-D TOP-LIGHTED LEFTWARDS EQUILATERAL ARROWHEAD..UP ARROWHEAD IN A RECTANGLE BOX
2BBD..2BC8;N # So [12] BALLOT BOX WITH LIGHT X..BLACK MEDIUM RIGHT-POINTING TRIANGLE CENTRED
2BCA..2BD1;N # So [8] TOP HALF BLACK CIRCLE..UNCERTAINTY SIGN
2BEC..2BEF;N # So [4] LEFTWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS..DOWNWARDS TWO-HEADED ARROW WITH TRIANGLE ARROWHEADS
2C00..2C2E;N # Lu [47] GLAGOLITIC CAPITAL LETTER AZU..GLAGOLITIC CAPITAL LETTER LATINATE MYSLITE
2C30..2C5E;N # Ll [47] GLAGOLITIC SMALL LETTER AZU..GLAGOLITIC SMALL LETTER LATINATE MYSLITE
2C60..2C7B;N # L& [28] LATIN CAPITAL LETTER L WITH DOUBLE BAR..LATIN LETTER SMALL CAPITAL TURNED E
@ -1407,8 +1410,8 @@
3400..4DB5;W # Lo [6582] CJK UNIFIED IDEOGRAPH-3400..CJK UNIFIED IDEOGRAPH-4DB5
4DB6..4DBF;W # Cn [10] <reserved-4DB6>..<reserved-4DBF>
4DC0..4DFF;N # So [64] HEXAGRAM FOR THE CREATIVE HEAVEN..HEXAGRAM FOR BEFORE COMPLETION
4E00..9FCC;W # Lo [20941] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FCC
9FCD..9FFF;W # Cn [51] <reserved-9FCD>..<reserved-9FFF>
4E00..9FD5;W # Lo [20950] CJK UNIFIED IDEOGRAPH-4E00..CJK UNIFIED IDEOGRAPH-9FD5
9FD6..9FFF;W # Cn [42] <reserved-9FD6>..<reserved-9FFF>
A000..A014;W # Lo [21] YI SYLLABLE IT..YI SYLLABLE E
A015;W # Lm YI SYLLABLE WU
A016..A48C;W # Lo [1143] YI SYLLABLE BIT..YI SYLLABLE YYR
@ -1432,7 +1435,7 @@ A67E;N # Po CYRILLIC KAVYKA
A67F;N # Lm CYRILLIC PAYEROK
A680..A69B;N # L& [28] CYRILLIC CAPITAL LETTER DWE..CYRILLIC SMALL LETTER CROSSED O
A69C..A69D;N # Lm [2] MODIFIER LETTER CYRILLIC HARD SIGN..MODIFIER LETTER CYRILLIC SOFT SIGN
A69F;N # Mn COMBINING CYRILLIC LETTER IOTIFIED E
A69E..A69F;N # Mn [2] COMBINING CYRILLIC LETTER EF..COMBINING CYRILLIC LETTER IOTIFIED E
A6A0..A6E5;N # Lo [70] BAMUM LETTER A..BAMUM LETTER KI
A6E6..A6EF;N # Nl [10] BAMUM LETTER MO..BAMUM LETTER KOGHOM
A6F0..A6F1;N # Mn [2] BAMUM COMBINING MARK KOQNDON..BAMUM COMBINING MARK TUKWENTIS
@ -1446,8 +1449,9 @@ A771..A787;N # L& [23] LATIN SMALL LETTER DUM..LATIN SMALL LETTER INSULAR
A788;N # Lm MODIFIER LETTER LOW CIRCUMFLEX ACCENT
A789..A78A;N # Sk [2] MODIFIER LETTER COLON..MODIFIER LETTER SHORT EQUALS SIGN
A78B..A78E;N # L& [4] LATIN CAPITAL LETTER SALTILLO..LATIN SMALL LETTER L WITH RETROFLEX HOOK AND BELT
A78F;N # Lo LATIN LETTER SINOLOGICAL DOT
A790..A7AD;N # L& [30] LATIN CAPITAL LETTER N WITH DESCENDER..LATIN CAPITAL LETTER L WITH BELT
A7B0..A7B1;N # Lu [2] LATIN CAPITAL LETTER TURNED K..LATIN CAPITAL LETTER TURNED T
A7B0..A7B7;N # L& [8] LATIN CAPITAL LETTER TURNED K..LATIN SMALL LETTER OMEGA
A7F7;N # Lo LATIN EPIGRAPHIC LETTER SIDEWAYS I
A7F8..A7F9;N # Lm [2] MODIFIER LETTER CAPITAL H WITH STROKE..MODIFIER LETTER SMALL LIGATURE OE
A7FA;N # Ll LATIN LETTER SMALL CAPITAL TURNED M
@ -1479,6 +1483,8 @@ A8E0..A8F1;N # Mn [18] COMBINING DEVANAGARI DIGIT ZERO..COMBINING DEVANAG
A8F2..A8F7;N # Lo [6] DEVANAGARI SIGN SPACING CANDRABINDU..DEVANAGARI SIGN CANDRABINDU AVAGRAHA
A8F8..A8FA;N # Po [3] DEVANAGARI SIGN PUSHPIKA..DEVANAGARI CARET
A8FB;N # Lo DEVANAGARI HEADSTROKE
A8FC;N # Po DEVANAGARI SIGN SIDDHAM
A8FD;N # Lo DEVANAGARI JAIN OM
A900..A909;N # Nd [10] KAYAH LI DIGIT ZERO..KAYAH LI DIGIT NINE
A90A..A925;N # Lo [28] KAYAH LI LETTER KA..KAYAH LI LETTER OO
A926..A92D;N # Mn [8] KAYAH LI VOWEL UE..KAYAH LI TONE CALYA PLOPHU
@ -1560,7 +1566,8 @@ AB28..AB2E;N # Lo [7] ETHIOPIC SYLLABLE BBA..ETHIOPIC SYLLABLE BBO
AB30..AB5A;N # Ll [43] LATIN SMALL LETTER BARRED ALPHA..LATIN SMALL LETTER Y WITH SHORT RIGHT LEG
AB5B;N # Sk MODIFIER BREVE WITH INVERTED BREVE
AB5C..AB5F;N # Lm [4] MODIFIER LETTER SMALL HENG..MODIFIER LETTER SMALL U WITH LEFT HOOK
AB64..AB65;N # Ll [2] LATIN SMALL LETTER INVERTED ALPHA..GREEK LETTER SMALL CAPITAL OMEGA
AB60..AB65;N # Ll [6] LATIN SMALL LETTER SAKHA YAT..GREEK LETTER SMALL CAPITAL OMEGA
AB70..ABBF;N # Ll [80] CHEROKEE SMALL LETTER A..CHEROKEE SMALL LETTER YA
ABC0..ABE2;N # Lo [35] MEETEI MAYEK LETTER KOK..MEETEI MAYEK LETTER I LONSUM
ABE3..ABE4;N # Mc [2] MEETEI MAYEK VOWEL SIGN ONAP..MEETEI MAYEK VOWEL SIGN INAP
ABE5;N # Mn MEETEI MAYEK VOWEL SIGN ANAP
@ -1609,7 +1616,7 @@ FE10..FE16;W # Po [7] PRESENTATION FORM FOR VERTICAL COMMA..PRESENTATION
FE17;W # Ps PRESENTATION FORM FOR VERTICAL LEFT WHITE LENTICULAR BRACKET
FE18;W # Pe PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET
FE19;W # Po PRESENTATION FORM FOR VERTICAL HORIZONTAL ELLIPSIS
FE20..FE2D;N # Mn [14] COMBINING LIGATURE LEFT HALF..COMBINING CONJOINING MACRON BELOW
FE20..FE2F;N # Mn [16] COMBINING LIGATURE LEFT HALF..COMBINING CYRILLIC TITLO RIGHT HALF
FE30;W # Po PRESENTATION FORM FOR VERTICAL TWO DOT LEADER
FE31..FE32;W # Pd [2] PRESENTATION FORM FOR VERTICAL EM DASH..PRESENTATION FORM FOR VERTICAL EN DASH
FE33..FE34;W # Pc [2] PRESENTATION FORM FOR VERTICAL LOW LINE..PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
@ -1766,6 +1773,9 @@ FFFD;A # So REPLACEMENT CHARACTER
10879..1087F;N # No [7] PALMYRENE NUMBER ONE..PALMYRENE NUMBER TWENTY
10880..1089E;N # Lo [31] NABATAEAN LETTER FINAL ALEPH..NABATAEAN LETTER TAW
108A7..108AF;N # No [9] NABATAEAN NUMBER ONE..NABATAEAN NUMBER ONE HUNDRED
108E0..108F2;N # Lo [19] HATRAN LETTER ALEPH..HATRAN LETTER QOPH
108F4..108F5;N # Lo [2] HATRAN LETTER SHIN..HATRAN LETTER TAW
108FB..108FF;N # No [5] HATRAN NUMBER ONE..HATRAN NUMBER ONE HUNDRED
10900..10915;N # Lo [22] PHOENICIAN LETTER ALF..PHOENICIAN LETTER TAU
10916..1091B;N # No [6] PHOENICIAN NUMBER ONE..PHOENICIAN NUMBER THREE
1091F;N # Po PHOENICIAN WORD SEPARATOR
@ -1773,7 +1783,10 @@ FFFD;A # So REPLACEMENT CHARACTER
1093F;N # Po LYDIAN TRIANGULAR MARK
10980..1099F;N # Lo [32] MEROITIC HIEROGLYPHIC LETTER A..MEROITIC HIEROGLYPHIC SYMBOL VIDJ-2
109A0..109B7;N # Lo [24] MEROITIC CURSIVE LETTER A..MEROITIC CURSIVE LETTER DA
109BC..109BD;N # No [2] MEROITIC CURSIVE FRACTION ELEVEN TWELFTHS..MEROITIC CURSIVE FRACTION ONE HALF
109BE..109BF;N # Lo [2] MEROITIC CURSIVE LOGOGRAM RMT..MEROITIC CURSIVE LOGOGRAM IMN
109C0..109CF;N # No [16] MEROITIC CURSIVE NUMBER ONE..MEROITIC CURSIVE NUMBER SEVENTY
109D2..109FF;N # No [46] MEROITIC CURSIVE NUMBER ONE HUNDRED..MEROITIC CURSIVE FRACTION TEN TWELFTHS
10A00;N # Lo KHAROSHTHI LETTER A
10A01..10A03;N # Mn [3] KHAROSHTHI VOWEL SIGN I..KHAROSHTHI VOWEL SIGN VOCALIC R
10A05..10A06;N # Mn [2] KHAROSHTHI VOWEL SIGN E..KHAROSHTHI VOWEL SIGN O
@ -1806,6 +1819,9 @@ FFFD;A # So REPLACEMENT CHARACTER
10B99..10B9C;N # Po [4] PSALTER PAHLAVI SECTION MARK..PSALTER PAHLAVI FOUR DOTS WITH DOT
10BA9..10BAF;N # No [7] PSALTER PAHLAVI NUMBER ONE..PSALTER PAHLAVI NUMBER ONE HUNDRED
10C00..10C48;N # Lo [73] OLD TURKIC LETTER ORKHON A..OLD TURKIC LETTER ORKHON BASH
10C80..10CB2;N # Lu [51] OLD HUNGARIAN CAPITAL LETTER A..OLD HUNGARIAN CAPITAL LETTER US
10CC0..10CF2;N # Ll [51] OLD HUNGARIAN SMALL LETTER A..OLD HUNGARIAN SMALL LETTER US
10CFA..10CFF;N # No [6] OLD HUNGARIAN NUMBER ONE..OLD HUNGARIAN NUMBER ONE THOUSAND
10E60..10E7E;N # No [31] RUMI DIGIT ONE..RUMI FRACTION TWO THIRDS
11000;N # Mc BRAHMI SIGN CANDRABINDU
11001;N # Mn BRAHMI SIGN ANUSVARA
@ -1846,10 +1862,14 @@ FFFD;A # So REPLACEMENT CHARACTER
111B6..111BE;N # Mn [9] SHARADA VOWEL SIGN U..SHARADA VOWEL SIGN O
111BF..111C0;N # Mc [2] SHARADA VOWEL SIGN AU..SHARADA SIGN VIRAMA
111C1..111C4;N # Lo [4] SHARADA SIGN AVAGRAHA..SHARADA OM
111C5..111C8;N # Po [4] SHARADA DANDA..SHARADA SEPARATOR
111C5..111C9;N # Po [5] SHARADA DANDA..SHARADA SANDHI MARK
111CA..111CC;N # Mn [3] SHARADA SIGN NUKTA..SHARADA EXTRA SHORT VOWEL MARK
111CD;N # Po SHARADA SUTRA MARK
111D0..111D9;N # Nd [10] SHARADA DIGIT ZERO..SHARADA DIGIT NINE
111DA;N # Lo SHARADA EKAM
111DB;N # Po SHARADA SIGN SIDDHAM
111DC;N # Lo SHARADA HEADSTROKE
111DD..111DF;N # Po [3] SHARADA CONTINUATION SIGN..SHARADA SECTION MARK-2
111E1..111F4;N # No [20] SINHALA ARCHAIC DIGIT ONE..SINHALA ARCHAIC NUMBER ONE THOUSAND
11200..11211;N # Lo [18] KHOJKI LETTER A..KHOJKI LETTER JJA
11213..1122B;N # Lo [25] KHOJKI LETTER NYA..KHOJKI LETTER LLA
@ -1860,12 +1880,18 @@ FFFD;A # So REPLACEMENT CHARACTER
11235;N # Mc KHOJKI SIGN VIRAMA
11236..11237;N # Mn [2] KHOJKI SIGN NUKTA..KHOJKI SIGN SHADDA
11238..1123D;N # Po [6] KHOJKI DANDA..KHOJKI ABBREVIATION SIGN
11280..11286;N # Lo [7] MULTANI LETTER A..MULTANI LETTER GA
11288;N # Lo MULTANI LETTER GHA
1128A..1128D;N # Lo [4] MULTANI LETTER CA..MULTANI LETTER JJA
1128F..1129D;N # Lo [15] MULTANI LETTER NYA..MULTANI LETTER BA
1129F..112A8;N # Lo [10] MULTANI LETTER BHA..MULTANI LETTER RHA
112A9;N # Po MULTANI SECTION MARK
112B0..112DE;N # Lo [47] KHUDAWADI LETTER A..KHUDAWADI LETTER HA
112DF;N # Mn KHUDAWADI SIGN ANUSVARA
112E0..112E2;N # Mc [3] KHUDAWADI VOWEL SIGN AA..KHUDAWADI VOWEL SIGN II
112E3..112EA;N # Mn [8] KHUDAWADI VOWEL SIGN U..KHUDAWADI SIGN VIRAMA
112F0..112F9;N # Nd [10] KHUDAWADI DIGIT ZERO..KHUDAWADI DIGIT NINE
11301;N # Mn GRANTHA SIGN CANDRABINDU
11300..11301;N # Mn [2] GRANTHA SIGN COMBINING ANUSVARA ABOVE..GRANTHA SIGN CANDRABINDU
11302..11303;N # Mc [2] GRANTHA SIGN ANUSVARA..GRANTHA SIGN VISARGA
11305..1130C;N # Lo [8] GRANTHA LETTER A..GRANTHA LETTER VOCALIC L
1130F..11310;N # Lo [2] GRANTHA LETTER EE..GRANTHA LETTER AI
@ -1880,6 +1906,7 @@ FFFD;A # So REPLACEMENT CHARACTER
11341..11344;N # Mc [4] GRANTHA VOWEL SIGN U..GRANTHA VOWEL SIGN VOCALIC RR
11347..11348;N # Mc [2] GRANTHA VOWEL SIGN EE..GRANTHA VOWEL SIGN AI
1134B..1134D;N # Mc [3] GRANTHA VOWEL SIGN OO..GRANTHA SIGN VIRAMA
11350;N # Lo GRANTHA OM
11357;N # Mc GRANTHA AU LENGTH MARK
1135D..11361;N # Lo [5] GRANTHA SIGN PLUTA..GRANTHA LETTER VOCALIC LL
11362..11363;N # Mc [2] GRANTHA VOWEL SIGN VOCALIC L..GRANTHA VOWEL SIGN VOCALIC LL
@ -1905,7 +1932,9 @@ FFFD;A # So REPLACEMENT CHARACTER
115BC..115BD;N # Mn [2] SIDDHAM SIGN CANDRABINDU..SIDDHAM SIGN ANUSVARA
115BE;N # Mc SIDDHAM SIGN VISARGA
115BF..115C0;N # Mn [2] SIDDHAM SIGN VIRAMA..SIDDHAM SIGN NUKTA
115C1..115C9;N # Po [9] SIDDHAM SIGN SIDDHAM..SIDDHAM END OF TEXT MARK
115C1..115D7;N # Po [23] SIDDHAM SIGN SIDDHAM..SIDDHAM SECTION MARK WITH CIRCLES AND FOUR ENCLOSURES
115D8..115DB;N # Lo [4] SIDDHAM LETTER THREE-CIRCLE ALTERNATE I..SIDDHAM LETTER ALTERNATE U
115DC..115DD;N # Mn [2] SIDDHAM VOWEL SIGN ALTERNATE U..SIDDHAM VOWEL SIGN ALTERNATE UU
11600..1162F;N # Lo [48] MODI LETTER A..MODI LETTER LLA
11630..11632;N # Mc [3] MODI VOWEL SIGN AA..MODI VOWEL SIGN II
11633..1163A;N # Mn [8] MODI VOWEL SIGN U..MODI VOWEL SIGN AI
@ -1925,15 +1954,27 @@ FFFD;A # So REPLACEMENT CHARACTER
116B6;N # Mc TAKRI SIGN VIRAMA
116B7;N # Mn TAKRI SIGN NUKTA
116C0..116C9;N # Nd [10] TAKRI DIGIT ZERO..TAKRI DIGIT NINE
11700..11719;N # Lo [26] AHOM LETTER KA..AHOM LETTER JHA
1171D..1171F;N # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
11720..11721;N # Mc [2] AHOM VOWEL SIGN A..AHOM VOWEL SIGN AA
11722..11725;N # Mn [4] AHOM VOWEL SIGN I..AHOM VOWEL SIGN UU
11726;N # Mc AHOM VOWEL SIGN E
11727..1172B;N # Mn [5] AHOM VOWEL SIGN AW..AHOM SIGN KILLER
11730..11739;N # Nd [10] AHOM DIGIT ZERO..AHOM DIGIT NINE
1173A..1173B;N # No [2] AHOM NUMBER TEN..AHOM NUMBER TWENTY
1173C..1173E;N # Po [3] AHOM SIGN SMALL SECTION..AHOM SIGN RULAI
1173F;N # So AHOM SYMBOL VI
118A0..118DF;N # L& [64] WARANG CITI CAPITAL LETTER NGAA..WARANG CITI SMALL LETTER VIYO
118E0..118E9;N # Nd [10] WARANG CITI DIGIT ZERO..WARANG CITI DIGIT NINE
118EA..118F2;N # No [9] WARANG CITI NUMBER TEN..WARANG CITI NUMBER NINETY
118FF;N # Lo WARANG CITI OM
11AC0..11AF8;N # Lo [57] PAU CIN HAU LETTER PA..PAU CIN HAU GLOTTAL STOP FINAL
12000..12398;N # Lo [921] CUNEIFORM SIGN A..CUNEIFORM SIGN UM TIMES ME
12000..12399;N # Lo [922] CUNEIFORM SIGN A..CUNEIFORM SIGN U U
12400..1246E;N # Nl [111] CUNEIFORM NUMERIC SIGN TWO ASH..CUNEIFORM NUMERIC SIGN NINE U VARIANT FORM
12470..12474;N # Po [5] CUNEIFORM PUNCTUATION SIGN OLD ASSYRIAN WORD DIVIDER..CUNEIFORM PUNCTUATION SIGN DIAGONAL QUADCOLON
12480..12543;N # Lo [196] CUNEIFORM SIGN AB TIMES NUN TENU..CUNEIFORM SIGN ZU5 TIMES THREE DISH TENU
13000..1342E;N # Lo [1071] EGYPTIAN HIEROGLYPH A001..EGYPTIAN HIEROGLYPH AA032
14400..14646;N # Lo [583] ANATOLIAN HIEROGLYPH A001..ANATOLIAN HIEROGLYPH A530
16800..16A38;N # Lo [569] BAMUM LETTER PHASE-A NGKUE MFON..BAMUM LETTER PHASE-F VUEQ
16A40..16A5E;N # Lo [31] MRO LETTER TA..MRO LETTER TEK
16A60..16A69;N # Nd [10] MRO DIGIT ZERO..MRO DIGIT NINE
@ -1979,7 +2020,7 @@ FFFD;A # So REPLACEMENT CHARACTER
1D185..1D18B;N # Mn [7] MUSICAL SYMBOL COMBINING DOIT..MUSICAL SYMBOL COMBINING TRIPLE TONGUE
1D18C..1D1A9;N # So [30] MUSICAL SYMBOL RINFORZANDO..MUSICAL SYMBOL DEGREE SLASH
1D1AA..1D1AD;N # Mn [4] MUSICAL SYMBOL COMBINING DOWN BOW..MUSICAL SYMBOL COMBINING SNAP PIZZICATO
1D1AE..1D1DD;N # So [48] MUSICAL SYMBOL PEDAL MARK..MUSICAL SYMBOL PES SUBPUNCTIS
1D1AE..1D1E8;N # So [59] MUSICAL SYMBOL PEDAL MARK..MUSICAL SYMBOL KIEVAN FLAT SIGN
1D200..1D241;N # So [66] GREEK VOCAL NOTATION SYMBOL-1..GREEK INSTRUMENTAL NOTATION SYMBOL-54
1D242..1D244;N # Mn [3] COMBINING GREEK MUSICAL TRISEME..COMBINING GREEK MUSICAL PENTASEME
1D245;N # So GREEK MUSICAL LEIMMA
@ -2026,6 +2067,18 @@ FFFD;A # So REPLACEMENT CHARACTER
1D7C3;N # Sm MATHEMATICAL SANS-SERIF BOLD ITALIC PARTIAL DIFFERENTIAL
1D7C4..1D7CB;N # L& [8] MATHEMATICAL SANS-SERIF BOLD ITALIC EPSILON SYMBOL..MATHEMATICAL BOLD SMALL DIGAMMA
1D7CE..1D7FF;N # Nd [50] MATHEMATICAL BOLD DIGIT ZERO..MATHEMATICAL MONOSPACE DIGIT NINE
1D800..1D9FF;N # So [512] SIGNWRITING HAND-FIST INDEX..SIGNWRITING HEAD
1DA00..1DA36;N # Mn [55] SIGNWRITING HEAD RIM..SIGNWRITING AIR SUCKING IN
1DA37..1DA3A;N # So [4] SIGNWRITING AIR BLOW SMALL ROTATIONS..SIGNWRITING BREATH EXHALE
1DA3B..1DA6C;N # Mn [50] SIGNWRITING MOUTH CLOSED NEUTRAL..SIGNWRITING EXCITEMENT
1DA6D..1DA74;N # So [8] SIGNWRITING SHOULDER HIP SPINE..SIGNWRITING TORSO-FLOORPLANE TWISTING
1DA75;N # Mn SIGNWRITING UPPER BODY TILTING FROM HIP JOINTS
1DA76..1DA83;N # So [14] SIGNWRITING LIMB COMBINATION..SIGNWRITING LOCATION DEPTH
1DA84;N # Mn SIGNWRITING LOCATION HEAD NECK
1DA85..1DA86;N # So [2] SIGNWRITING LOCATION TORSO..SIGNWRITING LOCATION LIMBS DIGITS
1DA87..1DA8B;N # Po [5] SIGNWRITING COMMA..SIGNWRITING PARENTHESIS
1DA9B..1DA9F;N # Mn [5] SIGNWRITING FILL MODIFIER-2..SIGNWRITING FILL MODIFIER-6
1DAA1..1DAAF;N # Mn [15] SIGNWRITING ROTATION MODIFIER-2..SIGNWRITING ROTATION MODIFIER-16
1E800..1E8C4;N # Lo [197] MENDE KIKAKUI SYLLABLE M001 KI..MENDE KIKAKUI SYLLABLE M060 NYON
1E8C7..1E8CF;N # No [9] MENDE KIKAKUI DIGIT ONE..MENDE KIKAKUI DIGIT NINE
1E8D0..1E8D6;N # Mn [7] MENDE KIKAKUI COMBINING NUMBER TEENS..MENDE KIKAKUI COMBINING NUMBER MILLIONS
@ -2081,19 +2134,14 @@ FFFD;A # So REPLACEMENT CHARACTER
1F210..1F23A;W # So [43] SQUARED CJK UNIFIED IDEOGRAPH-624B..SQUARED CJK UNIFIED IDEOGRAPH-55B6
1F240..1F248;W # So [9] TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-672C..TORTOISE SHELL BRACKETED CJK UNIFIED IDEOGRAPH-6557
1F250..1F251;W # So [2] CIRCLED IDEOGRAPH ADVANTAGE..CIRCLED IDEOGRAPH ACCEPT
1F300..1F32C;N # So [45] CYCLONE..WIND BLOWING FACE
1F330..1F37D;N # So [78] CHESTNUT..FORK AND KNIFE WITH PLATE
1F380..1F3CE;N # So [79] RIBBON..RACING CAR
1F3D4..1F3F7;N # So [36] SNOW CAPPED MOUNTAIN..LABEL
1F400..1F4FE;N # So [255] RAT..PORTABLE STEREO
1F500..1F54A;N # So [75] TWISTED RIGHTWARDS ARROWS..DOVE OF PEACE
1F550..1F579;N # So [42] CLOCK FACE ONE OCLOCK..JOYSTICK
1F300..1F3FA;N # So [251] CYCLONE..AMPHORA
1F3FB..1F3FF;N # Sk [5] EMOJI MODIFIER FITZPATRICK TYPE-1-2..EMOJI MODIFIER FITZPATRICK TYPE-6
1F400..1F579;N # So [378] RAT..JOYSTICK
1F57B..1F5A3;N # So [41] LEFT HAND TELEPHONE RECEIVER..BLACK DOWN POINTING BACKHAND INDEX
1F5A5..1F5FF;N # So [91] DESKTOP COMPUTER..MOYAI
1F600..1F642;N # So [67] GRINNING FACE..SLIGHTLY SMILING FACE
1F645..1F64F;N # So [11] FACE WITH NO GOOD GESTURE..PERSON WITH FOLDED HANDS
1F600..1F64F;N # So [80] GRINNING FACE..PERSON WITH FOLDED HANDS
1F650..1F67F;N # So [48] NORTH WEST POINTING LEAF..REVERSE CHECKER BOARD
1F680..1F6CF;N # So [80] ROCKET..BED
1F680..1F6D0;N # So [81] ROCKET..PLACE OF WORSHIP
1F6E0..1F6EC;N # So [13] HAMMER AND WRENCH..AIRPLANE ARRIVING
1F6F0..1F6F3;N # So [4] SATELLITE..PASSENGER SHIP
1F700..1F773;N # So [116] ALCHEMICAL SYMBOL FOR QUINTESSENCE..ALCHEMICAL SYMBOL FOR HALF OUNCE
@ -2103,12 +2151,17 @@ FFFD;A # So REPLACEMENT CHARACTER
1F850..1F859;N # So [10] LEFTWARDS SANS-SERIF ARROW..UP DOWN SANS-SERIF ARROW
1F860..1F887;N # So [40] WIDE-HEADED LEFTWARDS LIGHT BARB ARROW..WIDE-HEADED SOUTH WEST VERY HEAVY BARB ARROW
1F890..1F8AD;N # So [30] LEFTWARDS TRIANGLE ARROWHEAD..WHITE ARROW SHAFT WIDTH TWO THIRDS
1F910..1F918;N # So [9] ZIPPER-MOUTH FACE..SIGN OF THE HORNS
1F980..1F984;N # So [5] CRAB..UNICORN FACE
1F9C0;N # So CHEESE WEDGE
20000..2A6D6;W # Lo [42711] CJK UNIFIED IDEOGRAPH-20000..CJK UNIFIED IDEOGRAPH-2A6D6
2A6D7..2A6FF;W # Cn [41] <reserved-2A6D7>..<reserved-2A6FF>
2A700..2B734;W # Lo [4149] CJK UNIFIED IDEOGRAPH-2A700..CJK UNIFIED IDEOGRAPH-2B734
2B735..2B73F;W # Cn [11] <reserved-2B735>..<reserved-2B73F>
2B740..2B81D;W # Lo [222] CJK UNIFIED IDEOGRAPH-2B740..CJK UNIFIED IDEOGRAPH-2B81D
2B81E..2F7FF;W # Cn [16354] <reserved-2B81E>..<reserved-2F7FF>
2B81E..2B81F;W # Cn [2] <reserved-2B81E>..<reserved-2B81F>
2B820..2CEA1;W # Lo [5762] CJK UNIFIED IDEOGRAPH-2B820..CJK UNIFIED IDEOGRAPH-2CEA1
2CEA2..2F7FF;W # Cn [10590] <reserved-2CEA2>..<reserved-2F7FF>
2F800..2FA1D;W # Lo [542] CJK COMPATIBILITY IDEOGRAPH-2F800..CJK COMPATIBILITY IDEOGRAPH-2FA1D
2FA1E..2FFFD;W # Cn [1504] <reserved-2FA1E>..<reserved-2FFFD>
30000..3FFFD;W # Cn [65534] <reserved-30000>..<reserved-3FFFD>

View File

@ -35,7 +35,7 @@
# files for making modifications.
UNICODE_VERSION = 7.0.0
UNICODE_VERSION = 8.0.0
PYTHON3 = python3
WGET = wget

File diff suppressed because it is too large Load Diff

View File

@ -169,7 +169,9 @@ def is_combining_remove(code_point):
'PAHAWH HMONG',
'MIAO',
'DUPLOYAN',
'MENDE KIKAKUI'
'MENDE KIKAKUI',
'AHOM',
'SIGNWRITING'
):
if substring in name:
return False