scuffed-code/icu4c/source/data/translit/Cyrl_Latn.txt
2016-02-05 03:37:50 +00:00

280 lines
13 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ***************************************************************************
# *
# * Copyright (C) 2004-2016, International Business Machines
# * Corporation; Unicode, Inc.; and others. All Rights Reserved.
# *
# ***************************************************************************
# File: Cyrl_Latn.txt
# Generated from CLDR
#
# TODO: add remaining characters
# Should add variants for Russian-English, Russian-German
# Those can use this as a base, and then remap cases
# like a $hat to ya or ja.
# :: [\u0000-\u007E ʹ ʺ [:Cyrillic:] [:Latin:] [:nonspacing mark:]] ;
### WARNING, \u0308 must be added to the generated filters, in both directions ###
# MINIMAL FILTER
:: [Ққ\u0308Ă-ăĔ-ĕĞ-ğĬ-ĭŎ-ŏŬ-ŭ\u0306Ѐ-џҐ-ҕҘ-ҙӁ-ӂӐ-ӟӢ-ӧӬ-ӵӸ-ӹḜ-ḝẮ-ặᾰᾸῐῘῠῨ] ;
:: NFD (NFC) ;
$modprime = ʹ;
$modprime2 = ʺ;
$grave = \u0300;
$acute = \u0301;
$hat = \u0302;
$breve = \u0306 ;
$dot = \u0307 ;
$caron = \u030C ;
$comma = \u0326 ;
$under = \u0331 ;
$descender = ˌ;
# move up so not masked
я ↔ a $hat ; # CYRILLIC SMALL LETTER YA
Я ↔ A $hat ; # CYRILLIC CAPITAL LETTER YA
ч ↔ c $caron ; # CYRILLIC SMALL LETTER CHE
Ч ↔ C $caron; # CYRILLIC CAPITAL LETTER CHE
# ҷ ↔ XXX ; # CYRILLIC SMALL LETTER CHE WITH DESCENDER
# Ҷ ↔ XXX ; # CYRILLIC CAPITAL LETTER CHE WITH DESCENDER
# ӌ ↔ XXX ; # CYRILLIC SMALL LETTER KHAKASSIAN CHE
# Ӌ ↔ XXX ; # CYRILLIC CAPITAL LETTER KHAKASSIAN CHE
# ҹ ↔ XXX ; # CYRILLIC SMALL LETTER CHE WITH VERTICAL STROKE
# Ҹ ↔ XXX ; # CYRILLIC CAPITAL LETTER CHE WITH VERTICAL STROKE
э ↔ e $acute; # CYRILLIC SMALL LETTER E
Э ↔ E $acute; # CYRILLIC CAPITAL LETTER E
є ↔ e $hat; # CYRILLIC SMALL LETTER UKRAINIAN IE
Є ↔ E $hat; # CYRILLIC CAPITAL LETTER UKRAINIAN IE
ш ↔ s $caron ; # CYRILLIC SMALL LETTER SHA
Ш ↔ S $caron ; # CYRILLIC CAPITAL LETTER SHA
щ ↔ s $hat ; # CYRILLIC SMALL LETTER SHCHA
Щ ↔ S $hat; # CYRILLIC CAPITAL LETTER SHCHA
ѕ ↔ z $hat ; # CYRILLIC SMALL LETTER DZE
Ѕ ↔ Z $hat; # CYRILLIC CAPITAL LETTER DZE
# ӡ ↔ XXX ; # CYRILLIC SMALL LETTER ABKHASIAN DZE
# Ӡ ↔ XXX ; # CYRILLIC CAPITAL LETTER ABKHASIAN DZE
ю ↔ u $hat ; # CYRILLIC SMALL LETTER YU
Ю ↔ U $hat ; # CYRILLIC CAPITAL LETTER YU
і ↔ i $acute; # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
І ↔ I $acute; # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
ј ↔ j $caron; # CYRILLIC SMALL LETTER JE
Ј ↔ J $caron; # CYRILLIC CAPITAL LETTER JE
љ ↔ l $hat ; # CYRILLIC SMALL LETTER LJE
Љ ↔ L $hat ; # CYRILLIC CAPITAL LETTER LJE
њ ↔ n $hat ; # CYRILLIC SMALL LETTER NJE
Њ ↔ N $hat ; # CYRILLIC CAPITAL LETTER NJE
ћ ↔ c $acute ; # CYRILLIC SMALL LETTER TSHE
Ћ ↔ C $acute ; # CYRILLIC CAPITAL LETTER TSHE
џ ↔ d $hat ; # CYRILLIC SMALL LETTER DZHE
Џ ↔ D $hat ; # CYRILLIC CAPITAL LETTER DZHE
# Normal order
а ↔ a ; # CYRILLIC SMALL LETTER A
А ↔ A ; # CYRILLIC CAPITAL LETTER A
ә ↔ ə ; # CYRILLIC SMALL LETTER SCHWA
Ә ↔ Ə ; # CYRILLIC CAPITAL LETTER SCHWA
ӕ ↔ æ ; # CYRILLIC SMALL LIGATURE A IE
Ӕ ↔ Æ ; # CYRILLIC CAPITAL LIGATURE A IE
б ↔ b ; # CYRILLIC SMALL LETTER BE
Б ↔ B ; # CYRILLIC CAPITAL LETTER BE
в ↔ v ; # CYRILLIC SMALL LETTER VE
В ↔ V ; # CYRILLIC CAPITAL LETTER VE
ґ ↔ g $grave ; # CYRILLIC SMALL LETTER GHE WITH UPTURN
Ґ ↔ G $grave ; # CYRILLIC CAPITAL LETTER GHE WITH UPTURN
ғ ↔ g $dot ; # CYRILLIC SMALL LETTER GHE WITH STROKE
Ғ ↔ G $dot; # CYRILLIC CAPITAL LETTER GHE WITH STROKE
ҕ ↔ g $breve; # CYRILLIC SMALL LETTER GHE WITH MIDDLE HOOK
Ҕ ↔ G $breve; # CYRILLIC CAPITAL LETTER GHE WITH MIDDLE HOOK
г ↔ g ; # CYRILLIC SMALL LETTER GHE
Г ↔ G ; # CYRILLIC CAPITAL LETTER GHE
д ↔ d; # CYRILLIC SMALL LETTER DE
Д ↔ D; # CYRILLIC CAPITAL LETTER DE
ђ ↔ đ ; # CYRILLIC SMALL LETTER DJE
Ђ ↔ Đ ; # CYRILLIC CAPITAL LETTER DJE
ҙ ↔ z $comma ; # CYRILLIC SMALL LETTER ZE WITH DESCENDER
Ҙ ↔ Z $comma ; # CYRILLIC CAPITAL LETTER ZE WITH DESCENDER
е ↔ e ; # CYRILLIC SMALL LETTER IE
Е ↔ E; # CYRILLIC CAPITAL LETTER IE
ж ↔ z $caron; # CYRILLIC SMALL LETTER ZHE
Ж ↔ Z $caron; # CYRILLIC CAPITAL LETTER ZHE
# җ ↔ XXX ; # CYRILLIC SMALL LETTER ZHE WITH DESCENDER
# Җ ↔ XXX ; # CYRILLIC CAPITAL LETTER ZHE WITH DESCENDER
з ↔ z ; # CYRILLIC SMALL LETTER ZE
З ↔ Z; # CYRILLIC CAPITAL LETTER ZE
и\u0306 ↔ j ; # CYRILLIC SMALL LETTER I
И\u0306 ↔ J ; # CYRILLIC CAPITAL LETTER I
и ↔ i ; # CYRILLIC SMALL LETTER I
И ↔ I ; # CYRILLIC CAPITAL LETTER I
қ ↔ k $descender ; # CYRILLIC SMALL LETTER KA WITH DESCENDER
Қ ↔ K $descender ; # CYRILLIC CAPITAL LETTER KA WITH DESCENDER
к ↔ k ; # CYRILLIC SMALL LETTER KA
К ↔ K; # CYRILLIC CAPITAL LETTER KA
# ӄ ↔ XXX ; # CYRILLIC SMALL LETTER KA WITH HOOK
# Ӄ ↔ XXX ; # CYRILLIC CAPITAL LETTER KA WITH HOOK
# ҡ ↔ XXX ; # CYRILLIC SMALL LETTER BASHKIR KA
# Ҡ ↔ XXX ; # CYRILLIC CAPITAL LETTER BASHKIR KA
# ҟ ↔ XXX ; # CYRILLIC SMALL LETTER KA WITH STROKE
# Ҟ ↔ XXX ; # CYRILLIC CAPITAL LETTER KA WITH STROKE
# ҝ ↔ XXX ; # CYRILLIC SMALL LETTER KA WITH VERTICAL STROKE
# Ҝ ↔ XXX ; # CYRILLIC CAPITAL LETTER KA WITH VERTICAL STROKE
л ↔ l ; # CYRILLIC SMALL LETTER EL
Л ↔ L; # CYRILLIC CAPITAL LETTER EL
м ↔ m ; # CYRILLIC SMALL LETTER EM
М ↔ M ; # CYRILLIC CAPITAL LETTER EM
н ↔ n ; # CYRILLIC SMALL LETTER EN
Н ↔ N; # CYRILLIC CAPITAL LETTER EN
# ң ↔ XXX ; # CYRILLIC SMALL LETTER EN WITH DESCENDER
# Ң ↔ XXX ; # CYRILLIC CAPITAL LETTER EN WITH DESCENDER
# ӈ ↔ XXX ; # CYRILLIC SMALL LETTER EN WITH HOOK
# Ӈ ↔ XXX ; # CYRILLIC CAPITAL LETTER EN WITH HOOK
# ҥ ↔ XXX ; # CYRILLIC SMALL LIGATURE EN GHE
# Ҥ ↔ XXX ; # CYRILLIC CAPITAL LIGATURE EN GHE
о ↔ o ; # CYRILLIC SMALL LETTER O
О ↔ O ; # CYRILLIC CAPITAL LETTER O
# ө ↔ XXX ; # CYRILLIC SMALL LETTER BARRED O
# Ө ↔ XXX ; # CYRILLIC CAPITAL LETTER BARRED O
п ↔ p ; # CYRILLIC SMALL LETTER PE
П ↔ P ; # CYRILLIC CAPITAL LETTER PE
# ҧ ↔ XXX ; # CYRILLIC SMALL LETTER PE WITH MIDDLE HOOK
# Ҧ ↔ XXX ; # CYRILLIC CAPITAL LETTER PE WITH MIDDLE HOOK
# ҁ ↔ XXX ; # CYRILLIC SMALL LETTER KOPPA
# Ҁ ↔ XXX ; # CYRILLIC CAPITAL LETTER KOPPA
р ↔ r ; # CYRILLIC SMALL LETTER ER
Р ↔ R ; # CYRILLIC CAPITAL LETTER ER
# ҏ ↔ XXX ; # CYRILLIC SMALL LETTER ER WITH TICK
# Ҏ ↔ XXX ; # CYRILLIC CAPITAL LETTER ER WITH TICK
с ↔ s ; # CYRILLIC SMALL LETTER ES
С ↔ S ; # CYRILLIC CAPITAL LETTER ES
# ҫ ↔ XXX ; # CYRILLIC SMALL LETTER ES WITH DESCENDER
# Ҫ ↔ XXX ; # CYRILLIC CAPITAL LETTER ES WITH DESCENDER
т ↔ t ; # CYRILLIC SMALL LETTER TE
Т ↔ T ; # CYRILLIC CAPITAL LETTER TE
# ҭ ↔ XXX ; # CYRILLIC SMALL LETTER TE WITH DESCENDER
# Ҭ ↔ XXX ; # CYRILLIC CAPITAL LETTER TE WITH DESCENDER
у ↔ u ; # CYRILLIC SMALL LETTER U
У ↔ U ; # CYRILLIC CAPITAL LETTER U
# ү ↔ XXX ; # CYRILLIC SMALL LETTER STRAIGHT U
# Ү ↔ XXX ; # CYRILLIC CAPITAL LETTER STRAIGHT U
# ұ ↔ XXX ; # CYRILLIC SMALL LETTER STRAIGHT U WITH STROKE
# Ұ ↔ XXX ; # CYRILLIC CAPITAL LETTER STRAIGHT U WITH STROKE
# ѹ ↔ XXX ; # CYRILLIC SMALL LETTER UK
# Ѹ ↔ XXX ; # CYRILLIC CAPITAL LETTER UK
ф ↔ f ; # CYRILLIC SMALL LETTER EF
Ф ↔ F ; # CYRILLIC CAPITAL LETTER EF
х ↔ h ; # CYRILLIC SMALL LETTER HA
Х ↔ H; # CYRILLIC CAPITAL LETTER HA
# ҳ ↔ XXX ; # CYRILLIC SMALL LETTER HA WITH DESCENDER
# Ҳ ↔ XXX ; # CYRILLIC CAPITAL LETTER HA WITH DESCENDER
# һ ↔ XXX ; # CYRILLIC SMALL LETTER SHHA
# Һ ↔ XXX ; # CYRILLIC CAPITAL LETTER SHHA
# ѡ ↔ XXX ; # CYRILLIC SMALL LETTER OMEGA
# Ѡ ↔ XXX ; # CYRILLIC CAPITAL LETTER OMEGA
# ѿ ↔ XXX ; # CYRILLIC SMALL LETTER OT
# Ѿ ↔ XXX ; # CYRILLIC CAPITAL LETTER OT
# ѽ ↔ XXX ; # CYRILLIC SMALL LETTER OMEGA WITH TITLO
# Ѽ ↔ XXX ; # CYRILLIC CAPITAL LETTER OMEGA WITH TITLO
# ѻ ↔ XXX ; # CYRILLIC SMALL LETTER ROUND OMEGA
# Ѻ ↔ XXX ; # CYRILLIC CAPITAL LETTER ROUND OMEGA
ц ↔ c ; # CYRILLIC SMALL LETTER TSE
Ц ↔ C; # CYRILLIC CAPITAL LETTER TSE
# ҵ ↔ XXX ; # CYRILLIC SMALL LIGATURE TE TSE
# Ҵ ↔ XXX ; # CYRILLIC CAPITAL LIGATURE TE TSE
# ҽ ↔ XXX ; # CYRILLIC SMALL LETTER ABKHASIAN CHE
# Ҽ ↔ XXX ; # CYRILLIC CAPITAL LETTER ABKHASIAN CHE
# ҿ ↔ XXX ; # CYRILLIC SMALL LETTER ABKHASIAN CHE WITH DESCENDER
# Ҿ ↔ XXX ; # CYRILLIC CAPITAL LETTER ABKHASIAN CHE WITH DESCENDER
Ъ ↔ $modprime2 $under ; # CYRILLIC CAPITAL LETTER HARD SIGN
ъ ↔ $modprime2 ; # CYRILLIC SMALL LETTER HARD SIGN
Ь ↔ $modprime $under ; # CYRILLIC CAPITAL LETTER SOFT SIGN
ь ↔ $modprime ; # CYRILLIC SMALL LETTER SOFT SIGN
ы ↔ y ; # CYRILLIC SMALL LETTER YERU
Ы ↔ Y ; # CYRILLIC CAPITAL LETTER YERU
# ҍ ↔ XXX ; # CYRILLIC SMALL LETTER SEMISOFT SIGN
# Ҍ ↔ XXX ; # CYRILLIC CAPITAL LETTER SEMISOFT SIGN
# ѣ ↔ XXX ; # CYRILLIC SMALL LETTER YAT
# Ѣ ↔ XXX ; # CYRILLIC CAPITAL LETTER YAT
# ѥ ↔ XXX ; # CYRILLIC SMALL LETTER IOTIFIED E
# Ѥ ↔ XXX ; # CYRILLIC CAPITAL LETTER IOTIFIED E
# ѧ ↔ XXX ; # CYRILLIC SMALL LETTER LITTLE YUS
# Ѧ ↔ XXX ; # CYRILLIC CAPITAL LETTER LITTLE YUS
# ѫ ↔ XXX ; # CYRILLIC SMALL LETTER BIG YUS
# Ѫ ↔ XXX ; # CYRILLIC CAPITAL LETTER BIG YUS
# ѩ ↔ XXX ; # CYRILLIC SMALL LETTER IOTIFIED LITTLE YUS
# Ѩ ↔ XXX ; # CYRILLIC CAPITAL LETTER IOTIFIED LITTLE YUS
# ѭ ↔ XXX ; # CYRILLIC SMALL LETTER IOTIFIED BIG YUS
# Ѭ ↔ XXX ; # CYRILLIC CAPITAL LETTER IOTIFIED BIG YUS
# ѯ ↔ XXX ; # CYRILLIC SMALL LETTER KSI
# Ѯ ↔ XXX ; # CYRILLIC CAPITAL LETTER KSI
# ѱ ↔ XXX ; # CYRILLIC SMALL LETTER PSI
# Ѱ ↔ XXX ; # CYRILLIC CAPITAL LETTER PSI
# ѳ ↔ XXX ; # CYRILLIC SMALL LETTER FITA
# Ѳ ↔ XXX ; # CYRILLIC CAPITAL LETTER FITA
# ѵ ↔ XXX ; # CYRILLIC SMALL LETTER IZHITSA
# Ѵ ↔ XXX ; # CYRILLIC CAPITAL LETTER IZHITSA
# ҩ ↔ XXX ; # CYRILLIC SMALL LETTER ABKHASIAN HA
# Ҩ ↔ XXX ; # CYRILLIC CAPITAL LETTER ABKHASIAN HA
# Ӏ ↔ XXX ; # CYRILLIC LETTER PALOCHKA
### а\u0306 ↔ XXX ; # CYRILLIC SMALL LETTER A
### А\u0306 ↔ XXX ; # CYRILLIC CAPITAL LETTER A
### а\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER A
### А\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER A
### ә\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER SCHWA
### Ә\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER SCHWA
### г\u0301 ↔ XXX ; # CYRILLIC SMALL LETTER GHE
### Г\u0301 ↔ XXX ; # CYRILLIC CAPITAL LETTER GHE
### е\u0300 ↔ XXX ; # CYRILLIC SMALL LETTER IE
### Е\u0300 ↔ XXX ; # CYRILLIC CAPITAL LETTER IE
### е\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER IE
### Е\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER IE
### е\u0306 ↔ XXX ; # CYRILLIC SMALL LETTER IE
### Е\u0306 ↔ XXX ; # CYRILLIC CAPITAL LETTER IE
### ж\u0306 ↔ XXX ; # CYRILLIC SMALL LETTER ZHE
### Ж\u0306 ↔ XXX ; # CYRILLIC CAPITAL LETTER ZHE
### ж\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER ZHE
### Ж\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER ZHE
### з\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER ZE
### З\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER ZE
### и\u0300 ↔ XXX ; # CYRILLIC SMALL LETTER I
### И\u0300 ↔ XXX ; # CYRILLIC CAPITAL LETTER I
### и\u0304 ↔ XXX ; # CYRILLIC SMALL LETTER I
### И\u0304 ↔ XXX ; # CYRILLIC CAPITAL LETTER I
### и\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER I
### И\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER I
### і\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I
### І\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I
### о\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER O
### О\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER O
### ө\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER BARRED O
### Ө\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER BARRED O
### к\u0301 ↔ XXX ; # CYRILLIC SMALL LETTER KA
### К\u0301 ↔ XXX ; # CYRILLIC CAPITAL LETTER KA
### у\u0304 ↔ XXX ; # CYRILLIC SMALL LETTER U
### У\u0304 ↔ XXX ; # CYRILLIC CAPITAL LETTER U
### у\u0306 ↔ XXX ; # CYRILLIC SMALL LETTER U
### У\u0306 ↔ XXX ; # CYRILLIC CAPITAL LETTER U
### у\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER U
### У\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER U
### у\u030B ↔ XXX ; # CYRILLIC SMALL LETTER U
### У\u030B ↔ XXX ; # CYRILLIC CAPITAL LETTER U
### ч\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER CHE
### Ч\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER CHE
### ы\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER YERU
### Ы\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER YERU
### э\u0308 ↔ XXX ; # CYRILLIC SMALL LETTER E
### Э\u0308 ↔ XXX ; # CYRILLIC CAPITAL LETTER E
### ѵ\u030F ↔ XXX ; # CYRILLIC SMALL LETTER IZHITSA
### Ѵ\u030F ↔ XXX ; # CYRILLIC CAPITAL LETTER IZHITSA
# Completeness
$ignore = [[:Mark:]''] * ;
| k ← q ;
| K ← Q ;
| u ← w ;
| U ← W ;
| KS ← X } $ignore [:UppercaseLetter:] ;
| KS ← [:UppercaseLetter:] $ignore { X ;
| Ks ← X ;
| ks ← x ;
:: NFC (NFD) ;
# note: a global filter is more efficient, but MUST include all source chars!!
# :: ([\u0000-\u007E ʹ ʺ [:Cyrillic:] [:Latin:] [:nonspacing mark:]]);
# MINIMAL FILTER: Latin-Cyrillic
:: ( [ˌ\u0308A-Za-zÀ-ÏÑ-ÖÙ-Ýà-ïñ-öù-ýÿ-ĥĨ-İĴ-ķĹ-ľŃ-ňŌ-őŔ-ťŨ-žƏƠ-ơƯ-ưǍ-ǜǞ-ǣǦ-ǰǴ-ǵǸ-țȞ-ȟȦ-ȳəʹ-ʺ\u0300-\u0302\u0306-\u0307\u030C\u0326\u0331\u0340-\u0341\u0344ʹ΅-ΆΈ-ΊΌΎ-ΐά-ΰό-ώϓЀЃЌ-ЎЙйѐѓќ-ўӁ-ӂӐ-ӑӖ-ӗḀ-ẙẛẠ-ỹἂ-ἅἊ-Ἅἒ-ἕἚ-Ἕἢ-ἥἪ-Ἥἲ-ἵἺ-Ἵὂ-ὅὊ-Ὅὒ-ὕὛὝὢ-ὥὪ-Ὥὰ-ώᾂ-ᾅᾊ-ᾍᾒ-ᾕᾚ-ᾝᾢ-ᾥᾪ-ᾭᾰᾲᾴᾸᾺ-ΆῂῄῈ-Ή῍-῎ῐῒ-ΐῘῚ-Ί῝-῞ῠῢ-ΰῨῪ-Ύ῭-΅ῲῴῸ-ΏK-Å] ) ;