b997c53273
X-SVN-Rev: 16954
93 lines
4.3 KiB
Plaintext
93 lines
4.3 KiB
Plaintext
# Special Casing Properties
|
|
#
|
|
# This file is a supplement to the UnicodeData file.
|
|
# It contains additional information about the casing of Unicode characters.
|
|
# (For compatibility, the UnicodeData.txt file only contains case mappings for
|
|
# characters where they are 1-1, and does not have locale-specific mappings.)
|
|
# For more information, see the discussion of Case Mappings in the Unicode Standard.
|
|
#
|
|
# All code points not listed in this file that do not have a simple case mappings
|
|
# in UnicodeData.txt map to themselves.
|
|
# ================================================================================
|
|
# Format
|
|
# ================================================================================
|
|
# The entries in this file are in the following machine-readable format:
|
|
#
|
|
# <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment>
|
|
#
|
|
# <code>, <lower>, <title>, and <upper> provide character values in hex. If there is more than
|
|
# one character, they are separated by spaces. Other than as used to separate elements,
|
|
# spaces are to be ignored.
|
|
#
|
|
# The <condition_list> is optional. Where present, it consists of one or more locales or contexts,
|
|
# separated by spaces. In these conditions:
|
|
# - A condition list overrides the normal behavior if all of the listed conditions are true.
|
|
# - The context is always the context of the characters in the original string,
|
|
# NOT in the resulting string.
|
|
# - Case distinctions in the condition list are not significant.
|
|
# - Conditions preceded by "Not_" represent the negation of the condition.
|
|
#
|
|
# A locale is defined as:
|
|
# <locale> := <ISO_639_code> ( "_" <ISO_3166_code> ( "_" <variant> )? )?
|
|
# <ISO_3166_code> := 2-letter ISO country code,
|
|
# <ISO_639_code> := 2-letter ISO language code
|
|
#
|
|
# A context for a character C is one of the following. This overrides Table
|
|
# 3-13. Context Specification for Casing on p. 89 of The Unicode Standard,
|
|
# Version 4.0.
|
|
#
|
|
# Definitions
|
|
# - The property "cased" is defined in D47 on that same page (p. 89)
|
|
# - A character C is defined to be "case-ignorable" if it meets either of the
|
|
# following criteria:
|
|
# A. The general category of C is Nonspacing Mark (Mn), or Enclosing Mark
|
|
# (Me), or Format Control (Cf), or Letter Modifier (Lm), or
|
|
# Symbol Modifier (Sk)
|
|
# B. C is a MidLetter as defined in UAX #29
|
|
# - A "case-ignorable sequence" is a sequence of zero or more case-ignorable
|
|
# characters.
|
|
#
|
|
# A description of each context is followed by the equivalent regular
|
|
# expression(s) describing the context before C and/or the context after C.
|
|
# The regular expression uses the syntax of UTS #18, with one addition:
|
|
# "!" means that the expression does not match. All regular expressions
|
|
# below are case-sensitive.
|
|
#
|
|
# Context: Final_Sigma
|
|
# Description: C is preceded by a sequence consisting of a cased letter and
|
|
# a case-ignorable sequence, and C is not followed by a sequence consisting
|
|
# of an ignorable sequence
|
|
# and then a cased letter.
|
|
# Before C: \p{cased} (\p{case-ignorable})*
|
|
# After C: !( (\p{case-ignorable})* \p{cased} )
|
|
#
|
|
# Context: After_Soft_Dotted
|
|
# Description: The last preceding character with combining class of zero before C was
|
|
# Soft_Dotted, and there is no intervening combining character class 230 (ABOVE).
|
|
# Before C: [\p{Soft_Dotted}] ([^{cc=230} {cc=0}])*
|
|
#
|
|
# Context: More_Above
|
|
# Description: C is followed by one or more characters of combining class
|
|
# 230 (ABOVE) in the combining character sequence.
|
|
# After C: [^\p{cc=0}]* [\p{cc=230}]
|
|
#
|
|
# Context: Before_Dot
|
|
# Description: C is followed by combining dot above (U+0307). Any sequence
|
|
# of characters with a combining class that is neither 0 nor 230 may intervene
|
|
# between the current character and the combining dot above.
|
|
# After C: ([^\p{cc=230} \p{cc=0}])* [\u0307]
|
|
#
|
|
# Context: After_I
|
|
# Description: The last preceding base character was an uppercase I, and
|
|
# there is no intervening combining character class 230 (ABOVE).
|
|
# Before C: [I] ([^\p{cc=230} \p{cc=0}])
|
|
#
|
|
# Parsers of this file must be prepared to deal with future additions to this format:
|
|
# * Additional contexts
|
|
# * Additional fields
|
|
# ================================================================================
|
|
|
|
# ================================================================================
|
|
# Unconditional mappings
|
|
# ================================================================================
|