# Confusables.txt # Generated: %date%, MED # This is a draft list of visually confusable characters, for use in conjunction with the # recommendations in http://www.unicode.org/reports/tr36/ # # To fold using this list, first perform NFKD (if not already performed), # then map each source character to the target character(s), then perform NFKD again. # # The format the standard Unicode semicolon-delimited hex. # ; ; # # # The characters may be visually distinguishable in many fonts, or at larger sizes. # Some anomalies are also introduced by 'closure'. That is, there may be a sequence of # characters where each is visually confusable from the next, but the start and end are # visually distinguishable. But when the set is closed, these will all map to together. # # This is unlike normalization data. There may be no connection between characters other # than visual confusability. This data should not be used except in assessing visual confusability. # # This list is not limited to Unicode Identifier characters (XID_Continue) although the primary # application will be to such characters. It is also not limited to lowercase characters, # although the recommendations are to lowercase for security. # # Note that a some characters have unusual characteristics, and are not yet accounted for. # For example, U+302E (?) HANGUL SINGLE DOT TONE MARK and U+302F (?) HANGUL DOUBLE DOT TONE MARK # appear to the left of the prevous character. So what looks like "a:b" can actually be "ab\u302F" # # WARNING: The data is not final; it is very draft at this point, put together from different # sources that need to be reviewed for accuracy and completeness of the mappings. # There are still clear errors in the data; do not use this in any implementations. # Ignore the internal_info field; it will be removed. # # Thanks especially to Eric van der Poel for collecting information about fonts using shared glyphs. # =================================