Unicode 16.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 16.0.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Changes in CHARMAP and WIDTH:
Total added characters in newly generated CHARMAP: 5185
Total removed characters in newly generated WIDTH: 1
Total added characters in newly generated WIDTH: 170
The removed character from WIDTH is U+1171E AHOM CONSONANT SIGN MEDIAL RA.
It changed like this:
UnicodeData.txt 15.1.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mn;0;NSM;;;;;N;;;;;
UnicodeData.txt 16.0.0: 1171E;AHOM CONSONANT SIGN MEDIAL RA;Mc;0;L;;;;;N;;;;;
EastAsianWidth.txt 15.1.0: 1171D..1171F ; N # Mn [3] AHOM CONSONANT SIGN MEDIAL LA..AHOM CONSONANT SIGN MEDIAL LIGATING RA
EastAsianWidth.txt 16.0.0: 1171E ; N # Mc AHOM CONSONANT SIGN MEDIAL RA
I.e it changed from Mn (Mark Nonspacing) to Mc (Mark Spacing
combining). So it should now have width 1 instead of 0, therefore it
is OK that it was removed from WIDTH, characters not in WIDTH get
width 1 by default.
Nothing suspicious when browsing the list of the 170 added characters.
Changes in ctype:
alpha: Added 4452 characters in new ctype which were not in old ctype
combining: Added 51 characters in new ctype which were not in old ctype
combining_level3: Added 43 characters in new ctype which were not in old ctype
graph: Added 5185 characters in new ctype which were not in old ctype
lower: Added 25 characters in new ctype which were not in old ctype
print: Added 5185 characters in new ctype which were not in old ctype
punct: Missing 33 characters of old ctype in new ctype
punct: Added 766 characters in new ctype which were not in old ctype
tolower: Added 27 characters in new ctype which were not in old ctype
totitle: Added 27 characters in new ctype which were not in old ctype
toupper: Added 27 characters in new ctype which were not in old ctype
upper: Added 27 characters in new ctype which were not in old ctype
Nothing suspicous in the additions.
About the 33 characters removed from `punct`:
U+0363 - U+036F are identical in UnicodeData.txt. Difference in DerivedCoreProperties.txt:
DerivedCoreProperties.txt 15.1.0: not there.
DerivedCoreProperties.txt 16.0.0: 0363..036F ; Alphabetic # Mn [13] COMBINING LATIN SMALL LETTER A..COMBINING LATIN SMALL LETTER X
So that’s the reason why they are added to `alpha` and removed from `punct`.
Same for U+1DD3 - U+1DE6, they are identical in UnicodeData.txt but there is a difference in DerivedCoreProperties.txt:
DerivedCoreProperties.txt 15.1.0: 1DE7..1DF4 ; Alphabetic # Mn [14] COMBINING LATIN SMALL LETTER ALPHA..COMBINING LATIN SMALL LETTER U WITH DIAERESIS
DerivedCoreProperties.txt 16.0.0: 1DD3..1DF4 ; Alphabetic # Mn [34] COMBINING LATIN SMALL LETTER FLATTENED OPEN A ABOVE..COMBINING LATIN SMALL LETTER U WITH DIAERESIS
So they became `Alphabetic` and were thus added to `alpha` and removed from `punct`.
Resolves: BZ #32168
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Resolves: BZ # 31239
The correct abbreviated month names were apparently given in the comment above `abmon`.
But the value of `abmon` was apparently just copied from the value of `mon` and this
mistake was hard to see because code point notation <Uxxxx> was used. After converting
to UTF-8 it was obvious that there was apparently a copy and paste mistake.
Resolves: BZ # 31221
glibc can recognise its code, but does not have its data.
This patch remedies that.
Signed-off-by: Janet Blackquill <uhhadd@gmail.com>