Compiled regular expression patterns make use of several shared common
UnicodeSets. This change simplifies the creation and use of these
static UnicodeSets.
- Pointer fields to the static sets are removed from the compiled patterns,
and the static variables are accessed directly. The deleted pointers
were a hold-over from earlier code that did not use shared statics.
- The UnicodeSet pattern literals are changed from hex constants to
u"string literals".
- The size of fRuleSets (from regexst.h) is changed from a hard-coded 10
to the number of UnicodeSets actually required. Doing this required
a change to regexcst.pl to export the required size. Changing and
rerunning this perl code resulted in massive but benign changes to
the generated file regexcst.h, the result of perl having changed its
order of enumeration of hashes since the file was last regenerated.
- UnicodeSets are frozen when possible. Should result in faster matching.
DateTimePatternGenerator needs to consider the hour-cycle preferred by
Locale. This means that we need to to override the hour-cycle when a
locale contains "hc" keyword. This patch is adding such functionality.
In addition, "DateTimePatternGenerator::adjustFieldTypes" should adjust
hour field to properly follow tr35
spec(https://www.unicode.org/reports/tr35/tr35-dates.html#dfst-hour).
- Merge the look-ahead results slots used when multiple rules share a common accepting state.
- Sequentially number the look-ahead result slot. Will eventually allow replacing the runtime map with an array.
- Inhibit chaining out of look-ahead rules. This could never actually happen; when a hard break
rule matches, the engine is stopped immediately, but the state table was being constructed
as if it could happen. Reduces table size for line break rules.
- Remove incorrect handling of fAccepting and fLookAhead fields of a state table row
when removing duplicate states. Look-ahead slot number was being mis-interpreted as a state number.
See PR #915, see changes.txt
- Unicode 13 beta data as of 2019-nov-21
- uprops.icu format version 7.7 with more bits for Script/Script_Extensions
- more bits in spoof checker ScriptSet
- root line break rules adjusted for UAX 14 changes, from Andy
- line break tailorings not yet in sync with root
The purpose of the FIELD_NAME_STR() macro is to create a string literal
by using the # preprocessing operator and then skip the first 5 chars of
this string by using the +5 pointer arithmetic. This is all intentional,
but if the parentheses are misplaced the compiler might think that this
is a mistake, a failed string concatenation (-Wstring-plus-int).
This enables "classic" desktop builds of ICU4C for both ARM (32-bit)
and ARM64 (64-bit) on Windows.
All but the two samples "cal" and "date" in the "allinone" project now
have ARM and ARM64 project configurations, and build for Windows Desktop
ARM/ARM64.
Note: In order to build the ARM/ARM64 data DLL, you need to first build
x64/Release, as the ARM/ARM build uses the x64 bits in order to be able
to cross-compile for ARM/ARM64. This allows for completely building
ARM/ARM64 binaries using only x64 hardware.
The ARM/ARM64 builds require using a newer version of the Windows SDK
than 8.1, so they have a separate WindowsTargetPlatformVersion which
uses Windows 10 SDK version 10.0.16299.0 (aka RS3), which is the first
version of the Windows SDK to support building ARM64 desktop applications.
In addition this also greatly cleans-up the ICU4C ".vcxproj" files, in
order to remove redundant parts, fix inconsistencies, and make them more
readable. This introduces two new variables in the shared `*.props`
files: `IcuBinOutputDir` and `IcuLibOutputDir` in order to further
reduce the amount of duplicated lines in the individual ".vcxproj"
files themselves.