CharString, when asked, appends U_FILE_SEP_CHAR at the end of the string
it holds, if it won't find U_FILE_SEP_CHAR or U_FILE_ALT_SEP_CHAR there.
The problem starts if the dir variable uses
U_FILE_ALT_SEP_CHAR which is not equal to U_FILE_SEP_CHAR. Then the
resulting path could look like this
../data\
instead of this
../data/
This patch uses U_FILE_SEP_CHAR unless it detects that the dir variable
doesn't use it, and uses U_FILE_ALT_SEP_CHAR instead.
Fix a clang compiler warning and a potential undefined behavior arising
from casting an out-of-range double to an int. See the Jira ticket for a
more detailed description of the problem.
This PR is to fix the immediate problem. Longer term, the function
may be replaced entirely - see issue ICU-21147.
If udata_create won't find U_FILE_SEP_CHAR at the end of a dir variable,
then it appends it. The problem starts if the dir variable uses
U_FILE_ALT_SEP_CHAR which is not equal to U_FILE_SEP_CHAR. Then the
resulting path could look like this
../data\mappings/cns-11643-1992.ucm
instead of this
../data/mappings/cns-11643-1992.ucm
This patch uses U_FILE_SEP_CHAR unless it detects that the dir variable
doesn't use it, and uses U_FILE_ALT_SEP_CHAR instead.
- Produce new supplementalData.txt and units.txt with:
ant -f build-icu-data.xml -DoutDir=/tmp/new_dir \
-DcldrVersion=37 -DoutputTypes=UNITS,SUPPLEMENTAL_DATA
For identifying text that needs to be handled by a word dictionary for Break Iteration,
change from using a bit in the character category to sorting all dictionary categories
together, and recording the boundary between the non-dictionary and dictionary ranges.
This is internal to the implementaion. It does not affect behavior.
It does increase the number of character categories that can be handled using a
compact 8 bit Trie, from 127 to 255.
The result of pointer end + 1 will not be used if end is nullptr so it
doesn't really matter that the result of this operation is undefined,
but it's therefore also unnecessary to perform the operation at all.
Changing this removes this unnecessary operation and by doing so gives
the undefined behaviour sanitizer one thing less to worry about.
- Still allows "1234" or "cldrbug:1234" format ticket IDs
- However, docs recommend "ICU-1234" or "CLDR-1234" format
in the future.
- Other ticket IDs could be used, but won't be linkified.
- Check non-lenient rules before call lenint parsing
- Remove logKnownIssue 9503 from test code
- Adjust TestAllLocales test on ICU4C
- Add lenient checks on ICU4J
This eliminates the need for the fixed size scratch buffer inside of
locale_set_default_internal() and also eliminates the need for counting
bytes, something that ByteSink and CharString now will handle correctly,
when needed.
None of this should have any externally visible effect (apart from
removing the arbitrary size limit imposed by the fixed size scratch
buffer), it's all about cleaning up implementation internals.
Intel Control-flow Enforcement Technology (CET):
https://software.intel.com/en-us/articles/intel-sdm
contains shadow stack (SHSTK) and indirect branch tracking (IBT). When
CET is enabled, ELF object files must be marked with .note.gnu.property
section. GCC provides <cet.h> which can be included in assembly codes
to generate CET maker when compiling with -fcf-protection.
Two issues here:
- fix 2 build issue in i18n when compiling with clang++ -fsanitize=undefined
the following two symbols were not exported (and they should be):
typeinfo for icu::CollationCacheEntry
typeinfo for icu::numparse::impl::CodePointMatcher
- remove undefined behavior warning in NumberFormatTestTuple.. minor, but very annoying
when repeated many times during every test run. Tends to mask real errors.
> numberformattesttuple.cpp:319:5: runtime error: member access within null pointer of type 'NumberFormatTestTuple'
If you call the API getDefaultHourCycle on an empty DateTimePatternGenerator
instance (ie: no locale) then it calls UPRV_UNREACHABLE which calls abort().
We should return an error code instead of aborting.
since the move of the DLL to bin/ the library names in .pc files is
wrong. With ICU 65.1, icu-uc.pc contains
Libs: -L${libdir} -licuuc65 -licudt65
the version number should not appear. Indeed, the linker looks for the
libraries in $prefix/lib in the following order (see [1]):
libxxx.dll.a
xxx.dll.a
libxxx.a
cygxxx.dll
libxxx.dll
xxx.dll
As the is only the import library with no versioning (which is normal),
the is a link error when using ICU pc files.
[1] https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/win32.html
Change the implementation of grapheme cluster matching in regex to use an ICU
break iterator instead of a little one-off state machine.
The old implementation had fallen behind the Unicode UAX-29 specification for
graphem clusters, and could not be easily updated.
The implementation follows the same general pattern that is used for finding
word boundaries with an ICU break iterator. In reviewing that code, a few
improvements to the handling of ICU error codes were also made.
Also note that this change adds a new dependency on Break Iteration. Regex
patterns that previously would work with ICU builds that were configured with
no break iteration will now fail. But only if they include \X for matching
grapheme cluster boundaries.
This change builds on Vincent Torri's changes.
This installs the ICU DLL files in $prefix/bin instead of $prefix/lib.
Note: In order to disable this change in behavior you can edit
the "mh-mingw*" file(s). If you set the variable MINGW_MOVEDLLSTOBINDIR
to NO instead of YES, then it will retain the previous behavior of
installing the DLLs into the bin folder.
Adds `ICU_TIMEZONE_FILES_DIR_PREFIX_ENV_VAR`, similar to
`ICU_DATA_DIR_PREFIX_ENV_VAR`, that specifies an environment variable
to retrieve and prepend to the ICU time zone data file path.
In regular expressions, when testing for word boundaries with \b, the
boundaries were incorrect when in Unicode mode, meaning that an ICU word break
iterator is being used to find the boundaries, and the text being matched is
UTF-8 encoded.
The bug stemmed from a misunderstanding of how string indexes work with UText
and break iterators, leading to the inclusion of code to convert from UTF-8 to
UTF-16 indexing, when what was wanted was the original UTF-8 index everywhere.
Removing the indexing conversion fixes the problem.
Compiled regular expression patterns make use of several shared common
UnicodeSets. This change simplifies the creation and use of these
static UnicodeSets.
- Pointer fields to the static sets are removed from the compiled patterns,
and the static variables are accessed directly. The deleted pointers
were a hold-over from earlier code that did not use shared statics.
- The UnicodeSet pattern literals are changed from hex constants to
u"string literals".
- The size of fRuleSets (from regexst.h) is changed from a hard-coded 10
to the number of UnicodeSets actually required. Doing this required
a change to regexcst.pl to export the required size. Changing and
rerunning this perl code resulted in massive but benign changes to
the generated file regexcst.h, the result of perl having changed its
order of enumeration of hashes since the file was last regenerated.
- UnicodeSets are frozen when possible. Should result in faster matching.
The definition of max_align_t is not guaranteed to be available unless
the appropriate header is included. Since use of <stddef.h> from C++ is
deprecated, that's <cstddef>, and max_align_t is thus defined under the
std namespace rather than in the global namespace.
DateTimePatternGenerator needs to consider the hour-cycle preferred by
Locale. This means that we need to to override the hour-cycle when a
locale contains "hc" keyword. This patch is adding such functionality.
In addition, "DateTimePatternGenerator::adjustFieldTypes" should adjust
hour field to properly follow tr35
spec(https://www.unicode.org/reports/tr35/tr35-dates.html#dfst-hour).
This would cause failures during cross compilation cases such as:
make[6]: Leaving directory '/spksrc/spk/bazarr/work-qoriq-6.1/icu/source/data'
make[5]: *** No rule to make target 'out', needed by 'out/icudt64b.dat'. Stop.