Commit Graph

7976 Commits

Author SHA1 Message Date
Frank Tang
0eed48038b ICU-20725 Fix stack overflow of u_unescapeAt
See #1207
2020-08-10 14:59:38 -07:00
Hugo van der Merwe
0b815fb8c3 ICU-21059 Load simple unit IDs from convertUnits.
PR: https://github.com/icu-units/icu/pull/41
Commit: 7877f0409019827b2d8d43b0843656322181972b
2020-08-05 10:57:19 +02:00
Hugo van der Merwe
6b595d1c01 ICU-21076 Delete unneeded MeasureUnit data & code 2020-08-05 01:48:32 +02:00
Frank Tang
863582c2a4 ICU-20465 Calendar/DateFormat listen to tz extension
See #1176
2020-08-04 13:33:03 -07:00
Frank Tang
8ca80c4b6d ICU-21158 Fix doc of UDISPCTX_NO_SUBSTITUTE
See #1200
2020-07-31 18:39:46 -07:00
Frank Tang
7ddc231195 ICU-20734 Improve fuzzer_driver
See #1204
2020-07-31 15:30:03 -07:00
Frank Tang
41d1d57af0 ICU-21122 Fix flaky TestAdoptCalendarLeak 2020-07-29 20:39:55 -07:00
Frank Tang
d7ec310436 ICU-20684 Fix uninitialized in isMatchAtCPBoundary
Downstream bug https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=15505
Fix Fuzzer-detected Use-of-uninitialized-value in isMatchAtCPBoundary

To test to show the bug in the new test case, configure and build with
CFLAGS="-fsanitize=memory" CXXFLAGS="-fsanitize=memory" ./runConfigureICU \
  --enable-debug --disable-release  Linux  --disable-layoutex

Test with
cintltst /tsutil/custrtst
2020-07-29 14:21:53 -07:00
Andy Heninger
895aff3bff ICU-21178 Add check for corrupt rbbitst.txt data.
In the test data from rbbitst.txt, two or more adjacent boundary markers with
no intervening test data were accepted, with no indication of a problem.

This situation occurred, as described in bug ICU-21178, with a bad import of
some test cases from CLDR. PR #1194 corrected the problem with the test data
in ICU4C. This PR adds code to flag this situation in the test data, and
also propagates the data fix to ICU4J's copy of rbbitst.txt.
2020-07-24 15:16:12 -07:00
Frank Tang
0d4b1c1cb9 ICU-21160 Fix the length return by preflight
See #1178
2020-07-21 18:05:20 -07:00
Andy Heninger
003b431540 ICU-13590 RBBI, improve handling of concurrent look-ahead rules.
Change the mapping from rule number to boundary position to use a simple array
instead of a linear search lookup map.

Look-ahead rules have a preceding context, a boundary position, and following context.
In the implementation, when the preceding context matches, the potential boundary
position is saved. Then, if the following context proves to match, the saved boundary is
returned as an actual boundary.

Look-ahead rules are numbered, and the implementation maintains a map from
rule number to the tentative saved boundary position.

In an earlier improvement to the rule builder, the rule numbering was changed to be a
contiguous sequence, from the original sparse numbering. In anticipation of
changing the mapping from number to position to use a simple array.
2020-07-21 14:39:15 -07:00
Ramon
2de2585f1b ICU-13339 Do not parse decimal point for integers 2020-07-20 23:52:59 -05:00
Hugo van der Merwe
e734111ee5 ICU-21192 MeasureUnit Identifier spec compliance: s/p/pow/
Specification:
https://www.unicode.org/reports/tr35/tr35-general.html#Unit_Identifiers
2020-07-16 01:58:32 +02:00
Michael Block
f917c43cf1 ICU-21178 Adding the trailing space back into two RBBI test cases. 2020-07-07 16:05:05 -07:00
John Wilcock
6fe86f3934 ICU-21173 Add support for more currency variants. ICU4C equivalent of…
See #1184
2020-07-03 04:51:15 +02:00
John Wilcock
9219c6ae03 ICU-13733 Added test for mismatching currency format for strict-mode parsing
See #1169
2020-06-30 02:22:57 +02:00
Markus Scherer
ef12882fdb ICU-21144 LocaleMatcher setMaxDistance(), isMatch() 2020-06-23 13:56:49 -07:00
Hugo van der Merwe
6a1df9e16c ICU-21169 Add SingleUnitImpl::getSimpleUnitID().
Also:
- Use BytesTrie not UCharsTrie.
- Add a nullptr check for a uprv_malloc.
2020-06-18 09:27:03 +02:00
Andy Heninger
1eef362329 ICU-13565 Break Iteration, remove the dictionary bit from the implementation.
For identifying text that needs to be handled by a word dictionary for Break Iteration,
change from using a bit in the character category to sorting all dictionary categories
together, and recording the boundary between the non-dictionary and dictionary ranges.

This is internal to the implementaion. It does not affect behavior.
It does increase the number of character categories that can be handled using a
compact 8 bit Trie, from 127 to 255.
2020-06-17 12:00:14 -07:00
Frank Tang
e7bd5b1cef ICU-21109 minimum grouping digits in DecimalFormat
See #1152
2020-06-11 14:32:52 -07:00
Andy Heninger
f0ad454691 ICU-13565 RBBI, make all state table row data be unsigned. 2020-06-01 20:05:17 -07:00
Shane F. Carr
3ff6627ce6 ICU-21134 Copy additional data when toNumberFormatter is used
See #1156
2020-05-28 22:33:58 -05:00
Frank Tang
ec7e29f2b6 ICU-13786 Fix addLikelySubtags/minimizeSubtags
See #1140
2020-05-27 18:36:36 -07:00
Frank Tang
c5ebb80a73 ICU-13565 Reduce size of BreakIterator brk files
See #1100
2020-05-27 14:26:10 -07:00
Steven R. Loomis
4231ca5be0 ICU-21098 fix ticket URLs for logKnownIssue tickets.
- Still allows "1234" or "cldrbug:1234" format ticket IDs
- However, docs recommend "ICU-1234" or "CLDR-1234" format
in the future.
- Other ticket IDs could be used, but won't be linkified.
2020-05-20 15:58:51 -07:00
Markus Scherer
eaee0b175e ICU-21029 LocaleMatcher: add option to turn off default locale 2020-05-20 15:16:28 -07:00
Peter Edberg
6fdd303532 ICU-21096 adjust logKnownIssues for ICU rbbitst 2020-05-06 17:29:49 -07:00
Peter Edberg
d39899350d ICU-21099 udat_toCalendarDateField should handle all UDateFormatFields and out of range 2020-04-28 09:58:50 -07:00
Robert Melo
440cef61a7 ICU-21071 Fix lenient parse rules
- Check non-lenient rules before call lenint parsing
- Remove logKnownIssue 9503 from test code
- Adjust TestAllLocales test on ICU4C
- Add lenient checks on ICU4J
2020-04-24 15:46:48 -03:00
Shane F. Carr
a5c940dfd8
ICU-21087 Merge maint/maint-67 to master 2020-04-22 20:15:39 -05:00
Frank Tang
f0ada59042 ICU-20949 Fix compound unit in "ar", "ne" locales
Do not assume the "one" pattern always contains "{0}"
2020-04-22 10:39:01 -07:00
Elango Cheran
925376a904 ICU-21055 Remove test inputs causing noknownissues test run to hang 2020-04-21 12:49:30 -07:00
Hugo van der Merwe
e03fa70541 ICU-21060 Fix behaviour of -per-, -and-, and dimensionless units. 2020-04-18 00:57:02 -05:00
Frank Tang
a91a97c0c3 ICU-21069 Fix ucptrie_swap pointer logic
See #1102
2020-04-15 14:39:08 -07:00
Markus Scherer
b9d1ba87f5 ICU-20936 copy the new direction field 2020-04-14 15:12:43 -07:00
Elango Cheran
3fb3929f80 ICU-21040 Fix segfaults in no data tests 2020-04-10 13:56:10 -07:00
Hugo van der Merwe
cb544f47e0 ICU-21060 Fix heap-use-after-free bug. 2020-04-07 12:40:39 -05:00
Hugo van der Merwe
99f9802fec ICU-21060 Fix the foo-per-a-b -> foo-b-per-a bug. 2020-04-06 18:46:51 -05:00
Shane F. Carr
94c2c578a9 ICU-20979 Update TODOs in formatting code to point to open issues.
Also see: ICU-20920 ICU-21059 ICU-20429 ICU-21058
2020-04-03 01:57:33 -05:00
Shane F. Carr
3b0772fff9 ICU-21015 Fixing gcc compiler warnings 2020-04-03 01:56:07 -05:00
Peter Edberg
c5cabf1953 ICU-21022 Update logKnownIssue to refer to ticket for fixing in a future release 2020-04-01 15:30:37 -07:00
Jeff Genovy
822eb4e622 ICU-20979 Fixing minor MSVC warnings 2020-04-01 14:31:28 -07:00
Shane F. Carr
bda3a3e68c ICU-13724 Removing obsolete numberformat2test.cpp 2020-03-31 15:02:24 -05:00
Shane F. Carr
ac4540f8a4 ICU-20418 Number skeletons: implement star wildcard; user guide fixes
See #1060
2020-03-26 00:15:03 -05:00
Shane F. Carr
b03feb6338 ICU-20920 Changing "Sequence" to "Mixed" in ICU4C MeasureUnit 2020-03-25 16:13:39 -05:00
Peter Edberg
1084c1430a ICU-21022 Use logKnownIssue to avoid TestDateFormatRoundTrip exhaustive fail 2020-03-25 10:50:42 -07:00
Shane F. Carr
b186f2cff6 ICU-20912 Make C/J Currency consistent on lowercase/uppercase currency equality
- Adds additional tests for Currency equality behavior
2020-03-25 12:21:34 -05:00
Peter Edberg
cfef59f0b8 ICU-13790 Add data tests to verify certain chars present in pinyin, stroke 2020-03-25 08:33:49 -07:00
Campion
b525045209 ICU-10858 Fix missing fTimeZoneFormat assignment in SimpleDateFormat::operator= (#963) 2020-03-24 20:04:35 -07:00
Shane F. Carr
6edd38f35f ICU-20806 Removing obsolete number formatting methods.
See #1034
2020-03-24 15:21:32 -05:00
Shane F. Carr
fc6612cc56 ICU-20920 Add support for CLDR 37 unit identifiers in ICU4C
See #964
2020-03-24 14:15:19 -05:00
Shane Carr
1e24bcd721 ICU-20956 Fix monetary symbol getters in DecimalFormat
See #987
2020-03-23 20:12:14 -05:00
Frank Tang
f6622ab2f1 ICU-21016 Special handling of Spanish and Hebrew list format until CLDR get the data
See #1043
2020-03-19 19:36:15 -07:00
Steven R. Loomis
cb8e278ee6 ICU-20797 fix UBS compilation error and UBS in test code
Two issues here:

- fix 2 build issue in i18n when compiling with clang++ -fsanitize=undefined
the following two symbols were not exported (and they should be):
  typeinfo for icu::CollationCacheEntry
  typeinfo for icu::numparse::impl::CodePointMatcher

- remove undefined behavior warning in NumberFormatTestTuple.. minor, but very annoying
when repeated many times during every test run. Tends to mask real errors.

> numberformattesttuple.cpp:319:5: runtime error: member access within null pointer of type 'NumberFormatTestTuple'
2020-03-17 09:11:58 -07:00
Markus Scherer
524748c6bf ICU-20984 StringPiece & ByteSink overloads for char8_t* 2020-03-16 10:49:21 -07:00
Peter Edberg
69b3523593 ICU-20987 integrate CLDR release-37-alpha3 to master, adjust MeasureUnitTest.java tools 2020-03-13 12:01:39 -07:00
Shane F. Carr
2d83fc2278 ICU-20809 Remove FieldPositionIterator from listformatter.h 2020-03-11 21:13:45 -05:00
Peter Edberg
d6eabe4155 ICU-20879 fix typo in tests, calender → calendar 2020-03-10 23:06:21 -07:00
Markus Scherer
72cd937620 ICU-20936 add LocaleMatcher.Builder.setDirection(with-one-way vs. only-two-way) 2020-03-10 08:22:28 -07:00
Markus Scherer
d2ea4513dc ICU-20700 reimplement acceptLanguage() using the LocaleMatcher; replace older accept-language-string parsing by LocalePriorityList 2020-03-08 08:01:31 -07:00
Markus Scherer
3edff03393 ICU-20915 LocaleMatcher no match: always getSupportedIndex()=-1; remove defaultLocaleIndex field; constructor check if locales are equivalent to default, not just equal; simplify locale sorting; minor builder & test deflaking 2020-03-08 07:54:46 -07:00
Frank Tang
94c9ff2089 ICU-20991 Trace BreakIterator/BreakEngine creation
See #1014
2020-03-06 14:18:43 -08:00
Shane F. Carr
01523b4da6 ICU-20974 Fix exhaustive test failures 2020-03-06 01:34:51 -08:00
Jeff Genovy
7302079653 ICU-21000 Fix abort called by DateTimePatternGenerator::getDefaultHourCycle
If you call the API getDefaultHourCycle on an empty DateTimePatternGenerator
instance (ie: no locale) then it calls UPRV_UNREACHABLE which calls abort().
We should return an error code instead of aborting.
2020-03-05 18:19:04 -08:00
Jeff Genovy
ce7e060d50 ICU-21001 Fixing problems found by running valgrind.
This makes fixes in order to run the icu4c tests (intltest, cintltst,
iotest, and icuinfo) cleanly under valgrind with --leak-check=full.
2020-03-05 14:34:20 -08:00
Jeff Genovy
bd08ba2c5b ICU-21004 Fix buffer over-read in ucal_open
The issue shows under valgrind or as an Address Sanitizer failure.
2020-03-05 14:09:34 -08:00
Shane Carr
0b7f6b1864 ICU-20974 Correctly handle extreme values of double. 2020-03-05 13:40:59 -08:00
Frank Tang
be3ee4cc63 ICU-20967 add millisecond to DateIntervalFormat
See #978
2020-03-05 10:55:19 -08:00
Shane Carr
e572de5516 ICU-20961 Return correct currency plural pattern from DecimalFormat 2020-03-04 19:43:57 -08:00
Peter Edberg
63e480dedc ICU-20987 integrate CLDR release-37-alpha1 to master (using new tooling) 2020-02-27 16:27:50 -08:00
Shane Carr
bb1f00efb8 ICU-20919 Merge branch 'maint/maint-66' into maint-66-merge
Conflicts:
	icu4j/main/shared/data/icudata.jar
2020-02-21 18:21:05 -08:00
Laurent Stacul
3b58179396 ICU-20972 Fix invalid conversion from const char8_t* to const char* (C++20) 2020-02-20 13:09:18 -08:00
Markus Scherer
af9ef2650b ICU-20893 Unicode 13 data 2020feb19 2020-02-19 22:02:35 -08:00
Jeff Genovy
77fcded28b ICU-20969 Fix file permissions (-x) on ICU4C source files. 2020-02-19 17:00:06 -08:00
Andy Heninger
14bcaaf58e ICU-20876 Regex Grapheme Cluster matching with Break Iterators.
Change the implementation of grapheme cluster matching in regex to use an ICU
break iterator instead of a little one-off state machine.

The old implementation had fallen behind the Unicode UAX-29 specification for
graphem clusters, and could not be easily updated.

The implementation follows the same general pattern that is used for finding
word boundaries with an ICU break iterator. In reviewing that code, a few
improvements to the handling of ICU error codes were also made.

Also note that this change adds a new dependency on Break Iteration.  Regex
patterns that previously would work with ICU builds that were configured with
no break iteration will now fail. But only if they include \X for matching
grapheme cluster boundaries.
2020-02-18 18:28:10 -08:00
Frank Tang
6ea0fc7713 ICU-20834 Implement UTS35 Locale ID Canonicalization
See #951
2020-02-11 22:44:39 -08:00
Mihai Nita
dd50e38f45 ICU-20738 Best-match pattern for 'sS' uses <appendItem> data 2020-02-10 07:59:52 -08:00
Shane Carr
9eca171a39 ICU-20954 Fix currency spacing in suffix. 2020-02-06 09:56:32 -08:00
Elango Cheran
1a9fb8ec33 ICU-13836 C++ port of adding exponent for better plurals for compact decimal format 2020-02-05 09:08:48 -08:00
Andy Heninger
d6b88d49e3 ICU-20939 Fix problem w regexp \b boundaries & UTF-8 text
In regular expressions, when testing for word boundaries with \b, the
boundaries were incorrect when in Unicode mode, meaning that an ICU word break
iterator is being used to find the boundaries, and the text being matched is
UTF-8 encoded.

The bug stemmed from a misunderstanding of how string indexes work with UText
and break iterators, leading to the inclusion of code to convert from UTF-8 to
UTF-16 indexing, when what was wanted was the original UTF-8 index everywhere.
Removing the indexing conversion fixes the problem.
2020-02-03 16:51:17 -08:00
Frank Tang
b7d08bc04a ICU-20958 Prevent SEGV_MAPERR in append
See #971
2020-02-03 13:22:30 -08:00
Andy Heninger
54a60fe6f4 ICU-11548 Improve regex static UnicodeSets handling
Compiled regular expression patterns make use of several shared common
UnicodeSets. This change simplifies the creation and use of these
static UnicodeSets.

- Pointer fields to the static sets are removed from the compiled patterns,
  and the static variables are accessed directly. The deleted pointers
  were a hold-over from earlier code that did not use shared statics.

- The UnicodeSet pattern literals are changed from hex constants to
  u"string literals".

- The size of fRuleSets (from regexst.h) is changed from a hard-coded 10
  to the number of UnicodeSets actually required. Doing this required
  a change to regexcst.pl to export the required size. Changing and
  rerunning this perl code resulted in massive but benign changes to
  the generated file regexcst.h, the result of perl having changed its
  order of enumeration of hashes since the file was last regenerated.

- UnicodeSets are frozen when possible. Should result in faster matching.
2020-01-30 15:13:07 -08:00
Frank Tang
7a5139ad95 ICU-20934 Fix TZ test error
Somehow these tests are now fail on trunks.
Per https://mm.icann.org/pipermail/tz-announce/2019-July/000056.html
     Brazil has canceled DST and will stay on standard time indefinitely.

Cherry-picked from: 11ad8d69fb
2020-01-20 14:58:55 +01:00
Shane Carr
8c717b514e ICU-20665 Removing number-dependence from ICU4C FormattedStringBuilder fields.
See #727
2020-01-17 11:22:02 +01:00
Frank Yung-Fong Tang
21df05234d ICU-20673 Allow built-in translit ID w/o data.
See #958
2020-01-16 21:28:01 -08:00
Shane Carr
fe98d870b2 ICU-20418 Adding concise number skeletons in ICU4C 2020-01-14 11:52:27 +01:00
Shane Carr
b24538eb05 ICU-20921 Adding find and compare to StringPiece 2020-01-14 11:52:27 +01:00
Caio Lima
09d409f5f4 ICU-20442 Adding support for hour-cycle on DateTimePatternGenerator
DateTimePatternGenerator needs to consider the hour-cycle preferred by
Locale. This means that we need to to override the hour-cycle when a
locale contains "hc" keyword. This patch is adding such functionality.
In addition, "DateTimePatternGenerator::adjustFieldTypes" should adjust
hour field to properly follow tr35
spec(https://www.unicode.org/reports/tr35/tr35-dates.html#dfst-hour).
2020-01-09 16:45:56 +01:00
Frank Tang
11ad8d69fb ICU-20934 Fix TZ test error
Somehow these tests are now fail on trunks.
Per https://mm.icann.org/pipermail/tz-announce/2019-July/000056.html
     Brazil has canceled DST and will stay on standard time indefinitely.
2020-01-03 20:52:11 -08:00
Frank Tang
4a8483be91 ICU-20900 Fix createCanonical
See #922
2020-01-03 15:00:04 -08:00
Markus Scherer
60b567d6ab ICU-20917 LocaleMatcher: prefer a more-default locale 2020-01-02 18:00:52 -08:00
Frank Tang
79fac50101 ICU-20310 omit "-true" in toLanguageTag
See #952
2019-12-30 15:39:59 -08:00
Markus Scherer
ad638c274e ICU-20916 LocaleMatcher distinguish between equivalent locales
- equivalent but originally unequal
- locale distance shifted left for additional fraction bits with micro distance
- Java more verbose matcher debug output
See #949
2019-12-20 09:36:57 -08:00
Shane Carr
46ec4fd523 ICU-12863 Add list style APIs to C and C++
See #894
2019-12-17 13:07:36 -08:00
Andy Heninger
faa2f9f9e1 ICU-20303 Break Iterator, improve handling of look-ahead rules.
- Merge the look-ahead results slots used when multiple rules share a common accepting state.
- Sequentially number the look-ahead result slot. Will eventually allow replacing the runtime map with an array.
- Inhibit chaining out of look-ahead rules. This could never actually happen; when a hard break
  rule matches, the engine is stopped immediately, but the state table was being constructed
  as if it could  happen. Reduces table size for line break rules.
- Remove incorrect handling of fAccepting and fLookAhead fields of a state table row
  when removing duplicate states. Look-ahead slot number was being mis-interpreted as a state number.
2019-12-13 13:17:21 -08:00
Shane Carr
7917df1e80 ICU-20883 Move UFormattedDateInterval to end of argument list. 2019-12-12 13:48:28 -08:00
Frank Tang
923ec1ad30 ICU-20436 Add getDefaultHourCycle to DateTimePatternGenerator
See #901
2019-12-12 00:13:37 -08:00
Shane F. Carr
39eb0f4fbf
ICU-20919 Merge maint/maint-66 (release-66-preview) to master 2019-12-11 15:25:36 -08:00
Caio Lima
7c147e4e85 ICU-20741 Changing SimpleDateTimeFormat::subFormat to only include 1 field at the same position when there is a data fallback 2019-12-10 21:53:47 -08:00
Andy Heninger
197e0239ab ICU-20893 Line break tailorings updated to Unicode 13. 2019-11-26 15:25:06 -08:00