Commit Graph

904 Commits

Author SHA1 Message Date
Peter Edberg
b066f65a50 ICU-21249 integrate CLDR release-38-alpha1 to ICU trunk 2020-09-04 15:05:22 -07:00
Markus Scherer
9971c663ff ICU-21257 remove #License fragment from license URLs 2020-09-04 10:02:17 -07:00
Shane F. Carr
fcc3bcb43e ICU-21249 Update numberpermutationtest.txt 2020-09-03 00:46:59 +02:00
Peter Edberg
e618a1cc2d ICU-21249 integrate CLDR release 38 alpha0 to ICU trunk 2020-09-02 10:23:14 -07:00
Shane F. Carr
196d5e1182 ICU-20923 Fix compact notation with percent. 2020-09-01 04:36:04 -05:00
Shane F. Carr
0101e2632c ICU-20164 Make NoUnit a zero-cost abstraction over MeasureUnit.
See #1230
2020-08-29 00:01:49 -05:00
Craig Cornelius
408cd128fc ICU-21242 rephrase documentation using term master
See #1255
2020-08-28 12:42:20 -07:00
Hugo van der Merwe
66b2458a26 ICU-21066 Copy includelist of CLDR testData with tools/cldr/build.xml 2020-08-28 01:12:12 +02:00
Markus Scherer
39da689d30 ICU-21184 rephrase docs/comments using the term grandfathered 2020-08-21 14:13:03 -07:00
Michael Block
f917c43cf1 ICU-21178 Adding the trailing space back into two RBBI test cases. 2020-07-07 16:05:05 -07:00
Peter Edberg
cfef59f0b8 ICU-13790 Add data tests to verify certain chars present in pinyin, stroke 2020-03-25 08:33:49 -07:00
Peter Edberg
69b3523593 ICU-20987 integrate CLDR release-37-alpha3 to master, adjust MeasureUnitTest.java tools 2020-03-13 12:01:39 -07:00
Peter Edberg
63e480dedc ICU-20987 integrate CLDR release-37-alpha1 to master (using new tooling) 2020-02-27 16:27:50 -08:00
Shane Carr
bb1f00efb8 ICU-20919 Merge branch 'maint/maint-66' into maint-66-merge
Conflicts:
	icu4j/main/shared/data/icudata.jar
2020-02-21 18:21:05 -08:00
Markus Scherer
af9ef2650b ICU-20893 Unicode 13 data 2020feb19 2020-02-19 22:02:35 -08:00
Andy Heninger
14bcaaf58e ICU-20876 Regex Grapheme Cluster matching with Break Iterators.
Change the implementation of grapheme cluster matching in regex to use an ICU
break iterator instead of a little one-off state machine.

The old implementation had fallen behind the Unicode UAX-29 specification for
graphem clusters, and could not be easily updated.

The implementation follows the same general pattern that is used for finding
word boundaries with an ICU break iterator. In reviewing that code, a few
improvements to the handling of ICU error codes were also made.

Also note that this change adds a new dependency on Break Iteration.  Regex
patterns that previously would work with ICU builds that were configured with
no break iteration will now fail. But only if they include \X for matching
grapheme cluster boundaries.
2020-02-18 18:28:10 -08:00
Shane Carr
9eca171a39 ICU-20954 Fix currency spacing in suffix. 2020-02-06 09:56:32 -08:00
Andy Heninger
d6b88d49e3 ICU-20939 Fix problem w regexp \b boundaries & UTF-8 text
In regular expressions, when testing for word boundaries with \b, the
boundaries were incorrect when in Unicode mode, meaning that an ICU word break
iterator is being used to find the boundaries, and the text being matched is
UTF-8 encoded.

The bug stemmed from a misunderstanding of how string indexes work with UText
and break iterators, leading to the inclusion of code to convert from UTF-8 to
UTF-16 indexing, when what was wanted was the original UTF-8 index everywhere.
Removing the indexing conversion fixes the problem.
2020-02-03 16:51:17 -08:00
Markus Scherer
60b567d6ab ICU-20917 LocaleMatcher: prefer a more-default locale 2020-01-02 18:00:52 -08:00
Markus Scherer
ad638c274e ICU-20916 LocaleMatcher distinguish between equivalent locales
- equivalent but originally unequal
- locale distance shifted left for additional fraction bits with micro distance
- Java more verbose matcher debug output
See #949
2019-12-20 09:36:57 -08:00
Andy Heninger
faa2f9f9e1 ICU-20303 Break Iterator, improve handling of look-ahead rules.
- Merge the look-ahead results slots used when multiple rules share a common accepting state.
- Sequentially number the look-ahead result slot. Will eventually allow replacing the runtime map with an array.
- Inhibit chaining out of look-ahead rules. This could never actually happen; when a hard break
  rule matches, the engine is stopped immediately, but the state table was being constructed
  as if it could  happen. Reduces table size for line break rules.
- Remove incorrect handling of fAccepting and fLookAhead fields of a state table row
  when removing duplicate states. Look-ahead slot number was being mis-interpreted as a state number.
2019-12-13 13:17:21 -08:00
Shane F. Carr
39eb0f4fbf
ICU-20919 Merge maint/maint-66 (release-66-preview) to master 2019-12-11 15:25:36 -08:00
Andy Heninger
197e0239ab ICU-20893 Line break tailorings updated to Unicode 13. 2019-11-26 15:25:06 -08:00
Shane Carr
017c8b762e ICU-20890 Change locale_dependencies.py into LOCALE_DEPS.json files
- Refactors Python to make I/O operations more abstract
- Adds stable sample data for Python test
2019-11-22 20:23:30 -08:00
Peter Edberg
04c8616f93 ICU-20857 integrate CLDR release-36-1-preview to maint-66 2019-11-22 19:01:36 -08:00
Markus Scherer
a7e378d587 ICU-20893 Unicode 13 beta
See PR #915, see changes.txt
- Unicode 13 beta data as of 2019-nov-21
- uprops.icu format version 7.7 with more bits for Script/Script_Extensions
- more bits in spoof checker ScriptSet
- root line break rules adjusted for UAX 14 changes, from Andy
- line break tailorings not yet in sync with root
2019-11-21 17:35:53 -08:00
Shane Carr
00946cef43 ICU-20709 Moving rounder call before number properties.
- Changes EXCEPT_ZERO notation to hide sign on numbers that round to zero.
- Adds additional tests for this behavior.
2019-11-05 14:43:34 -08:00
Peter Edberg
e25796f6e5 ICU-20801 integrate CLDR release-36-alpha2, update MeasureUnits (#809) 2019-09-06 14:07:36 -07:00
Andy Heninger
327087150f ICU-20618 Regex nested lookaround expressions, clean up active match region handling. 2019-08-19 13:31:34 -07:00
Markus Scherer
41c24b6c00 ICU-9695 port LocaleMatcher to C++ 2019-08-19 10:41:35 -07:00
Shane Carr
d983221543 ICU-20764 Allow top-level include and exclude in data filter rules. 2019-08-13 15:12:32 -07:00
Shane Carr
513b0c20b0 ICU-13743 Adding number permutation test.
Adds a test suite in C++ and Java to test many permutations of options in NumberFormatter.
2019-08-12 23:34:51 -07:00
Shane Carr
b4d41b0561 ICU-20737 Removing Python dependency on distutils.
Deletes tstfiles.mk and merges the list into BUILDRULES.py
2019-08-12 15:12:48 -07:00
Shane Carr
afab3f992c ICU-13780 Removing DecimalFormat_ICU58 (finally). 2019-08-12 14:59:45 -07:00
Andy Heninger
fa240d49cc ICU-13637 Break Iterator Rule Updates for Indic Grapheme Clusters. 2019-06-27 17:17:26 -07:00
Andy Heninger
5c23416308 ICU-13637 Documentation for doing break iterator updates. 2019-06-21 10:31:40 -07:00
Shane Carr
8667d0a106 ICU-20639 Add "mol" 3-letter language code to C++ map. 2019-06-18 13:47:27 -07:00
Shane Carr
c8c3fbca28 ICU-20616 Allow bidi marks around the sign in exponent parsing. 2019-05-27 22:39:18 -07:00
Shane Carr
702fdb6c33 ICU-20593 Renaming Python buildtool to icutools.databuilder. 2019-05-07 13:42:06 -07:00
Andy Heninger
d685cacd9b ICU-20391 Fix regexp crash with nested look-behinds, from fuzz testing. 2019-04-17 22:17:47 -07:00
Shane F. Carr
14eb026570
ICU-20511 Merge release-64-2 to master 2019-04-17 14:15:59 -07:00
yumaoka
f508bc491e ICU-20554 Disabled current date sensitive Japanese era test cases for now. 2019-04-15 09:49:04 -04:00
Shane F. Carr
be25c277fd
ICU-20511 Merge release-64-2-rc to master 2019-04-12 16:57:29 -07:00
Andy Heninger
bdb1806580 ICU-20544 Regex, Fix assertion failure in positive look-behind 2019-04-12 15:27:40 -07:00
Andy Heninger
7053363323 ICU-20544 Regex, fix min/max match length computation with negative look-behind patterns. 2019-04-10 22:38:25 -07:00
Steven R. Loomis
b76cb6517e ICU-20526 fix pkgdata where LD_SONAME has a trailing space
- added PKGDATA_TRAILING_SPACE to all of the pkgdataMakefile.in file.
- NOTE: Users who create their own pkgdata.inc / icupkg.inc files may need
   to recreate this PKGDATA_TRAILING_SPACE behavior.

- used the above variable, normally undefined, in mh-* files that need a trailing space

- Also, fixed use of system() in pkgdata.cpp per ICU-20538
This was causing pkgdata to return a zero status even on clang
failure, masking this issue.

(cherry picked from commit 83a0542b5b)
2019-04-05 10:53:59 -07:00
Markus Scherer
0565894534 ICU-20497 Unicode 12.1 2019-04-04 10:23:24 -07:00
Steven R. Loomis
83a0542b5b ICU-20526 fix pkgdata where LD_SONAME has a trailing space
- added PKGDATA_TRAILING_SPACE to all of the pkgdataMakefile.in file.
- NOTE: Users who create their own pkgdata.inc / icupkg.inc files may need
   to recreate this PKGDATA_TRAILING_SPACE behavior.

- used the above variable, normally undefined, in mh-* files that need a trailing space

- Also, fixed use of system() in pkgdata.cpp per ICU-20538
This was causing pkgdata to return a zero status even on clang
failure, masking this issue.
2019-04-03 16:43:42 -07:00
Markus Scherer
98589d9cc7 ICU-20203 Unicode 12 final data (only trivial changes) 2019-03-13 08:57:05 -07:00
Steven R. Loomis
3a28fb7216 ICU-20479 don’t leave junk in source directory on configure or make check
- see also ICU-20062
- add a `-B` option to the two python invocations on Windows
- set PYTHONDONTWRITEBYTECODE in configure.ac and icudefs.mk.in

Co-authored-by: Fredrik Roubert <roubert@google.com>
2019-03-08 14:28:27 -08:00