Commit Graph

894 Commits

Author SHA1 Message Date
Peter Edberg
cfef59f0b8 ICU-13790 Add data tests to verify certain chars present in pinyin, stroke 2020-03-25 08:33:49 -07:00
Peter Edberg
69b3523593 ICU-20987 integrate CLDR release-37-alpha3 to master, adjust MeasureUnitTest.java tools 2020-03-13 12:01:39 -07:00
Peter Edberg
63e480dedc ICU-20987 integrate CLDR release-37-alpha1 to master (using new tooling) 2020-02-27 16:27:50 -08:00
Shane Carr
bb1f00efb8 ICU-20919 Merge branch 'maint/maint-66' into maint-66-merge
Conflicts:
	icu4j/main/shared/data/icudata.jar
2020-02-21 18:21:05 -08:00
Markus Scherer
af9ef2650b ICU-20893 Unicode 13 data 2020feb19 2020-02-19 22:02:35 -08:00
Andy Heninger
14bcaaf58e ICU-20876 Regex Grapheme Cluster matching with Break Iterators.
Change the implementation of grapheme cluster matching in regex to use an ICU
break iterator instead of a little one-off state machine.

The old implementation had fallen behind the Unicode UAX-29 specification for
graphem clusters, and could not be easily updated.

The implementation follows the same general pattern that is used for finding
word boundaries with an ICU break iterator. In reviewing that code, a few
improvements to the handling of ICU error codes were also made.

Also note that this change adds a new dependency on Break Iteration.  Regex
patterns that previously would work with ICU builds that were configured with
no break iteration will now fail. But only if they include \X for matching
grapheme cluster boundaries.
2020-02-18 18:28:10 -08:00
Shane Carr
9eca171a39 ICU-20954 Fix currency spacing in suffix. 2020-02-06 09:56:32 -08:00
Andy Heninger
d6b88d49e3 ICU-20939 Fix problem w regexp \b boundaries & UTF-8 text
In regular expressions, when testing for word boundaries with \b, the
boundaries were incorrect when in Unicode mode, meaning that an ICU word break
iterator is being used to find the boundaries, and the text being matched is
UTF-8 encoded.

The bug stemmed from a misunderstanding of how string indexes work with UText
and break iterators, leading to the inclusion of code to convert from UTF-8 to
UTF-16 indexing, when what was wanted was the original UTF-8 index everywhere.
Removing the indexing conversion fixes the problem.
2020-02-03 16:51:17 -08:00
Markus Scherer
60b567d6ab ICU-20917 LocaleMatcher: prefer a more-default locale 2020-01-02 18:00:52 -08:00
Markus Scherer
ad638c274e ICU-20916 LocaleMatcher distinguish between equivalent locales
- equivalent but originally unequal
- locale distance shifted left for additional fraction bits with micro distance
- Java more verbose matcher debug output
See #949
2019-12-20 09:36:57 -08:00
Andy Heninger
faa2f9f9e1 ICU-20303 Break Iterator, improve handling of look-ahead rules.
- Merge the look-ahead results slots used when multiple rules share a common accepting state.
- Sequentially number the look-ahead result slot. Will eventually allow replacing the runtime map with an array.
- Inhibit chaining out of look-ahead rules. This could never actually happen; when a hard break
  rule matches, the engine is stopped immediately, but the state table was being constructed
  as if it could  happen. Reduces table size for line break rules.
- Remove incorrect handling of fAccepting and fLookAhead fields of a state table row
  when removing duplicate states. Look-ahead slot number was being mis-interpreted as a state number.
2019-12-13 13:17:21 -08:00
Shane F. Carr
39eb0f4fbf
ICU-20919 Merge maint/maint-66 (release-66-preview) to master 2019-12-11 15:25:36 -08:00
Andy Heninger
197e0239ab ICU-20893 Line break tailorings updated to Unicode 13. 2019-11-26 15:25:06 -08:00
Shane Carr
017c8b762e ICU-20890 Change locale_dependencies.py into LOCALE_DEPS.json files
- Refactors Python to make I/O operations more abstract
- Adds stable sample data for Python test
2019-11-22 20:23:30 -08:00
Peter Edberg
04c8616f93 ICU-20857 integrate CLDR release-36-1-preview to maint-66 2019-11-22 19:01:36 -08:00
Markus Scherer
a7e378d587 ICU-20893 Unicode 13 beta
See PR #915, see changes.txt
- Unicode 13 beta data as of 2019-nov-21
- uprops.icu format version 7.7 with more bits for Script/Script_Extensions
- more bits in spoof checker ScriptSet
- root line break rules adjusted for UAX 14 changes, from Andy
- line break tailorings not yet in sync with root
2019-11-21 17:35:53 -08:00
Shane Carr
00946cef43 ICU-20709 Moving rounder call before number properties.
- Changes EXCEPT_ZERO notation to hide sign on numbers that round to zero.
- Adds additional tests for this behavior.
2019-11-05 14:43:34 -08:00
Peter Edberg
e25796f6e5 ICU-20801 integrate CLDR release-36-alpha2, update MeasureUnits (#809) 2019-09-06 14:07:36 -07:00
Andy Heninger
327087150f ICU-20618 Regex nested lookaround expressions, clean up active match region handling. 2019-08-19 13:31:34 -07:00
Markus Scherer
41c24b6c00 ICU-9695 port LocaleMatcher to C++ 2019-08-19 10:41:35 -07:00
Shane Carr
d983221543 ICU-20764 Allow top-level include and exclude in data filter rules. 2019-08-13 15:12:32 -07:00
Shane Carr
513b0c20b0 ICU-13743 Adding number permutation test.
Adds a test suite in C++ and Java to test many permutations of options in NumberFormatter.
2019-08-12 23:34:51 -07:00
Shane Carr
b4d41b0561 ICU-20737 Removing Python dependency on distutils.
Deletes tstfiles.mk and merges the list into BUILDRULES.py
2019-08-12 15:12:48 -07:00
Shane Carr
afab3f992c ICU-13780 Removing DecimalFormat_ICU58 (finally). 2019-08-12 14:59:45 -07:00
Andy Heninger
fa240d49cc ICU-13637 Break Iterator Rule Updates for Indic Grapheme Clusters. 2019-06-27 17:17:26 -07:00
Andy Heninger
5c23416308 ICU-13637 Documentation for doing break iterator updates. 2019-06-21 10:31:40 -07:00
Shane Carr
8667d0a106 ICU-20639 Add "mol" 3-letter language code to C++ map. 2019-06-18 13:47:27 -07:00
Shane Carr
c8c3fbca28 ICU-20616 Allow bidi marks around the sign in exponent parsing. 2019-05-27 22:39:18 -07:00
Shane Carr
702fdb6c33 ICU-20593 Renaming Python buildtool to icutools.databuilder. 2019-05-07 13:42:06 -07:00
Andy Heninger
d685cacd9b ICU-20391 Fix regexp crash with nested look-behinds, from fuzz testing. 2019-04-17 22:17:47 -07:00
Shane F. Carr
14eb026570
ICU-20511 Merge release-64-2 to master 2019-04-17 14:15:59 -07:00
yumaoka
f508bc491e ICU-20554 Disabled current date sensitive Japanese era test cases for now. 2019-04-15 09:49:04 -04:00
Shane F. Carr
be25c277fd
ICU-20511 Merge release-64-2-rc to master 2019-04-12 16:57:29 -07:00
Andy Heninger
bdb1806580 ICU-20544 Regex, Fix assertion failure in positive look-behind 2019-04-12 15:27:40 -07:00
Andy Heninger
7053363323 ICU-20544 Regex, fix min/max match length computation with negative look-behind patterns. 2019-04-10 22:38:25 -07:00
Steven R. Loomis
b76cb6517e ICU-20526 fix pkgdata where LD_SONAME has a trailing space
- added PKGDATA_TRAILING_SPACE to all of the pkgdataMakefile.in file.
- NOTE: Users who create their own pkgdata.inc / icupkg.inc files may need
   to recreate this PKGDATA_TRAILING_SPACE behavior.

- used the above variable, normally undefined, in mh-* files that need a trailing space

- Also, fixed use of system() in pkgdata.cpp per ICU-20538
This was causing pkgdata to return a zero status even on clang
failure, masking this issue.

(cherry picked from commit 83a0542b5b)
2019-04-05 10:53:59 -07:00
Markus Scherer
0565894534 ICU-20497 Unicode 12.1 2019-04-04 10:23:24 -07:00
Steven R. Loomis
83a0542b5b ICU-20526 fix pkgdata where LD_SONAME has a trailing space
- added PKGDATA_TRAILING_SPACE to all of the pkgdataMakefile.in file.
- NOTE: Users who create their own pkgdata.inc / icupkg.inc files may need
   to recreate this PKGDATA_TRAILING_SPACE behavior.

- used the above variable, normally undefined, in mh-* files that need a trailing space

- Also, fixed use of system() in pkgdata.cpp per ICU-20538
This was causing pkgdata to return a zero status even on clang
failure, masking this issue.
2019-04-03 16:43:42 -07:00
Markus Scherer
98589d9cc7 ICU-20203 Unicode 12 final data (only trivial changes) 2019-03-13 08:57:05 -07:00
Steven R. Loomis
3a28fb7216 ICU-20479 don’t leave junk in source directory on configure or make check
- see also ICU-20062
- add a `-B` option to the two python invocations on Windows
- set PYTHONDONTWRITEBYTECODE in configure.ac and icudefs.mk.in

Co-authored-by: Fredrik Roubert <roubert@google.com>
2019-03-08 14:28:27 -08:00
Shane Carr
60f4e1ba83 ICU-10923 Fixing dependency graph and filter logic for collation.
- Fixes filterrb.cpp to check for wildcard when at a leaf.
- Adds additional verbose logging to genrb.
- Fixes filtration to add deps to dep_targets instead of dep_files.
- Separates dep_files to common_dep_files and specific_dep_files.
2019-02-26 20:54:04 -06:00
Peter Edberg
30d2034597 ICU-20438 64rc BRS, integrate CLDR alpha2, update MeasureUnit APIs [& resolve conflicts] (#485) 2019-02-24 22:28:51 -08:00
Peter Edberg
2c1fcb0a96 ICU-20408 Integrate jpanyear support and related "ja" format changes [& resolve conflicts] (#465) 2019-02-21 11:52:33 -08:00
Shane F. Carr
7791a58a83 ICU-10923 Adding wildcard resource matching. 2019-02-20 12:20:38 -06:00
Shane F. Carr
8db0321f54 ICU-10923 Adding file replacement mechanism to buildtool. 2019-02-20 12:20:25 -06:00
Markus Scherer
ac4387a374 ICU-20203 Unicode 12 data 20190214 2019-02-15 11:37:34 -08:00
Andy Heninger
64f4dd64e2 ICU-12017 Improve line break around numbers. 2019-02-08 13:54:14 -08:00
Andy Heninger
1130b9c087 ICU-20385 Regex, fix pattern compile problem with look-behind patterns that cannot match. 2019-02-08 12:57:06 -08:00
Shane Carr
96556c2d4c ICU-10923 Fixing warning in testdata build file. 2019-02-06 18:59:31 -08:00
Shane Carr
1a453301ee ICU-10923 Adding unix-exec mode to buildtool and updating help page.
- Renames --format flag to --mode.
- Renames windirect to windows-exec.
2019-01-25 15:34:44 -08:00