d6b88d49e3
In regular expressions, when testing for word boundaries with \b, the boundaries were incorrect when in Unicode mode, meaning that an ICU word break iterator is being used to find the boundaries, and the text being matched is UTF-8 encoded. The bug stemmed from a misunderstanding of how string indexes work with UText and break iterators, leading to the inclusion of code to convert from UTF-8 to UTF-16 indexing, when what was wanted was the original UTF-8 index everywhere. Removing the indexing conversion fixes the problem. |
||
---|---|---|
.ci-builds | ||
.github | ||
docs | ||
icu4c | ||
icu4j | ||
tools | ||
vendor/double-conversion | ||
.appveyor.yml | ||
.cpyskip.txt | ||
.gitattributes | ||
.gitignore | ||
.travis.yml | ||
KEYS | ||
README.md |
International Components for Unicode
This is the repository for the International Components for Unicode. The ICU project is under the stewardship of The Unicode Consortium.
Build Status (master branch)
Build | Status |
---|---|
TravisCI | |
Azure Pipelines | |
Azure Pipelines (Exhaustive Tests) | |
AppVeyor | |
Fuzzing |
Subdirectories and Information
icu4c/
ICU for C/C++icu4j/
ICU for Javatools/
Toolsvendor/
Vendor dependencies
License
Please see ./icu4c/LICENSE (C and J are under an identical license file.)
Copyright © 2016 and later Unicode, Inc. and others. All Rights Reserved. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. Terms of Use and License