ICU-7963 Break Iterator data files update for Unicode 6.0

X-SVN-Rev: 28646
This commit is contained in:
Andy Heninger 2010-09-18 01:22:35 +00:00
parent 02a21226d9
commit efa8bfba9e
8 changed files with 27 additions and 20 deletions

View File

@ -1,12 +1,12 @@
#
# Copyright (C) 2002-2009, International Business Machines Corporation and others.
# Copyright (C) 2002-2010, International Business Machines Corporation and others.
# All Rights Reserved.
#
# file: char.txt
#
# ICU Character Break Rules, also known as Grapheme Cluster Boundaries
# See Unicode Standard Annex #29.
# These rules are based on TR29 Revision 13, for Unicode Version 5.1
# These rules are based on TR29 Revision 16, for Unicode Version 6.0
#
#

View File

@ -1,12 +1,12 @@
#
# Copyright (C) 2002-2009, International Business Machines Corporation and others.
# Copyright (C) 2002-2010, International Business Machines Corporation and others.
# All Rights Reserved.
#
# file: char_th.txt
#
# ICU Character Break Rules, also known as Grapheme Cluster Boundaries
# See Unicode Standard Annex #29.
# These rules are based on TR29 Revision 13, for Unicode Version 5.1
# These rules are based on TR29 Revision 16, for Unicode Version 6.0
#
#

View File

@ -1,14 +1,16 @@
# Copyright (c) 2002-2009 International Business Machines Corporation and
# Copyright (c) 2002-2010 International Business Machines Corporation and
# others. All Rights Reserved.
#
# file: line.txt
#
# Line Breaking Rules
# Implement default line breaking as defined by
# Unicode Standard Annex #14 Revision 24 for Unicode 5.2
# Unicode Standard Annex #14 Revision 24 for Unicode 6.0
# http://www.unicode.org/reports/tr14/
#
# TODO: Rule LB 8 remains as it was in Unicode 5.2
# This is only because of a limitation of ICU break engine implementation,
# not because the older behavior is desirable.
#
# Character Classes defined by TR 14.
@ -214,6 +216,9 @@ $CM+ [$SP $ZW];
#
# LB 8 Break after zero width space
# TODO: ZW SP* <break>
# An engine change is required to write the reverse rule for this.
# For now, leave the Unicode 5.2 rule, ZW <break>
#
$LB8Breaks = [$LB4Breaks $ZW];
$LB8NonBreaks = [[$LB4NonBreaks] - [$ZW]];
@ -452,8 +457,10 @@ $LF $CR;
[$SP $ZW] [$LB4NonBreaks-$CM];
[$SP $ZW] $CM+ $CAN_CM;
# LB 8 Break after zero width space
# LB 8 ZW SP* <break>
# TODO: to implement this, we need more than one look-ahead hard break in play at a time.
# Requires an engine enhancement.
# / $SP* $ZW
# LB 9,10 Combining marks.
# X $CM needs to behave like X, where X is not $SP or controls.

View File

@ -1,12 +1,12 @@
#
# Copyright (C) 2002-2009, International Business Machines Corporation and others.
# Copyright (C) 2002-2010, International Business Machines Corporation and others.
# All Rights Reserved.
#
# file: sent.txt
#
# ICU Sentence Break Rules
# See Unicode Standard Annex #29.
# These rules are based on UAX 29 Revision 13 for Unicode Version 5.1.0
# These rules are based on UAX 29 Revision 16 for Unicode Version 6.0
#

View File

@ -1,12 +1,12 @@
#
# Copyright (C) 2002-2009, International Business Machines Corporation and others.
# Copyright (C) 2002-2010, International Business Machines Corporation and others.
# All Rights Reserved.
#
# file: sent_el.txt
#
# ICU Sentence Break Rules
# See Unicode Standard Annex #29.
# These rules are based on UAX 29 Revision 13 for Unicode Version 5.1.0
# These rules are based on UAX 29 Revision 16 for Unicode Version 6.0
#

View File

@ -1,12 +1,12 @@
#
# Copyright (C) 2002-2009, International Business Machines Corporation
# Copyright (C) 2002-2010, International Business Machines Corporation
# and others. All Rights Reserved.
#
# file: word.txt
#
# ICU Word Break Rules
# See Unicode Standard Annex #29.
# These rules are based on UAX-29 Revision 13 for Unicode 5.1
# These rules are based on UAX-29 Revision 16 for Unicode 6.0
#
# Note: Updates to word.txt will usually need to be merged into
# word_POSIX.txt and word_ja.txt also.

View File

@ -1,12 +1,12 @@
#
# Copyright (C) 2002-2009, International Business Machines Corporation
# Copyright (C) 2002-2010, International Business Machines Corporation
# and others. All Rights Reserved.
#
# file: word_POSIX.txt
#
# ICU Word Break Rules, POSIX locale.
# See Unicode Standard Annex #29.
# These rules are based on UAX-29 Revision 13 for Unicode 5.1
# These rules are based on UAX-29 Revision 16 for Unicode 6.0
#
# Note: Updates to word.txt will usually need to be merged into
# word_POSIX.txt and word_ja.txt also.

View File

@ -1,12 +1,12 @@
#
# Copyright (C) 2002-2009, International Business Machines Corporation
# Copyright (C) 2002-2010, International Business Machines Corporation
# and others. All Rights Reserved.
#
# file: word_ja.txt
#
# ICU Word Break Rules
# See Unicode Standard Annex #29.
# These rules are based on UAX-29 Revision 13 for Unicode 5.1
# These rules are based on UAX-29 Revision 16 for Unicode 6.0
#
# Note: Updates to word.txt will usually need to be merged into
# word_POSIX.txt and word_ja.txt also.