scuffed-code/icu4c/source/common/utrie2.cpp
Markus Scherer fe3eb3ed5c
ICU-13530 add UCPTrie/CodePointTrie, switch normalization to use it (#48)
* ICU-13530 copy C/C++ files UTrie2 -> UTrie3

X-SVN-Rev: 40754

* ICU-13530 UTrie3 new files copied from UTrie2: rename types/functions/macros

X-SVN-Rev: 40755

* ICU-13530 debug-print building each UTrie2

X-SVN-Rev: 40756

* ICU-13530 remove two-byte-UTF-8 errorValue block; move highValue from end of data array into header; add errorValue to header

X-SVN-Rev: 40762

* ICU-13530 UTrie3 U16_NEXT/PREV: errorValue for unpaired surrogates

X-SVN-Rev: 40763

* ICU-13530 no more separate values for lead surrogate code units

X-SVN-Rev: 40764

* ICU-13530 change from 11:5 trie bits to 10:6 for simpler UTF-8 code

X-SVN-Rev: 40766

* ICU-13530 UTrie2 build UTrie3 as well, print sizes

X-SVN-Rev: 40767

* ICU-13530 debug-print countSame, sumOverlaps, countInitial

X-SVN-Rev: 40768

* ICU-13530 debug-print whether trie is for CanonIterData

X-SVN-Rev: 40769

* ICU-13530 no index-shift for BMP data, no separate index-2 for 2-byte UTF-8; builder changes incomplete

X-SVN-Rev: 40777

* ICU-13530 remove errorValue and highStart from UNewTrie3

X-SVN-Rev: 40778

* ICU-13530 rewrite UTrie3 builder code

X-SVN-Rev: 40783

* ICU-13530 UTrie3 bug fixes

X-SVN-Rev: 40788

* ICU-13530 fully re-inline _UTRIE3_U8_NEXT()

X-SVN-Rev: 40790

* ICU-13530 find most common all-same data block for dataNullBlock and initialValue

X-SVN-Rev: 40792

* ICU-13530 UTrie3 iterator functions take start and return the end of a range, rather than callback call for each range

X-SVN-Rev: 40800

* ICU-13530 mask off unused data value bits before building a UTrie3 with values less than 32 bits wide

X-SVN-Rev: 40803

* ICU-13530 split utrie3builder.h out of utrie3.h

X-SVN-Rev: 40804

* ICU-13530 separate types UTrie3 vs. UTrie3Builder, implement builder as wrapper over C++ class Trie3Builder in .cpp

X-SVN-Rev: 40809

* ICU-13530 function to make a UTrie3Builder from a UTrie3

X-SVN-Rev: 40810

* ICU-13530 debug-print some data; some cleanup

X-SVN-Rev: 40865

* ICU-13530 BMP 10:6 but supplementary 10:6:4

X-SVN-Rev: 40984

* ICU-13530 move errorValue & highValue to the end of the data table, minimal padding to 4 bytes

X-SVN-Rev: 41011

* ICU-13530 index-1 table gap of index-2 null blocks

X-SVN-Rev: 41018

* ICU-13530 test with more than 128k compacted data

X-SVN-Rev: 41034

* ICU-13530 supplementary bits 11:5:4 saves a little space

X-SVN-Rev: 41039

* ICU-13530 supplementary bits 6:5:5:4 instead of gap: about same size but simpler

X-SVN-Rev: 41050

* ICU-13530 remove unnecessary utrie3_clone(built trie)

X-SVN-Rev: 41058

* ICU-13530 remove unnecessary UTrie3StringIterator

X-SVN-Rev: 41059

* ICU-13530 back to UTRIE3_GET...() macros *returning* data values

X-SVN-Rev: 41060

* ICU-13530 fast vs. small

X-SVN-Rev: 41066

* ICU-13530 always load NFC data, add simple normalization performance test

X-SVN-Rev: 41110

* ICU-13530 change normalization main trie to UTrie3 with special values for lead surrogates; forbid non-inert surrogate code *points* because unable to store values different from code *units*; runtime code work around that for code point lookup and iteration; adjust UTS 46 for normalization no longer mapping unpaired surrogates to U+FFFD

X-SVN-Rev: 41122

* ICU-13530 simplenormperf bug fix and NFC base line

X-SVN-Rev: 41126

* ICU-13530 move normalization getRange skipping lead surrogates to API getRangeSkipLead()

X-SVN-Rev: 41182

* ICU-13530 switch CanonIterData and gennorm2 Norms to UTrie3

X-SVN-Rev: 41183

* ICU-13530 remove unused overwrite parameter from setRange()

X-SVN-Rev: 41184

* ICU-13530 getRange skip lead -> fixed surrogates

X-SVN-Rev: 41219

* ICU-13530 minor cleanup

X-SVN-Rev: 41221

* ICU-13530 UTS 46 code map unpaired surrogates to U+FFFD before normalization

X-SVN-Rev: 41224

* ICU-13530 minor internal-docs cleanup

X-SVN-Rev: 41225

* ICU-13530 rename UTrie3 to UCPTrie, and other name changes

X-SVN-Rev: 41226

* ICU-13530 add 8-bit data option; add type-any & valueBits-any for fromBinary(); macros consistently source type then data width

X-SVN-Rev: 41234

* ICU-13530 scrub the API docs for the proposal

X-SVN-Rev: 41319

* ICU-13530 tag internal definitions as such, or move them to an internal header

X-SVN-Rev: 41320

* ICU-13530 Java API skeleton

X-SVN-Rev: 41326

* ICU-13530 API feedback: ValueWidth, MutableCodePointTrie, base CodePointMap, ...

X-SVN-Rev: 41382

* ICU-13530 add UCPTrie valueWidth field and padding, and combine data pointers into a union

X-SVN-Rev: 41408

* ICU-13530 switch some macros to using dataAccess parameter: separate index vs. data lookups, no macro variant for each value width

X-SVN-Rev: 41409

* ICU-13530 StringIterator is no longer a java.util.Iterator (bad fit)

X-SVN-Rev: 41455

* ICU-13530 CodePointTrie.java code complete

X-SVN-Rev: 41518

* ICU-13530 finish Java port incl test; keep C++ parallel

* ICU-13530 adjust API for feedback: rename HandleValue to FilterValue, change getRange+getRangeFixedSurr(bool allSurr) to enum RangeOption+getRange(enum option); change remaining C macros to use dataAccess for 16/32/8-bit value widths; fix/clarify some API docs

* ICU-13530 add javadoc

* ICU-13530 document UCPTrie binary data format

* ICU-13530 update .nrm formatVersion 3->4, document change in surrogate handling with new trie

* ICU-13530 re-hardcode NFC data

* move trie swapper code into new file; add new files to Windows project files; turn off trie debugging

* ICU-13530 minor cleanup

* ICU-13530 test more range starts; fix a C test leak

* ICU-13530 regenerate Java data from scratch

* ICU-13530 review feedback changes: API docs typos, more @internal, C++11 field initializers, fix potential leak in MutableCodePointTrie::fromUCPTrie()

* ICU-13530 rename interface FilterValue to ValueFilter
2018-09-27 14:27:38 -07:00

664 lines
20 KiB
C++

// © 2016 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html
/*
******************************************************************************
*
* Copyright (C) 2001-2014, International Business Machines
* Corporation and others. All Rights Reserved.
*
******************************************************************************
* file name: utrie2.cpp
* encoding: UTF-8
* tab size: 8 (not used)
* indentation:4
*
* created on: 2008aug16 (starting from a copy of utrie.c)
* created by: Markus W. Scherer
*
* This is a common implementation of a Unicode trie.
* It is a kind of compressed, serializable table of 16- or 32-bit values associated with
* Unicode code points (0..0x10ffff).
* This is the second common version of a Unicode trie (hence the name UTrie2).
* See utrie2.h for a comparison.
*
* This file contains only the runtime and enumeration code, for read-only access.
* See utrie2_builder.c for the builder code.
*/
#include "unicode/utypes.h"
#ifdef UCPTRIE_DEBUG
#include "unicode/umutablecptrie.h"
#endif
#include "unicode/utf.h"
#include "unicode/utf8.h"
#include "unicode/utf16.h"
#include "cmemory.h"
#include "utrie2.h"
#include "utrie2_impl.h"
#include "uassert.h"
/* Public UTrie2 API implementation ----------------------------------------- */
static uint32_t
get32(const UNewTrie2 *trie, UChar32 c, UBool fromLSCP) {
int32_t i2, block;
if(c>=trie->highStart && (!U_IS_LEAD(c) || fromLSCP)) {
return trie->data[trie->dataLength-UTRIE2_DATA_GRANULARITY];
}
if(U_IS_LEAD(c) && fromLSCP) {
i2=(UTRIE2_LSCP_INDEX_2_OFFSET-(0xd800>>UTRIE2_SHIFT_2))+
(c>>UTRIE2_SHIFT_2);
} else {
i2=trie->index1[c>>UTRIE2_SHIFT_1]+
((c>>UTRIE2_SHIFT_2)&UTRIE2_INDEX_2_MASK);
}
block=trie->index2[i2];
return trie->data[block+(c&UTRIE2_DATA_MASK)];
}
U_CAPI uint32_t U_EXPORT2
utrie2_get32(const UTrie2 *trie, UChar32 c) {
if(trie->data16!=NULL) {
return UTRIE2_GET16(trie, c);
} else if(trie->data32!=NULL) {
return UTRIE2_GET32(trie, c);
} else if((uint32_t)c>0x10ffff) {
return trie->errorValue;
} else {
return get32(trie->newTrie, c, TRUE);
}
}
U_CAPI uint32_t U_EXPORT2
utrie2_get32FromLeadSurrogateCodeUnit(const UTrie2 *trie, UChar32 c) {
if(!U_IS_LEAD(c)) {
return trie->errorValue;
}
if(trie->data16!=NULL) {
return UTRIE2_GET16_FROM_U16_SINGLE_LEAD(trie, c);
} else if(trie->data32!=NULL) {
return UTRIE2_GET32_FROM_U16_SINGLE_LEAD(trie, c);
} else {
return get32(trie->newTrie, c, FALSE);
}
}
static inline int32_t
u8Index(const UTrie2 *trie, UChar32 c, int32_t i) {
int32_t idx=
_UTRIE2_INDEX_FROM_CP(
trie,
trie->data32==NULL ? trie->indexLength : 0,
c);
return (idx<<3)|i;
}
U_CAPI int32_t U_EXPORT2
utrie2_internalU8NextIndex(const UTrie2 *trie, UChar32 c,
const uint8_t *src, const uint8_t *limit) {
int32_t i, length;
i=0;
/* support 64-bit pointers by avoiding cast of arbitrary difference */
if((limit-src)<=7) {
length=(int32_t)(limit-src);
} else {
length=7;
}
c=utf8_nextCharSafeBody(src, &i, length, c, -1);
return u8Index(trie, c, i);
}
U_CAPI int32_t U_EXPORT2
utrie2_internalU8PrevIndex(const UTrie2 *trie, UChar32 c,
const uint8_t *start, const uint8_t *src) {
int32_t i, length;
/* support 64-bit pointers by avoiding cast of arbitrary difference */
if((src-start)<=7) {
i=length=(int32_t)(src-start);
} else {
i=length=7;
start=src-7;
}
c=utf8_prevCharSafeBody(start, 0, &i, c, -1);
i=length-i; /* number of bytes read backward from src */
return u8Index(trie, c, i);
}
U_CAPI UTrie2 * U_EXPORT2
utrie2_openFromSerialized(UTrie2ValueBits valueBits,
const void *data, int32_t length, int32_t *pActualLength,
UErrorCode *pErrorCode) {
const UTrie2Header *header;
const uint16_t *p16;
int32_t actualLength;
UTrie2 tempTrie;
UTrie2 *trie;
if(U_FAILURE(*pErrorCode)) {
return 0;
}
if( length<=0 || (U_POINTER_MASK_LSB(data, 3)!=0) ||
valueBits<0 || UTRIE2_COUNT_VALUE_BITS<=valueBits
) {
*pErrorCode=U_ILLEGAL_ARGUMENT_ERROR;
return 0;
}
/* enough data for a trie header? */
if(length<(int32_t)sizeof(UTrie2Header)) {
*pErrorCode=U_INVALID_FORMAT_ERROR;
return 0;
}
/* check the signature */
header=(const UTrie2Header *)data;
if(header->signature!=UTRIE2_SIG) {
*pErrorCode=U_INVALID_FORMAT_ERROR;
return 0;
}
/* get the options */
if(valueBits!=(UTrie2ValueBits)(header->options&UTRIE2_OPTIONS_VALUE_BITS_MASK)) {
*pErrorCode=U_INVALID_FORMAT_ERROR;
return 0;
}
/* get the length values and offsets */
uprv_memset(&tempTrie, 0, sizeof(tempTrie));
tempTrie.indexLength=header->indexLength;
tempTrie.dataLength=header->shiftedDataLength<<UTRIE2_INDEX_SHIFT;
tempTrie.index2NullOffset=header->index2NullOffset;
tempTrie.dataNullOffset=header->dataNullOffset;
tempTrie.highStart=header->shiftedHighStart<<UTRIE2_SHIFT_1;
tempTrie.highValueIndex=tempTrie.dataLength-UTRIE2_DATA_GRANULARITY;
if(valueBits==UTRIE2_16_VALUE_BITS) {
tempTrie.highValueIndex+=tempTrie.indexLength;
}
/* calculate the actual length */
actualLength=(int32_t)sizeof(UTrie2Header)+tempTrie.indexLength*2;
if(valueBits==UTRIE2_16_VALUE_BITS) {
actualLength+=tempTrie.dataLength*2;
} else {
actualLength+=tempTrie.dataLength*4;
}
if(length<actualLength) {
*pErrorCode=U_INVALID_FORMAT_ERROR; /* not enough bytes */
return 0;
}
/* allocate the trie */
trie=(UTrie2 *)uprv_malloc(sizeof(UTrie2));
if(trie==NULL) {
*pErrorCode=U_MEMORY_ALLOCATION_ERROR;
return 0;
}
uprv_memcpy(trie, &tempTrie, sizeof(tempTrie));
trie->memory=(uint32_t *)data;
trie->length=actualLength;
trie->isMemoryOwned=FALSE;
#ifdef UTRIE2_DEBUG
trie->name="fromSerialized";
#endif
/* set the pointers to its index and data arrays */
p16=(const uint16_t *)(header+1);
trie->index=p16;
p16+=trie->indexLength;
/* get the data */
switch(valueBits) {
case UTRIE2_16_VALUE_BITS:
trie->data16=p16;
trie->data32=NULL;
trie->initialValue=trie->index[trie->dataNullOffset];
trie->errorValue=trie->data16[UTRIE2_BAD_UTF8_DATA_OFFSET];
break;
case UTRIE2_32_VALUE_BITS:
trie->data16=NULL;
trie->data32=(const uint32_t *)p16;
trie->initialValue=trie->data32[trie->dataNullOffset];
trie->errorValue=trie->data32[UTRIE2_BAD_UTF8_DATA_OFFSET];
break;
default:
*pErrorCode=U_INVALID_FORMAT_ERROR;
return 0;
}
if(pActualLength!=NULL) {
*pActualLength=actualLength;
}
return trie;
}
U_CAPI UTrie2 * U_EXPORT2
utrie2_openDummy(UTrie2ValueBits valueBits,
uint32_t initialValue, uint32_t errorValue,
UErrorCode *pErrorCode) {
UTrie2 *trie;
UTrie2Header *header;
uint32_t *p;
uint16_t *dest16;
int32_t indexLength, dataLength, length, i;
int32_t dataMove; /* >0 if the data is moved to the end of the index array */
if(U_FAILURE(*pErrorCode)) {
return 0;
}
if(valueBits<0 || UTRIE2_COUNT_VALUE_BITS<=valueBits) {
*pErrorCode=U_ILLEGAL_ARGUMENT_ERROR;
return 0;
}
/* calculate the total length of the dummy trie data */
indexLength=UTRIE2_INDEX_1_OFFSET;
dataLength=UTRIE2_DATA_START_OFFSET+UTRIE2_DATA_GRANULARITY;
length=(int32_t)sizeof(UTrie2Header)+indexLength*2;
if(valueBits==UTRIE2_16_VALUE_BITS) {
length+=dataLength*2;
} else {
length+=dataLength*4;
}
/* allocate the trie */
trie=(UTrie2 *)uprv_malloc(sizeof(UTrie2));
if(trie==NULL) {
*pErrorCode=U_MEMORY_ALLOCATION_ERROR;
return 0;
}
uprv_memset(trie, 0, sizeof(UTrie2));
trie->memory=uprv_malloc(length);
if(trie->memory==NULL) {
uprv_free(trie);
*pErrorCode=U_MEMORY_ALLOCATION_ERROR;
return 0;
}
trie->length=length;
trie->isMemoryOwned=TRUE;
/* set the UTrie2 fields */
if(valueBits==UTRIE2_16_VALUE_BITS) {
dataMove=indexLength;
} else {
dataMove=0;
}
trie->indexLength=indexLength;
trie->dataLength=dataLength;
trie->index2NullOffset=UTRIE2_INDEX_2_OFFSET;
trie->dataNullOffset=(uint16_t)dataMove;
trie->initialValue=initialValue;
trie->errorValue=errorValue;
trie->highStart=0;
trie->highValueIndex=dataMove+UTRIE2_DATA_START_OFFSET;
#ifdef UTRIE2_DEBUG
trie->name="dummy";
#endif
/* set the header fields */
header=(UTrie2Header *)trie->memory;
header->signature=UTRIE2_SIG; /* "Tri2" */
header->options=(uint16_t)valueBits;
header->indexLength=(uint16_t)indexLength;
header->shiftedDataLength=(uint16_t)(dataLength>>UTRIE2_INDEX_SHIFT);
header->index2NullOffset=(uint16_t)UTRIE2_INDEX_2_OFFSET;
header->dataNullOffset=(uint16_t)dataMove;
header->shiftedHighStart=0;
/* fill the index and data arrays */
dest16=(uint16_t *)(header+1);
trie->index=dest16;
/* write the index-2 array values shifted right by UTRIE2_INDEX_SHIFT */
for(i=0; i<UTRIE2_INDEX_2_BMP_LENGTH; ++i) {
*dest16++=(uint16_t)(dataMove>>UTRIE2_INDEX_SHIFT); /* null data block */
}
/* write UTF-8 2-byte index-2 values, not right-shifted */
for(i=0; i<(0xc2-0xc0); ++i) { /* C0..C1 */
*dest16++=(uint16_t)(dataMove+UTRIE2_BAD_UTF8_DATA_OFFSET);
}
for(; i<(0xe0-0xc0); ++i) { /* C2..DF */
*dest16++=(uint16_t)dataMove;
}
/* write the 16/32-bit data array */
switch(valueBits) {
case UTRIE2_16_VALUE_BITS:
/* write 16-bit data values */
trie->data16=dest16;
trie->data32=NULL;
for(i=0; i<0x80; ++i) {
*dest16++=(uint16_t)initialValue;
}
for(; i<0xc0; ++i) {
*dest16++=(uint16_t)errorValue;
}
/* highValue and reserved values */
for(i=0; i<UTRIE2_DATA_GRANULARITY; ++i) {
*dest16++=(uint16_t)initialValue;
}
break;
case UTRIE2_32_VALUE_BITS:
/* write 32-bit data values */
p=(uint32_t *)dest16;
trie->data16=NULL;
trie->data32=p;
for(i=0; i<0x80; ++i) {
*p++=initialValue;
}
for(; i<0xc0; ++i) {
*p++=errorValue;
}
/* highValue and reserved values */
for(i=0; i<UTRIE2_DATA_GRANULARITY; ++i) {
*p++=initialValue;
}
break;
default:
*pErrorCode=U_ILLEGAL_ARGUMENT_ERROR;
return 0;
}
return trie;
}
U_CAPI void U_EXPORT2
utrie2_close(UTrie2 *trie) {
if(trie!=NULL) {
if(trie->isMemoryOwned) {
uprv_free(trie->memory);
}
if(trie->newTrie!=NULL) {
uprv_free(trie->newTrie->data);
#ifdef UCPTRIE_DEBUG
umutablecptrie_close(trie->newTrie->t3);
#endif
uprv_free(trie->newTrie);
}
uprv_free(trie);
}
}
U_CAPI UBool U_EXPORT2
utrie2_isFrozen(const UTrie2 *trie) {
return (UBool)(trie->newTrie==NULL);
}
U_CAPI int32_t U_EXPORT2
utrie2_serialize(const UTrie2 *trie,
void *data, int32_t capacity,
UErrorCode *pErrorCode) {
/* argument check */
if(U_FAILURE(*pErrorCode)) {
return 0;
}
if( trie==NULL || trie->memory==NULL || trie->newTrie!=NULL ||
capacity<0 || (capacity>0 && (data==NULL || (U_POINTER_MASK_LSB(data, 3)!=0)))
) {
*pErrorCode=U_ILLEGAL_ARGUMENT_ERROR;
return 0;
}
if(capacity>=trie->length) {
uprv_memcpy(data, trie->memory, trie->length);
} else {
*pErrorCode=U_BUFFER_OVERFLOW_ERROR;
}
return trie->length;
}
/* enumeration -------------------------------------------------------------- */
#define MIN_VALUE(a, b) ((a)<(b) ? (a) : (b))
/* default UTrie2EnumValue() returns the input value itself */
static uint32_t U_CALLCONV
enumSameValue(const void * /*context*/, uint32_t value) {
return value;
}
/**
* Enumerate all ranges of code points with the same relevant values.
* The values are transformed from the raw trie entries by the enumValue function.
*
* Currently requires start<limit and both start and limit must be multiples
* of UTRIE2_DATA_BLOCK_LENGTH.
*
* Optimizations:
* - Skip a whole block if we know that it is filled with a single value,
* and it is the same as we visited just before.
* - Handle the null block specially because we know a priori that it is filled
* with a single value.
*/
static void
enumEitherTrie(const UTrie2 *trie,
UChar32 start, UChar32 limit,
UTrie2EnumValue *enumValue, UTrie2EnumRange *enumRange, const void *context) {
const uint32_t *data32;
const uint16_t *idx;
uint32_t value, prevValue, initialValue;
UChar32 c, prev, highStart;
int32_t j, i2Block, prevI2Block, index2NullOffset, block, prevBlock, nullBlock;
if(enumRange==NULL) {
return;
}
if(enumValue==NULL) {
enumValue=enumSameValue;
}
if(trie->newTrie==NULL) {
/* frozen trie */
idx=trie->index;
U_ASSERT(idx!=NULL); /* the following code assumes trie->newTrie is not NULL when idx is NULL */
data32=trie->data32;
index2NullOffset=trie->index2NullOffset;
nullBlock=trie->dataNullOffset;
} else {
/* unfrozen, mutable trie */
idx=NULL;
data32=trie->newTrie->data;
U_ASSERT(data32!=NULL); /* the following code assumes idx is not NULL when data32 is NULL */
index2NullOffset=trie->newTrie->index2NullOffset;
nullBlock=trie->newTrie->dataNullOffset;
}
highStart=trie->highStart;
/* get the enumeration value that corresponds to an initial-value trie data entry */
initialValue=enumValue(context, trie->initialValue);
/* set variables for previous range */
prevI2Block=-1;
prevBlock=-1;
prev=start;
prevValue=0;
/* enumerate index-2 blocks */
for(c=start; c<limit && c<highStart;) {
/* Code point limit for iterating inside this i2Block. */
UChar32 tempLimit=c+UTRIE2_CP_PER_INDEX_1_ENTRY;
if(limit<tempLimit) {
tempLimit=limit;
}
if(c<=0xffff) {
if(!U_IS_SURROGATE(c)) {
i2Block=c>>UTRIE2_SHIFT_2;
} else if(U_IS_SURROGATE_LEAD(c)) {
/*
* Enumerate values for lead surrogate code points, not code units:
* This special block has half the normal length.
*/
i2Block=UTRIE2_LSCP_INDEX_2_OFFSET;
tempLimit=MIN_VALUE(0xdc00, limit);
} else {
/*
* Switch back to the normal part of the index-2 table.
* Enumerate the second half of the surrogates block.
*/
i2Block=0xd800>>UTRIE2_SHIFT_2;
tempLimit=MIN_VALUE(0xe000, limit);
}
} else {
/* supplementary code points */
if(idx!=NULL) {
i2Block=idx[(UTRIE2_INDEX_1_OFFSET-UTRIE2_OMITTED_BMP_INDEX_1_LENGTH)+
(c>>UTRIE2_SHIFT_1)];
} else {
i2Block=trie->newTrie->index1[c>>UTRIE2_SHIFT_1];
}
if(i2Block==prevI2Block && (c-prev)>=UTRIE2_CP_PER_INDEX_1_ENTRY) {
/*
* The index-2 block is the same as the previous one, and filled with prevValue.
* Only possible for supplementary code points because the linear-BMP index-2
* table creates unique i2Block values.
*/
c+=UTRIE2_CP_PER_INDEX_1_ENTRY;
continue;
}
}
prevI2Block=i2Block;
if(i2Block==index2NullOffset) {
/* this is the null index-2 block */
if(prevValue!=initialValue) {
if(prev<c && !enumRange(context, prev, c-1, prevValue)) {
return;
}
prevBlock=nullBlock;
prev=c;
prevValue=initialValue;
}
c+=UTRIE2_CP_PER_INDEX_1_ENTRY;
} else {
/* enumerate data blocks for one index-2 block */
int32_t i2, i2Limit;
i2=(c>>UTRIE2_SHIFT_2)&UTRIE2_INDEX_2_MASK;
if((c>>UTRIE2_SHIFT_1)==(tempLimit>>UTRIE2_SHIFT_1)) {
i2Limit=(tempLimit>>UTRIE2_SHIFT_2)&UTRIE2_INDEX_2_MASK;
} else {
i2Limit=UTRIE2_INDEX_2_BLOCK_LENGTH;
}
for(; i2<i2Limit; ++i2) {
if(idx!=NULL) {
block=(int32_t)idx[i2Block+i2]<<UTRIE2_INDEX_SHIFT;
} else {
block=trie->newTrie->index2[i2Block+i2];
}
if(block==prevBlock && (c-prev)>=UTRIE2_DATA_BLOCK_LENGTH) {
/* the block is the same as the previous one, and filled with prevValue */
c+=UTRIE2_DATA_BLOCK_LENGTH;
continue;
}
prevBlock=block;
if(block==nullBlock) {
/* this is the null data block */
if(prevValue!=initialValue) {
if(prev<c && !enumRange(context, prev, c-1, prevValue)) {
return;
}
prev=c;
prevValue=initialValue;
}
c+=UTRIE2_DATA_BLOCK_LENGTH;
} else {
for(j=0; j<UTRIE2_DATA_BLOCK_LENGTH; ++j) {
value=enumValue(context, data32!=NULL ? data32[block+j] : idx[block+j]);
if(value!=prevValue) {
if(prev<c && !enumRange(context, prev, c-1, prevValue)) {
return;
}
prev=c;
prevValue=value;
}
++c;
}
}
}
}
}
if(c>limit) {
c=limit; /* could be higher if in the index2NullOffset */
} else if(c<limit) {
/* c==highStart<limit */
uint32_t highValue;
if(idx!=NULL) {
highValue=
data32!=NULL ?
data32[trie->highValueIndex] :
idx[trie->highValueIndex];
} else {
highValue=trie->newTrie->data[trie->newTrie->dataLength-UTRIE2_DATA_GRANULARITY];
}
value=enumValue(context, highValue);
if(value!=prevValue) {
if(prev<c && !enumRange(context, prev, c-1, prevValue)) {
return;
}
prev=c;
prevValue=value;
}
c=limit;
}
/* deliver last range */
enumRange(context, prev, c-1, prevValue);
}
U_CAPI void U_EXPORT2
utrie2_enum(const UTrie2 *trie,
UTrie2EnumValue *enumValue, UTrie2EnumRange *enumRange, const void *context) {
enumEitherTrie(trie, 0, 0x110000, enumValue, enumRange, context);
}
U_CAPI void U_EXPORT2
utrie2_enumForLeadSurrogate(const UTrie2 *trie, UChar32 lead,
UTrie2EnumValue *enumValue, UTrie2EnumRange *enumRange,
const void *context) {
if(!U16_IS_LEAD(lead)) {
return;
}
lead=(lead-0xd7c0)<<10; /* start code point */
enumEitherTrie(trie, lead, lead+0x400, enumValue, enumRange, context);
}
/* C++ convenience wrappers ------------------------------------------------- */
U_NAMESPACE_BEGIN
uint16_t BackwardUTrie2StringIterator::previous16() {
codePointLimit=codePointStart;
if(start>=codePointStart) {
codePoint=U_SENTINEL;
return static_cast<uint16_t>(trie->errorValue);
}
uint16_t result;
UTRIE2_U16_PREV16(trie, start, codePointStart, codePoint, result);
return result;
}
uint16_t ForwardUTrie2StringIterator::next16() {
codePointStart=codePointLimit;
if(codePointLimit==limit) {
codePoint=U_SENTINEL;
return static_cast<uint16_t>(trie->errorValue);
}
uint16_t result;
UTRIE2_U16_NEXT16(trie, codePointLimit, limit, codePoint, result);
return result;
}
U_NAMESPACE_END