/* ********************************************************************** * Copyright (C) 2001 IBM and others. All rights reserved. ********************************************************************** * Date Name Description * 06/28/2001 synwee Creation. ********************************************************************** */ #ifndef USEARCH_H #define USEARCH_H #include "unicode/utypes.h" #include "unicode/ucol.h" #include "unicode/ucoleitr.h" #include "unicode/ubrk.h" /** * C Apis for an engine that provides language-sensitive text searching based * on the comparison rules defined in a UCollator data struct, * see ucol.h. This ensures that language eccentricity can be * handled, e.g. for the German collator, characters ß and SS will be matched * if case is chosen to be ignored. * See the * "ICU Collation Design Document" for more information. *

* The algorithm implemented is a modified form of the Boyer Moore's search. * For more information see * * "Efficient Text Searching in Java", published in Java Report * in February, 1999, for further information on the algorithm. *

* There are 2 match options for selection:
* Let S' be the sub-string of a text string S between the offsets start and * end . *
* A pattern string P matches a text string S at the offsets * if *

 
 * option 1. Some canonical equivalent of P matches some canonical equivalent 
 *           of S'
 * option 2. P matches S' and if P starts or ends with a combining mark, 
 *           there exists no non-ignorable combining mark before or after S’ 
 *           in S respectively. 
 * 
* Option 2. will be the default· *

* This search has APIs similar to that of other text iteration mechanisms * such as the break iterators in ubrk.h. Using these * APIs, it is easy to scan through text looking for all occurances of * a given pattern. This search iterator allows changing of direction by * calling a reset followed by a next or previous. * Though a direction change can occur without calling reset first, * this operation comes with some speed penalty. * Generally, match results in the forward direction will match the result * matches in the backwards direction in the reverse order *

* usearch.h provides APIs to specify the starting position * within the text string to be searched, e.g. usearch_setOffset, * usearch_preceding and usearch_following. Since the * starting position will be set as it is specified, please take note that * there are some dangerous positions which the search may render incorrect * results: *