* posix/regexec.c (prune_impossible_nodes): Handle sifted_states[0]
being NULL also if there are no backreferences.
* posix/rxspencer/tests: Add testcases.
check_arrival_add_next_nodes): Avoid using uninitialized variable.
* malloc/memusage.c (dest): Fix a bunch of warnings on 32-bit arches.
* sysdeps/i386/fpu/libm-test-ulps: Update for GCC 4.0.x.
2005-09-06 Paul Eggert <eggert@cs.ucla.edu>
Ulrich Drepper <drepper@redhat.com>
[BZ #1302]
Change bitset word type from unsigned int to unsigned long int,
as this has better performance on typical 64-bit hosts. Change
bitset type name to bitset_t.
* posix/regcomp.c (build_equiv_class, build_charclass):
(build_range_exp, build_collating_symbol):
Prefer bitset_t to re_bitset_ptr_t in prototypes, when the actual
argument is a bitset. This is merely a style issue, but it makes
it clearer that an entire array is expected.
(re_compile_fastmap_iter, init_dfa, init_word_char, optimize_subexps,
lower_subexp): Adjust for new bitset_t definition.
(lower_subexp, parse_bracket_exp, built_charclass_op): Likewise.
* posix/regex_internal.h (bitset_set, bitset_clear, bitset_contain,
bitset_not, bitset_merge, bitset_set_all, bitset_mask): Likewise.
* posix/regexec.c (check_dst_limits_calc_pos_1,
check_subexp_matching_top, build_trtable, group_nodes_into_DFAstates):
Likewise.
* posix/regcomp.c (utf8_sb_map): Don't assume initializer
== 0xffffffff.
* posix/regex_internal.h (BITSET_WORD_BITS): Renamed from UINT_BITS.
All uses changed.
(BITSET_WORDS): Renamed from BITSET_UINTS. All uses changed.
(bitset_word_t): New type, replacing 'unsigned int' for bitset uses.
All uses changed.
(BITSET_WORD_MAX): New macro.
(bitset_set, bitset_clear, bitset_contain, bitset_empty,
(bitset_set_all, bitset_copy): Adjust for bitset_t change.
(bitset_empty, bitset_copy):
Prefer sizeof (bitset_t) to multiplying it out ourselves.
(bitset_not_merge): Remove; unused.
(bitset_contain): Return bool, not unsigned int with one bit on.
All callers changed.
* posix/regexec.c (build_trtable): Don't assume bitset_t has no
stricter alignment than re_node_set; do this by defining a new
internal type struct dests_alloc and using it to allocate memory.
(get_subexp): Likewise.
(check_arrival): Likewise.
(check_arrival_expand_ecl): Mark DFA parameter as const.
(check_arrival_expand_ecl_sub): Likewise.
(check_arrival_expand_ecl): Mark eclosure as const.
mbrtowc for very simple UTF-8 case.
2005-09-01 Paul Eggert <eggert@cs.ucla.edu>
* posix/regex_internal.c (build_wcs_upper_buffer): Fix portability
bugs in int versus size_t comparisons.
2005-09-06 Ulrich Drepper <drepper@redhat.com>
* posix/regex_internal.c (re_acquire_state): Make DFA pointer arg
a pointer-to-const.
(re_acquire_state_context): Likewise.
* posix/regex_internal.h: Adjust prototypes.
2005-08-31 Jim Meyering <jim@meyering.net>
* posix/regcomp.c (search_duplicated_node): Make first pointer arg
a pointer-to-const.
* posix/regex_internal.c (create_ci_newstate, create_cd_newstate,
register_state): Likewise.
* posix/regexec.c (search_cur_bkref_entry, check_dst_limits):
(check_dst_limits_calc_pos_1, check_dst_limits_calc_pos):
(group_nodes_into_DFAstates): Likewise.
* posix/regexec.c (re_search_internal): Simplify update of
rm_so and rm_eo by replacing "if (A == B) A += C - B;"
with the equivalent of "if (A == B) A = C;".
2005-09-06 Ulrich Drepper <drepper@redhat.com>
* posix/regcomp.c (re_compile_internal): Change third parameter type
to size_t.
(init_dfa): Likewise. Make sure that arithmetic on pat_len doesn't
overflow.
* posix/regex_internal.h (struct re_dfa_t): Change type of nodes_alloc
and nodes_len to size_t.
* posix/regex_internal.c (re_dfa_add_node): Use size_t as type for
new_nodes_alloc. Check for overflow.
2005-08-31 Paul Eggert <eggert@cs.ucla.edu>
* posix/regcomp.c (re_compile_fastmap_iter, init_dfa, init_word_char):
(optimize_subexps, lower_subexp):
Don't assume 1<<31 has defined behavior on hosts with 32-bit int,
since the signed shift might overflow. Use 1u<<31 instead.
* posix/regex_internal.h (bitset_set, bitset_clear, bitset_contain):
Likewise.
* posix/regexec.c (check_dst_limits_calc_pos_1): Likewise.
(check_subexp_matching_top): Likewise.
* posix/regcomp.c (optimize_subexps, lower_subexp):
Use CHAR_BIT rather than 8, for clarity.
* posix/regexec.c (check_dst_limits_calc_pos_1):
(check_subexp_matching_top): Likewise.
* posix/regcomp.c (init_dfa): Make table_size unsigned, so that we
don't have to worry about portability issues when shifting it left.
Remove no-longer-needed test for table_size > 0.
* posix/regcomp.c (parse_sub_exp): Do not shift more bits than there
are in a word, as the resulting behavior is undefined.
* posix/regexec.c (check_dst_limits_calc_pos_1): Likewise;
in one case, a <= should have been an <, and in another case the
whole test was missing.
* posix/regex_internal.h (BYTE_BITS): Remove. All uses changed to
the standard name CHAR_BIT.
next_last_offset.
(struct re_dfa_t): Remove unused member states_alloc.
* posix/regcomp.c (init_dfa): Don't initialize unused members.
2005-08-25 Paul Eggert <eggert@cs.ucla.edu>
* posix/regexec.c (set_regs): Don't alloca with an unbounded size.
alloca modernization/simplification for regex.
* posix/regex.c: Remove portability cruft for alloca. This no longer
needs to be at the start of the file, and can be moved into
regex_internal.h and simplified.
* posix/regex_internal.h: Include <alloca.h>.
(__libc_use_alloca) [!defined _LIBC]: New macro.
* posix/regexec.c (build_trtable): Remove "#ifdef _LIBC",
since the code now works outside glibc.
2005-09-06 Ulrich Drepper <drepper@redhat.com>
* include/regex.h: Remove use of _RE_ARGS.
2005-08-25 Paul Eggert <eggert@cs.ucla.edu>
* posix/regexec.c (find_recover_state): Change "err" to "*err".
2005-08-24 Paul Eggert <eggert@cs.ucla.edu>
* posix/regcomp.c (regerror): Pointer args are 'restrict',
as per POSIX.
* posix/regex.h (regerror): Likewise.
* manual/pattern.texi (POSIX Regexp Compilation): Likewise.
Similarly for regcomp and regexec. Also, first 2 args of regexec
and 2nd arg of regerror are const.
* posix/regex.c: Do not include <sys/types.h>, as POSIX no longer
requires this. (The code never needed it.)
2005-08-20 Paul Eggert <eggert@cs.ucla.edu>
* posix/regexec.c (sift_states_bkref): re_node_set_insert returns
int, not reg_errcode_t.
* posix/regex_internal.c (calc_state_hash): Put 'inline' before type,
since some broken compilers warn about it otherwise.
* posix/regcomp.c (create_initial_state): Remove duplicate decl.
2005-08-20 Paul Eggert <eggert@cs.ucla.edu>
* posix/regex.h (_RE_ARGS): Remove. No longer needed, since we assume
C89 or better. All uses removed.
2005-09-06 Ulrich Drepper <drepper@redhat.com>
* posix/regex.c: Prevent using C++ compilers.
2005-08-19 Paul Eggert <eggert@cs.ucla.edu>
* posix/regcomp.c (duplicate_node): Return new index, not an error
code, and let the caller return REG_ESPACE if out of space. This
removes an uninitialied-variable warning with GCC 4.0.1, and also
avoids taking the address of a local variable. All callers
changed.
2005-09-06 Ulrich Drepper <drepper@redhat.com>
* include/time.h (__strptime_internal): Rename parameter to avoid
bogus compiler warning.
2005-08-19 Jim Meyering <jim@meyering.net>
* posix/regexec.c (proceed_next_node): Redo local variables to
avoid GCC shadowing warnings.
2005-09-06 Ulrich Drepper <drepper@redhat.com>
* posix/regex_internal.c (re_acquire_state): Minor code rearrangement.
(re_acquire_state_context): Likewise.
2005-08-19 Paul Eggert <eggert@cs.ucla.edu>
* posix/regex_internal.c (re_string_realloc_buffers):
(re_node_set_insert, re_node_set_insert_last, re_dfa_add_node):
Rename local variables to avoid GCC shadowing warnings.
2005-07-08 Eric Blake <ebb9@byu.net>
Paul Eggert <eggert@cs.ucla.edu>
* posix/regcomp.c (init_dfa): Store __btowc value in wint_t, not
wchar_t. Remove now-unnecessary cast.
(build_range_exp): Likewise.
Update.
2005-01-27 Paolo Bonzini <bonzini@gnu.org>
[BZ #558]
* posix/regcomp.c (calc_inveclosure): Return reg_errcode_t.
Initialize the node sets in dfa->inveclosures.
(analyze): Initialize inveclosures only if it is needed.
Check errors from calc_inveclosure.
* posix/regex_internal.c (re_dfa_add_node): Do not initialize
the inveclosure node set.
* posix/regexec.c (re_search_internal): If nmatch includes unused
subexpressions, reset them to { rm_so: -1, rm_eo: -1 } here.
* posix/regcomp.c (parse_bracket_exp) [!RE_ENABLE_I18N]:
Do build a SIMPLE_BRACKET token.
* posix/regexec.c (transit_state_mb): Do not examine nodes
where ACCEPT_MB is not set.
Update.
2004-12-13 Paolo Bonzini <bonzini@gnu.org>
Separate parsing and creation of the NFA. Avoided recursion on
the (very unbalanced) parse tree.
[BZ #611]
* posix/regcomp.c (struct subexp_optimize, analyze_tree, calc_epsdest,
re_dfa_add_tree_node, mark_opt_subexp_iter): Removed.
(optimize_subexps, duplicate_tree, calc_first, calc_next,
mark_opt_subexp): Rewritten.
(preorder, postorder, lower_subexps, lower_subexp, link_nfa_nodes,
create_token_tree, free_tree, free_token): New.
(analyze): Accept a regex_t *. Invoke the passes via the preorder and
postorder generic visitors. Do not initialize the fields in the
re_dfa_t that represent the transitions.
(free_dfa_content): Use free_token.
(re_compile_internal): Analyze before UTF-8 optimizations. Do not
include optimization of subexpressions.
(create_initial_state): Fetch the DFA node index from the first node's
bin_tree_t *.
(optimize_utf8): Abort on unexpected nodes, including OP_DUP_QUESTION.
Return on COMPLEX_BRACKET.
(duplicate_node_closure): Fix comment.
(duplicate_node): Do not initialize the fields in the
re_dfa_t that represent the transitions.
(calc_eclosure, calc_inveclosure): Do not handle OP_DELETED_SUBEXP.
(create_tree): Remove final argument. All callers adjusted. Rewritten
to use create_token_tree.
(parse_reg_exp, parse_branch, parse_expression, parse_bracket_exp,
build_charclass_op): Use create_tree or create_token_tree instead
of re_dfa_add_tree_node.
(parse_dup_op): Likewise. Also free the tree using free_tree for
"<re>{0}", and lower OP_DUP_QUESTION to OP_ALT: "a?" is equivalent
to "a|". Adjust invocation of mark_opt_subexp.
(parse_sub_exp): Create a single SUBEXP node.
* posix/regex_internal.c (re_dfa_add_node): Remove last parameter,
always perform as if it was 1. Do not initialize OPT_SUBEXP and
DUPLICATED, and initialize the DFA fields representing the transitions.
* posix/regex_internal.h (re_dfa_add_node): Adjust prototype.
(re_token_type_t): Move OP_DUP_PLUS and OP_DUP_QUESTION to the tokens
section. Add a tree-only code SUBEXP. Remove OP_DELETED_SUBEXP.
(bin_tree_t): Include a full re_token_t for TOKEN. Turn FIRST and
NEXT into pointers to trees. Remove ECLOSURE.
2004-12-28 Paolo Bonzini <bonzini@gnu.org >
[BZ #605]
* posix/regcomp.c (parse_bracket_exp): Do not modify DFA nodes
that were already created.
* posix/regex_internal.c (re_dfa_add_node): Set accept_mb field
in the token if needed.
(create_ci_newstate, create_cd_newstate): Set accept_mb field
from the tokens' field.
* posix/regex_internal.h (re_token_t): Add accept_mb field.
(ACCEPT_MB_NODE): Removed.
* posix/regexec.c (proceed_next_node, transit_states_mb,
build_sifted_states, check_arrival_add_next_nodes): Use
accept_mb instead of ACCEPT_MB_NODE.
2004-04-27 Paolo Bonzini <bonzini@gnu.org>
* posix/regex_internal.h (struct re_dfastate_t): Make
word_trtable a pointer to the 512-item transition table.
* posix/regexec.c (build_trtable): Fill in either state->trtable
or state->word_trtable. Return a boolean indicating success.
(transit_state): Expect state->trtable to be a 256-item
transition table. Reorganize code to have less tests in
the common case, and to save an indentation level.
2004-12-07 Paolo Bonzini <bonzini@gnu.org>
* posix/regexec.c (proceed_next_node): Simplify treatment of epsilon
nodes. Pass the pushed node to push_fail_stack.
(push_fail_stack): Accept a single node rather than an array
of two epsilon destinations.
(build_sifted_states): Only walk non-epsilon nodes.
(check_arrival): Don't pass epsilon nodes to
check_arrival_add_next_nodes.
(check_arrival_add_next_nodes) [DEBUG]: Abort if an epsilon node is
found.
(check_node_accept): Do expensive checks later.
(add_epsilon_src_nodes): Cache result of merging the inveclosures.
* posix/regex_internal.h (re_dfastate_t): Add non_eps_nodes and
inveclosure.
(re_string_elem_size_at, re_string_char_size_at, re_string_wchar_at,
re_string_context_at, re_string_peek_byte_case,
re_string_fetch_byte_case, re_node_set_compare, re_node_set_contains):
Declare as pure.
* posix/regex_internal.c (create_newstate_common): Remove.
(register_state): Move part of it here. Initialize non_eps_nodes.
(free_state): Free inveclosure and non_eps_nodes.
(create_cd_newstate, create_ci_newstate): Allocate the new
re_dfastate_t here.
2004-12-01 Paolo Bonzini <bonzini@gnu.org>
* posix/regcomp.c (free_dfa_content, init_dfa): Remove
references to re_dfa_t's subexps field.
(parse_sub_exp, parse_expression): Do not use it. Use
completed_bkref_map instead.
(create_initial_state, peek_token): Store a backreference \N
with opr.idx = N-1.
* posix/regexec.c (proceed_next_node, check_dst_limits, get_subexp):
Likewise.
(check_subexp_limits): Remove useless condition.
* posix/regex_internal.h (re_subexp_t): Remove.
(re_dfa_t): Remove subexps and subexps_alloc field, add
completed_bkref_map.
Update.
2004-11-18 Jakub Jelinek <jakub@redhat.com>
[BZ #544]
* posix/regex.h (RE_NO_SUB): New define.
* posix/regex_internal.h (OP_DELETED_SUBEXP): New.
(re_dfa_t): Add subexp_map.
* posix/regcomp.c (struct subexp_optimize): New type.
(optimize_subexps): New routine.
(re_compile_internal): Call it.
(re_compile_pattern): Set preg->no_sub to 1 if RE_NO_SUB.
(free_dfa_content): Free subexp_map.
(calc_inveclosure, calc_eclosure): Skip OP_DELETED_SUBEXP
nodes.
* posix/regexec.c (re_search_internal): If subexp_map
is not NULL, duplicate registers as needed.
* posix/Makefile: Add rules to build and run tst-regex2.
* posix/tst-regex2.c: New test.
* posix/rxspencer/tests: Fix last two tests (\0 -> \1).
Add some new tests for nested subexpressions.
2004-11-12 Ulrich Drepper <drepper@redhat.com>
* posix/Makefile (tests): Add bug-regex24.
* posix/bug-regex24.c: New file.
2004-11-12 Paolo Bonzini <bonzini@gnu.org>
* posix/regexec.c (check_dst_limits_calc_pos_1): Use the map to
cut recursive paths. Make exit condition more precise.
(match_ctx_add_entry): Initialize the map.
* posix/regex_internal.h (struct re_backref_cache_entry): Add a map of
reachable subexpression nodes from each backreference cache entry.
2004-11-09 Paolo Bonzini <bonzini@gnu.org>
* posix/regexec.c (transit_state): Remove the check for
out-of-bounds buffers.
(check_matching): Check here for out-of-bounds buffers.
(re_search_internal): Store into match_kind a set of bits
indicating which incantation of fastmap scanning must be
used. Use a switch statement instead of multiple ifs.
Exit the final "for (;;)" with goto free_return unless
the match succeeded, thus simplifying some conditionals.
* posix/regex_internal.c (re_string_reconstruct,
re_string_context_at): Add several branch predictions for
case-sensitive matching and no transition table being used.
2004-11-10 Ulrich Drepper <drepper@redhat.com>
* posix/tst-waitid.c: Don't use error to print error message, they
won't end up in the .out file.
* nscd/nscd_getgr_r.c: Likewise. Make map externally visible.
* nscd/nscd_gethst_r.c: Likewise.
2004-11-08 Ulrich Drepper <drepper@redhat.com>
* posix/regcomp.c (utf8_sb_map): Define.
(free_dfa_content): Don't free dfa->sb_char if it's a pointer to
utf8_sb_map.
(init_dfa): Use utf8_sb_map instead of initializing memory when the
encoding is UTF-8.
* posix/regcomp.c (init_dfa): Get the codeset name outside glibc as
well. Check if it is spelled UTF8 as well as UTF-8, and check
case-insensitively. Set dfa->map_notascii manually when outside
glibc.
* posix/regex_internal.c (build_wcs_upper_buffer) [!_LIBC]: Enable
optimizations based on map_notascii.
* posix/regex_internal.h [HAVE_LANGINFO_H || HAVE_LANGINFO_CODESET
|| _LIBC]: Include langinfo.h.
* posix/regex_internal.h (struct re_backref_cache_entry): Add "more"
field.
* posix/regexec.c (check_dst_limits): Hoist computation of the source
and destination bkref_idx out of the loop. Pass it to
check_dst_limits_calc_pos.
(check_dst_limits_calc_pos_1): New function, containing the recursive
loop of check_dst_limits_calc_pos; uses the "more" field of
struct re_backref_cache to control the loop.
(check_dst_limits_calc_pos): Store into "boundaries" the position
relative to lim's start and end positions. Do not accept eclosures,
accept bkref_idx instead. Call check_dst_limits_calc_pos_1 to do the
work.
(sift_states_bkref): Use the "more" field of struct re_backref_cache
to control the loop. A big "if" was turned into a continue and the
function was reindented.
(get_subexp): Use the "more" field of struct re_backref_cache
to control the loop.
(match_ctx_add_entry): Initialize the bkref_ents' "more" field.
(search_cur_bkref_entry): Return -1 if out of bounds.
* posix/regexec.c (empty_set): Remove.
(sift_states_backward): Remove cur_src variable. Move inner loop
to build_sifted_states.
(build_sifted_states): Extract from sift_states_backward. Do not
use empty_set.
(update_cur_sifted_state): Do not use empty_set. Special case
dest_nodes->nelem == 0.
2004-11-03 Paolo Bonzini <bonzini@gnu.org>
* posix/regex_internal.h (struct re_backref_cache_entry): Remove flag
field.
(struct re_sift_context_t): Remove cur_bkref, cls_subexp_idx,
check_subexp fields. Move limits last.
* posix/regexec.c (match_ctx_clear_flag): Remove.
(sift_ctx_init): Remove check_subexp parameter. Do not set removed
fields. Callers adjusted.
(expand_bkref_cache): Remove last_str parameter. Callers adjusted.
(re_search_internal): Remove fast_translate variable.
(update_cur_sifted_state): Pass candidates as the final parameter
to sift_states_bkref.
(sift_states_bkref): Change last unused parameter to be "candidates",
do not fetch candidates into a local variable.
Remove dead test for "node == sctx->bkref", and the cur_bkref_idx
variable.
Remove loops that set/reset the flag field of backref cache entries.
(check_arrival_add_next_nodes): Use a signed int to hold the return
value of re_node_set_insert.
(group_nodes_into_DFAstates): Likewise.
(match_ctx_add_entry): Do not set the flag field of the new entry.
2004-03-10 Richard Henderson <rth@redhat.com>
* sysdeps/generic/errno.c: Disable versioning for rtld.
* sysdeps/generic/Makefile (elf/shared): Add unwind-pe.
* sysdeps/generic/unwind-pe.c: New file.
* sysdeps/generic/unwind-pe.h: Only prototypes for _LIBC without
_LIBC_DEFINITIONS.
* posix/regexec.c: Likewise.
2004-02-29 Paolo Bonzini <bonzini@gnu.org>
* posix/regexec.c (transit_state): Don't handle state == NULL.
Move state log and backreference management...
(merge_state_with_log): ... to this function.
(find_recover_state): New function.
(check_matching): Use find_recover_state to get a non-NULL
state when an invalid state is reached. Compute the amount
of initial characters to be skipped less conservatively when
multi-byte character sets are in use. Do not check
dfa->nbackref if the state log is NULL. Initialize err.
(acquire_init_state_context): Expect err to be initialized.
Fix spacing.
2004-03-05 Jakub Jelinek <jakub@redhat.com>
* sysdeps/sparc/sparc32/elf/start.S: Handle PIEs.
* sysdeps/sparc/sparc64/elf/start.S: Likewise.
2004-02-02 Paolo Bonzini <bonzini@gnu.org>
* posix/regexec.c (check_matching): Add P_MATCH_FIRST parameter.
(re_search_internal): Pass new parameter to check_matching.
(check_matching): Unless a parenthesized group is found at the
beginning of the regexp, advance P_MATCH_FIRST until we entered
a state different from the initial state.