2004-01-02  Jakub Jelinek  <jakub@redhat.com>

	* posix/regex_internal.c (re_node_set_insert): Remove unused variables.

	* posix/regex_internal.h (re_dfa_t): Add syntax field.
	* posix/regcomp.c (parse): Initialize dfa->syntax.
	* posix/regexec.c (acquire_init_state_context,
	prune_impossible_nodes, check_matching, check_halt_state_context,
	proceed_next_node, sift_states_iter_mb, sift_states_backward,
	update_cur_sifted_state, sift_states_bkref, transit_state,
	transit_state_sb, transit_state_mb, transit_state_bkref,
	get_subexp, get_subexp_sub, check_arrival, expand_bkref_cache,
	build_trtable): Remove preg argument, add dfa argument instead
	and remove dfa = preg->buffer initialization in the body.
	Adjust all callers.
	(check_node_accept_bytes, group_nodes_into_DFAstates,
	check_node_accept): Likewise.  Use dfa->syntax instead of
	preg->syntax.
	(check_arrival_add_next_nodes): Remove preg argument.

	* posix/regex_internal.h (re_match_context_t): Make input
	re_string_t instead of a pointer to it.
	* posix/regex_internal.c (re_string_construct_common): Don't clear
	pstr here...
	(re_string_construct): ... but only here.
	* posix/regexec.c (match_ctx_init): Remove input argument.  Don't
	initialize fields to zero.
	(re_search_internal): Move input into mctx.input.
	(acquire_init_state_context, check_matching,
	check_halt_state_context, proceed_next_node,
	clean_state_log_if_needed, sift_states_bkref, sift_states_iter_mb,
	transit_state, transit_state_sb, transit_state_mb,
	transit_state_bkref, get_subexp, check_arrival,
	check_arrival_add_next_nodes, check_node_accept, extend_buffers):
	Change mctx->input into &mctx->input and mctx->input->field into
	mctx->input.field.

2004-01-02  Jakub Jelinek  <jakub@redhat.com>
	    Paolo Bonzini  <bonzini@gnu.org>

	* posix/regex_internal.h (re_const_bitset_ptr_t): New type.
	(re_string_t): Add newline_anchor, word_char and word_ops_used fields.
	(re_dfa_t): Change word_char type to bitset.  Add word_ops_used field.
	(re_string_context_at, re_string_reconstruct): Remove last argument.
	* posix/regex_internal.c (re_string_allocate): Initialize
	pstr->word_char and pstr->word_ops_used.
	(re_string_context_at): Remove newline_anchor argument.
	Use input->newline_anchor instead, swap && conditions.
	Only use IS_WIDE_WORD_CHAR if input->word_ops_used != 0.
	Use input->word_char bitmap instead of IS_WORD_CHAR.
	(re_string_reconstruct): Likewise.
	Adjust re_string_context_at caller.
	* posix/regexec.c (acquire_init_state_context,
	check_halt_state_context, transit_state, transit_state_sb,
	transit_state_mb, transit_state_bkref, check_arrival,
	check_node_accept): Adjust re_string_context_at and
	re_string_reconstruct callers.
	(re_search_internal): Likewise.  Set input.newline_anchor.
	(build_trtable): Use dfa->word_char bitmap instead of IS_WORD_CHAR.
	* posix/regcomp.c (init_word_char): Change return type to void.
	Set dfa->word_ops_used.
	(free_dfa_content): Don't free dfa->word_char.
	(parse_expression): Remove error handling for init_word_char.
This commit is contained in:
Ulrich Drepper 2004-01-02 21:20:51 +00:00
parent 8503c987b6
commit 56b168be5d
5 changed files with 319 additions and 272 deletions

View File

@ -1,3 +1,67 @@
2004-01-02 Jakub Jelinek <jakub@redhat.com>
* posix/regex_internal.c (re_node_set_insert): Remove unused variables.
* posix/regex_internal.h (re_dfa_t): Add syntax field.
* posix/regcomp.c (parse): Initialize dfa->syntax.
* posix/regexec.c (acquire_init_state_context,
prune_impossible_nodes, check_matching, check_halt_state_context,
proceed_next_node, sift_states_iter_mb, sift_states_backward,
update_cur_sifted_state, sift_states_bkref, transit_state,
transit_state_sb, transit_state_mb, transit_state_bkref,
get_subexp, get_subexp_sub, check_arrival, expand_bkref_cache,
build_trtable): Remove preg argument, add dfa argument instead
and remove dfa = preg->buffer initialization in the body.
Adjust all callers.
(check_node_accept_bytes, group_nodes_into_DFAstates,
check_node_accept): Likewise. Use dfa->syntax instead of
preg->syntax.
(check_arrival_add_next_nodes): Remove preg argument.
* posix/regex_internal.h (re_match_context_t): Make input
re_string_t instead of a pointer to it.
* posix/regex_internal.c (re_string_construct_common): Don't clear
pstr here...
(re_string_construct): ... but only here.
* posix/regexec.c (match_ctx_init): Remove input argument. Don't
initialize fields to zero.
(re_search_internal): Move input into mctx.input.
(acquire_init_state_context, check_matching,
check_halt_state_context, proceed_next_node,
clean_state_log_if_needed, sift_states_bkref, sift_states_iter_mb,
transit_state, transit_state_sb, transit_state_mb,
transit_state_bkref, get_subexp, check_arrival,
check_arrival_add_next_nodes, check_node_accept, extend_buffers):
Change mctx->input into &mctx->input and mctx->input->field into
mctx->input.field.
2004-01-02 Jakub Jelinek <jakub@redhat.com>
Paolo Bonzini <bonzini@gnu.org>
* posix/regex_internal.h (re_const_bitset_ptr_t): New type.
(re_string_t): Add newline_anchor, word_char and word_ops_used fields.
(re_dfa_t): Change word_char type to bitset. Add word_ops_used field.
(re_string_context_at, re_string_reconstruct): Remove last argument.
* posix/regex_internal.c (re_string_allocate): Initialize
pstr->word_char and pstr->word_ops_used.
(re_string_context_at): Remove newline_anchor argument.
Use input->newline_anchor instead, swap && conditions.
Only use IS_WIDE_WORD_CHAR if input->word_ops_used != 0.
Use input->word_char bitmap instead of IS_WORD_CHAR.
(re_string_reconstruct): Likewise.
Adjust re_string_context_at caller.
* posix/regexec.c (acquire_init_state_context,
check_halt_state_context, transit_state, transit_state_sb,
transit_state_mb, transit_state_bkref, check_arrival,
check_node_accept): Adjust re_string_context_at and
re_string_reconstruct callers.
(re_search_internal): Likewise. Set input.newline_anchor.
(build_trtable): Use dfa->word_char bitmap instead of IS_WORD_CHAR.
* posix/regcomp.c (init_word_char): Change return type to void.
Set dfa->word_ops_used.
(free_dfa_content): Don't free dfa->word_char.
(parse_expression): Remove error handling for init_word_char.
2004-01-01 Paolo Bonzini <bonzini@gnu.org>
* posix/regex_internal.h (re_dfastate_t): Fix size of the CONTEXT

View File

@ -1,5 +1,5 @@
/* Extended regular expression matching and search library.
Copyright (C) 2002, 2003 Free Software Foundation, Inc.
Copyright (C) 2002, 2003, 2004 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Isamu Hasegawa <isamu@yamato.ibm.com>.
@ -24,7 +24,7 @@ static void re_compile_fastmap_iter (regex_t *bufp,
const re_dfastate_t *init_state,
char *fastmap);
static reg_errcode_t init_dfa (re_dfa_t *dfa, int pat_len);
static reg_errcode_t init_word_char (re_dfa_t *dfa);
static void init_word_char (re_dfa_t *dfa);
#ifdef RE_ENABLE_I18N
static void free_charset (re_charset_t *cset);
#endif /* RE_ENABLE_I18N */
@ -611,7 +611,6 @@ free_dfa_content (re_dfa_t *dfa)
re_free (entry->array);
}
re_free (dfa->state_table);
re_free (dfa->word_char);
#ifdef RE_ENABLE_I18N
re_free (dfa->sb_char);
#endif
@ -839,7 +838,6 @@ init_dfa (dfa, pat_len)
dfa->subexps_alloc = 1;
dfa->subexps = re_malloc (re_subexp_t, dfa->subexps_alloc);
/* dfa->word_char = NULL; */
dfa->mb_cur_max = MB_CUR_MAX;
#ifdef _LIBC
@ -879,19 +877,16 @@ init_dfa (dfa, pat_len)
"word". In this case "word" means that it is the word construction
character used by some operators like "\<", "\>", etc. */
static reg_errcode_t
static void
init_word_char (dfa)
re_dfa_t *dfa;
{
int i, j, ch;
dfa->word_char = (re_bitset_ptr_t) calloc (sizeof (bitset), 1);
if (BE (dfa->word_char == NULL, 0))
return REG_ESPACE;
dfa->word_ops_used = 1;
for (i = 0, ch = 0; i < BITSET_UINTS; ++i)
for (j = 0; j < UINT_BITS; ++j, ++ch)
if (isalnum (ch) || ch == '_')
dfa->word_char[i] |= 1 << j;
return REG_NOERROR;
}
/* Free the work area which are only used while compiling. */
@ -1960,6 +1955,7 @@ parse (regexp, preg, syntax, err)
re_dfa_t *dfa = (re_dfa_t *) preg->buffer;
bin_tree_t *tree, *eor, *root;
re_token_t current_token;
dfa->syntax = syntax;
fetch_token (&current_token, regexp, syntax | RE_CARET_ANCHORS_HERE);
tree = parse_reg_exp (regexp, preg, &current_token, syntax, 0, err);
if (BE (*err != REG_NOERROR && tree == NULL, 0))
@ -2191,12 +2187,8 @@ parse_expression (regexp, preg, token, syntax, nest, err)
case ANCHOR:
if ((token->opr.ctx_type
& (WORD_DELIM | INSIDE_WORD | WORD_FIRST | WORD_LAST))
&& dfa->word_char == NULL)
{
*err = init_word_char (dfa);
if (BE (*err != REG_NOERROR, 0))
return NULL;
}
&& dfa->word_ops_used == 0)
init_word_char (dfa);
if (token->opr.ctx_type == WORD_DELIM)
{
bin_tree_t *tree_first, *tree_last;

View File

@ -67,6 +67,8 @@ re_string_allocate (pstr, str, len, init_len, trans, icase, dfa)
if (BE (ret != REG_NOERROR, 0))
return ret;
pstr->word_char = dfa->word_char;
pstr->word_ops_used = dfa->word_ops_used;
pstr->mbs = pstr->mbs_allocated ? pstr->mbs : (unsigned char *) str;
pstr->valid_len = (pstr->mbs_allocated || dfa->mb_cur_max > 1) ? 0 : len;
pstr->valid_raw_len = pstr->valid_len;
@ -84,6 +86,7 @@ re_string_construct (pstr, str, len, trans, icase, dfa)
const re_dfa_t *dfa;
{
reg_errcode_t ret;
memset (pstr, '\0', sizeof (re_string_t));
re_string_construct_common (str, len, pstr, trans, icase, dfa);
if (len > 0)
@ -183,7 +186,6 @@ re_string_construct_common (str, len, pstr, trans, icase, dfa)
int icase;
const re_dfa_t *dfa;
{
memset (pstr, '\0', sizeof (re_string_t));
pstr->raw_mbs = (const unsigned char *) str;
pstr->len = len;
pstr->raw_len = len;
@ -572,9 +574,9 @@ re_string_translate_buffer (pstr)
convert to upper case in case of REG_ICASE, apply translation. */
static reg_errcode_t
re_string_reconstruct (pstr, idx, eflags, newline)
re_string_reconstruct (pstr, idx, eflags)
re_string_t *pstr;
int idx, eflags, newline;
int idx, eflags;
{
int offset = idx - pstr->raw_mbs_idx;
if (offset < 0)
@ -609,8 +611,7 @@ re_string_reconstruct (pstr, idx, eflags, newline)
)
{
/* Yes, move them to the front of the buffer. */
pstr->tip_context = re_string_context_at (pstr, offset - 1, eflags,
newline);
pstr->tip_context = re_string_context_at (pstr, offset - 1, eflags);
#ifdef RE_ENABLE_I18N
if (pstr->mb_cur_max > 1)
memmove (pstr->wcs, pstr->wcs + offset,
@ -695,8 +696,11 @@ re_string_reconstruct (pstr, idx, eflags, newline)
memset (pstr->mbs, 255, pstr->valid_len);
}
pstr->valid_raw_len = pstr->valid_len;
pstr->tip_context = (IS_WIDE_WORD_CHAR (wc) ? CONTEXT_WORD
: ((newline && IS_WIDE_NEWLINE (wc))
pstr->tip_context = ((BE (pstr->word_ops_used != 0, 0)
&& IS_WIDE_WORD_CHAR (wc))
? CONTEXT_WORD
: ((IS_WIDE_NEWLINE (wc)
&& pstr->newline_anchor)
? CONTEXT_NEWLINE : 0));
}
else
@ -705,8 +709,9 @@ re_string_reconstruct (pstr, idx, eflags, newline)
int c = pstr->raw_mbs[pstr->raw_mbs_idx + offset - 1];
if (pstr->trans)
c = pstr->trans[c];
pstr->tip_context = (IS_WORD_CHAR (c) ? CONTEXT_WORD
: ((newline && IS_NEWLINE (c))
pstr->tip_context = (bitset_contain (pstr->word_char, c)
? CONTEXT_WORD
: ((IS_NEWLINE (c) && pstr->newline_anchor)
? CONTEXT_NEWLINE : 0));
}
}
@ -843,9 +848,9 @@ re_string_destruct (pstr)
/* Return the context at IDX in INPUT. */
static unsigned int
re_string_context_at (input, idx, eflags, newline_anchor)
re_string_context_at (input, idx, eflags)
const re_string_t *input;
int idx, eflags, newline_anchor;
int idx, eflags;
{
int c;
if (idx < 0 || idx == input->len)
@ -874,17 +879,18 @@ re_string_context_at (input, idx, eflags, newline_anchor)
return input->tip_context;
}
wc = input->wcs[wc_idx];
if (IS_WIDE_WORD_CHAR (wc))
if (BE (input->word_ops_used != 0, 0) && IS_WIDE_WORD_CHAR (wc))
return CONTEXT_WORD;
return (newline_anchor && IS_WIDE_NEWLINE (wc)) ? CONTEXT_NEWLINE : 0;
return (IS_WIDE_NEWLINE (wc) && input->newline_anchor
? CONTEXT_NEWLINE : 0);
}
else
#endif
{
c = re_string_byte_at (input, idx);
if (IS_WORD_CHAR (c))
if (bitset_contain (input->word_char, c))
return CONTEXT_WORD;
return (newline_anchor && IS_NEWLINE (c)) ? CONTEXT_NEWLINE : 0;
return IS_NEWLINE (c) && input->newline_anchor ? CONTEXT_NEWLINE : 0;
}
}
@ -1156,7 +1162,7 @@ re_node_set_insert (set, elem)
re_node_set *set;
int elem;
{
int idx, right, mid;
int idx;
/* In case the set is empty. */
if (set->alloc == 0)
{
@ -1206,7 +1212,7 @@ re_node_set_insert (set, elem)
}
/* Compare two node sets SET1 and SET2.
return 1 if SET1 and SET2 are equivalent, retrun 0 otherwise. */
return 1 if SET1 and SET2 are equivalent, return 0 otherwise. */
static int
re_node_set_compare (set1, set2)

View File

@ -121,6 +121,7 @@ extern const size_t __re_error_msgid_idx[] attribute_hidden;
#define BITSET_UINTS ((SBC_MAX + UINT_BITS - 1) / UINT_BITS)
typedef unsigned int bitset[BITSET_UINTS];
typedef unsigned int *re_bitset_ptr_t;
typedef const unsigned int *re_const_bitset_ptr_t;
#define bitset_set(set,i) (set[i / UINT_BITS] |= 1 << i % UINT_BITS)
#define bitset_clear(set,i) (set[i / UINT_BITS] &= ~(1 << i % UINT_BITS))
@ -337,12 +338,16 @@ struct re_string_t
unsigned int tip_context;
/* The translation passed as a part of an argument of re_compile_pattern. */
RE_TRANSLATE_TYPE trans;
/* Copy of re_dfa_t's word_char. */
re_const_bitset_ptr_t word_char;
/* 1 if REG_ICASE. */
unsigned char icase;
unsigned char is_utf8;
unsigned char map_notascii;
unsigned char mbs_allocated;
unsigned char offsets_needed;
unsigned char newline_anchor;
unsigned char word_ops_used;
int mb_cur_max;
};
typedef struct re_string_t re_string_t;
@ -363,14 +368,17 @@ typedef struct re_dfa_t re_dfa_t;
static reg_errcode_t re_string_allocate (re_string_t *pstr, const char *str,
int len, int init_len,
RE_TRANSLATE_TYPE trans, int icase,
const re_dfa_t *dfa) internal_function;
const re_dfa_t *dfa)
internal_function;
static reg_errcode_t re_string_construct (re_string_t *pstr, const char *str,
int len, RE_TRANSLATE_TYPE trans,
int icase, const re_dfa_t *dfa) internal_function;
int icase, const re_dfa_t *dfa)
internal_function;
static reg_errcode_t re_string_reconstruct (re_string_t *pstr, int idx,
int eflags, int newline) internal_function;
int eflags) internal_function;
static reg_errcode_t re_string_realloc_buffers (re_string_t *pstr,
int new_buf_len) internal_function;
int new_buf_len)
internal_function;
# ifdef RE_ENABLE_I18N
static void build_wcs_buffer (re_string_t *pstr) internal_function;
static int build_wcs_upper_buffer (re_string_t *pstr) internal_function;
@ -379,15 +387,19 @@ static void build_upper_buffer (re_string_t *pstr) internal_function;
static void re_string_translate_buffer (re_string_t *pstr) internal_function;
static void re_string_destruct (re_string_t *pstr) internal_function;
# ifdef RE_ENABLE_I18N
static int re_string_elem_size_at (const re_string_t *pstr, int idx) internal_function;
static inline int re_string_char_size_at (const re_string_t *pstr, int idx) internal_function;
static inline wint_t re_string_wchar_at (const re_string_t *pstr, int idx) internal_function;
static int re_string_elem_size_at (const re_string_t *pstr, int idx)
internal_function;
static inline int re_string_char_size_at (const re_string_t *pstr, int idx)
internal_function;
static inline wint_t re_string_wchar_at (const re_string_t *pstr, int idx)
internal_function;
# endif /* RE_ENABLE_I18N */
static unsigned int re_string_context_at (const re_string_t *input, int idx,
int eflags, int newline_anchor) internal_function;
int eflags) internal_function;
static unsigned char re_string_peek_byte_case (const re_string_t *pstr,
int idx) internal_function;
static unsigned char re_string_fetch_byte_case (re_string_t *pstr) internal_function;
static unsigned char re_string_fetch_byte_case (re_string_t *pstr)
internal_function;
#endif
#define re_string_peek_byte(pstr, offset) \
((pstr)->mbs[(pstr)->cur_idx + offset])
@ -471,7 +483,7 @@ struct re_dfastate_t
re_node_set nodes;
re_node_set *entrance_nodes;
struct re_dfastate_t **trtable;
unsigned int context : 10;
unsigned int context : 4;
unsigned int halt : 1;
/* If this state can accept `multi byte'.
Note that we refer to multibyte characters, and multi character
@ -542,13 +554,13 @@ struct re_backref_cache_entry
typedef struct
{
/* The string object corresponding to the input string. */
re_string_t input;
/* EFLAGS of the argument of regexec. */
int eflags;
/* Where the matching ends. */
int match_last;
int last_node;
/* The string object corresponding to the input string. */
re_string_t *input;
/* The state log used by the matcher. */
re_dfastate_t **state_log;
int state_log_top;
@ -594,7 +606,6 @@ struct re_fail_stack_t
struct re_dfa_t
{
re_bitset_ptr_t word_char;
re_subexp_t *subexps;
re_token_t *nodes;
int nodes_alloc;
@ -629,7 +640,10 @@ struct re_dfa_t
unsigned int has_mb_node : 1;
unsigned int is_utf8 : 1;
unsigned int map_notascii : 1;
unsigned int word_ops_used : 1;
int mb_cur_max;
bitset word_char;
reg_syntax_t syntax;
#ifdef DEBUG
char* re_str;
#endif

File diff suppressed because it is too large Load Diff