Copy regex-related files back from Gnulib, to fix a problem with
static checking of regex calls noted by Martin Sebor. This merges the
following changes:
* New macro __attribute_nonnull__ in misc/sys/cdefs.h, for use later
when copying other files back from Gnulib.
* Use __GNULIB_CDEFS instead of __GLIBC__ when deciding
whether to include bits/wordsize.h etc.
* Avoid duplicate entries in epsilon closure table.
* New regex.h macro _REGEX_NELTS to let regexec say that its pmatch
arg should contain nmatch elts. Use that for regexec, instead of
__attr_access (which is incorrect).
* New regex.h macro _Attr_access_ which is like __attr_access except
portable to non-glibc platforms.
* Add some DEBUG_ASSERTs to pacify gcc -fanalyzer and to catch
recently-fixed performance bugs if they recur.
* Add Gnulib-specific stuff to port the dynarray- and lock-using parts
of regex code to non-glibc platforms.
* Fix glibc bug 11053.
* Avoid some undefined behavior when popping an empty fail stack.
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
This simplifies the code, by removing stuff intended for porting
to Gnulib but no longer needed there.
* posix/regcomp.c [!_LIBC]: No need to put #ifdef _LIBC around
uses of libc_hidden_def, weak_alias.
* posix/regcomp.c, posix/regexec.c: Use __restrict rather than
_Restrict_ except for public-facing headers.
* posix/regex_internal.h (attribute_hidden) [!_LIBC]:
Remove; already defined elsewhere.
* posix/regex.c, posix/regex_internal.h:
Use __GNUC_PREREQ instead of rolling our own.
* posix/regex_internal.h (__GNUC_PREREQ): Remove duplicate defn.
[BZ#23744]
This refactoring was prompted by a problem when the regex code is
used as part of Gnulib and when the builder’s compiler does not grok
__builtin_expect. Problem reported for Gawk by Nelson H.F. Beebe in:
https://lists.gnu.org/r/bug-gnulib/2018-09/msg00137.html
Although this refactoring does not fix the problem directly,
we might as well have Gawk use the now-preferred glibc style for when
__builtin_expect is unavailable.
* posix/regex_internal.h (BE): Remove.
All uses replaced by __glibc_unlikely or __glibc_likely.
Adjust the non-glibc code to agree with what Gawk needs for
rational range interpretation (RRI) for regular expression ranges.
In unibyte locales, Gawk wants ranges to use the underlying byte
rather than the character code point. This change does not affect
glibc proper.
* posix/regcomp.c (parse_byte) [!LIBC && RE_ENABLE_I18N]:
In unibyte locales, use the byte value rather than
running it through btowc.
Problem and fix reported by Assaf Gordon in:
https://lists.gnu.org/r/bug-gnulib/2018-07/txtqLKNwBdefE.txt
* posix/regcomp.c (free_charset) [!_LIBC]: Free range_starts and
range_ends members too, as they are defined in 'struct
re_charset_t' even if not _LIBC. This affects only Gnulib.
This bug is very similar to bug 23036: The existing code assumed that
the length count included the length byte itself.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch syncs the regex implementation with gnulib (commit 0ee5212).
Only two changes in GLIBC regex testing are required:
1. posix/bug-regex28.c: as previously discussed [1] the change of
expected results on the pattern should be safe.
2. posix/PCRE.tests: the ERE (a)|\1 is malformed (in the sense that
the \1 doesn't mean anything) and although current GLIBC accepts
it has undefined behavior. This patch removes the specific test.
This sync contains some patches from thread 'Regex: Make libc regex
more usable outside GLIBC.' [2] which have been pushed upstream in
gnulib. This patches also fixes some regex issues (BZ #23233,
BZ #21163, BZ #18986, BZ #13762) and I did not add testcases for
both #23233 and #13762 because I couldn't think a simple way to
trigger the expected failure path to trigger them.
Checked on x86_64-linux-gnu and i686-linux-gnu.
[BZ #23233]
[BZ #21163]
[BZ #18986]
[BZ #13762]
* posix/Makefile (tests): Add bug-regex37 and bug-regex38.
* posix/PCRE.tests: Remove invalid test.
* posix/bug-regex28.c: Fix expected values for used syntax.
* posix/bug-regex37.c: New file.
* posix/bug-regex38.c: Likewise.
* posix/regcomp.c: Sync with gnulib.
* posix/regex.c: Likewise.
* posix/regex.h: Likewise.
* posix/regex_internal.c: Likewise.
* posix/regex_internal.h: Likewise.
* posix/regexec.c: Likewise.
[1] https://sourceware.org/ml/libc-alpha/2017-12/msg00807.html
[2] https://sourceware.org/ml/libc-alpha/2017-12/msg00237.html
This mostly automatically-generated patch converts 113 function
definitions in glibc from old-style K&R to prototype-style. Following
my other recent such patches, this one deals with the case of function
definitions in files that either contain assertions or where grep
suggested they might contain assertions - and thus where it isn't
possible to use a simple object code comparison as a sanity check on
the correctness of the patch, because line numbers are changed.
A few such automatically-generated changes needed to be supplemented
by manual changes for the result to compile. openat64 had a prototype
declaration with "..." but an old-style definition in
sysdeps/unix/sysv/linux/dl-openat64.c, and "..." needed adding to the
generated prototype in the definition (I've filed
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68024> for diagnosing
such cases in GCC; the old state was undefined behavior not requiring
a diagnostic, but one seems a good idea). In addition, as Florian has
noted regparm attribute mismatches between declaration and definition
are only diagnosed for prototype definitions, and five functions
needed internal_function added to their definitions (in the case of
__pthread_mutex_cond_lock, via the macro definition of
__pthread_mutex_lock) to compile on i386.
After this patch is in, remaining old-style definitions are probably
most readily fixed manually before we can turn on
-Wold-style-definition for all builds.
Tested for x86_64 and x86 (testsuite).
* crypt/md5-crypt.c (__md5_crypt_r): Convert to prototype-style
function definition.
* crypt/sha256-crypt.c (__sha256_crypt_r): Likewise.
* crypt/sha512-crypt.c (__sha512_crypt_r): Likewise.
* debug/backtracesyms.c (__backtrace_symbols): Likewise.
* elf/dl-minimal.c (_itoa): Likewise.
* hurd/hurdmalloc.c (malloc): Likewise.
(free): Likewise.
(realloc): Likewise.
* inet/inet6_option.c (inet6_option_space): Likewise.
(inet6_option_init): Likewise.
(inet6_option_append): Likewise.
(inet6_option_alloc): Likewise.
(inet6_option_next): Likewise.
(inet6_option_find): Likewise.
* io/ftw.c (FTW_NAME): Likewise.
(NFTW_NAME): Likewise.
(NFTW_NEW_NAME): Likewise.
(NFTW_OLD_NAME): Likewise.
* libio/iofwide.c (_IO_fwide): Likewise.
* libio/strops.c (_IO_str_init_static_internal): Likewise.
(_IO_str_init_static): Likewise.
(_IO_str_init_readonly): Likewise.
(_IO_str_overflow): Likewise.
(_IO_str_underflow): Likewise.
(_IO_str_count): Likewise.
(_IO_str_seekoff): Likewise.
(_IO_str_pbackfail): Likewise.
(_IO_str_finish): Likewise.
* libio/wstrops.c (_IO_wstr_init_static): Likewise.
(_IO_wstr_overflow): Likewise.
(_IO_wstr_underflow): Likewise.
(_IO_wstr_count): Likewise.
(_IO_wstr_seekoff): Likewise.
(_IO_wstr_pbackfail): Likewise.
(_IO_wstr_finish): Likewise.
* locale/programs/localedef.c (normalize_codeset): Likewise.
* locale/programs/locarchive.c (add_locale_to_archive): Likewise.
(add_locales_to_archive): Likewise.
(delete_locales_from_archive): Likewise.
* malloc/malloc.c (__libc_mallinfo): Likewise.
* math/gen-auto-libm-tests.c (init_fp_formats): Likewise.
* misc/tsearch.c (__tfind): Likewise.
* nptl/pthread_attr_destroy.c (__pthread_attr_destroy): Likewise.
* nptl/pthread_attr_getdetachstate.c
(__pthread_attr_getdetachstate): Likewise.
* nptl/pthread_attr_getguardsize.c (pthread_attr_getguardsize):
Likewise.
* nptl/pthread_attr_getinheritsched.c
(__pthread_attr_getinheritsched): Likewise.
* nptl/pthread_attr_getschedparam.c
(__pthread_attr_getschedparam): Likewise.
* nptl/pthread_attr_getschedpolicy.c
(__pthread_attr_getschedpolicy): Likewise.
* nptl/pthread_attr_getscope.c (__pthread_attr_getscope):
Likewise.
* nptl/pthread_attr_getstack.c (__pthread_attr_getstack):
Likewise.
* nptl/pthread_attr_getstackaddr.c (__pthread_attr_getstackaddr):
Likewise.
* nptl/pthread_attr_getstacksize.c (__pthread_attr_getstacksize):
Likewise.
* nptl/pthread_attr_init.c (__pthread_attr_init_2_1): Likewise.
(__pthread_attr_init_2_0): Likewise.
* nptl/pthread_attr_setdetachstate.c
(__pthread_attr_setdetachstate): Likewise.
* nptl/pthread_attr_setguardsize.c (pthread_attr_setguardsize):
Likewise.
* nptl/pthread_attr_setinheritsched.c
(__pthread_attr_setinheritsched): Likewise.
* nptl/pthread_attr_setschedparam.c
(__pthread_attr_setschedparam): Likewise.
* nptl/pthread_attr_setschedpolicy.c
(__pthread_attr_setschedpolicy): Likewise.
* nptl/pthread_attr_setscope.c (__pthread_attr_setscope):
Likewise.
* nptl/pthread_attr_setstack.c (__pthread_attr_setstack):
Likewise.
* nptl/pthread_attr_setstackaddr.c (__pthread_attr_setstackaddr):
Likewise.
* nptl/pthread_attr_setstacksize.c (__pthread_attr_setstacksize):
Likewise.
* nptl/pthread_condattr_setclock.c (pthread_condattr_setclock):
Likewise.
* nptl/pthread_create.c (__find_in_stack_list): Likewise.
* nptl/pthread_getattr_np.c (pthread_getattr_np): Likewise.
* nptl/pthread_mutex_cond_lock.c (__pthread_mutex_lock): Define to
use internal_function.
* nptl/pthread_mutex_init.c (__pthread_mutex_init): Convert to
prototype-style function definition.
* nptl/pthread_mutex_lock.c (__pthread_mutex_lock): Likewise.
(__pthread_mutex_cond_lock_adjust): Likewise. Use
internal_function.
* nptl/pthread_mutex_timedlock.c (pthread_mutex_timedlock):
Convert to prototype-style function definition.
* nptl/pthread_mutex_trylock.c (__pthread_mutex_trylock):
Likewise.
* nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_usercnt):
Likewise.
(__pthread_mutex_unlock): Likewise.
* nptl_db/td_ta_clear_event.c (td_ta_clear_event): Likewise.
* nptl_db/td_ta_set_event.c (td_ta_set_event): Likewise.
* nptl_db/td_thr_clear_event.c (td_thr_clear_event): Likewise.
* nptl_db/td_thr_event_enable.c (td_thr_event_enable): Likewise.
* nptl_db/td_thr_set_event.c (td_thr_set_event): Likewise.
* nss/makedb.c (process_input): Likewise.
* posix/fnmatch.c (__strchrnul): Likewise.
(__wcschrnul): Likewise.
(fnmatch): Likewise.
* posix/fnmatch_loop.c (FCT): Likewise.
* posix/glob.c (globfree): Likewise.
(__glob_pattern_type): Likewise.
(__glob_pattern_p): Likewise.
* posix/regcomp.c (re_compile_pattern): Likewise.
(re_set_syntax): Likewise.
(re_compile_fastmap): Likewise.
(regcomp): Likewise.
(regerror): Likewise.
(regfree): Likewise.
* posix/regexec.c (regexec): Likewise.
(re_match): Likewise.
(re_search): Likewise.
(re_match_2): Likewise.
(re_search_2): Likewise.
(re_search_stub): Likewise. Use internal_function
(re_copy_regs): Likewise.
(re_set_registers): Convert to prototype-style function
definition.
(prune_impossible_nodes): Likewise. Use internal_function.
* resolv/inet_net_pton.c (inet_net_pton): Convert to
prototype-style function definition.
(inet_net_pton_ipv4): Likewise.
* stdlib/strtod_l.c (____STRTOF_INTERNAL): Likewise.
* sysdeps/pthread/aio_cancel.c (aio_cancel): Likewise.
* sysdeps/pthread/aio_suspend.c (aio_suspend): Likewise.
* sysdeps/pthread/timer_delete.c (timer_delete): Likewise.
* sysdeps/unix/sysv/linux/dl-openat64.c (openat64): Likewise.
Make variadic.
* time/strptime_l.c (localtime_r): Convert to prototype-style
function definition.
* wcsmbs/mbsnrtowcs.c (__mbsnrtowcs): Likewise.
* wcsmbs/mbsrtowcs_l.c (__mbsrtowcs_l): Likewise.
* wcsmbs/wcsnrtombs.c (__wcsnrtombs): Likewise.
* wcsmbs/wcsrtombs.c (__wcsrtombs): Likewise.
regcomp brings in references to wcscoll, which isn't in all the
standards that contain regcomp. In turn, wcscoll brings in references
to wcscmp, also not in all those standards. This patch fixes this by
making those functions into weak aliases of __wcscoll and __wcscmp and
calling those names instead as needed.
Tested for x86_64 and x86 (testsuite, and that disassembly of
installed shared libraries is unchanged by the patch).
[BZ #18497]
* wcsmbs/wcscmp.c [!WCSCMP] (WCSCMP): Define as __wcscmp instead
of wcscmp.
(wcscmp): Define as weak alias of WCSCMP.
* wcsmbs/wcscoll.c (STRCOLL): Define as __wcscoll instead of
wcscoll.
(USE_HIDDEN_DEF): Define.
[!USE_IN_EXTENDED_LOCALE_MODEL] (wcscoll): Define as weak alias of
__wcscoll. Don't use libc_hidden_weak.
* wcsmbs/wcscoll_l.c (STRCMP): Define as __wcscmp instead of
wcscmp.
* sysdeps/i386/i686/multiarch/wcscmp-c.c
[SHARED] (libc_hidden_def): Define __GI___wcscmp instead of
__GI_wcscmp.
(weak_alias): Undefine and redefine.
* sysdeps/i386/i686/multiarch/wcscmp.S (wcscmp): Rename to
__wcscmp and define as weak alias of __wcscmp.
* sysdeps/x86_64/wcscmp.S (wcscmp): Likewise.
* include/wchar.h (__wcscmp): Declare. Use libc_hidden_proto.
(__wcscoll): Likewise.
(wcscmp): Don't use libc_hidden_proto.
(wcscoll): Likewise.
* posix/regcomp.c (build_range_exp): Call __wcscoll instead of
wcscoll.
* posix/regexec.c (check_node_accept_bytes): Likewise.
* conform/Makefile (test-xfail-XPG3/regex.h/linknamespace): Remove
variable.
(test-xfail-XPG4/regex.h/linknamespace): Likewise.
(test-xfail-POSIX/regex.h/linknamespace): Likewise.
regcomp brings in references to various wctype functions that aren't
in all the standards including regcomp. This patch fixes this in the
usual way by using the __* versions of these functions (which already
exist, but some didn't have libc_hidden_proto / libc_hidden_def
before).
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch). (Other wide character
function references from the regex code mean that this patch by itself
doesn't fix any XFAILed linknamespace test failures; further patches
will be needed for that.)
[BZ #18495]
* wctype/wcfuncs.c (__iswalnum): Use libc_hidden_def.
(__iswlower): Likewise.
* include/wctype.h (__iswalnum): Declare. Use libc_hidden_proto.
(__iswlower): Likewise.
* posix/regcomp.c (re_compile_fastmap_iter): Call __towlower
instead of towlower.
* posix/regex_internal.c (build_wcs_upper_buffer): Call __iswlower
instead of iswlower. Call __towupper instead of towupper.
* posix/regex_internal.h (IS_WIDE_WORD_CHAR): Call __iswalnum
instead of iswalnum.
We see some surprising warnings on tilegx with gcc 4.8.2:
In file included from regex.c:66:0:
regcomp.c: In function ‘parse_expression’:
regcomp.c:2849:15: error: ‘end_elem’ may be used uninitialized in this
function [-Werror=maybe-uninitialized]
else if (br_elem->type == COLL_SYM)
^
regcomp.c:3109:34: note: ‘end_elem’ was declared here
bracket_elem_t start_elem, end_elem;
^
regcomp.c:3109:22: error: ‘start_elem’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
bracket_elem_t start_elem, end_elem;
^
These warnings are not seen on x86, and in fact if I compile the
preprocessed tile sources with the x86 gcc 4.8.2, I don't see the
warnings. I do see eqiuvalent warnings if I compile the
x86-preprocessed source code with tilegx gcc 4.8.2.
The fix here is to initialize the union type field appropriately in
a couple of places where we pass a union pointer to a subroutine that
"knows" what type the union is. Setting the type explicitly seems like
a more robust way to manage such a data structure in any case.
* posix/regcomp.c: (parse_dup_op): Handle duplicate_tree
failure in one more place.
To trigger the segfault, configure grep -with-included-regex,
build it, and run these commands:
( ulimit -v 300000; echo a|src/grep -E a+++++++++++++++++++++ )
This is another bug in computing the fastmap. It was reported by a user
of sed because it usually does not happen with !_LIBC. However, it is
there in that case too.
The bug is that whenever we have a range at the beginning of the regex,
the regex must be tested on any possible multibyte character. The reason
why _LIBC masks it, is that in general there is a collation symbol for
each possible multibyte-character lead byte, so all the lead bytes are
in general already part of the fastmap.
The tests use cyrillic characters as an example. With _LIBC, they pass
without the patch too, but you can make them fail by removing collation
symbols handling.