This patch fixes bugs in ldbl-128ibm frexpl for 32-bit systems shown
up by warnings:
../sysdeps/ieee754/ldbl-128ibm/s_frexpl.c:82:4: warning: left shift count >= width of type
../sysdeps/ieee754/ldbl-128ibm/s_frexpl.c:129:5: warning: left shift count >= width of type
This did in fact show up in test-ldouble.out (alongside all the other
problems there ... maybe we should again consider running the libm
tests at finer granularity from the makefiles) as already covered by
the testsuite after the previous patch that fixed these bugs for
64-bit systems. The fix is simply using 1LL instead of 1L when
shifting by 52.
Tested for powerpc32 (soft float).
[BZ #16619]
[BZ #16740]
* sysdeps/ieee754/ldbl-128ibm/s_frexpl.c (__frexpl): Use 1LL << 52
instead of 1L << 52.
On powerpc, atomic_exchange_and_add is implemented without any
barriers. This patchs adds the missing instruction and memory barrier
for acquire and release semanthics.
Replace with IS_IN (ldconfig). No change in generated code.
* elf/Makefile (CFLAGS-ldconfig.c): Remove definition of
IS_IN_ldconfig.
* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Use IS_IN.
* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.h: Likewise.
dlopening libraries from a static program would dlopen libc.so,
which thus needs its own initialization, done in posixland_init,
which was missing initializing RPCs so far.
ChangeLog:
2014-11-23 Samuel Thibault <samuel.thibault@ens-lyon.org>
* sysdeps/mach/hurd/i386/init-first.c (posixland_init): Call
__mach_init in dlopened libc.
GCC marked OABI obsolete in 4.7 and dropped it in 4.8. So the number
of people this is catching is shrinking every day. At this point,
it's not terribly useful, so just drop it.
This patch reformats the inline-asm in elf_machine_load_address so it is
easier to change only part of the inline-asm. That is using string
concatenating instead of string continuation.
Also document why this inline-asm works - it depends on the 32bit
relocation being resolved at link time.
ChangeLog:
2014-11-21 Will Newton <will.newton@linaro.org>
Andrew Pinski <andrew.pinski@caviumnetworks.com>
* sysdeps/aarch64/dl-machine.h (elf_machine_load_address):
Refactor inline-asm. Also add comment.
Using the macros for ELF types is required for adding ILP32 support.
In the standard AArch64 configuration this makes no difference to
the types used.
ChangeLog:
2014-11-21 Will Newton <will.newton@linaro.org>
Andrew Pinski <andrew.pinski@caviumnetworks.com>
* sysdeps/aarch64/bits/link.h (la_aarch64_gnu_pltenter): Use
ElfW macro instead of hardcoded Elf64 types.
(la_aarch64_gnu_pltenter): Likewise.
* sysdeps/aarch64/dl-machine.h
(elf_machine_runtime_setup): Use ElfW(Addr).
The latest version of the binutils ELF header defines a new set of
dynamic relocations for ILP32 and renames some to make the naming
more uniform.
ChangeLog:
2014-11-21 Will Newton <will.newton@linaro.org>
Andrew Pinski <andrew.pinski@caviumnetworks.com>
* elf/elf.h (R_AARCH64_P32_ABS32, R_AARCH64_P32_COPY,
R_AARCH64_P32_GLOB_DAT, R_AARCH64_P32_JUMP_SLOT,
R_AARCH64_P32_RELATIVE, R_AARCH64_P32_TLS_DTPMOD,
R_AARCH64_P32_TLS_DTPREL, R_AARCH64_P32_TLS_TPREL,
R_AARCH64_P32_TLSDESC, R_AARCH64_P32_IRELATIVE): Define.
(R_AARCH64_TLS_DTPMOD64): Rename to ..
(R_AARCH64_TLS_DTPMOD): This.
(R_AARCH64_TLS_DTPREL64): Rename to ...
(R_AARCH64_TLS_DTPREL): This.
(R_AARCH64_TLS_TPREL64): Rename to ...
(R_AARCH64_TLS_TPREL): This.
* sysdeps/aarch64/dl-machine.h (elf_machine_type_class): Update
R_AARCH64_TLS_DTPMOD64, R_AARCH64_TLS_DTPREL64, and
R_AARCH64_TLS_TPREL64.
(elf_machine_rela): Likewise.
for ChangeLog
* sysdeps/posix/ctermid.c (ctermid): Return a pointer to a
string literal if not passed a buffer.
* manual/job.texi (ctermid): Update reasoning, note deviation
from posix, suggest mtasurace when not passed a buffer, for
future non-preliminary safety notes.
This sets __HAVE_64B_ATOMICS if provided. It also sets
USE_ATOMIC_COMPILER_BUILTINS to true if the existing atomic ops use the
__atomic* builtins (aarch64, mips partially) or if this has been
tested (x86_64); otherwise, this is set to false so that C11 atomics will
be based on the existing atomic operations.
Remove libc-modules.h from the tree and auto-generate it from
soversions.i and the list of modules in the built-modules variable
defined in Makeconfig. Macros generated have increasing numbered
values, with built-modules having lower values starting from 1,
following which a separator value LIBS_BEGIN is added and then finally
the library names from soversions.i are appended to the list. This
allows us to conveniently differentiate between the versioned
libraries and other built modules, which is needed in errno.h and
netdb.h to decide whether to use an internal symbol or an external
one.
Verified that generated code remains unchanged on x86_64.
* Makeconfig (built-modules): List non-library modules to be
built.
(module-cppflags): Include libc-modules.h for
everything except shlib-versions.v.i.
(CPPFLAGS): Use it.
(before-compile): Add libc-modules.h.
($(common-objpfx)libc-modules.h,
$(common-objpfx)libc-modules.stmp): New targets.
(common-generated): Add libc-modules.h and libc-modules.stmp.
($(common-objpfx)Versions.v.i): Depend on libc-modules.h.
* include/libc-symbols.h: Don't include libc-modules.h.
* include/libc-modules.h: Remove file.
* scripts/gen-libc-modules.awk: New script to generate
libc-modules.h.
* sysdeps/unix/Makefile ($(common-objpfx)sysd-syscalls):
Depend on libc-modules.stmp.
The current scheme to identify which module a translation unit is
built in depends on defining multiple macros IS_IN_* and also defining
NOT_IN_libc if we're building a non-libc module. In addition, there
is an IN_LIB macro that does effectively the same thing, but for
different modules (notably the systemtap probes). This macro scheme
unifies both ideas to use just one macro IN_MODULE and assign it a
value depending on the module it is being built into. If the module
is not defined, it defaults to MODULE_libc.
Patches that follow will replace uses of IS_IN_* variables with the
IS_IN() macro. libc-symbols.h has been converted already to give an
example of how such a transition will look.
Verified that there are no relevant binary changes. One source change
that will crop up repeatedly is that of nscd_stat, since it uses the
build timestamp as a constant in its logic.
* Makeconfig (in-module): Get value of libof set for the
translation unit.
(CPPFLAGS): Use $(in-module).
* Makerules: Don't suffix routine names for nonlib.
* include/libc-modules.h: New file.
* include/libc-symbols.h: Include libc-modules.h
(IS_IN): New macro to replace IS_IN_* macros.
* elf/Makefile: Set libof-* for each routine.
* elf/rtld-Rules: Likewise.
* extra-modules.mk: Likewise.
* iconv/Makefile: Likewise.
* iconvdata/Makefile: Likewise.
* locale/Makefile: Likewise.
* malloc/Makefile: Likewise.
* nss/Makefile: Likewise.
* sysdeps/gnu/Makefile: Likewise.
* sysdeps/ieee754/ldbl-opt/Makefile: Likewise.
* sysdeps/unix/sysv/linux/Makefile: Likewise.
* sysdeps/s390/s390-64/Makefile: Likewise.
* nscd/Makefile: Set libof-* for each routine. Set CFLAGS and
CPPFLAGS for nscd instead of nonlib.
libm uses symbols mpone and mptwo for internal purposes. This patch
moves them to the implementation namespace (__mpone and __mptwo).
Tested for x86_64 (testsuite, and that installed stripped shared
libraries are unchanged by the patch).
[BZ #17616]
* sysdeps/ieee754/dbl-64/mpa.c (mpone): Rename to __mpone.
(mptwo): Rename to __mptwo.
(__inv): Use __mptwo instead of mptwo.
* sysdeps/ieee754/dbl-64/mpa.h (mpone): Rename to __mpone.
(mptwo): Rename to __mptwo.
* sysdeps/ieee754/dbl-64/mpatan.c (__mpatan): Use __mpone instead
of mpone and __mptwo instead of mptwo.
* sysdeps/ieee754/dbl-64/mpatan2.c (__mpatan2): Use __mpone
instead of mpone.
* sysdeps/ieee754/dbl-64/mpexp.c (__mpexp): Likewise.
* sysdeps/ieee754/dbl-64/mplog.c (__mplog): Likewise.
* sysdeps/ieee754/dbl-64/sincos32.c (__c32): Use __mpone instead
of mpone and __mptwo instead of mptwo.
(__mpranred): Use __mpone instead of mpone.
* conform/Makefile (test-xfail-ISO/math.h/linknamespace): Remove
variable.
(test-xfail-ISO99/complex.h/linknamespace): Likewise.
(test-xfail-ISO99/math.h/linknamespace): Likewise.
(test-xfail-ISO99/tgmath.h/linknamespace): Likewise.
(test-xfail-ISO11/complex.h/linknamespace): Likewise.
(test-xfail-ISO11/math.h/linknamespace): Likewise.
(test-xfail-ISO11/tgmath.h/linknamespace): Likewise.
(test-xfail-XPG3/math.h/linknamespace): Likewise.
(test-xfail-XPG4/math.h/linknamespace): Likewise.
(test-xfail-POSIX/math.h/linknamespace): Likewise.
(test-xfail-UNIX98/math.h/linknamespace): Likewise.
(test-xfail-XOPEN2K/complex.h/linknamespace): Likewise.
(test-xfail-XOPEN2K/math.h/linknamespace): Likewise.
(test-xfail-XOPEN2K/tgmath.h/linknamespace): Likewise.
(test-xfail-POSIX2008/complex.h/linknamespace): Likewise.
(test-xfail-POSIX2008/math.h/linknamespace): Likewise.
(test-xfail-POSIX2008/tgmath.h/linknamespace): Likewise.
(test-xfail-XOPEN2K8/complex.h/linknamespace): Likewise.
(test-xfail-XOPEN2K8/math.h/linknamespace): Likewise.
(test-xfail-XOPEN2K8/tgmath.h/linknamespace): Likewise.
Commit 5c0508a318 broke the Alpha
port, as the extra parenthesis got in the way of some token pasting
that we were doing in a redefined raw unpack macro.
Avoid this situation in the future by not attempting to redefine a
basic macro, but rather work from the outermost public interface.
The compiler does in fact see through the added indirection.
* sysdeps/alpha/soft-fp/local-soft-fp.h (_FP_UNPACK_RAW_2): Remove.
(_FP_PACK_RAW_2): Remove.
(AXP_DECL_RETURN_Q): Rename from FP_DECL_RETURN, use _FP_UNION_Q.
(AXP_RETURN_Q): Rename from FP_RETURN, use _FP_UNION_Q.
(AXP_UNPACK_RAW_Q, AXP_UNPACK_SEMIRAW_Q, AXP_UNPACK_Q): New.
(AXP_PACK_RAW_Q, AXP_PACK_SEMIRAW_Q, AXP_PACK_Q): New.
* sysdeps/alpha/soft-fp/ots_add.c (_OtsAddX): Update to match.
* sysdeps/alpha/soft-fp/ots_cmp.c (internal_equality): Likewise.
* sysdeps/alpha/soft-fp/ots_cmpe.c (internal_compare): Likewise.
* sysdeps/alpha/soft-fp/ots_cvtqux.c (_OtsCvtQUX): Likewise.
* sysdeps/alpha/soft-fp/ots_cvtqx.c (_OtsCvtQX): Likewise.
* sysdeps/alpha/soft-fp/ots_cvttx.c (_OtsConvertFloatTX): Likewise.
* sysdeps/alpha/soft-fp/ots_cvtxq.c (_OtsCvtXQ): Likewise.
* sysdeps/alpha/soft-fp/ots_cvtxt.c (_OtsConvertFloatXT): Likewise.
* sysdeps/alpha/soft-fp/ots_div.c (_OtsDivX): Likewise.
* sysdeps/alpha/soft-fp/ots_mul.c (_OtsMulX): Likewise.
* sysdeps/alpha/soft-fp/ots_nintxq.c (_OtsNintXQ): Likewise.
* sysdeps/alpha/soft-fp/ots_sub.c (_OtsSubX): Likewise.
This patch removes a conditional on __GNUC_PREREQ (4, 6) in x86_64
code.
Tested for x86_64 that installed shared libraries are unchanged by
this patch. Committed (I think this file reasonably comes under math
maintainership).
* sysdeps/x86_64/fpu/dla.h [__FMA4__ && __GNUC_PREREQ (4, 6)]
(DLA_FMS): Make definition conditional only on [__FMA4__].
[__FMA4__ && !__GNUC_PREREQ (4, 6)] (DLA_FMS): Remove conditional
definition.
This patch removes conditionals in ARM code on __GNUC_PREREQ(4,4),
which were already obsolete even before the move from 4.4 to 4.6 as
minimum GCC version for building glibc.
Tested for ARM that installed shared libraries are unchanged by this
patch.
* sysdeps/arm/sysdep.h [PROF && __GNUC_PREREQ(4,4)] (CALL_MCOUNT):
Make definition conditional only on [PROF].
[PROF && !__GNUC_PREREQ(4,4)] (CALL_MCOUNT): Remove conditional
definition.
[__GNUC_PREREQ(4,4)] (mcount): Make definition unconditional.
[!__GNUC_PREREQ(4,4)] (mcount): Remove conditional definition.
This patch fixes the build of C mempcpy and stpcpy by disabling the
redirection to __mempcpy and __stpcpy asm names if
NO_MEMPCPY_STPCPY_REDIRECT is defined, and defining that macro in the
relevant source files.
Tested for powerpc32 that the build is fixed.
* include/string.h [NO_MEMPCPY_STPCPY_REDIRECT] (mempcpy): Do not
redeclare with asm name.
[NO_MEMPCPY_STPCPY_REDIRECT] (stpcpy): Likewise.
* string/mempcpy.c (NO_MEMPCPY_STPCPY_REDIRECT): Define before
including <string.h>.
* string/stpcpy.c (NO_MEMPCPY_STPCPY_REDIRECT): Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c
[!NOT_IN_libc] (NO_MEMPCPY_STPCPY_REDIRECT): Likewise.
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c
[!NOT_IN_libc] (NO_MEMPCPY_STPCPY_REDIRECT): Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c
[SHARED && !NOT_IN_libc] (NO_MEMPCPY_STPCPY_REDIRECT): Likewise.
__get_nprocs is called from malloc code, but calls fgets_unlocked,
which is not an ISO C or POSIX function. This patch fixes it to call
a new __fgets_unlocked name instead.
Note: there are various other uses of fgets_unlocked in glibc's
libraries, and I haven't yet investigated which others might also be
problematic (called directly or indirectly from standard functions)
and so need to change to use __fgets_unlocked.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch).
[BZ #17582]
* libio/iofgets.c [weak_alias && !_IO_MTSAFE_IO]
(__fgets_unlocked): Add alias of _IO_fgets. Use libc_hidden_def.
* libio/iofgets_u.c (fgets_unlocked): Rename to __fgets_unlocked
and define as weak alias of __fgets_unlocked. Use
libc_hidden_weak.
(__fgets_unlocked): Use libc_hidden_def.
* include/stdio.h (__fgets_unlocked): Declare. Use
libc_hidden_proto.
* sysdeps/unix/sysv/linux/getsysstats.c (phys_pages_info): Use
__fgets_unlocked instead of fgets_unlocked.
* sysdeps/unix/sysv/linux/alpha/getsysstats.c
(GET_NPROCS_CONF_PARSER): Likewise.
* sysdeps/unix/sysv/linux/sparc/getsysstats.c
(GET_NPROCS_CONF_PARSER): Likewise.
rawmemchr is not an ISO C function, but __rawmemchr is called from ISO
C functions, so rawmemchr should be a weak alias. On most
architecture it is, but x86_64 defines the function as rawmemchr with
__rawmemchr as a strong alias. This patch makes x86_64 follow the
same arrangements as other architectures.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch).
[BZ #17572]
* sysdeps/x86_64/rawmemchr.S (rawmemchr): Rename to __rawmemchr
and define as weak alias of __rawmemchr.
(__rawmemchr): Do not define as strong alias of rawmemchr.
qsort_r is defined in the same file as qsort, but is not an ISO C
function, so should be a weak alias for __qsort_r. The uses in
getaddrinfo should also call __qsort_r, since getaddrinfo is a POSIX
function and qsort_r isn't. This patch implements this. Because nscd
uses the getaddrinfo sources outside libc, as do the tst-rfc3484
tests, a #define of __qsort_r to qsort_r is added there alongside the
similar defines for other libc-internal symbols used in getaddrinfo.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch).
[BZ #17571]
* stdlib/msort.c (qsort_r): Rename to __qsort_r and define as weak
alias of __qsort_r.
(qsort): Call __qsort_r instead of qsort_r.
* include/stdlib.h (qsort_r): Do not call libc_hidden_proto.
(__qsort_r): Declare. Call libc_hidden_proto.
* sysdeps/posix/getaddrinfo.c (getaddrinfo): Call __qsort_r
instead of qsort_r.
* nscd/gai.c (__qsort_r): Define to qsort_r.
* posix/tst-rfc3484.c (__qsort_r): Likewise.
* posix/tst-rfc3484-2.c (__qsort_r): Likewise.
* posix/tst-rfc3484-3.c (__qsort_r): Likewise.
__getcwd is called from dcigettext.o (brought in by various ISO C
functionality), but calls rewinddir, which is not an ISO C function.
This patch makes __getcwd call __rewinddir instead and makes rewinddir
a weak alias for __rewinddir.
Since getcwd.c is shared with gnulib (albeit not merged in either
direction for a long time, and omitted from gnulib's
config/srclist.txt list of shared files) I put in a #ifndef _LIBC
define of __rewinddir to rewinddir, although a future merged version
of getcwd could end up looking significantly different.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by this patch).
[BZ #17584]
* dirent/rewinddir.c (rewinddir): Rename to __rewinddir and define
as weak alias of __rewinddir. Don't use libc_hidden_def.
(__rewinddir): Use libc_hidden_def.
* sysdeps/mach/hurd/rewinddir.c: Rename to __rewinddir and define
as weak alias of __rewinddir. Don't use libc_hidden_def.
(__rewinddir): Use libc_hidden_def.
* sysdeps/posix/rewinddir.c: Rename to __rewinddir and define as
weak alias of __rewinddir. Don't use libc_hidden_def.
(__rewinddir): Use libc_hidden_def.
* include/dirent.h (rewinddir): Don't use libc_hidden_proto.
(__rewinddir): Use libc_hidden_proto.
* sysdeps/posix/getcwd.c [!_LIBC] (__rewinddir): Define to
rewinddir.
(__getcwd): Use __rewinddir instead of rewinddir.
The s390 ABI requires the stack pointer to be aligned at 8-bytes.
When a program is invoked as an argument to the dynamic linker,
_dl_start_user adjusts the stack to remove the dynamic linker
arguments so that the program sees only its name and arguments. This
may result in the stack being misaligned since each argument shift is
only a word and not a double-word.
This is now fixed shifting argv and envp down instead of shifting argc
up and reclaiming the stack. This requires _dl_argv to be adjusted
and hence, is no longer relro.
Continuing the removal of unused __libc_* function names, this patch
removes the __libc_waitpid name.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch; __waitpid, which is exported from
shared libc, changes from weak to strong on some configurations, which
is of no significance).
* include/sys/wait.h (__libc_waitpid): Remove declaration.
* posix/waitpid.c (__libc_waitpid): Rename to __waitpid.
(__waitpid): Don't define as alias. Use libc_hidden_def not
libc_hidden_weak.
(waitpid): Define as alias of __waitpid.
* sysdeps/unix/bsd/waitpid.c (__libc_waitpid): Rename to
__waitpid.
(__waitpid): Don't define as alias. Use libc_hidden_def not
libc_hidden_weak.
(waitpid): Define as alias of __waitpid.
* sysdeps/unix/sysv/linux/i386/syscalls.list (waitpid): Remove
__libc_waitpid alias.
* sysdeps/unix/sysv/linux/m68k/syscalls.list (waitpid): Likewise.
* sysdeps/unix/sysv/linux/powerpc/syscalls.list (waitpid):
Likewise.
* sysdeps/unix/sysv/linux/sh/syscalls.list (waitpid): Likewise.
* sysdeps/unix/sysv/linux/sparc/syscalls.list (waitpid): Likewise.
* sysdeps/unix/sysv/linux/tile/waitpid.S (__libc_waitpid): Remove
alias.
* sysdeps/unix/sysv/linux/waitpid.c (__libc_waitpid): Rename to
__waitpid.
(__waitpid): Don't define as alias. Use libc_hidden_def not
libc_hidden_weak.
(waitpid): Define as alias of __waitpid.
For maximum paranoia we run ld.so through the normal set
of tests for all of the shared libraries. This includes
running ld.so through check-localplt, check-textrel, and
check-execstack. While none of these should trigger any
failures given the way ld.so is built, it might possibly
fail if a developer does something wrong. This paranoia
was triggered by a discussion over the use of __strcpy
vs. strcpy [1] and if the symbol could leak and use the
libc.so version.
The check-localplt test fails right away because localplt.data
needs updating for all arches. By default we add 6 new symbols:
__tls_get_addr, __libc_memalign, malloc, calloc, realloc and
free. Other machines like i386, power, and s390 require some
different symbol sets e.g. ___tls_get_addr vs. __tls_get_addr
for i386.
Verified for i386
Verified for x86_64
Verified for ppc32
Verified for ppc64
Verified for ppc64le
Verified for arm
Verified for aarch64
Verified for s390
Verified for s390x
Guessed for alpha
Guessed for ia64
Guessed for m68k
Guessed for microblaze
Guessed for sparc32
Guessed for sparc64
Defaults for sh
Defaults for mips
Defaults for hppa
Defaults for tile
Machine manintainers notified to double check the data
used in localplt.data.
[1] https://sourceware.org/ml/libc-alpha/2014-10/msg00548.html
Completing the removal of the obsolete INTDEF / INTUSE mechanism, this
patch removes the final use - that for _dl_starting_up - replacing it
by rtld_hidden_def / rtld_hidden_proto. Having removed the last use,
the mechanism itself is also removed.
Tested for x86_64 that installed stripped shared libraries are
unchanged by the patch. (This is not much of a test since this
variable is only defined and used in the !HAVE_INLINED_SYSCALLS case.)
[BZ #14132]
* include/libc-symbols.h (INTUSE): Remove macro.
(INTDEF): Likewise.
(INTVARDEF): Likewise.
(_INTVARDEF): Likewise.
(INTDEF2): Likewise.
(INTVARDEF2): Likewise.
* elf/rtld.c [!HAVE_INLINED_SYSCALLS] (_dl_starting_up): Use
rtld_hidden_def instead of INTVARDEF.
* sysdeps/generic/ldsodefs.h [IS_IN_rtld]
(_dl_starting_up_internal): Remove declaration.
(_dl_starting_up): Use rtld_hidden_proto.
* elf/dl-init.c [!HAVE_INLINED_SYSCALLS] (_dl_starting_up): Remove
declaration.
[!HAVE_INLINED_SYSCALLS] (_dl_starting_up_internal): Likewise.
(_dl_init) [!HAVE_INLINED_SYSCALLS]: Don't use INTUSE with
_dl_starting_up.
* elf/dl-writev.h (_dl_writev): Likewise.
* sysdeps/powerpc/powerpc64/dl-machine.h [!HAVE_INLINED_SYSCALLS]
(DL_STARTING_UP_DEF): Use __GI__dl_starting_up instead of
_dl_starting_up_internal.
Here is an optimized implementation of __strchrnul. The
simplification that we don't have to track precisely why the loop
terminates (match or end-of-string) means we have to do less work in
both setup and the core inner loop. That means this should never be
slower than strchr.
As with strchr, the use of LD1 means we do not need different versions
for big-/little-endian.