The mutually misaligned inputs on aarch64 are compared with a simple
byte copy, which is not very efficient. Enhance the comparison
similar to strcmp by loading a double-word at a time. The peak
performance improvement (i.e. 4k maxlen comparisons) due to this on
the strncmp microbenchmark is as follows:
falkor: 3.5x (up to 72% time reduction)
cortex-a73: 3.5x (up to 71% time reduction)
cortex-a53: 3.5x (up to 71% time reduction)
All mutually misaligned inputs from 16 bytes maxlen onwards show
upwards of 15% improvement and there is no measurable effect on the
performance of aligned/mutually aligned inputs.
* sysdeps/aarch64/strncmp.S (count): New macro.
(strncmp): Store misaligned length in SRC1 in COUNT.
(mutual_align): Adjust.
(misaligned8): Load dword at a time when it is safe.
C99 specifies that the EOF condition on a file is "sticky": once EOF
has been encountered, all subsequent reads should continue to return
EOF until the file is closed or something clears the "end-of-file
indicator" (e.g. fseek, clearerr). This is arguably a change from
C89, where the wording was ambiguous; the BSDs always had sticky EOF,
but the System V lineage would attempt to read from the underlying fd
again. GNU libc has followed System V for as long as we've been
using libio, but nowadays C99 conformance and BSD compatibility are
more important than System V compatibility.
You might wonder if changing the _underflow impls is sufficient to
apply the C99 semantics to all of the many stdio functions that
perform input. It should be enough to cover all paths to _IO_SYSREAD,
and the only other functions that call _IO_SYSREAD are the _seekoff
impls, which is OK because seeking clears EOF, and the _xsgetn impls,
which, as far as I can tell, are unused within glibc.
The test programs in this patch use a pseudoterminal to set up the
necessary conditions. To facilitate this I added a new test-support
function that sets up a pair of pty file descriptors for you; it's
almost the same as BSD openpty, the only differences are that it
allocates the optionally-returned tty pathname with malloc, and that
it crashes if anything goes wrong.
[BZ #1190]
[BZ #19476]
* libio/fileops.c (_IO_new_file_underflow): Return EOF immediately
if the _IO_EOF_SEEN bit is already set; update commentary.
* libio/oldfileops.c (_IO_old_file_underflow): Likewise.
* libio/wfileops.c (_IO_wfile_underflow): Likewise.
* support/support_openpty.c, support/tty.h: New files.
* support/Makefile (libsupport-routines): Add support_openpty.
* libio/tst-fgetc-after-eof.c, wcsmbs/test-fgetwc-after-eof.c:
New test cases.
* libio/Makefile (tests): Add tst-fgetc-after-eof.
* wcsmbs/Makefile (tests): Add tst-fgetwc-after-eof.
* sysdeps/mach/hurd/reboot.c: Include <hurd/paths.h>
(reboot): Lookup _SERVERS_STARTUP instead of calling proc_getmsgport to get a
port to the startup server.
Jeff Law noticed that native PowerPC builds were broken by my having
made math_ldbl_opt.h not include math.h. nldbl-compat.c formerly got
math.h via libioP.h and math_ldbl_opt.h, *without* __NO_LONG_DOUBLE_MATH;
after my change it got it via nldbl-compat.h *with* __NO_LONG_DOUBLE_MATH,
but __NO_LONG_DOUBLE_MATH mode is forbidden on hosts that define
__HAVE_DISTINCT_FLOAT128, so the build breaks. This is the quick fix.
* sysdeps/ieee754/ldbl-opt/nldbl-compat.c: Include math.h
before nldbl-compat.h.
The sysdeps/ieee754/ldbl-opt version of math_ldbl_opt.h includes
math.h and math_private.h, despite not having any need for those
headers itself; the sysdeps/generic version doesn't. About 20 files
are relying on math_ldbl_opt.h to include math.h and/or math_private.h
for them, even though none of them necessarily used on a platform that
needs ldbl-opt support.
* sysdeps/ieee754/ldbl-opt/math_ldbl_opt.h: Don't include
math.h or math_private.h.
* sysdeps/alpha/fpu/s_isnan.c
* sysdeps/ieee754/ldbl-128ibm/s_ceill.c
* sysdeps/ieee754/ldbl-128ibm/s_floorl.c
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c
* sysdeps/ieee754/ldbl-128ibm/s_roundl.c
* sysdeps/ieee754/ldbl-128ibm/s_truncl.c
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypot.c
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/e_hypotf.c:
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf.c
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypot.c
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_hypotf.c:
Include math_private.h.
* sysdeps/ieee754/ldbl-64-128/s_finitel.c
* sysdeps/ieee754/ldbl-64-128/s_fpclassifyl.c
* sysdeps/ieee754/ldbl-64-128/s_isinfl.c
* sysdeps/ieee754/ldbl-64-128/s_isnanl.c
* sysdeps/ieee754/ldbl-64-128/s_signbitl.c
* sysdeps/powerpc/power7/fpu/s_logb.c:
Include math.h and math_private.h.
On Alpha, the register $at is, by default, reserved for use by the
assembler, in the expansion of pseudo-instructions. It's also used
by the special calling convention for _mcount. We get warnings from
Alpha clone.S because the code to call _mcount isn't properly marked
up to tell the assembler not to use $at itself.
* sysdeps/unix/sysv/linux/alpha/clone.s (__clone): Wrap manual
uses of $at in .set noat / .set at.
Since __libc_longjmp is a private interface for cancellation implementation
in libpthread, there is no need to provide hidden __libc_longjmp in libc.
Tested with build-many-glibcs.py.
* include/setjmp.h (__libc_longjmp): Remove libc_hidden_proto.
* setjmp/longjmp.c (__libc_longjmp): Remove libc_hidden_def.
* sysdeps/s390/longjmp.c (__libc_longjmp): Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/longjmp.S (__libc_longjmp):
Likewise.
On sparc32 tst-makecontext fails, as backtrace called within a context
created by makecontext to yield infinite backtrace.
Fix that the same way than nios2 by adding a nop just before
__startcontext. This is needed as otherwise FDE lookup just repeatedly
finds __setcontext's FDE in an infinite loop, due to the convention of
using 'address - 1' for FDE lookup.
Changelog:
[BZ #22919]
* sysdeps/unix/sysv/linux/sparc/sparc32/setcontext.S (__startcontext):
Add nop before __startcontext, add explaining comments.
Some SPE opcodes clashes with some recent PowerISA opcodes and
until recently gas did not complain about it. However binutils
recently changed it and now VLE configured gas does not support to
assembler some instruction that might class with VLE (HTM for
instance). It also does not help that glibc build hardware lock
elision support as default (regardless of assembler support).
Although runtime will not actually enables TLE on SPE hardware
(since kernel will not advertise it), I see little advantage on
adding HTM support on SPE built glibc. SPE uses an incompatible
ABI which does not allow share the same build with default
powerpc and HTM code slows down SPE without any benefict.
This patch fixes it by only building HTM when SPE configuration
is not used.
Checked with a powerpc-linux-gnuspe build. I also did some sniff
tests on a e500 hardware without any issue.
[BZ #22926]
* sysdeps/powerpc/powerpc32/sysdep.h (ABORT_TRANSACTION_IMPL): Define
empty for __SPE__.
* sysdeps/powerpc/sysdep.h (ABORT_TRANSACTION): Likewise.
* sysdeps/unix/sysv/linux/powerpc/elision-lock.c (__lll_lock_elision):
Do not build hardware transactional code for __SPE__.
* sysdeps/unix/sysv/linux/powerpc/elision-trylock.c
(__lll_trylock_elision): Likewise.
* sysdeps/unix/sysv/linux/powerpc/elision-unlock.c
(__lll_unlock_elision): Likewise.
This patch refactors the ARCH_FORK macro and the required architecture
specific header to simplify the required architecture definitions
to provide the fork syscall semantic and proper document current
Linux clone ABI variant.
Instead of require the reimplementation of arch-fork.h header, this
patch changes the ARCH_FORK to an inline function with clone ABI
defined by kernel-features.h define. The generic kernel ABI meant
for newer ports is used as default and redefine if the architecture
requires.
Checked on x86_64-linux-gnu and i686-linux-gnu. Also with a build
for all the afected ABIs.
* sysdeps/nptl/fork.c (ARCH_FORK): Replace by auch_fork.
* sysdeps/unix/sysv/linux/alpha/arch-fork.h: Remove file.
* sysdeps/unix/sysv/linux/riscv/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/aarch64/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/arm/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/hppa/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/i386/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/microblaze/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/mips/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/nios2/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/s390/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/sh/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/tile/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/arch-fork.h: Likewise.
* sysdeps/unix/sysv/linux/arch-fork.h (arch_fork): New function.
* sysdeps/unix/sysv/linux/aarch64/kernel-features.h: New file.
* sysdeps/unix/sysv/linux/riscv/kernel-features.h: Likewise.
* sysdeps/unix/sysv/linux/arm/kernel-features.h
(__ASSUME_CLONE_BACKWARDS): Define.
* sysdeps/unix/sysv/linux/createthread.c (ARCH_CLONE): Define to
__clone2 if __NR_clone2 is defined.
* sysdeps/unix/sysv/linux/hppa/kernel-features.h
(__ASSUME_CLONE_BACKWARDS): Likewise.
* sysdeps/unix/sysv/linux/i386/kernel-features.h
(__ASSUME_CLONE_BACKWARDS): Likewise.
* sysdeps/unix/sysv/linux/ia64/kernel-features.h
(__ASSUME_CLONE2): Likewise.
* sysdeps/unix/sysv/linux/microblaze/kernel-features.h
(__ASSUME_CLONE_BACKWARDS3): Likewise.
* sysdeps/unix/sysv/linux/kernel-features.h: Document possible clone
variants and the define architecture can use.
(__ASSUME_CLONE_DEFAULT): Define as default.
* sysdeps/unix/sysv/linux/mips/kernel-features.h
(__ASSUME_CLONE_BACKWARDS): Likewise.
* sysdeps/unix/sysv/linux/powerpc/kernel-features.h
(__ASSUME_CLONE_BACKWARDS): Likewise.
* sysdeps/unix/sysv/linux/s390/kernel-features.h
(__ASSUME_CLONE_BACKWARDS2): Likewise.
I goofed up when changing the loop8 name to loop16 and missed on out
the branch instance. Fixed and actually build tested this time.
* sysdeps/aarch64/memcmp.S (more16): Fix branch target loop16.
This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources. The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang. On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.
* sysdeps/aarch64/memcmp.S: Widen comparison to 16 bytes at a
time.
The 0 length strncmp is interesting for correctness but not for
performance.
* benchtests/bench-strncmp.c (test_main): Remove 0 length tests.
(do_test_limit): Likewise.
Don't reuse buffers for different strncmp implementations since the
earlier implementation will end up warming the cache for the later
one. Eventually there should be a more elegant way to do this.
* benchtests/bench-strncmp.c (do_test_limit): Reallocate buffers
for every implementation.
(do_test): Likewise.
* sysdeps/mach/hurd/bits/stat.h [__USE_ATFILE] (UTIME_NOW,
UTIME_OMIT): New macros.
* sysdeps/mach/hurd/futimens.c (__futimens): Try to use __file_utimens
before reverting to converting time spec to time value and calling
__file_utimes.
* sysdeps/mach/hurd/utime-helper.c: New file.
* sysdeps/mach/hurd/futimes.c: Include "utime-helper.c".
(__futimes): Try to use utime_ts_from_tval and __file_utimens before
reverting to utime_tvalue_from_tval and __file_utimes.
* sysdeps/mach/hurd/lutimes.c: Include "utime-helper.c".
(__lutimes): Just call hurd_futimens after lookup.
* sysdeps/mach/hurd/utimes.c: Likewise.
Building glibc for s390 with -Os (32-bit only, with GCC 7) fails with:
In file included from ../sysdeps/s390/multiarch/8bit-generic.c:370:0,
from ebcdic-at-de.c:28:
../iconv/loop.c: In function '__to_generic_vx':
../iconv/loop.c:264:22: error: 'ch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (((Character) >> 7) == (0xe0000 >> 7)) \
^~
In file included from ebcdic-at-de.c:28:0:
../sysdeps/s390/multiarch/8bit-generic.c:340:15: note: 'ch' was declared here
uint32_t ch; \
^
../iconv/loop.c:325:7: note: in expansion of macro 'BODY'
BODY
^~~~
It's fairly easy to see, looking at the (long) expansion of the BODY
macro, that this is a false positive and the relevant variable 'ch' is
always initialized before use, in one of two possible places. As
such, disabling the warning for -Os with the DIAG_* macros is the
natural approach to fix this build failure. However, because of the
location at which the warning is reported, the disabling needs to go
in iconv/loop.c, around the definition of UNICODE_TAG_HANDLER (not
inside the definition), as that macro definition is where the
uninitialized use is reported, whereas the code that needs to be
reasoned about to see that the warning is a false positive is in the
definition of BODY elsewhere.
Thus, the patch adds such disabling in iconv/loop.c, with a comment
pointing to the s390-specific code and a comment in the s390-specific
code pointing to the generic file to alert people to the possible need
to update one place when changing the other. It would be possible if
desired to use #ifdef __s390__ around the disabling, though in general
we try to avoid that sort of thing in generic files. (Or some
extremely specialized macros for "disable -Wmaybe-uninitialized in
this particular place" could be specified, defined to 0 in a lot of
different files that include iconv/loop.c and to 1 in that particular
s390 file.)
Tested that this fixed -Os compilation for s390-linux-gnu with
build-many-glibcs.py.
* iconv/loop.c (UNICODE_TAG_HANDLER): Disable
-Wmaybe-uninitialized for -Os.
* sysdeps/s390/multiarch/8bit-generic.c (BODY): Add comment about
this disabling.
This patch defines _DIRENT_MATCHES_DIRENT64 to either 0 or 1 and adjust its
usage from checking its definition to its value.
Checked on a build for major Linux abis.
* bits/dirent.h (__INO_T_MATCHES_INO64_T): Define regardless whether
__INO_T_MATCHES_INO64_T is defined.
* sysdeps/unix/sysv/linux/bits/dirent.h: Likewise.
* dirent/alphasort.c: Check _DIRENT_MATCHES_DIRENT64 value instead
of definition.
* dirent/alphasort64.c: Likewise.
* dirent/scandir.c: Likewise.
* dirent/scandir64-tail.c: Likewise.
* dirent/scandir64.c: Likewise.
* dirent/scandirat.c: Likewise.
* dirent/scandirat64.c: Likewise.
* dirent/versionsort.c: Likewise.
* dirent/versionsort64.c: Likewise.
* include/dirent.h: Likewise.
Now that send might be implemented calling sendto syscall on Linux,
I am seeing some issue in some kernel configurations where tst-cancel4
sendto do not block as expected.
The socket used to force the syscall blocking is used with default
system configuration for buffer sending size, which might not be
suffice to force blocking. This patch fixes it by explicit setting
buffer socket lower than the buffer size used. It also enables sendto
cancellation tests to work in both ways (since internally send is
implemented routing to sendto on Linux kernel).
The patch also removes unrequired make rules on some archictures
for send/recv. The generic nptl Makefile already set the compiler flags
required on some architectures for correct unwinding and libc object
are not strictly required to support unwind (since pthread_cancel
requires linking against libpthread).
Checked on aarch64-linux-gnu and x86_64-linux-gnu. I also did a
sniff test with tst-cancel{4,5} on a simulated mips64-linux-gnu.
* nptl/tst-cancel4-common.h (set_socket_buffer): New function.
* nptl/tst-cancel4-common.c (do_test): Call set_socket_buffer
for socketpair endpoint.
* nptl/tst-cancel4.c (tf_send): Call set_socket_buffer and use
WRITE_BUFFER_SIZE as buffer size for sending socket.
(tf_sendto): Use SOCK_STREAM instead of SOCK_DGRAM and fix an
issue on system where send is implemented with sendto syscall.
* sysdeps/unix/sysv/linux/mips/mips64/Makefile [$(subdir) = socket]
(CFLAGS-recv.c, CFLAGS-send.c): Remove rules.
[$(subdir) = nptl] (CFLAGS-recv.c, CFLAGS-send.c): Likewise.
* sysdeps/unix/sysv/linux/riscv/rv64/Makefile: Remove file.
This patch fixes the i386 sa_restorer field initialization for sigaction
syscall for kernel with vDSO. As described in bug report, i386 Linux
(and compat on x86_64) interprets SA_RESTORER clear with nonzero
sa_restorer as a request for stack switching if the SS segment is 'funny'.
This means that anything that tries to mix glibc's signal handling with
segmentation (for instance through modify_ldt syscall) is randomly broken
depending on what values lands in sa_restorer.
The testcase added is based on Linux test tools/testing/selftests/x86/ldt_gdt.c,
more specifically in do_multicpu_tests function. The main changes are:
- C11 atomics instead of plain access.
- Remove x86_64 support which simplifies the syscall handling and fallbacks.
- Replicate only the test required to trigger the issue.
Checked on i686-linux-gnu.
[BZ #21269]
* sysdeps/unix/sysv/linux/i386/Makefile (tests): Add tst-bz21269.
* sysdeps/unix/sysv/linux/i386/sigaction.c (SET_SA_RESTORER): Clear
sa_restorer for vDSO case.
* sysdeps/unix/sysv/linux/i386/tst-bz21269.c: New file.
so interfaces needing it can get it.
* stdlib/errno.h (error_t): Move definition to...
* bits/types/error_t.h: ... new header.
* stdlib/Makefile (headers): Add bits/types/error_t.h.
* sysdeps/mach/hurd/bits/errno.h (error_t): Move definition to...
* sysdeps/mach/hurd/bits/types/error_t.h: ... new header.
* sysdeps/mach/hurd/errnos.awk (error_t): Likewise.
* hurd/hurd.h: Include <bits/types/error_t.h>
* hurd/hurd/fd.h: Include <bits/types/error_t.h>
* hurd/hurd/id.h: Include <errno.h> and <bits/types/error_t.h>
* hurd/hurd/lookup.h: Include <errno.h> and <bits/types/error_t.h>
* hurd/hurd/resource.h: Include <bits/types/error_t.h>
* hurd/hurd/signal.h: Include <bits/types/error_t.h>
* hurd/hurd/sigpreempt.h: Include <bits/types/error_t.h>
* hurd/hurd.h: Include <bits/types/sigset_t.h>
* hurd/hurd/fd.h: Include <sys/select.h> and <bits/types/sigset_t.h>
(_hurd_fd_read, _hurd_fd_write): Use __loff_t instead of loff_t.
* hurd/hurd/signal.h: Include <bits/types/stack_t.h> and
<bits/types/sigset_t.h>.
[!defined __USE_GNU]: Do not #error out.
(struct hurd_sigstate): Use _NSIG instead of NSIG.
* hurd/hurd/sigpreempt.h (__need_size_t): Define.
Include <stddef.h> and <bits/types/sigset_t.h>
(struct hurd_signal_preemptor, hurd_catch_signal): Use __sighandler_t
instead of sighandler_t.
mig_support does not actually inline the stpncpy any more.
* mach/mach/mig_support.h [defined __USE_GNU]: Do not #error out.
* scripts/check-installed-headers.sh: Do not ignore Hurd and Mach
headers.
thus making <hurd/port.h> and <hurd/userlink.h> includable without
_GNU_SOURCE.
* hurd/hurd/port.h: Do not include <hurd/signal.h>.
* hurd/hurd/userlink.h [!defined __USE_EXTERN_INLINES ||
!defined _LIBC || !IS_IN (libc)]: Do not include <hurd/signal.h>.
Compiling the testsuite for powerpc (multi-arch configurations) with
-Os with GCC 7 fails with:
In file included from ifuncmod1.c:7:0,
from ifuncdep1.c:3:
../sysdeps/powerpc/ifunc-sel.h: In function 'ifunc_sel':
../sysdeps/powerpc/ifunc-sel.h:12:3: error: asm operand 2 probably doesn't match constraints [-Werror]
__asm__ ("mflr 12\n\t"
^~~~~~~
../sysdeps/powerpc/ifunc-sel.h:12:3: error: asm operand 3 probably doesn't match constraints [-Werror]
../sysdeps/powerpc/ifunc-sel.h:12:3: error: asm operand 4 probably doesn't match constraints [-Werror]
../sysdeps/powerpc/ifunc-sel.h:12:3: error: impossible constraint in 'asm'
The "i" constraints on function pointers require the function call to
be inlined so the compiler can see the constant function pointer
arguments passed to the asm. This patch marks the relevant functions
as always_inline accordingly.
Tested that this fixes the -Os testsuite build for
powerpc-linux-gnu-power4, powerpc64-linux-gnu, powerpc64le-linux-gnu
with build-many-glibcs.py.
* sysdeps/powerpc/ifunc-sel.h (ifunc_sel): Make always_inline.
(ifunc_one): Likewise.
Unlike other nscd caches, the netgroup cache contains two types of
records - those for "iterate through a netgroup" (i.e. setnetgrent())
and those for "is this user in this netgroup" (i.e. innetgr()),
i.e. full and partial records. The timeout code assumes these records
have the same key for the group name, so that the collection of records
that is "this netgroup" can be expired as a unit.
However, the keys are not the same, as the in-netgroup key is generated
by nscd rather than being passed to it from elsewhere, and is generated
without the trailing NUL. All other keys have the trailing NUL, and as
noted in the linked BZ, debug statements confirm that two keys for the
same netgroup are added to the cache with two different lengths.
The result of this is that as records in the cache expire, the purge
code only cleans out one of the two types of entries, resulting in
stale, possibly incorrect, and possibly inconsistent cache data.
The patch simply includes the existing NUL in the computation for the
key length ('key' points to the char after the NUL, and 'group' to the
first char of the group, so 'key-group' includes the first char to the
NUL, inclusive).
[BZ #22342]
* nscd/netgroupcache.c (addinnetgrX): Include trailing NUL in
key value.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Complement commit c579f48edb ("Remove cached PID/TID in clone") and
remove the `match_pid' parameter not used by `iterate_thread_list' any
longer. Update call sites accordingly.
* nptl_db/td_ta_thr_iter.c (iterate_thread_list): Remove
`match_pid' parameter.
(td_ta_thr_iter): Update accordingly.
libpthread_nonshared.a is unused after this, so remove it from the
build.
There is no ABI impact because pthread_atfork was implemented using
__register_atfork in libc even before this change.
pthread_atfork has to be a weak alias because pthread_* names are not
reserved in libc.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
As discussed in bug 22902, the i386 fenv_private.h implementation has
problems for float128 for the case of 32-bit glibc built with libgcc
from GCC configured using --with-fpmath=sse.
The optimized floating-point state handling in fenv_private.h needs to
know which floating-point state - x87 or SSE - is used for each
floating-point type, so that only one state needs updating / testing
for libm code using that state internally. On 32-bit x86, the x87
rounding mode is always used for float128, but the x87 exception flags
are only used when libgcc is built using x87 floating-point
arithmetic; if libgcc is built for SSE arithmetic, the SSE exception
flags are used.
The choice of arithmetic with which libgcc is built is independent of
that with which glibc is built. Thus, since glibc cannot tell the
choice used in libgcc, the default implementations of
libc_feholdexcept_setroundf128 and libc_feupdateenv_testf128 (which
use the <fenv.h> functions, thus using both x87 and SSE state on
processors that have both) need to be used; this patch updates the
code accordingly.
Tested for 32-bit x86; HJ reports testing in the --with-fpmath=sse
case.
[BZ #22902]
* sysdeps/i386/fpu/fenv_private.h [!__x86_64__]
(libc_feholdexcept_setroundf128): New macro.
[!__x86_64__] (libc_feupdateenv_testf128): Likewise.