Commit a6a4395d fixed modf implementation by compiling s_modf.c and
s_modff.c with -fsignaling-nans. However these files are also included
from the pre-POWER5+ implementation, and thus these files should also
be compiled with -fsignaling-nans.
Changelog:
[BZ #20240]
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(CFLAGS-s_modf-ppc32.c): New variable.
(CFLAGS-s_modff-ppc32.c): Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(CFLAGS-s_modf-ppc64.c): Likewise.
(CFLAGS-s_modff-ppc64.c): Likewise.
Fix the postal_fmt and country_name entries to continue on the following
line without indentation.
localedata/Changelog:
* locales/de_LI (postal_fmt): Fix indentation.
(country_name): Likewise.
In Linux/ARM environment, a robust mutex can't catch the timeout result
when it is already owned by other thread and requests to try lock with
a specific time value(pthread_mutex_timedlock). The futex already returns
the ETIMEDOUT result but there is no check the return value and it makes
a deadlock.
* nptl/lowlevelrobustlock.c: Implement ETIMEDOUT logic.
The Principality of Liechtenstein currently does not have a corresponding
locale. Given the links with Switzerland, the best is to base the locale
on the de_CH one (German is the official language) and only change the
country related categories: LC_ADDRESS. and LC_TELEPHONE.
localedata/Changelog:
* locales/de_LI: New locale.
* SUPPORTED: Add de_LI.
On s390, the current prelink undo code in elf_machine_lazy_rel()
has the requirement, that the plt stubs use the first got slots
after the 3 reserved ones.
In case of undoing prelink, the plt got slots are reset to the correct
addresses whithin the corresponding plt-stub. Therefore the address
is calculated by the address of the first plt-stub-address which
was written by prelink (see l->l_mach.plt) to got[1] and index of
current relocation multiplied with 32 (=size of one plt slot).
The index was calculated with ¤t-got-slot - &got[3].
This patch removes the requirement, that the plt-got-slots are
starting at got[3]. The index is now calculated with
¤t-reloc - &reloc[0]. The first struct Elf64_Rela is stored
at DT_JMPREL.
This patch is needed to prepare for partial relro support.
Ulrich Weigand suggested this approach to use DT_JMPREL - Thanks.
ChangeLog:
* sysdeps/s390/linkmap.h (struct link_map_machine):
Remove member gotplt and add member jmprel.
* sysdeps/s390/s390-32/dl-machine.h
(elf_machine_runtime_setup): Setup member jmprel with DT_JMPREL
instead of gotplt with &got[3].
(elf_machine_lazy_rel): Calculate address with reloc and jmprel.
* sysdeps/s390/s390-64/dl-machine.h: Likewise.
* libio/iofopncook.c (_IO_cookie_read, _IO_cookie_write,
_IO_cookie_seek, _IO_cookie_close, _IO_old_cookie_seek)
[!PTR_DEMANGLE]: Do not call PTR_DEMANGLE.
(set_callbacks) [!PTR_MANGLE]: Do not call PTR_MANGLE.
* libio/vtables.c (_IO_vtable_check)
[!PTR_DEMANGLE]: Do not call PTR_DEMANGLE.
* libio/libioP.h (IO_set_accept_foreign_vtables)
[!PTR_MANGLE]: Do not call PTR_MANGLE.
If C++ headers <cstdlib> or <cmath> are used, GCC 6 will include
/usr/include/stdlib.h or /usr/include/math.h from "#include_next"
(instead of stdlib/stdlib.h or math/math.h in the glibc source
directory), and this turns up as a make dependency. An implicit
rule will kick in and make will try to install stdlib/stdlib.h or
math/math.h as /usr/include/stdlib.h or /usr/include/math.h because
the target is out of date. We make a copy of <cstdlib> and <cmath>
in the glibc build directory so that stdlib/stdlib.h and math/math.h
will be used instead of /usr/include/stdlib.h and /usr/include/math.h.
[BZ #20314]
* Makeconfig (CXXFLAGS): Prepend -I$(common-objpfx).
* Makerules (before-compile): Add $(common-objpfx)cstdlib and
$(common-objpfx)cmath.
($(common-objpfx)cstdlib): New target.
($(common-objpfx)cmath): Likewise.
Right now tilegx is right on the verge of timeout when it runs,
so adding a bit of headroom seems like the right thing; we
see failures when running tests in parallel.
If the input values are unaligned and if there are null characters in the
memory before the starting address of the input values, strcasecmp
gives incorrect return code. Fixed it by adding mask the bits that
are not part of the string.
This patch adds early cancel test for open syscall through a FIFO
(thus makign subsequent call to open block until the other end is
also opened).
It also cleanup the sigpause tests by using sigpause along with
SIGINT instead of __xpg_sigpause and SIGCANCEL. Since the idea
is just to test the cancellation handling there is no need to expose
internal glibc implementation details to the test through pthreadP.h
inclusion.
Tested x86_64.
* nptl/tst-cancel4-common.c (do_test): Add temporary fifo creation.
* nptl/tst-cancel4-common.h (fifoname): New variable.
(fifofd): Likewise.
(cl_fifo): New function.
* nptl/tst-cancel4.c (tf_sigpause): Replace SIGCANCEL usage by
SIGINT.
(tf_open): Add early cancel test.
In a reference to PR ld/19908 make ld.so respect symbol export classes
aka visibility and treat STV_HIDDEN and STV_INTERNAL symbols as local,
preventing such symbols from preempting exported symbols.
According to the ELF gABI[1] neither STV_HIDDEN nor STV_INTERNAL symbols
are supposed to be present in linked binaries:
"A hidden symbol contained in a relocatable object must be either
removed or converted to STB_LOCAL binding by the link-editor when the
relocatable object is included in an executable file or shared object."
"An internal symbol contained in a relocatable object must be either
removed or converted to STB_LOCAL binding by the link-editor when the
relocatable object is included in an executable file or shared object."
however some GNU binutils versions produce such symbols in some cases.
PR ld/19908 is one and we also have this note in scripts/abilist.awk:
so clearly there is linked code out there which contains such symbols
which is prone to symbol table misinterpretation, and it'll be more
productive if we handle this gracefully, under the Robustness Principle:
"be liberal in what you accept, and conservative in what you produce",
especially as this is a simple (STV_HIDDEN|STV_INTERNAL) => STB_LOCAL
mapping.
References:
[1] "System V Application Binary Interface - DRAFT - 24 April 2001",
The Santa Cruz Operation, Inc., "Symbol Table",
<http://www.sco.com/developers/gabi/2001-04-24/ch4.symtab.html>
* sysdeps/generic/ldsodefs.h
(dl_symbol_visibility_binds_local_p): New inline function.
* elf/dl-addr.c (determine_info): Treat hidden and internal
symbols as local.
* elf/dl-lookup.c (do_lookup_x): Likewise.
* elf/dl-reloc.c (RESOLVE_MAP): Likewise.
nearbyint and nearbyintf should not trigger inexact exceptions, but
should still trigger an invalid exception for a sNaN input.
The SPARC specific implementations of these functions save the FSR at
the beginning of the function and restore it at the end to not trigger
an inexact exception. This however doesn't work for an sNaN input which
need to trigger an invalid exception. Fix that by adding a fcmp
instruction using the input value before saving FSR, so that an invalid
exception is triggered for a sNaN input.
This fixes the math/test-nearbyint-except test on SPARC.
Changelog:
* sparc/sparc32/sparcv9/fpu/s_nearbyint.S (__nearbyint): Trigger an
invalid exception for a sNaN input.
* sparc/sparc32/sparcv9/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyint-vis3.S
(__nearbyint_vis3): Likewise
* sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyintf-vis3.S
(__nearbyintf_vis3): Likewise
* sparc/sparc64/fpu/s_nearbyint.S (__nearbyint): Likewise.
* sparc/sparc64/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sparc/sparc64/fpu/multiarch/s_nearbyint-vis3.S (__nearbyint_vis3):
Likewise.
* sparc/sparc64/fpu/multiarch/s_nearbyintf-vis3.S (__nearbyintf_vis3):
Likewise.
If assembler doesn't support AVX512DQ, _dl_runtime_resolve_avx is used
to save the first 8 vector registers, which only saves the lower 256
bits of vector register, for lazy binding. When it is called on AVX512
platform, the upper 256 bits of ZMM registers are clobbered. Parameters
passed in ZMM registers will be wrong when the function is called the
first time. This patch requires binutils 2.24, whose assembler can store
and load ZMM registers, to build x86-64 glibc. Since mathvec library
needs assembler support for AVX512DQ, we disable mathvec if assembler
doesn't support AVX512DQ.
[BZ #20139]
* config.h.in (HAVE_AVX512_ASM_SUPPORT): Renamed to ...
(HAVE_AVX512DQ_ASM_SUPPORT): This.
* sysdeps/x86_64/configure.ac: Require assembler from binutils
2.24 or above.
(HAVE_AVX512_ASM_SUPPORT): Removed.
(HAVE_AVX512DQ_ASM_SUPPORT): New.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/dl-trampoline.S: Make HAVE_AVX512_ASM_SUPPORT
check unconditional.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Likewise.
* sysdeps/x86_64/multiarch/memcpy.S: Likewise.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S:
Likewise.
* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memmove.S: Likewise.
* sysdeps/x86_64/multiarch/memmove_chk.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
Likewise.
* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
Likewise.
* sysdeps/x86_64/multiarch/memset.S: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S: Check
HAVE_AVX512DQ_ASM_SUPPORT instead of HAVE_AVX512_ASM_SUPPORT.
* sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_log8_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx51:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S:
Likewise.
current vector function declaration "#pragma omp declare simd notinbranch",
according to which vector sincos should have vector of pointers for second and
third parameters. It is fixed with implementation as wrapper to version
having second and third parameters as pointers.
[BZ #20024]
* sysdeps/x86/fpu/test-math-vector-sincos.h: New.
* sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S: Fixed ABI
of this implementation of vector function.
* sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S:
Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S: Likewise.
* sysdeps/x86_64/fpu/svml_d_sincos2_core.S: Likewise.
* sysdeps/x86_64/fpu/svml_d_sincos4_core.S: Likewise.
* sysdeps/x86_64/fpu/svml_d_sincos4_core_avx.S: Likewise.
* sysdeps/x86_64/fpu/svml_d_sincos8_core.S: Likewise.
* sysdeps/x86_64/fpu/svml_s_sincosf16_core.S: Likewise.
* sysdeps/x86_64/fpu/svml_s_sincosf4_core.S: Likewise.
* sysdeps/x86_64/fpu/svml_s_sincosf8_core.S: Likewise.
* sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Use another wrapper
for testing vector sincos with fixed ABI.
* sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx.c: New test.
* sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx2.c: Likewise.
* sysdeps/x86_64/fpu/test-double-libmvec-sincos-avx512.c: Likewise.
* sysdeps/x86_64/fpu/test-double-libmvec-sincos.c: Likewise.
* sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx.c: Likewise.
* sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx2.c: Likewise.
* sysdeps/x86_64/fpu/test-float-libmvec-sincosf-avx512.c: Likewise.
* sysdeps/x86_64/fpu/test-float-libmvec-sincosf.c: Likewise.
* sysdeps/x86_64/fpu/Makefile: Added new tests.
Commits d81f90cc and 89faa0340 replaced called to __isnan and __isinf
by the corresponding GCC builtins. In turns GCC emits calls to _Qp_cmp.
We should therefore add _Qp_cmp to localplt.data as otherwise the
elf/check-localplt test fails with:
Extra PLT reference: libc.so: _Qp_cmp
A similar change has already been done for SPARC32 in commit 6ef1cb95.
Changelog:
* sysdeps/unix/sysv/linux/sparc/sparc64/localplt.data: Add _Qp_cmp.
This implementation is based on the one already used at
sysdeps/x86_64/fpu/e_expf.S.
This implementation improves the performance by ~14% on average in synthetic
benchmarks at the cost of decreasing accuracy to 1 ULP.
The patched change fixes a regression for executables compiled with the
-p option and linked with gcrt1.o. The executables crash on startup.
This regression was introduced in 2.22 and was noticed in the gcc testsuite.
Although the Enhanced REP MOVSB/STOSB (ERMS) implementations of memmove,
memcpy, mempcpy and memset aren't used by the current processors, this
patch adds Prefer_ERMS check in memmove, memcpy, mempcpy and memset so
that they can be used in the future.
* sysdeps/x86/cpu-features.h (bit_arch_Prefer_ERMS): New.
(index_arch_Prefer_ERMS): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Return
__memcpy_erms for Prefer_ERMS.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
(__memmove_erms): Enabled for libc.a.
* ysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Return
__memmove_erms or Prefer_ERMS.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Return
__mempcpy_erms for Prefer_ERMS.
* sysdeps/x86_64/multiarch/memset.S (memset): Return
__memset_erms for Prefer_ERMS.
tst-cleanupx4 is linked with tst-cleanupx4.o and tst-cleanup4aux.o.
Since tst-cleanupx4.o is compiled from tst-cleanup4.c with -fexceptions,
tst-cleanup4aux.c should also be compiled with -fexceptions.
Tested on x86-64 and i686.
[BZ #18645]
* nptl/Makefile (extra-test-objs): Add tst-cleanupx4aux.o.
(test-extras): Add tst-cleanupx4aux.
(CFLAGS-tst-cleanupx4aux.c): New. Set to -fexceptions.
($(objpfx)tst-cleanupx4): Replace tst-cleanup4aux.o with
tst-cleanupx4aux.o.
* nptl/tst-cleanupx4aux.c: New file.
The EM_BPF number has been officially assigned, though it
has not yet been posted to the gabi webpage yet.
* elf/elf.h (EM_BPF): New.
(EM_NUM): Update.
(R_BPF_NONE, R_BPF_MAP_FD): New.
With shared libc, all locale categories are always loaded.
For static libc they aren't, but there exist a weak
_nl_current_LC_CATEGORY_used symbol for each category.
If the category is used, the locale/lc-CATEGORY.o is linked in
where _NL_CURRENT_DEFINE (LC_CATEGORY) defines and sets the
_nl_current_LC_CATEGORY_used symbol to one.
As reported by Marcin
"Bug 18960 - s390: _nl_locale_subfreeres uses larl opcode on misaligned symbol"
(https://sourceware.org/bugzilla/show_bug.cgi?id=18960)
In function _nl_locale_subfreeres (locale/setlocale.c) for each category
a check - &_nl_current_LC_CATEGORY_used != 0 - decides whether the category
is used or not.
There is also a second usage with the same mechanism in function __uselocale
(locale/uselocale.c).
On s390 a larl instruction with R_390_PC32DBL relocation is used to
get the address of _nl_current_LC_CATEGORY_used symbols. As larl loads the
address relative in halfwords and the code is always 2-byte aligned,
larl can only load even addresses.
At the end, the relocated address is always zero and never one.
Marcins patch (see bugzilla) uses the following declaration in locale/setlocale.c:
extern char _nl_current_##category##_used __attribute__((__aligned__(1)));
In function _nl_locale_subfreeres all categories are checked and therefore gcc
is now building an array of addresses in rodata section with an R_390_64
relocation for every address. This array is loaded with larl instruction and
each address is accessed by index.
This fixes only the usage in _nl_locale_subfreeres. Each user has to add the
alignment attribute.
This patch set the _nl_current_LC_CATEGORY_used symbols to two instead of one.
This way gcc can use larl instruction and the check against zero works on
every usage.
ChangeLog:
[BZ #19860]
* locale/localeinfo.h (_NL_CURRENT_DEFINE):
Set _nl_current_LC_CATEGORY_used to two instead of one.
For some reasons I have not investigated yet, tst-mode-switch-1 hangs on
a MIPS UTM-8 machine running an o32 userland and a 3.6.1 kernel.
This patch changes the test so that it runs under the test-skeleton
framework, causing the test to fail after a timeout instead of hanging
the whole testsuite. At the same time, also change the tst-mode-switch-2
and tst-mode-switch-3 tests.
Changelog:
* sysdeps/mips/tst-mode-switch-1.c (main): Converted to ...
(do_test): ... this.
(TEST_FUNCTION): New macro.
Include test-skeleton.c.
* sysdeps/mips/tst-mode-switch-2.c (main): Likewise.
* sysdeps/mips/tst-mode-switch-3.c (main): Likewise.
As discussed in
<https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS
18661-1 disallows ceil, floor, round and trunc functions from raising
the "inexact" exception, in accordance with general IEEE 754 semantics
for when that exception is raised. Fixing this for x87 floating point
is more complicated than for the other versions of these functions,
because they use the frndint instruction that raises "inexact" and
this can only be avoided by saving and restoring the whole
floating-point environment.
As I noted in
<https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have
now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7,
such that GCC will inline these functions on x86, without caring about
"inexact", when the default -ffp-int-builtin-inexact is in effect.
This allows users to get optimized code depending on the options they
pass to the compiler, while making the out-of-line functions follow TS
18661-1 semantics and avoid "inexact".
This patch duly fixes the out-of-line trunc function implementations
to avoid "inexact", in the same way as the nearbyint implementations.
I do not know how the performance of implementations such as these
based on saving the environment and changing the rounding mode
temporarily compares to that of the C versions or SSE 4.1 versions (of
course, for 32-bit x86 SSE implementations still need to get the
return value in an x87 register); it's entirely possible other
implementations could be faster in some cases.
Tested for x86_64 and x86.
[BZ #15479]
* sysdeps/i386/fpu/s_trunc.S (__trunc): Save and restore
floating-point environment rather than just control word.
* sysdeps/i386/fpu/s_truncf.S (__truncf): Likewise.
* sysdeps/i386/fpu/s_truncl.S (__truncl): Save and restore
floating-point environment, with "invalid" exceptions merged in,
rather than just control word.
* sysdeps/x86_64/fpu/s_truncl.S (__truncl): Likewise.
* math/libm-test.inc (trunc_test_data): Do not allow spurious
"inexact" exceptions.
As discussed in
<https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS
18661-1 disallows ceil, floor, round and trunc functions from raising
the "inexact" exception, in accordance with general IEEE 754 semantics
for when that exception is raised. Fixing this for x87 floating point
is more complicated than for the other versions of these functions,
because they use the frndint instruction that raises "inexact" and
this can only be avoided by saving and restoring the whole
floating-point environment.
As I noted in
<https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have
now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7,
such that GCC will inline these functions on x86, without caring about
"inexact", when the default -ffp-int-builtin-inexact is in effect.
This allows users to get optimized code depending on the options they
pass to the compiler, while making the out-of-line functions follow TS
18661-1 semantics and avoid "inexact".
This patch duly fixes the out-of-line floor function implementations
to avoid "inexact", in the same way as the nearbyint implementations.
I do not know how the performance of implementations such as these
based on saving the environment and changing the rounding mode
temporarily compares to that of the C versions or SSE 4.1 versions (of
course, for 32-bit x86 SSE implementations still need to get the
return value in an x87 register); it's entirely possible other
implementations could be faster in some cases.
Tested for x86_64 and x86.
[BZ #15479]
* sysdeps/i386/fpu/s_floor.S (__floor): Save and restore
floating-point environment rather than just control word.
* sysdeps/i386/fpu/s_floorf.S (__floorf): Likewise.
* sysdeps/i386/fpu/s_floorl.S (__floorl): Save and restore
floating-point environment, with "invalid" exceptions merged in,
rather than just control word.
* sysdeps/x86_64/fpu/s_floorl.S (__floorl): Likewise.
* math/libm-test.inc (floor_test_data): Do not allow spurious
"inexact" exceptions.
As discussed in
<https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS
18661-1 disallows ceil, floor, round and trunc functions from raising
the "inexact" exception, in accordance with general IEEE 754 semantics
for when that exception is raised. Fixing this for x87 floating point
is more complicated than for the other versions of these functions,
because they use the frndint instruction that raises "inexact" and
this can only be avoided by saving and restoring the whole
floating-point environment.
As I noted in
<https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have
now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7,
such that GCC will inline these functions on x86, without caring about
"inexact", when the default -ffp-int-builtin-inexact is in effect.
This allows users to get optimized code depending on the options they
pass to the compiler, while making the out-of-line functions follow TS
18661-1 semantics and avoid "inexact".
This patch duly fixes the out-of-line ceil function implementations to
avoid "inexact", in the same way as the nearbyint implementations.
I do not know how the performance of implementations such as these
based on saving the environment and changing the rounding mode
temporarily compares to that of the C versions or SSE 4.1 versions (of
course, for 32-bit x86 SSE implementations still need to get the
return value in an x87 register); it's entirely possible other
implementations could be faster in some cases.
Tested for x86_64 and x86.
[BZ #15479]
* sysdeps/i386/fpu/s_ceil.S (__ceil): Save and restore
floating-point environment rather than just control word.
* sysdeps/i386/fpu/s_ceilf.S (__ceilf): Likewise.
* sysdeps/i386/fpu/s_ceill.S (__ceill): Save and restore
floating-point environment, with "invalid" exceptions merged in,
rather than just control word.
* sysdeps/x86_64/fpu/s_ceill.S (__ceill): Likewise.
* math/libm-test.inc (ceil_test_data): Do not allow spurious
"inexact" exceptions.
Commit 43c29487 tried to fix the vfork aliases in libpthread.so on MIPS
and SPARC, but failed to do it correctly, introducing an ABI change.
This patch does the remaining changes needed to align the MIPS and SPARC
vfork implementations with the other architectures. That way the the
alpha version of pt-vfork.S works correctly for MIPS and SPARC. The
changes for alpha were done in 82aab97c.
Changelog:
* sysdeps/unix/sysv/linux/mips/vfork.S (__vfork): Rename into
__libc_vfork.
(__vfork) [IS_IN (libc)]: Remove alias.
(__libc_vfork) [IS_IN (libc)]: Define as an alias.
* sysdeps/unix/sysv/linux/sparc/sparc32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/vfork.S: Likewise.
atomic_compare_and_exchange_bool_rel and
catomic_compare_and_exchange_bool_rel are removed and replaced with the
new C11-like atomic_compare_exchange_weak_release. The concurrent code
in nscd/cache.c has not been reviewed yet, so this patch does not add
detailed comments.
* nscd/cache.c (cache_add): Use new C11-like atomic operation instead
of atomic_compare_and_exchange_bool_rel.
* nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_full): Likewise.
* include/atomic.h (atomic_compare_and_exchange_bool_rel,
catomic_compare_and_exchange_bool_rel): Remove.
* sysdeps/aarch64/atomic-machine.h
(atomic_compare_and_exchange_bool_rel): Likewise.
* sysdeps/alpha/atomic-machine.h
(atomic_compare_and_exchange_bool_rel): Likewise.
* sysdeps/arm/atomic-machine.h
(atomic_compare_and_exchange_bool_rel): Likewise.
* sysdeps/mips/atomic-machine.h
(atomic_compare_and_exchange_bool_rel): Likewise.
* sysdeps/tile/atomic-machine.h
(atomic_compare_and_exchange_bool_rel): Likewise.
The x86_64 and i386 versions of scalbl return sNaN for some cases of
sNaN input and are missing "invalid" exceptions for other cases. This
results from overly complicated code that either returns a NaN input,
or discards both inputs when one is NaN and loads a NaN from memory.
This patch fixes this by simplifying the code to add the arguments
when either one is NaN.
Tested for x86_64 and x86.
[BZ #20296]
* sysdeps/i386/fpu/e_scalbl.S (__ieee754_scalbl): Add arguments
when either argument is a NaN.
* sysdeps/x86_64/fpu/e_scalbl.S (__ieee754_scalbl): Likewise.
* math/libm-test.inc (scalb_test_data): Add sNaN tests.
This patch adds tests of sNaN inputs to more functions to
libm-test.inc. This covers the remaining real functions except for
scalb, where there's a bug to fix, and hypot pow fmin fmax, where
there are cases where a qNaN input does not result in a qNaN output
and so sNaN support according to TS 18661-1 is more of a new feature.
Tested for x86_64 and x86.
* math/libm-test.inc (snan_value_ld): New macro.
(isgreater_test_data): Add sNaN tests.
(isgreaterequal_test_data): Likewise.
(isless_test_data): Likewise.
(islessequal_test_data): Likewise.
(islessgreater_test_data): Likewise.
(isunordered_test_data): Likewise.
(nextafter_test_data): Likewise.
(nexttoward_test_data): Likewise.
(remainder_test_data): Likewise.
(remquo_test_data): Likewise.
(significand_test_data): Likewise.
* math/gen-libm-test.pl (%beautify): Add snan_value_ld.
getconf has the capability to do a runtime check for environment
support in cases where there is optional support for an environment
(_POSIX_V7_ILP32_OFF32 on x86_64 for example) and this is indicated by
not defining the _POSIX_V7_ILP32_OFF32 macro, which results in getconf
doing an additional execve of _POSIX_V7_ILP32_OFF32 in the
$GETCONF_DIR.
The default bits/environments.h however does not leave any environment
macros undefined, which means that no such additional execve is
needed. gcc trunk catches this as a build failure since it finds that
the code block inside switch(specs[i].num) is not reachable. Avoid
this error by not bothering about the additional exec (and looking in
specific environments) when all environments are defined.
Tested on aarch64.
* posix/getconf.c: Define ALL_ENVIRONMENTS_DEFINED if all
environment macros are defined.
(main): Avoid execve if ALL_ENVIRONMENTS_DEFINED is defined.
This commit puts all libio vtables in a dedicated, read-only ELF
section, so that they are consecutive in memory. Before any indirect
jump, the vtable pointer is checked against the section boundaries,
and the process is terminated if the vtable pointer does not fall into
the special ELF section.
To enable backwards compatibility, a special flag variable
(_IO_accept_foreign_vtables), protected by the pointer guard, avoids
process termination if libio stream object constructor functions have
been called earlier. Such constructor functions are called by the GCC
2.95 libstdc++ library, and this mechanism ensures compatibility with
old binaries. Existing callers inside glibc of these functions are
adjusted to call the original functions, not the wrappers which enable
vtable compatiblity.
The compatibility mechanism is used to enable passing FILE * objects
across a static dlopen boundary, too.
If the requested size is zero, realloc returns NULL, but the
deallocation is still successful, unless the pointer is also
NULL, when realloc behaves as malloc (0).
__attribute__ ((used)) means that the function has to be
emitted in assembly because it is referenced in ways the
compiler cannot detect (such as asm statements, or some
post-processing on the generated assembly).
The unused attribute needs to come first, otherwise it is
applied to the return type and not the function definition.
The i386 implementations of nearbyint functions, and x86_64
nearbyintl, contain code to mask the "inexact" exception. However,
the fnstenv instruction has the effect of masking all exceptions, so
this masking code has been redundant since fnstenv was added to those
implementations (by commit 846d9a4a3acdb4939ca7bf6aed48f9f6f26911be;
commit 71d1b0166b added the test
math/test-nearbyint-except-2.c that verifies these functions do work
when called with "inexact" traps enabled); this patch removes the
redundant code.
Tested for x86_64 and x86.
* sysdeps/i386/fpu/s_nearbyint.S (__nearbyint): Do not mask
"inexact" exceptions after fnstenv.
* sysdeps/i386/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sysdeps/i386/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
* sysdeps/x86_64/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
This file was added to sysdeps/generic/bits in 2012. This appears to
have been an oversight, as the entire sysdeps/generic/bits directory was
moved to the top level in 2005. Accordingly the generic bits/hwcap.h
belongs there too.
* sysdeps/generic/bits/hwcap.h: Moved to ...
* bits/hwcap.h: Here.
This file was added to sysdeps/generic/bits in 2012. This appears to
have been an oversight, as the entire sysdeps/generic/bits directory was
moved to the top level in 2005. Accordingly the generic bits/hwcap.h
belongs there too.
* sysdeps/generic/bits/hwcap.h: Moved to ...
* bits/hwcap.h: Here.