This patches consolidates all the powerpc llrint{f} implementations on
the generic sysdeps/powerpc/fpu/s_llrint{f}.
The IFUNC support is also moved only to powerpc64 only, since for
powerpc64le generic implementation resulting in optimized code.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_llrint-power8, s_llrint-power6x, and
s_llrint-ppc64.
(CFLAGS-s_llrint-power8.c, CFLAGS-s_llrint-power6x.c): New rule.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power6x.c: New
file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-power8.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_lrint.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrint.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrintf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_llrintf.c: ... here.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_lrint.c: New file.
* sysdeps/powerpc/powerpc64/fpu/Makefile: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_llrint-* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power6x.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-power8.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_llrint-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llrint.c: New file.
* sysdeps/powerpc/powerpc64/fpu/s_llrintf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lrint.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lrintf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Remove file.
* sysdeps/powerpc/powerpc64/fpu/s_llrintf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_lrint.S: Likewise.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_llrint.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_llrint.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
The identifier linux is used as a predefined macro, so the actually
used path is 1/stat.h or 1/stat64.h. Using the quote-based version
triggers a file lookup for /usr/include/bits/linux/stat.h (or whatever
directory is used to store bits/statx.h), but since bits/ is pretty
much reserved by glibc, this appears to be acceptable.
This is related to GCC PR 80005: incorrect macro expansion of the
argument of __has_include.
Suggested by Zack Weinberg.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This is currently ineffective with GCC because of GCC PR 80005, but
it makes sense to anticipate a fix for this defect.
Suggested by Zack Weinberg.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch adds the new constant IPV6_ROUTER_ALERT_ISOLATE from Linux
5.1 to sysdeps/unix/sysv/linux/bits/in.h.
Tested for x86_64.
* sysdeps/unix/sysv/linux/bits/in.h (IPV6_ROUTER_ALERT_ISOLATE):
New macro.
Some recent change on GCC mainline resulted in the localplt test
failing for powerpc soft-float (not sure exactly when, as the failure
appeared when there were other build test failures as well;
<https://sourceware.org/ml/libc-testresults/2019-q2/msg00261.html>
shows it remaining when other failures went away). The problem is a
call to memset that GCC now generates in the libgcc long double code.
Since memset is documented as a function GCC may always implicitly
generate calls to, it seems reasonable to allow that local PLT
reference (just like those for libgcc functions that GCC implicitly
generates calls to and that are also exported from libc.so), which
this patch does.
Tested for powerpc soft-float with build-many-glibcs.py.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/localplt.data:
Allow memset in libc.so.
Avoid lazy binding of symbols that may follow a variant PCS with different
register usage convention from the base PCS.
Currently the lazy binding entry code does not preserve all the registers
required for AdvSIMD and SVE vector calls. Saving and restoring all
registers unconditionally may break existing binaries, even if they never
use vector calls, because of the larger stack requirement for lazy
resolution, which can be significant on an SVE system.
The solution is to mark all symbols in the symbol table that may follow
a variant PCS so the dynamic linker can handle them specially. In this
patch such symbols are always resolved at load time, not lazily.
So currently LD_AUDIT for variant PCS symbols are not supported, for that
the _dl_runtime_profile entry needs to be changed e.g. to unconditionally
save/restore all registers (but pass down arg and retval registers to
pltentry/exit callbacks according to the base PCS).
This patch also removes a __builtin_expect from the modified code because
the branch prediction hint did not seem useful.
* sysdeps/aarch64/dl-dtprocnum.h: New file.
* sysdeps/aarch64/dl-machine.h (DT_AARCH64): Define.
(elf_machine_runtime_setup): Handle DT_AARCH64_VARIANT_PCS.
(elf_machine_lazy_rel): Check STO_AARCH64_VARIANT_PCS and bind such
symbols at load time.
* sysdeps/aarch64/linkmap.h (struct link_map_machine): Add variant_pcs.
STO_AARCH64_VARIANT_PCS is a non-visibility st_other flag for marking
symbols that reference functions that may follow a variant PCS with
different register usage convention from the base PCS.
DT_AARCH64_VARIANT_PCS is a dynamic tag that marks ELF modules that
have R_*_JUMP_SLOT relocations for symbols marked with
STO_AARCH64_VARIANT_PCS (i.e. have variant PCS calls via a PLT).
* elf/elf.h (STO_AARCH64_VARIANT_PCS): Define.
(DT_AARCH64_VARIANT_PCS): Define.
The powerpc finite optimization do not show much gain:
- GCC will call libm iff -fsignaling-nans is used. This usage pattern
is usually not performance oriented and for such calls PLT overhead
should dominate execution time.
- The power7 uses ftdiv to optimize for some input patterns, but at
cost of others. Comparing against generic C implementation built
for powerpc64-linux-gnu-power7 (--with-cpu=power7):
- Generic sysdeps/ieee754 implementation:
"isfinite": {
"": {
"duration": 5.0082e+09,
"iterations": 2.45299e+09,
"max": 43.824,
"min": 2.008,
"mean": 2.04167
},
"INF": {
"duration": 4.66554e+09,
"iterations": 2.28288e+09,
"max": 35.73,
"min": 2.008,
"mean": 2.04371
},
"NAN": {
"duration": 4.66274e+09,
"iterations": 2.28716e+09,
"max": 34.161,
"min": 2.009,
"mean": 2.03866
}
}
- power7 optimized one:
"isfinite": {
"": {
"duration": 4.99111e+09,
"iterations": 2.65566e+09,
"max": 25.015,
"min": 1.716,
"mean": 1.87942
},
"INF": {
"duration": 4.6783e+09,
"iterations": 2.0999e+09,
"max": 35.264,
"min": 1.868,
"mean": 2.22787
},
"NAN": {
"duration": 4.67915e+09,
"iterations": 2.08678e+09,
"max": 38.099,
"min": 1.869,
"mean": 2.24228
}
}
So it basically optimizes marginally for normal numbers while
increasing the latency for other kind of FP.
- The power8 implementation is just the generic implementation using
ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
So generic implementation is the best option for powerpc64le.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdeps_routines, libm-sysdep_routines): Remove s_finite*
objects.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-power7.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef.c: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_finite.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_finitef.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call):
Remove s_finite* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power7.S: Remove file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef.c: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_finite.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_finitef.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_finite.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_finitef.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
- math.h will use compiler builtin for gcc 4.4 when built without
-fsignaling-nans and the builtin is expanded inline for all
support architectures. As an example, there is no intra finite
call on libm for the architecture I checked, x86, arm, aarch64,
and powerpc.
- The resulting binary difference on 32 bits architecture is minimum
for the non hotspot symbol.
- It helps wordsize-64 architectures that use ldbl-opt.
- It add some code simplification with reduction of duplicated
implementations.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/ieee754/dbl-64/wordsize-64/s_finite.c: Move to ...
* sysdeps/ieee754/dbl-64/s_finite.c: ... here and format code.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
The powerpc isinf optimizations onyl adds complexity:
- GCC will call libm iff -fsignaling-nans is used. This usage pattern
is usually not performance oriented and for such calls PLT overhead
should dominate execution time.
- The power7 uses ftdiv to optimize for some input pattern and branch
implementation for INF and denormal that does:
return (ix & UINT64_C (0x7fffffffffffffff)) == UINT64_C (0x7ff0000000000000)
Although it does show slight better latency than generic algorithm
(as below), it is only for power7 and requires it to override it
for power8.
- The power8 implementation is just the generic implementation using
ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
So generic implementation is the best option for powerpc64le.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdeps_routines, libm-sysdep_routines): Remove s_isinf* and s_isinf*
objects.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-power7.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff.c: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isinf.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isinff.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_call):
Remove s_isinf* and s_isinf* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff.c: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isinf.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isinff.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isinf.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isinff.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
- math.h will use compiler builtin for gcc 4.4 when built without
-fsignaling-nans and the builtin is expanded inline for all
support architectures. As an example, there is no intra isinf
call on libm for the architecture I checked, x86, arm, aarch64,
and powerpc.
- The resulting binary difference on 32 bits architecture is minimum
for the non hotspot symbol.
- It helps wordsize-64 architectures that use ldbl-opt.
- It add some code simplification with reduction of duplicated
implementations.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/ieee754/dbl-64/wordsize-64/s_isinf.c: Move to ...
* sysdeps/ieee754/dbl-64/s_isinf.c: ... here and format code.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
The powerpc isnan optimizations are not really a gain:
- GCC will call libm iff -fsignaling-nans is used. This usage pattern
is usually not performance oriented and for such calls PLT overhead
should dominate execution time.
- The power5, power6, and power6x are just micro-optimization to
improve the Load-Hit-Store hazards from floating-point to general
register transfer, and current GCC already has support to minimize
it by inserting either extra nops or group dispatch instructions.
- The power7 uses ftdiv to optimize for some input patterns, but at
cost of others. Comparing against generic C implementation built
for powerpc-linux-gnu-power4 (which uses the hp-timing support on
benchtests):
- Generic sysdeps/ieee754 implementation:
"isnan": {
"": {
"duration": 4.98415e+09,
"iterations": 2.34516e+09,
"max": 45.925,
"min": 2.052,
"mean": 2.12529
},
"INF": {
"duration": 4.74057e+09,
"iterations": 1.69761e+09,
"max": 91.01,
"min": 2.052,
"mean": 2.79249
},
"NAN": {
"duration": 4.74071e+09,
"iterations": 1.68768e+09,
"max": 282.343,
"min": 2.052,
"mean": 2.809
}
}
- power7 optimized one:
$ ./testrun.sh benchtests/bench-isnan
"isnan": {
"": {
"duration": 4.96842e+09,
"iterations": 2.56297e+09,
"max": 50.048,
"min": 1.872,
"mean": 1.93854
},
"INF": {
"duration": 4.76648e+09,
"iterations": 1.54213e+09,
"max": 373.408,
"min": 2.661,
"mean": 3.09084
},
"NAN": {
"duration": 4.76845e+09,
"iterations": 1.54515e+09,
"max": 51.016,
"min": 2.736,
"mean": 3.08607
}
}
So it basically optimizes marginally for normal numbers while
increasing the latency for other kind of FP.
- The generic implementation requires getting the floating point
status, disable the invalid operation bit, and restore the
floating-point status. Each operation is costly and requires
flushing the FP pipeline.
Using the same scenarion for the previous analysis:
"isnan": {
"": {
"duration": 5.08284e+09,
"iterations": 6.2898e+08,
"max": 41.844,
"min": 8.057,
"mean": 8.08108
},
"INF": {
"duration": 4.97904e+09,
"iterations": 6.16176e+08,
"max": 39.661,
"min": 8.057,
"mean": 8.08055
},
"NAN": {
"duration": 4.98695e+09,
"iterations": 5.95866e+08,
"max": 29.728,
"min": 8.345,
"mean": 8.36925
}
}
- The power8 implementation is just the generic implementation using
ISA 2.07 mfvsrd instruction (which GCC uses for generic implementation).
So generic implementation is the best option for powerpc64le.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_isnan.c: Remove file.
* sysdeps/powerpc/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdeps_routines, libm-sysdep_routines): Remove s_isnan-* and
s_isnanf-* objects.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power5.S:
Remove file
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-power7.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power5.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf-power6.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf.c: Likewise.
* sysdeps/powerpc/powerpc32/power5/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power5/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdep_calls):
Remove s_isnan-* and s_isnanf-* objects.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power5.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power6x.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnanf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power5/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power6x/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/fpu/s_isnanf.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isnan.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/fpu/s_isnanf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
- math.h will use compiler builtin for gcc 4.4 when built without
-fsignaling-nans and the builtin is expanded inline for all
support architectures. As an example, there is no intra isnan
call on libm for the architecture I checked, x86, arm, aarch64,
and powerpc.
- The resulting binary difference on 32 bits architecture is minimum
for the non hotspot symbol.
- It helps wordsize-64 architectures that use ldbl-opt.
- It add some code simplification with reduction of duplicated
implementations.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/ieee754/dbl-64/wordsize-64/s_isnan.c: Move to ...
* sysdeps/ieee754/dbl-64/s_isnan.c: ... here and format code.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
* benchtests/Makefile (bench-math): Add isnan, isinf, and isfinite.
(CFLAGS-bench-isnan.c, CFLAGS-bench-isinf.c,
CFLAGS-bench-isfinite.c): New rule.
* benchtests/isnan-input: New file.
* benchtests/isinf-input: New file.
* benchtests/isfinite-input: New file.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
GCC always expand copysign{f} for all possible cpus, so calling the libm
is only done if user explicitly states to disable the builtin (which is
done usually not for performance reason). So to provide ifunc variant
for copysign is just unrequired complexity, since libm will be called
on non-performance critical code.
This patch removes both powerpc32 and powerpc64 ifunc variants and
consolidates the powerpc implementation on
sysdeps/powerpc/fpu/s_copysign{f}.c using compiler builtins.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_copysign.c: New file.
* sysdeps/powerpc/fpu/s_copysignf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(sysdep_routines, libm-sysdep_routines): Remove s_copysign-power6 and
s_copysign-ppc32.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-power6.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysign.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_copysignf.c:
Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile (sysdeps_calls):
Remove s_copysign-power6 s_copysign-ppc64.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-power6.S:
Remove file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign-ppc64.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysign.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_copysignf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_copysignf.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_copysign.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/fpu/s_copysignf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
This patches consolidates all the powerpc rint{f} implementations on
the generic sysdeps/powerpc/fpu/s_rint{f}.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode,
round_to_integer_float, round_mode): Add RINT handling.
(reset_fenv_mode): New symbol.
* sysdeps/powerpc/fpu/s_rint.c (__rint): Use generic implementation.
* sysdeps/powerpc/fpu/s_rintf.c (__rintf): Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_rint.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_rintf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_rint.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
Now that there are no internal users of __sysctl left, it is possible
to add an unconditional deprecation warning to <sys/sysctl.h>.
To avoid a test failure due this warning in check-install-headers,
skip the test for sys/sysctl.h.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch significantly improves performance of memmem using a novel
modified Horspool algorithm. Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.
By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.
Small needles up to size 2 use a dedicated linear search. Very long needles
use the Two-Way algorithm (to avoid increasing stack size or slowing down
the common case, inlining is disabled).
The performance gain is 6.6 times on English text on AArch64 using random
needles with average size 8.
Tested against GLIBC testsuite and randomized tests.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* string/memmem.c (__memmem): Rewrite to improve performance.
This patch significantly improves performance of strstr using a novel
modified Horspool algorithm. Needles up to size 256 use a bad-character
table indexed by hashed pairs of characters to quickly skip past mismatches.
Long needles use a self-adapting filtering step to avoid comparing the whole
needle repeatedly.
By limiting the needle length to 256, the shift table only requires 8 bits
per entry, lowering preprocessing overhead and minimizing cache effects.
This limit also implies worst-case performance is linear.
Small needles up to size 3 use a dedicated linear search. Very long needles
use the Two-Way algorithm.
The performance gain using the improved bench-strstr on Cortex-A72 is 5.8
times basic_strstr and 3.7 times twoway_strstr.
Tested against GLIBC testsuite, randomized tests and the GNULIB strstr test
(https://git.savannah.gnu.org/cgit/gnulib.git/tree/tests/test-strstr.c).
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
* string/str-two-way.h (two_way_short_needle): Add inline to avoid
warning.
(two_way_long_needle): Block inlining.
* string/strstr.c (strstr2): Add new function.
(strstr3): Likewise.
(STRSTR): Completely rewrite strstr to improve performance.
Benchmark needles which exhibit worst-case performance. This shows that
basic_strstr is quadratic and thus unsuitable for large needles.
On the other hand the Two-way and new strstr implementations are linear with
increasing needle sizes. The slowest cases of the two implementations are
within a factor of 2 on several different microarchitectures. Two-way is
slowest on inputs which cause a branch mispredict on almost every character.
The new strstr is slowest on inputs which almost match and result in many
calls to memcmp. Thanks to Szabolcs for providing various hard needles.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
* benchtests/bench-strstr.c (test_hard_needle): New function.
GCC mainline has recently added warn_unused_result attributes to some
malloc-like built-in functions, where glibc previously had them in its
headers only for __USE_FORTIFY_LEVEL > 0. This results in those
attributes being newly in effect for building the glibc testsuite, so
resulting in new warnings that break the build where tests
deliberately call such functions and ignore the result. Thus patch
duly adds calls to DIAG_* macros around those calls to disable the
warning.
Tested with build-many-glibcs.py for aarch64-linux-gnu.
* malloc/tst-calloc.c: Include <libc-diag.h>.
(null_test): Ignore -Wunused-result around calls to calloc.
* malloc/tst-mallocfork.c: Include <libc-diag.h>.
(do_test): Ignore -Wunused-result around call to malloc.
No 32-bit system call wrapper is added because the interface
is problematic because it cannot deal with 64-bit inode numbers
and 64-bit directory hashes.
A future commit will deprecate the undocumented getdirentries
and getdirentries64 functions.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Add support to use 'mffsl' instruction if compiled for POWER9 (or later).
Also, mask the result to avoid bleeding unrelated bits into the result of
_FPU_GET_RC().
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
fegetexcept() included code which exactly duplicates the code in
fenv_reg_to_exceptions(). Replace with a call to that function.
2019-06-05 Paul A. Clarke <pc@us.ibm.com>
* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Replace code
with call to equivalent function.
Linux only supports the required ISA sysctls on StrongARM devices,
which are armv4 and no longer tested during glibc development
and probably bit-rotted by this point. (No reported test results,
and the last discussion of armv4 support was in the glibc 2.19
release notes.)
<asm/unistd.h> on arm defines the following macros:
#define __ARM_NR_breakpoint (__ARM_NR_BASE+1)
#define __ARM_NR_cacheflush (__ARM_NR_BASE+2)
#define __ARM_NR_usr26 (__ARM_NR_BASE+3)
#define __ARM_NR_usr32 (__ARM_NR_BASE+4)
#define __ARM_NR_set_tls (__ARM_NR_BASE+5)
#define __ARM_NR_get_tls (__ARM_NR_BASE+6)
These do not follow the regular __NR_* naming convention and
have so far been ignored by the syscall-names.list consistency
checks. This commit adds these names to the file, preparing
for the availability of these names in the regular __NR_*
namespace.
Since GCC commit 271500 (svn), also known as the following commit on the
git mirror:
commit 61edec870f9fdfb5df3fa4e40f28cbaede28a5b1
Author: amodra <amodra@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Wed May 22 04:34:26 2019 +0000
[RS6000] Don't pass -many to the assembler
glibc builds are failing when an assembly implementation does not
declare the correct '.machine' directive, or when no such directive is
declared at all. For example, when a POWER6 instruction is used, but
'.machine power6' is not declared, the assembler will fail with an error
similar to the following:
../sysdeps/powerpc/powerpc64/power8/strcmp.S: Assembler messages:
24 ../sysdeps/powerpc/powerpc64/power8/strcmp.S:55: Error: unrecognized opcode: `cmpb'
This patch adds '.machine powerN' directives where none existed, as well
as it updates '.machine power7' directives on POWER8 files, because the
minimum binutils version required to build glibc (binutils 2.25) now
provides this machine version. It also adds '-many' to the assembler
command used to build tst-set_ppr.c.
Tested for powerpc, powerpc64, and powerpc64le, as well as with
build-many-glibcs.py for powerpc targets.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
The patch 6e8ba7fd57 meant to remove the all get_clockfreq.c. This
patch removes the missing files for sparcv9 and x86_64.
Checked against a build to x86_64-linux-gnu and sparcv9-linux-gnu.
* sysdeps/unix/sysv/linux/sparc/sparc32/sparcv9/get_clockfreq.c:
Remove file.
* sysdeps/unix/sysv/linux/x86_64/get_clockfreq.c: Likewise.
This commit fixes some errors and converts all month names to lowercase.
The content is synchronized with CLDR-35.1 now but trailing dots are
removed from abmon values in order to maintain consistency with the
previous values and with many other locales which do the same.
[BZ #24369]
* localedata/locales/tt_RU (mon): Update from CLDR-35.1, fix errors.
(abmon): Likewise, but remove the trailing dots.
This patch adds the new F_SEAL_FUTURE_WRITE constant from Linux 5.1 to
bits/fcntl-linux.h.
Tested for x86_64.
* sysdeps/unix/sysv/linux/bits/fcntl-linux.h [__USE_GNU]
(F_SEAL_FUTURE_WRITE): New macro.
This test corrupts /var/cache/ldconfig/aux-cache and executes ldconfig
to check it will not segfault using the corrupted aux_cache. The test
uses the test-in-container framework. Verified no regressions on
x86_64.
Improve string benchtest timing. Many tests run for 0.01s which is way too
short to give accurate results. Other tests take over 40 seconds which is
way too long. Significantly increase the iterations of the short running
tests. Reduce number of alignment variations in the long running memcpy walk
tests so they take less than 5 seconds.
As a result most tests take at least 0.1s and all finish within 5 seconds.
* benchtests/bench-memcpy-random.c (do_one_test): Use medium iterations.
* benchtests/bench-memcpy-walk.c (test_main): Reduce alignment tests.
* benchtests/bench-memmem.c (do_one_test): Use small iterations.
* benchtests/bench-memmove-walk.c (test_main): Reduce alignment tests.
* benchtests/bench-memset-walk.c (test_main): Reduce alignment tests.
* benchtests/bench-strcasestr.c (do_one_test): Use small iterations.
* benchtests/bench-string.h (INNER_LOOP_ITERS): Increase iterations.
(INNER_LOOP_ITERS_MEDIUM): New define.
(INNER_LOOP_ITERS_SMALL): New define.
* benchtests/bench-strpbrk.c (do_one_test): Use medium iterations.
* benchtests/bench-strsep.c (do_one_test): Use small iterations.
* benchtests/bench-strspn.c (do_one_test): Use medium iterations.
* benchtests/bench-strstr.c (do_one_test): Use small iterations.
* benchtests/bench-strtok.c (do_one_test): Use small iterations.
This patch add the missing SEMTIMEDOP_IPC_ARGS definions on powerpc
and sparc ipc_priv.h.
Checked on powerpc64le-linux-gnu and with a build for sparc64-linux-gnu.
* sysdeps/unix/sysv/linux/powerpc/ipc_priv.h (SEMTIMEDOP_IPC_ARGS):
New define.
* sysdeps/unix/sysv/linux/sparc/sparc64/ipc_priv.h
(SEMTIMEDOP_IPC_ARGS): Likewise.
struct gconv_fcts for the C locale is statically allocated,
and __gconv_close_transform deallocates the steps object.
Therefore this commit introduces __wcsmbs_close_conv to avoid
freeing the statically allocated steps objects.
The codecvt vtable is not a real vtable because it also contains the
conversion state data. Furthermore, wide stream support was added to
GCC 3.0, after a C++ ABI bump, so there is no compatibility
requirement with libstdc++.
This change removes several unmangled function pointers which could
be used with a corrupted FILE object to redirect execution. (libio
vtable verification did not cover the codecvt vtable.)
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The original implementations of test_timespec_before_impl and
test_timespec_equal_or_after in 5198399651
were missing the backslash required for a newline.
Checked on x86_64-linux-gnu.
* support/timespec.c: Add backslash to correct newline in failure
message.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch consolidates the s390-32 semtimedop implementation by defining
a arch-specific SEMTIMEDOP_IPC_ARGS to rearrange the arguments expected
by s390 Linux kABI. The idea is to avoid have multiples semtimedop
implementation changes for Linux v5.1 change to enable wire-up sysvipc
support.
Checked with a s390-linux-gnu and s390x-linux-gnu and checking that
resulting semtimedop objects did not change.
* sysdeps/unix/sysv/linux/ipc_priv.h (SEMTIMEDOP_IPC_ARGS): New
define.
* sysdpes/unix/sysv/linux/s390/ipc_priv.h: New file.
* sysdeps/unix/sysv/linux/s390/semtimedop.c: Remove file.
* sysdeps/unix/sysv/linux/semtimedop.c (semtimedop): Use
SEMTIMEDOP_IPC_ARGS for calls with __NR_ipc.
The __IPC64 flags is meant to be used to enable the new sysv struct
format when the architectures supports it (ARCH_WANT_IPC_PARSE_VERSION
config flag on Linux kernel).
This currently issue only affects alpha.
[BZ #24570]
* sysdeps/unix/sysv/linux/msgctl.c (__old_msgctl): Remove __IPC_64
usage.
This patch adds the new NT_ARM_PACA_KEYS and NT_ARM_PACG_KEYS from
Linux 5.1 to glibc's elf.h.
Tested for x86_64.
* elf/elf.h (NT_ARM_PACA_KEYS): New macro.
(NT_ARM_PACG_KEYS): Likewise.
Change the tcache->counts[] entries to uint16_t - this removes
the limit set by char and allows a larger tcache. Remove a few
redundant asserts.
bench-malloc-thread with 4 threads is ~15% faster on Cortex-A72.
Reviewed-by: DJ Delorie <dj@redhat.com>
* malloc/malloc.c (MAX_TCACHE_COUNT): Increase to UINT16_MAX.
(tcache_put): Remove redundant assert.
(tcache_get): Remove redundant asserts.
(__libc_malloc): Check tcache count is not zero.
* manual/tunables.texi (glibc.malloc.tcache_count): Update maximum.
Linux 5.1 adds missing syscalls to the syscall table for many Linux
kernel architectures. This patch updates the kernel-features.h
headers accordingly. __ASSUME_DIRECT_SYSVIPC_SYSCALLS is not updated
because of the differences between new and old syscalls described in
<https://sourceware.org/ml/libc-alpha/2019-05/msg00235.html>. The
statfs64 structure used by alpha matches what the new kernel syscalls
use.
Tested with build-many-glibcs.py.
* sysdeps/unix/sysv/linux/alpha/kernel-features.h
(__ASSUME_STATFS64): Only undefine if [__LINUX_KERNEL_VERSION <
0x050100].
* sysdeps/unix/sysv/linux/ia64/kernel-features.h (__ASSUME_STATX):
Likewise.
* sysdeps/unix/sysv/linux/sh/kernel-features.h
(__ASSUME_STATX): Likewise.
Provide an explicit diagnostic if the length is positive, and
do not just crash with a null pointer dereference. Null pointers
are only valid if the length is zero, so this can only happen with
a faulty test.
dlerror.c (__dlerror_main_freeres) will try to free resources which only
have been initialized when init () has been called. That function is
called when resources are needed using __libc_once (once, init) where
once is a __libc_once_define (static, once) in the dlerror.c file.
Trying to free those resources if init () hasn't been called will
produce errors under valgrind memcheck. So guard the freeing of those
resources using __libc_once_get (once) and make sure we have a valid
key. Also add a similar guard to __dlerror ().
* dlfcn/dlerror.c (__dlerror_main_freeres): Guard using
__libc_once_get (once) and static_bug == NULL.
(__dlerror): Check we have a valid key, set result to static_buf
otherwise.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
When computing the length of the converted part of the stdio buffer, use
the number of consumed wide characters, not the (negative) distance to the
end of the wide buffer.
The function uses the internal service_user type, so it is not
really usable from the outside of glibc. Rename the function
to __nss_database_lookup2 for internal use, and change
__nss_database_lookup to always indicate failure to the caller.
__nss_next already was a compatibility symbol. The new
implementation always fails and no longer calls __nss_next2.
unscd, the alternative nscd implementation, does not use
__nss_database_lookup, so it is not affected by this change.
Commit ba7b4d294b ("Complete the
removal of __gconv_translit_find") added a declaration of the
GLIBC_PRIVATE function, __gconv_transliterate, to the installed
header <gconv.h>. It should have been added to the internal
<gconv_int.h> header.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The tgkill function is sometimes used in crash handlers.
<bits/signal_ext.h> follows the same approach as <bits/unistd_ext.h>
(which was added for the gettid system call wrapper).
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Use a new libsupport support_bindir_prefix instead of a hardcoded
/usr/bin to create the pldd path on container directory.
Checked on x86_64-linux-gnu with default and non-default --prefix and
--bindir paths, as well with --enable-hardcoded-path-in-tests.
[BZ #24544]
* elf/tst-pldd.c (do_test): Use support_bindir_prefix instead of
pre-defined value.
Reviewed-by: DJ Delorie <dj@redhat.com>
This allows sets a path using --bindir. Checked on x86_64-linux-gnu
with a non-default --bindir and checked resulting installed binaries
(pldd for instance).
* config.make.in (bindir): New variable.
Reviewed-by: DJ Delorie <dj@redhat.com>
This patch removes the arch-specific x86 assembly implementation for
low level locking and consolidate both 64 bits and 32 bits in a
single implementation.
Different than other architectures, x86 lll_trylock, lll_lock, and
lll_unlock implements a single-thread optimization to avoid atomic
operation, using cmpxchgl instead. This patch implements by using
the new single-thread.h definitions in a generic way, although using
the previous semantic.
The lll_cond_trylock, lll_cond_lock, and lll_timedlock just use
atomic operations plus calls to lll_lock_wait*.
For __lll_lock_wait_private and __lll_lock_wait the generic implemtation
there is no indication that assembly implementation is required
performance-wise.
Checked on x86_64-linux-gnu and i686-linux-gnu.
* sysdeps/nptl/lowlevellock.h (__lll_trylock): New macro.
(lll_trylock): Call __lll_trylock.
* sysdeps/unix/sysv/linux/i386/libc-lowlevellock.S: Remove file.
* sysdeps/unix/sysv/linux/i386/lll_timedlock_wait.c: Likewise.
* sysdeps/unix/sysv/linux/i386/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/i386/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/libc-lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/lll_timedlock_wait.c: Likewise.
* sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/lowlevellock.h: Likewise.
* sysdeps/unix/sysv/linux/x86/lowlevellock.h: New file.
* sysdeps/unix/sysv/linux/x86_64/cancellation.S: Include
lowlevellock-futex.h.
Since hppa is not an outlier anymore regarding LLL_LOCK_INITIALIZER value,
we can now assume it 0 for all architectures.
Checked on a build for all major ABIs.
* nptl/nptl-init.c (__pthread_initialize_minimal_internal): Remove
initialization for LLL_LOCK_INITIALIZER different than 0.
* nptl/old_pthread_cond_broadcast.c (__pthread_cond_broadcast_2_0):
Assume LLL_LOCK_INITIALIZER being 0.
* nptl/old_pthread_cond_signal.c (__pthread_cond_signal_2_0): Likewise.
* nptl/old_pthread_cond_timedwait.c (__pthread_cond_timedwait_2_0):
Likewise.
* nptl/old_pthread_cond_wait.c (__pthread_cond_wait_2_0): Likewise.
* sysdeps/nptl/libc-lockP.h (__libc_lock_define_initialized): Likewise.
This patch move the single-thread syscall optimization defintions from
syscall-cancel.h to new header file single-thread.h and also move the
cancellation definitions from pthreadP.h to syscall-cancel.h.
The idea is just simplify the inclusion of both syscall-cancel.h and
single-thread.h (without the requirement of including all pthreadP.h
defintions).
No semantic changes expected, checked on a build for all major ABIs.
* nptl/pthreadP.h (CANCEL_ASYNC, CANCEL_RESET, LIBC_CANCEL_ASYNC,
LIBC_CANCEL_RESET, __libc_enable_asynccancel,
__libc_disable_asynccancel, __librt_enable_asynccancel,
__libc_disable_asynccancel, __librt_enable_asynccancel,
__librt_disable_asynccancel): Move to ...
* sysdeps/unix/sysv/linux/sysdep-cancel.h: ... here.
(SINGLE_THREAD_P, RTLD_SINGLE_THREAD_P): Move to ...
* sysdeps/unix/sysv/linux/single-thread.h: ... here.
* sysdeps/generic/single-thread.h: New file.
* sysdeps/unix/sysdep.h: Include single-thread.h.
* sysdeps/unix/sysv/linux/futex-internal.h: Include sysdep-cancel.h.
* sysdeps/unix/sysv/linux/lowlevellock-futex.h: Likewise.
Unicode 12.1.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 12.1.0, using
the generator scripts contributed by Mike FABIAN (Red Hat).
Some info about the number of characters added or changed:
Total added characters in newly generated CHARMAP: 1
added: <U32FF> /xe3/x8b/xbf SQUARE ERA NAME REIWA
Total added characters in newly generated WIDTH: 1
added: <U32FF> 2 : eaw=W category=So bidi=L name=SQUARE ERA NAME REIWA
graph: Added 1 characters in new ctype which were not in old ctype
graph: Added: ㋿ U+32FF SQUARE ERA NAME REIWA
print: Added 1 characters in new ctype which were not in old ctype
print: Added: ㋿ U+32FF SQUARE ERA NAME REIWA
punct: Added 1 characters in new ctype which were not in old ctype
punct: Added: ㋿ U+32FF SQUARE ERA NAME REIWA
The tcache counts[] array is a char, which has a very small range and thus
may overflow. When setting tcache_count tunable, there is no overflow check.
However the tunable must not be larger than the maximum value of the tcache
counts[] array, otherwise it can overflow when filling the tcache.
[BZ #24531]
* malloc/malloc.c (MAX_TCACHE_COUNT): New define.
(do_set_tcache_count): Only update if count is small enough.
* manual/tunables.texi (glibc.malloc.tcache_count): Document max value.
The patch print timespec members as intmax_t instead of long int.
It avoid the -Werror=format= build issue on x32:
timespec.c: In function 'test_timespec_before_impl':
timespec.c:32:23: error: format '%ld' expects argument of type 'long int',
but argument 4 has type '__time_t' {aka 'const long long int'} [-Werror=format=]
Checked on x86_64-linux-gnu-x32, x86_64-linux-gnu, and i686-linux-gnu.
* support/timespec.c (test_timespec_before_impl,
test_timespec_equal_or_after_impl): print timespec member as intmax_t
insted of long int.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Checked on x86_64-linux-gnu and i686-linux-gnu.
* nptl/tst-rwlock6.c: Use libsupport. This also happens to fix a
small bug where only tv.tv_usec was checked which could cause an
erroneous pass if pthread_rwlock_timedrdlock incorrectly took more
than a second.
* nptl/tst-rwlock7.c, nptl/tst-rwlock9.c, nptl/tst-rwlock14.c: Use
libsupport.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Checked on x86_64-linux-gnu and i686-linux-gnu.
* nptl/tst-sem5.c(do_test): Use xclock_gettime, timespec_add and
TEST_TIMESPEC_NOW_OR_AFTER from libsupport.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
It adds useful functions for tests that use struct timespec.
Checked on x86_64-linux-gnu and i686-linux-gnu.
* support/timespec.h: New file. Provide timespec helper functions
along with macros in the style of those in check.h.
* support/timespec.c: New file. Implement check functions declared
in support/timespec.h.
* support/timespec-add.c: New file from gnulib containing
timespec_add implementation that handles overflow.
* support/timespec-sub.c: New file from gnulib containing
timespec_sub implementation that handles overflow.
* support/README: Mention timespec.h.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Don't run nptl/tst-eintr1 by normal make check because it can spuriously
break testing on various linux kernels. (Currently this affects the
aarch64 glibc buildbot machine which regularly fails and loses test
results.)
[BZ #24537]
* nptl/Makefile: Move tst-eintr1 to xtests.
This patches consolidates all the powerpc trunc{f} implementations on
the generic sysdeps/powerpc/fpu/s_trunc{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/trunc_to_integer.h (set_fenv_mode): Add
TRUNC handling.
(round_mode): Add definition for TRUNC.
* sysdeps/powerpc/fpu/s_trunc.c: New file.
* sysdeps/powerpc/fpu/s_truncf.c: New file.
* sysdeps/powerpc/powerpc32/fpu/s_trunc.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-power5+.c: New
file.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_trunc-ppc32.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-power5+.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_truncf-ppc32.c:
Likewise.
* sysdep/powerpc/powerpc32/power5+/fpu/s_trunc.S: Remove file.
* sysdep/powerpc/powerpc32/power5+/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_trunc-power5+, s_trunc-ppc64,
s_truncf-power5+, and s_truncf-ppc64.
(CFLAGS-s_trunc-power5+.c, CFLAGS-s_truncf-power5+.c): New rule.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_trunc-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_trunc.c: ... here.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_truncf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_truncf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_trunc-power5+, s_trunc-ppc64,
s_truncf-power5+, and s_truncf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-power5+.S: Remove
file.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_trunc-ppc64.S: Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-power5+.S:
Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_truncf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_trunc.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_truncf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
This patches consolidates all the powerpc round{f} implementations on
the generic sysdeps/powerpc/fpu/s_round{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode): Add
ROUND handling.
(round_mode): Add definition for ROUND.
(round_to_integer_float): Likewise.
* sysdeps/powerpc/fpu/s_round.c: New file.
* sysdeps/powerpc/fpu/s_roundf.c: New file.
* sysdeps/powerpc/powerpc32/fpu/s_round.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.S:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-power5+.c: New
file.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_round-ppc32.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-power5+.c:
Likewise.
* sysdep/powerpc/powepc32/power4/fpu/multiarch/s_roundf-ppc32.c:
Likewise.
* sysdep/powerpc/powerpc32/power5+/fpu/s_round.S: Remove file.
* sysdep/powerpc/powerpc32/power5+/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_round-power5+, s_round-ppc64,
s_roundf-power5+, and s_roundf-ppc64.
(CFLAGS-s_round-power5+.c, CFLAGS-s_roundf-power5+.c): New rule.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_round-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_round.c: ... here.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-power5+.c: New
file.
* sysdep/powerpc/powercp64/be/fpu/multiarch/s_roundf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_roundf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_round-power5+, s_round-ppc64,
s_roundf-power5+, and s_roundf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_round-power5+.S: Remove
file.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_round-ppc64.S: Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-power5+.S:
Likewise.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_roundf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_round.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_round.S: Likewise.
* sysdep/powerpc/powerpc64/power5+/fpu/s_roundf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
This patches consolidates all the powerpc floor{f} implementations on
the generic sysdeps/powerpc/fpu/s_floor{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the
frim instruction) or a generic implementation which uses FP only
operations.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/round_to_integer.h (set_fenv_mode):
Add FLOOR option.
(round_mode): Add definition for FLOOR.
* sysdeps/powerpc/fpu/s_floor.c: New file.
* sysdeps/powerpc/fpu/s_floorf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_floor.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_floorf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.S:
Likewise
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-power5+.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floor.S: Remove file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_floorf.S: Remove file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile
(libm-sysdep_routines): Add s_floor-power5+, s_floor-ppc64,
s_floorf-power5+, and s_floorf-ppc64.
(CFLAGS-s_floor-power5+.c, CFLAGS-s_floorf-power5+.c): New rule.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-power5+.c: New
file.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floor-ppc64.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floor.c: ... here.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-power5+.c: New
file.
* sysdep/powerpc/powerpc64/be/fpu/multiarch/s_floorf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_floorf.c: ... here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_floor-power5+, s_floor-ppc64,
s_floorf-power5+, and s_floorf-ppc64.
* sysdep/powerpc/powerpc64/fpu/multiarch/s_floor-power5+.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor-ppc64.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf-ppc64.S:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floor.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floor.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_floorf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
* support/xclock_gettime.c (xclock_gettime): New file. Provide
clock_gettime wrapper for use in tests that fails the test rather
than returning failure.
* support/xtime.h: New file to declare xclock_gettime.
* support/Makefile: Add xclock_gettime.c.
* support/README: Mention xtime.h.
This synchronization method has a lower overhead and makes
it more likely that the signal arrives during one of the critical
functions.
Also test for fork deadlocks explicitly.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
This patch updates syscall-names.list for Linux 5.1 (which has many
new syscalls, mainly but not entirely ones for 64-bit time).
Tested with build-many-glibcs.py (before the revert of the move to
Linux 5.1 there; verified there were no tst-syscall-list failures).
* sysdeps/unix/sysv/linux/syscall-names.list: Update kernel
version to 5.1.
(clock_adjtime64) New syscall.
(clock_getres_time64) Likewise.
(clock_gettime64) Likewise.
(clock_nanosleep_time64) Likewise.
(clock_settime64) Likewise.
(futex_time64) Likewise.
(io_pgetevents_time64) Likewise.
(io_uring_enter) Likewise.
(io_uring_register) Likewise.
(io_uring_setup) Likewise.
(mq_timedreceive_time64) Likewise.
(mq_timedsend_time64) Likewise.
(pidfd_send_signal) Likewise.
(ppoll_time64) Likewise.
(pselect6_time64) Likewise.
(recvmmsg_time64) Likewise.
(rt_sigtimedwait_time64) Likewise.
(sched_rr_get_interval_time64) Likewise.
(semtimedop_time64) Likewise.
(timer_gettime64) Likewise.
(timer_settime64) Likewise.
(timerfd_gettime64) Likewise.
(timerfd_settime64) Likewise.
(utimensat_time64) Likewise.
The performance improvement is about 20%-30% for
larger cases and about 1%-5% for smaller cases.
Used SIMD load/store instead of GPR for large
overlapping forward moves.
Reused existing memcpy implementation for smaller
or overlapping backward moves.
Fixed the existing memcpy implementation to allow it
to deal with the overlapping case.
Simplified loop tails in the memcpy implementation -
use branchless overlapping sequence of fixed length
load/stores instead of branching depending on the
size.
A cleanup/optimization converting str's to stp's.
Added __memmove_thunderx2 to the list of the
available implementations.
The elf/tst-pldd (added by 1a4c27355e to fix BZ#18035) test does
not expect the hardcoded paths that are output by pldd when the test
is built with --enable-hardcoded-path-in-tests. Instead of showing
the ABI installed library names for loader and libc (such as
ld-linux-x86-64.so.2 and libc.so.6 for x86_64), pldd shows the default
built ld.so and libc.so.
It makes the tests fail with an invalid expected loader/libc name.
This patch fixes the elf-pldd test by adding the canonical ld.so and
libc.so names in the expected list of possible outputs when parsing
the result output from pldd. The test now handles both default
build and --enable-hardcoded-path-in-tests option.
Checked on x86_64-linux-gnu (built with and without
--enable-hardcoded-path-in-tests) and i686-linux-gnu.
* elf/tst-pldd.c (in_str_list): New function.
(do_test): Add default names for ld and libc as one option.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The twalk function is very difficult to use in a multi-threaded
program because there is no way to pass external state to the
iterator function.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Keep these functions compatible with Gnulib while adding
__time64_t support. The basic idea is to move private API
declarations from include/time.h to time/mktime-internal.h, since
the former file cannot easily be shared with Gnulib whereas the
latter can.
Also, do some other minor cleanup while in the neighborhood.
* include/time.h: Include stdbool.h, time/mktime-internal.h.
(__mktime_internal): Move this prototype to time/mktime-internal.h,
since Gnulib needs it.
(__localtime64_r, __gmtime64_r) [__TIMESIZE == 64]:
Move these macros to time/mktime-internal.h, since Gnulib needs them.
(__mktime64, __timegm64) [__TIMESIZE != 64]: New prototypes.
(in_time_t_range): New static function.
* posix/bits/types.h (__time64_t) [__TIMESIZE == 64 && !defined __LIBC]:
Do not define as a macro in this case, so that portable code is
less tempted to use __time64_t.
* time/mktime-internal.h: Rewrite so that it does both glibc
and Gnulib work. Include time.h if not _LIBC.
(mktime_offset_t) [!_LIBC]: Define for gnulib.
(__time64_t, __gmtime64_r, __localtime64_r, __mktime64, __timegm64)
[!_LIBC || __TIMESIZE == 64]: New macros, mostly moved here
from include/time.h.
(__gmtime_r, __localtime_r, __mktime_internal) [!_LIBC]:
New macros, taken from GNulib.
(__mktime_internal): New prototype, moved here from include/time.h.
* time/mktime.c (mktime_min, mktime_max, convert_time)
(ranged_convert, __mktime_internal, __mktime64):
* time/timegm.c (__timegm64):
Use __time64_t, not time_t.
* time/mktime.c: Stop worrying about whether time_t is floating-point.
(__mktime64) [! (_LIBC && __TIMESIZE != 64)]:
Rename from mktime.
(mktime) [_LIBC && __TIMESIZE != 64]: New function.
* time/timegm.c [!_LIBC]: Include libc-config.h, not config.h,
for libc_hidden_def.
Include errno.h.
(__timegm64) [! (_LIBC && __TIMESIZE != 64)]:
Rename from timegm.
(timegm) [_LIBC && __TIMESIZE != 64]: New function.
First cut at publicizing __time64_t
Complementing commit 4a06ceea33 ("sysdeps/ieee754/soft-fp: ignore
maybe-uninitialized with -O [BZ #19444]") and commit 27c5e756a2
("sysdeps/ieee754: prevent maybe-uninitialized errors with -O [BZ
#19444]") also fix compilation errors observed at -O1 in `__ddivl' and
`__fdivl' with GCC 9 and RISC-V targets:
In file included from ../soft-fp/soft-fp.h:318,
from ../sysdeps/ieee754/soft-fp/s_fdivl.c:27:
../sysdeps/ieee754/soft-fp/s_fdivl.c: In function '__fdivl':
../soft-fp/op-2.h:108:9: error: 'R_f1' may be used uninitialized in this function [-Werror=maybe-uninitialized]
108 | : (X##_f1 << (2*_FP_W_TYPE_SIZE - (N)))) \
| ^
../sysdeps/ieee754/soft-fp/s_fdivl.c:37:14: note: 'R_f1' was declared here
37 | FP_DECL_Q (R);
| ^
../soft-fp/op-common.h:39:3: note: in expansion of macro '_FP_FRAC_DECL_2'
39 | _FP_FRAC_DECL_##wc (X)
| ^~~~~~~~~~~~~~
../soft-fp/quad.h:226:24: note: in expansion of macro '_FP_DECL'
226 | # define FP_DECL_Q(X) _FP_DECL (2, X)
| ^~~~~~~~
../sysdeps/ieee754/soft-fp/s_fdivl.c:37:3: note: in expansion of macro 'FP_DECL_Q'
37 | FP_DECL_Q (R);
| ^~~~~~~~~
../soft-fp/op-2.h:109:8: error: 'R_f0' may be used uninitialized in this function [-Werror=maybe-uninitialized]
109 | | X##_f0) != 0)); \
| ^
../sysdeps/ieee754/soft-fp/s_fdivl.c:37:14: note: 'R_f0' was declared here
37 | FP_DECL_Q (R);
| ^
../soft-fp/op-common.h:39:3: note: in expansion of macro '_FP_FRAC_DECL_2'
39 | _FP_FRAC_DECL_##wc (X)
| ^~~~~~~~~~~~~~
../soft-fp/quad.h:226:24: note: in expansion of macro '_FP_DECL'
226 | # define FP_DECL_Q(X) _FP_DECL (2, X)
| ^~~~~~~~
../sysdeps/ieee754/soft-fp/s_fdivl.c:37:3: note: in expansion of macro 'FP_DECL_Q'
37 | FP_DECL_Q (R);
| ^~~~~~~~~
In file included from ../soft-fp/soft-fp.h:318,
from ../sysdeps/ieee754/soft-fp/s_ddivl.c:31:
../sysdeps/ieee754/soft-fp/s_ddivl.c: In function '__ddivl':
../soft-fp/op-2.h:98:25: error: 'R_f1' may be used uninitialized in this function [-Werror=maybe-uninitialized]
98 | X##_f0 = (X##_f1 << (_FP_W_TYPE_SIZE - (N)) | X##_f0 >> (N) \
| ^~
../sysdeps/ieee754/soft-fp/s_ddivl.c:41:14: note: 'R_f1' was declared here
41 | FP_DECL_Q (R);
| ^
../soft-fp/op-2.h:37:36: note: in definition of macro '_FP_FRAC_DECL_2'
37 | _FP_W_TYPE X##_f0 _FP_ZERO_INIT, X##_f1 _FP_ZERO_INIT
| ^
../soft-fp/quad.h:226:24: note: in expansion of macro '_FP_DECL'
226 | # define FP_DECL_Q(X) _FP_DECL (2, X)
| ^~~~~~~~
../sysdeps/ieee754/soft-fp/s_ddivl.c:41:3: note: in expansion of macro 'FP_DECL_Q'
41 | FP_DECL_Q (R);
| ^~~~~~~~~
../soft-fp/op-2.h:101:17: error: 'R_f0' may be used uninitialized in this function [-Werror=maybe-uninitialized]
101 | : (X##_f0 << (_FP_W_TYPE_SIZE - (N))) != 0)); \
| ^~
../sysdeps/ieee754/soft-fp/s_ddivl.c:41:14: note: 'R_f0' was declared here
41 | FP_DECL_Q (R);
| ^
../soft-fp/op-2.h:37:14: note: in definition of macro '_FP_FRAC_DECL_2'
37 | _FP_W_TYPE X##_f0 _FP_ZERO_INIT, X##_f1 _FP_ZERO_INIT
| ^
../soft-fp/quad.h:226:24: note: in expansion of macro '_FP_DECL'
226 | # define FP_DECL_Q(X) _FP_DECL (2, X)
| ^~~~~~~~
../sysdeps/ieee754/soft-fp/s_ddivl.c:41:3: note: in expansion of macro 'FP_DECL_Q'
41 | FP_DECL_Q (R);
| ^~~~~~~~~
cc1: all warnings being treated as errors
make[2]: *** [.../sysd-rules:587: .../math/s_fdivl.o] Error 1
make[2]: *** Waiting for unfinished jobs....
cc1: all warnings being treated as errors
make[2]: *** [.../sysd-rules:587: .../math/s_ddivl.o] Error 1
This comes from cases in _FP_DIV that return a result described as
FP_CLS_ZERO or FP_CLS_INF and do not initialize the fractional part,
which is then operated on unconditionally in FP_TRUNC_COOKED before
being ignored by _FP_PACK_CANONICAL.
Clearly at this optimization level GCC cannot guarantee to be able to
determine that the fractional part is ultimately unused, so ignore the
error as with the earlier commits referred, letting compilation proceed.
[BZ #19444]
* sysdeps/ieee754/soft-fp/s_ddivl.c (__ddivl): Ignore errors
from `-Wmaybe-uninitialized'.
* sysdeps/ieee754/soft-fp/s_fdivl.c (__fdivl): Likewise.
This patches consolidates all the powerpc ceil{f} implementations on
the generic sysdeps/powerpc/fpu/s_ceil{f}. The generic implementation
uses either the compiler builts for ISA 2.03+ (which generates the frip
instruction) or a generic implementation which uses FP only operations.
It adds a generic implementation (round_to_integer.h) which is shared
with other rounding to integer routines. The resulting code should be
similar in term os performance to previous assembly one.
The IFUNC organization for powerpc64 is also change to be enabled only
for powerpc64 and not for powerpc64le (since minium ISA of 2.08 does not
require the fallback generic implementation).
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/fenv_libc.h (__fesetround_inline_nocheck): New
function.
* sysdeps/powerpc/fpu/round_to_integer.h: New file.
* sysdeps/powerpc/fpu/s_ceil.c: Likewise.
* sysdeps/powerpc/fpu/s_ceilf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_ceil.S: Remove file.
* sysdeps/powerpc/powerpc32/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(CFLAGS-s_ceil-power5+.c, CFLAGS-s_ceilf-power5+.c): New rule.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.S:
Remove file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.S:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-power5+.c:
New file.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf-ppc32.c:
Likewise.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceil.S: Remove file.
* sysdeps/powerpc/powerpc32/power5+/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/Makefile: New file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-power5+.c:
Likewise.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceil.c: ... here.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-power5+.c: New
file.
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf-ppc64.c:
Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Move to ...
* sysdeps/powerpc/powerpc64/be/fpu/multiarch/s_ceilf.c: ...
* here.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(libm-sysdep_routines): Remove s_ceil-power5+, s_ceil-ppc64,
s_ceilf-power5+, and s_ceilf-ppc64.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-power5+.S: Remove
file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-power5+.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceil.S: Likewise.
* sysdeps/powerpc/powerpc64/power5+/fpu/s_ceilf.S: Likewise.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
Except the following functions, NPTL implementation assume sem_t
argument (or other arguments) are not NULL, so they would benefit
from having the nonnull attribute.
- sem_close(): can cope with a NULL sem_t and return -1 with error EINVAL;
- sem_destroy(): does nothing at all
* sysdeps/pthread/semaphore.h (sem_init): Add __nonnull attribute.
(sem_destroy, sem_open, sem_close, sem_unlink): Likewise.
(sem_wait, sem_timedwait, sem_trywait, sem_post): Likewise.
(sem_getvalue): Likewise.
The audit module itself can be linked with BIND_NOW; it does not
affect its functionality.
This should complete the leftovers from commit
2d6ab5df3b ("Document and fix
--enable-bind-now [BZ #21015]").
Previously, the -Wl,-rpath-link options came after the libraries
injected using LDLIBS-* variables on the link editor command line for
main programs. As a result, it could happen that installed libraries
that reference glibc libraries used the installed glibc from the system
directories, instead of the glibc from the build tree. This can lead to
link failures if the wrong version of libpthread.so.0 is used, for
instance, due to differences in the internal GLIBC_PRIVATE interfaces,
as seen with memusagestat and -lgd after commit
f9b645b4b0 ("memusagestat: use local glibc
when linking [BZ #18465]").
The isolation is necessarily imperfect because these installed
libraries are linked against the installed glibc in the system
directories. However, in most cases, the built glibc will be newer
than the installed glibc, and this link is permitted because of the
ABI backwards compatibility glibc provides.
This change is needed to add linker flags which come very early in the
command linke (before LDFLAGS) and are not applied to test programs
(only to installed programs).
While working on enabling D front-end (GDC) in GCC we noticed that
druntime was segfaulting if it is linked dynamically. This was tracked
to DL_RO_DYN_SECTION.
DL_RO_DYN_SECTION lines seem to be copied from MIPS file (which is the
only user of it), but the comment doesn't apply to RISC-V. There is no
such requirement in RISC-V ABI.
[BZ#24484]
* sysdeps/riscv/ldsodefs.h: Remove DL_RO_DYN_SECTION as it is not
required by RISC-V ABI.
Benchmarks should reflect distribution build policies, so it makes
sense to honor the BIND_NOW configuration for them.
This commit keeps using $(+link-tests), so that the benchmarks are
linked according to the --enable-hardcoded-path-in-tests configure
option.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Commit 2d6ab5df3b ("Document and fix
--enable-bind-now [BZ #21015]") extended BIND_NOW to all installed
shared objects. This change also covers installed programs.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reduce the total time taken by benchtests. The malloc thread test takes 4
minutes to run which is significantly more than most other tests. Reduce
this to a more reasonable 40 seconds. The math tests take 10 seconds each,
eventhough all they do is loop on the same input. Anything more than 1
second runtime is way overkill, so set the limit to 1 second.
* benchtests/Makefile (BENCH_DURATION): Set to 1 second.
* benchtests/bench-malloc-thread.c (BENCH_DURATION): Set to 10 seconds.
The memusagestat is the only binary that has its own link line which
causes it to be linked against the existing installed C library. It
has been this way since it was originally committed in 1999, but I
don't see any reason as to why. Since we want all the programs we
build locally to be against the new copy of glibc, change the build
to be like all other programs.
Since 9182aa6799 (Fix vDSO l_name for GDB's, BZ#387) the initial link_map
for executable itself and loader will have both l_name and l_libname->name
holding the same value due:
elf/dl-object.c
95 new->l_name = *realname ? realname : (char *) newname->name + libname_len - 1;
Since newname->name points to new->l_libname->name.
This leads to pldd to an infinite call at:
elf/pldd-xx.c
203 again:
204 while (1)
205 {
206 ssize_t n = pread64 (memfd, tmpbuf.data, tmpbuf.length, name_offset);
228 /* Try the l_libname element. */
229 struct E(libname_list) ln;
230 if (pread64 (memfd, &ln, sizeof (ln), m.l_libname) == sizeof (ln))
231 {
232 name_offset = ln.name;
233 goto again;
234 }
Since the value at ln.name (l_libname->name) will be the same as previously
read. The straightforward fix is just avoid the check and read the new list
entry.
I checked also against binaries issues with old loaders with fix for BZ#387,
and pldd could dump the shared objects.
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.
[BZ #18035]
* elf/Makefile (tests-container): Add tst-pldd.
* elf/pldd-xx.c: Use _Static_assert in of pldd_assert.
(E(find_maps)): Avoid use alloca, use default read file operations
instead of explicit LFS names, and fix infinite loop.
* elf/pldd.c: Explicit set _FILE_OFFSET_BITS, cleanup headers.
(get_process_info): Use _Static_assert instead of assert, use default
directory operations instead of explicit LFS names, and free some
leadek pointers.
* elf/tst-pldd.c: New file.
Remove do_set_mallopt_check prototype since it is unused.
* malloc/arena.c (do_set_mallopt_check): Removed.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
As discussed previously on libc-alpha [1], this patch follows up the idea
and add both the __attribute_alloc_size__ on malloc functions (malloc,
calloc, realloc, reallocarray, valloc, pvalloc, and memalign) and limit
maximum requested allocation size to up PTRDIFF_MAX (taking into
consideration internal padding and alignment).
This aligns glibc with gcc expected size defined by default warning
-Walloc-size-larger-than value which warns for allocation larger than
PTRDIFF_MAX. It also aligns with gcc expectation regarding libc and
expected size, such as described in PR#67999 [2] and previously discussed
ISO C11 issues [3] on libc-alpha.
From the RFC thread [4] and previous discussion, it seems that consensus
is only to limit such requested size for malloc functions, not the system
allocation one (mmap, sbrk, etc.).
The implementation changes checked_request2size to check for both overflow
and maximum object size up to PTRDIFF_MAX. No additional checks are done
on sysmalloc, so it can still issue mmap with values larger than
PTRDIFF_T depending on the requested size.
The __attribute_alloc_size__ is for functions that return a pointer only,
which means it cannot be applied to posix_memalign (see remarks in GCC
PR#87683 [5]). The runtimes checks to limit maximum requested allocation
size does applies to posix_memalign.
Checked on x86_64-linux-gnu and i686-linux-gnu.
[1] https://sourceware.org/ml/libc-alpha/2018-11/msg00223.html
[2] https://gcc.gnu.org/bugzilla//show_bug.cgi?id=67999
[3] https://sourceware.org/ml/libc-alpha/2011-12/msg00066.html
[4] https://sourceware.org/ml/libc-alpha/2018-11/msg00224.html
[5] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87683
[BZ #23741]
* malloc/hooks.c (malloc_check, realloc_check): Use
__builtin_add_overflow on overflow check and adapt to
checked_request2size change.
* malloc/malloc.c (__libc_malloc, __libc_realloc, _mid_memalign,
__libc_pvalloc, __libc_calloc, _int_memalign): Limit maximum
allocation size to PTRDIFF_MAX.
(REQUEST_OUT_OF_RANGE): Remove macro.
(checked_request2size): Change to inline function and limit maximum
requested size to PTRDIFF_MAX.
(__libc_malloc, __libc_realloc, _int_malloc, _int_memalign): Limit
maximum allocation size to PTRDIFF_MAX.
(_mid_memalign): Use _int_memalign call for overflow check.
(__libc_pvalloc): Use __builtin_add_overflow on overflow check.
(__libc_calloc): Use __builtin_mul_overflow for overflow check and
limit maximum requested size to PTRDIFF_MAX.
* malloc/malloc.h (malloc, calloc, realloc, reallocarray, memalign,
valloc, pvalloc): Add __attribute_alloc_size__.
* stdlib/stdlib.h (malloc, realloc, reallocarray, valloc): Likewise.
* malloc/tst-malloc-too-large.c (do_test): Add check for allocation
larger than PTRDIFF_MAX.
* malloc/tst-memalign.c (do_test): Disable -Walloc-size-larger-than=
around tests of malloc with negative sizes.
* malloc/tst-posix_memalign.c (do_test): Likewise.
* malloc/tst-pvalloc.c (do_test): Likewise.
* malloc/tst-valloc.c (do_test): Likewise.
* malloc/tst-reallocarray.c (do_test): Replace call to reallocarray
with resulting size allocation larger than PTRDIFF_MAX with
reallocarray_nowarn.
(reallocarray_nowarn): New function.
* NEWS: Mention the malloc function semantic change.
This patch just refactor the assembly implementation to use compiler
builtins instead.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_fma.S: Remove file.
* sysdeps/powerpc/fpu/s_fmaf.S: Likewise.
* sysdeps/powerpc/fpu/s_fma.c: New file.
* sysdeps/powerpc/fpu/s_fmaf.c: Likewise.
Since be2e25bbd7 the generic ieee754 implementation uses
compiler builtin which generates fabs{f} for all supported targets.
Checked on powerpc-linux-gnu (built without --with-cpu, with
--with-cpu=power4 and with --with-cpu=power5+ and --disable-multi-arch),
powerpc64-linux-gnu (built without --with-cp and with --with-cpu=power5+
and --disable-multi-arch).
* sysdeps/powerpc/fpu/s_fabs.S: Remove file.
* sysdeps/powerpc/fpu/s_fabsf.S: Likewise.
Similar to powerpc, mips also issues rt_sigreturn for setcontext
case the v0 value saved is not the one set by setcontext or
makecontext. As for powerpc, it is intention is no really supported
since setcontext is not async-signal-safe.
Checked the context tests on mips64-linux-gnu and mips-linux-gnu.
* sysdeps/unix/sysv/linux/mips/getcontext.S (__getcontext): Remove
the magic flag store.
* sysdeps/unix/sysv/linux/mips/makecontext.S (__makecontext):
Likewise.
* sysdeps/unix/sysv/linux/mips/swapcontext.S (__swapcontext):
Likewise.
* sysdeps/unix/sysv/linux/mips/setcontext.S (__setcontext):
Remove rt_sigreturn call.
As described in a recent glibc thread [1], the rt_sigreturn syscall
on setcontext and swapcontext is not used on default use and its
intention is no really supported since neither setcontext nor
swapcontext are async-signal-safe.
Checked on powerpc64-linux-gnu and powerpc-linux-gnu.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/setcontext-common.S:
Remove rt_sigreturn call.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/swapcontext-common.S:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/setcontext.S: Likewie.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/swapcontext.S: Likewise.
[1] https://sourceware.org/ml/libc-alpha/2019-02/msg00367.html
Its API is similar to support_capture_subprocess, but rather creates a
new process based on the input path and arguments. Under the hoods it
uses posix_spawn to create the new process.
It also allows the use of other support_capture_* functions to check
for expected results and free the resources.
Checked on x86_64-linux-gnu.
* support/Makefile (libsupport-routines): Add support_subprocess,
xposix_spawn, xposix_spawn_file_actions_addclose, and
xposix_spawn_file_actions_adddup2.
(tst-support_capture_subprocess-ARGS): New rule.
* support/capture_subprocess.h (support_capture_subprogram): New
prototype.
* support/support_capture_subprocess.c (support_capture_subprocess):
Refactor to use support_subprocess and support_capture_poll.
(support_capture_subprogram): New function.
* support/tst-support_capture_subprocess.c (write_mode_to_str,
str_to_write_mode, test_common, parse_int, handle_restart,
do_subprocess, do_subprogram, do_multiple_tests): New functions.
(do_test): Add support_capture_subprogram tests.
* support/subprocess.h: New file.
* support/support_subprocess.c: Likewise.
* support/xposix_spawn.c: Likewise.
* support/xposix_spawn_file_actions_addclose.c: Likewise.
* support/xposix_spawn_file_actions_adddup2.c: Likewise.
* support/xspawn.h: Likewise.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This test would fail unnecessarily if the user running it had more than
64 groups since getgroups returns EINVAL if the size provided is less
than the number of supplementary group IDs. Instead dynamically
determine the number of supplementary groups the user has.
The purpose of the bp[0] == '.' check is unclear. Only the root domain
starts with '.'. The empty string is accepted as a domain name in many
places, denoting the root, but using it implicitly is confusing.
alloc_buffer_next is useful for peeking to the remaining part of the
buffer and update it, with subsequent allocation (once the length
is known) using alloc_buffer_alloc_bytes. This is not as robust
as the other interfaces, but it allows using alloc_buffer with
string-writing interfaces such as snprintf and ns_name_ntop.
If an error occurs during the tracing operation, particularly during a
call to lock_and_info() which calls _dl_addr, we may end up calling back
into the malloc-subsystem and relock the loader lock and deadlock. For
all intents and purposes the call to _dl_addr can call any of the malloc
family API functions and so we should disable all tracing before calling
such loader functions. This is similar to the strategy that the new
malloc tracer takes when calling the real malloc, namely that all
tracing ceases at the boundary to the real function and any faults at
that point are the purvue of the library (though the new tracer does
this on a per-thread basis in an MT-safe fashion). Since the new tracer
and the hook deprecation are not yet complete we must fix these issues
where we can.
Tested on x86_64 with no regressions.
Co-authored-by: Kwok Cheung Yeung <kcy@codesourcery.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
Replace slow byte-oriented tests in several string benchmarks with the
generic implementations from the string/ directory so the comparisons
are more realistic and useful.
* benchtests/bench-stpcpy.c (SIMPLE_STPCPY): Remove function.
(generic_stpcpy): New function.
* benchtests/bench-stpncpy.c (SIMPLE_STPNCPY): Remove function.
(generic_stpncpy): New function.
* benchtests/bench-strcat.c (SIMPLE_STRCAT): Remove function.
(generic_strcat): New function.
* benchtests/bench-strcpy.c (SIMPLE_STRCPY): Remove function.
(generic_strcpy): New function.
* benchtests/bench-strncat.c (SIMPLE_STRNCAT): Remove function.
(STUPID_STRNCAT): Remove function.
(generic_strncat): New function.
* benchtests/bench-strncpy.c (SIMPLE_STRNCPY): Remove function.
(STUPID_STRNCPY): Remove function.
(generic_strncpy): New function.
* benchtests/bench-strnlen.c (SIMPLE_STRNLEN): Remove function.
(generic_strnlen): New function.
(memchr_strnlen): New function.
* benchtests/bench-strlen.c (generic_strlen): Define for WIDE.
(memchr_strlen): Likewise.
Improve bench-strstr by using an extract from the manual as the input
to make the test more realistic. Use the same input for both found and
fail cases rather than using a memset of '0' for most of the string,
which measures performance of strchr rather than strstr. Add result
checking to catch potential errors. Remove the repeated tests at slightly
different alignments and add more large needle and haystack testcases.
Replace stupid_strstr with an efficient basic implementation. Add the
Two-way implementation to simplify comparisons with much faster generic
implementations.
* benchtests/bench-strstr.c (input): Add realistic input text.
(stupid_strstr): Remove function.
(basic_strstr): Add function.
(twoway_strstr): Add function.
(do_one_test): Add result checking.
(do_test): Use new input text. Remove accidental early matches.
(test_main): Improve range of tests, reduce unaligned cases.
Improve bench-memmem by replacing simple_memmem with a more efficient
implementation. Add the Two-way implementation to enable direct comparison
with the optimized memmem.
* benchtests/bench-memmem.c (simple_memmem): Remove function.
(basic_memmem): Add function.
(twoway_memmem): Add function.
This functionality was deprecated in glibc 2.25.
This commit only includes the core changes to remove the
functionality. It does not remove the RES_USE_INET6 handling in the
individual NSS service modules and the res_use_inet6 function.
These changes will happen in future commits.
Here is the updated patch for improving the long unaligned
code path (the one using "ext" instruction).
1. Always taken conditional branch at the beginning is
removed.
2. Epilogue code is placed after the end of the loop to
reduce the number of branches.
3. The redundant "mov" instructions inside the loop are
gone due to the changed order of the registers in the "ext"
instructions inside the loop, the prologue has additional
"ext" instruction.
4.Updating count in the prologue was hoisted out as
it is the same update for each prologue.
5. Invariant code of the loop epilogue was hoisted out.
6. As the current size of the ext chunk is exactly 16
instructions long "nop" was added at the beginning
of the code sequence so that the loop entry for all the
chunks be aligned.
* sysdeps/aarch64/multiarch/memcpy_thunderx2.S: Cleanup branching
and remove redundant code.
This allows an architecture to set explicit loop unrolling.
Checked on aarch64-linux-gnu.
* wcsmbs/wcsrchr.c (WCSRCHR): Use loop_unroll.h to parametrize
the loop unroll.
This allows an architecture to set explicit loop unrolling.
Checked on aarch64-linux-gnu.
* wcsmbs/wcschr.c (WCSCHR): Use loop_unroll.h to parametrize
the loop unroll.
This allows an architecture to use the old generic implementation
and also set explicit loop unrolling.
Checked on aarch64-linux-gnu.
* include/loop_unroll.h: New file.
* wcsmbs/wcscpy (__wcscpy): Add option to use loop unrolling
besides generic implementation.
snprintf will only truncate the output if the data its given
is corrupted, but a truncated buffer will not match the
"pristine" data's buffer, which is all we need. So just
disable the warning via the DIAG macros.