C23 adds various <math.h> function families originally defined in TS
18661-4. Add the log2p1 functions (log2(1+x): like log1p, but for
base-2 logarithms).
This illustrates the intended structure of implementations of all
these function families: define them initially with a type-generic
template implementation. If someone wishes to add type-specific
implementations, it is likely such implementations can be both faster
and more accurate than the type-generic one and can then override it
for types for which they are implemented (adding benchmarks would be
desirable in such cases to demonstrate that a new implementation is
indeed faster).
The test inputs are copied from those for log1p. Note that these
changes make gen-auto-libm-tests depend on MPFR 4.2 (or later).
The bulk of the changes are fairly generic for any such new function.
(sysdeps/powerpc/nofpu/Makefile only needs changing for those
type-generic templates that use fabs.)
Tested for x86_64 and x86, and with build-many-glibcs.py.
This patch ensures that $libc_cv_cc_submachine, which is set from
"--with-cpu", overrides $CFLAGS for configure time tests.
Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The e68b1151f7 commit changed the
__fesetround_inline_nocheck implementation to use mffscrni
(through __fe_mffscrn) instead of mtfsfi. For generic powerpc
ceil/floor/trunc, the function is supposed to disable the
floating-point inexact exception enable bit, however mffscrni
does not change any exception enable bits.
This patch fixes by reverting the optimization for the
__fesetround_inline_nocheck.
Checked on powerpc-linux-gnu.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
This patch is based on __strcmp_power10.
Improvements from __strncmp_power9:
1. Uses new POWER10 instructions
- This code uses lxvp to decrease contention on load
by loading 32 bytes per instruction.
2. Performance implication
- This version has around 38% better performance on average.
- Minor performance regression is seen for few small sizes
and specific combination of alignments.
Signed-off-by: Amrita H S <amritahs@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS. This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This seems to have stopped working with some GCC 14 versions,
which clobber r2. With other compilers, the kernel-provided
r2 value is still available at this point.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The ifunc variants now uses the powerpc implementation which in turn
uses the compiler builtin. Without the proper -mcpu switch the builtin
does not generate the expected optimization.
Checked on powerpc-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The following three changes have been added to provide initial Power11 support.
1. Add the directories to hold Power11 files.
2. Add support to select Power11 libraries based on AT_PLATFORM.
3. Let submachine=power11 be set automatically.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
This patch adds a new feature for powerpc. In order to get faster
access to the HWCAP3/HWCAP4 masks, similar to HWCAP/HWCAP2 (i.e. for
implementing __builtin_cpu_supports() in GCC) without the overhead of
reading them from the auxiliary vector, we now reserve space for them
in the TCB.
This is an ABI change for GLIBC 2.39.
Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
Similar to strstr (1e9a550ba4), power8 strcasestr does not show much
improvement compared to the generic implementation. The geomean
on bench-strcasestr shows:
__strcasestr_power8 __strcasestr_ppc
power10 1159 1120
power9 1640 1469
power8 1787 1904
The strcasestr uses the same 'trick' as power7 strstr to detect
potential quadradic behavior, which only adds overheads for input
that trigger quadradic behavior and it is really a hack.
Checked on powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The optimization is not faster than the generic algorithm,
using the bench-strstr the geometric mean running on a POWER10 machine
using gcc 13.1.1 is 482.47 while the default __strstr_ppc is 340.97
(which uses the generic implementation).
Also, there is no need to redirect the internal str*/mem* call
to optimized version, internal ifunc is supported and enabled
for internal calls (meaning that the generic implementation
will use any asm optimization if available).
Checked on powerpc64le-linux-gnu.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
Complete the internal renaming from "C2X" and related names in GCC by
renaming *-c2x and *-gnu2x tests to *-c23 and *-gnu23.
Tested for x86_64, and with build-many-glibcs.py for powerpc64le.
WG14 decided to use the name C23 as the informal name of the next
revision of the C standard (notwithstanding the publication date in
2024). Update references to C2X in glibc to use the C23 name.
This is intended to update everything *except* where it involves
renaming files (the changes involving renaming tests are intended to
be done separately). In the case of the _ISOC2X_SOURCE feature test
macro - the only user-visible interface involved - support for that
macro is kept for backwards compatibility, while adding
_ISOC23_SOURCE.
Tested for x86_64.
According to ISO C23 (7.6.4.4), fesetexcept is supposed to set
floating-point exception flags without raising a trap (unlike
feraiseexcept, which is supposed to raise a trap if feenableexcept was
called with the appropriate argument).
This is a side-effect of how we implement the GNU extension
feenableexcept, where feenableexcept/fesetenv/fesetmode/feupdateenv
might issue prctl (PR_SET_FPEXC, PR_FP_EXC_PRECISE) depending of the
argument. And on PR_FP_EXC_PRECISE, setting a floating-point exception
flag triggers a trap.
To make the both functions follow the C23, fesetexcept and
fesetexceptflag now fail if the argument may trigger a trap.
The math tests now check for an value different than 0, instead
of bail out as unsupported for EXCEPTION_SET_FORCES_TRAP.
Checked on powerpc64le-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch reserves space for HWCAP3/HWCAP4 in the TCB of powerpc.
These hardware capabilities bits will be used by future Power
architectures.
Versioned symbol '__parse_hwcap_3_4_and_convert_at_platform' advertises
the availability of the new HWCAP3/HWCAP4 data in the TCB.
This is an ABI change for GLIBC 2.39.
Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
Current implementation of strcmp for power10 has
performance regression for multiple small sizes
and alignment combination.
Most of these performance issues are fixed by this
patch. The compare loop is unrolled and page crosses
of unrolled loop is handled.
Thanks to Paul E. Murphy for helping in fixing the
performance issues.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Co-Authored-By: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
Optimized memchr for POWER10 based on existing rawmemchr and strlen.
Reordering instructions and loop unrolling helped in getting better performance.
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
This patch is based on __strcmp_power9 and __strlen_power10.
Improvements from __strcmp_power9:
1. Uses new POWER10 instructions
- This code uses lxvp to decrease contention on load
by loading 32 bytes per instruction.
2. Performance implication
- This version has around 30% better performance on average.
- Performance regression is seen for a specific combination
of sizes and alignments. Some of them is observed without
changes also, while rest may be induced by the patch.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
The _dl_non_dynamic_init does not parse LD_PROFILE, which does not
enable profile for dlopen objects. Since dlopen is deprecated for
static objects, it is better to remove the support.
It also allows to trim down libc.a of profile support.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
On powerpc, SET_RESTORE_ROUND uses inline assembly to optimize the
prologue get/save/set rounding mode operations for POWER9 and
later by using 'mffscrn' where possible, this was introduced by
commit f1c56cdff0.
GCC version 14 onwards supports builtins as __builtin_set_fpscr_rn
which now returns the FPSCR fields in a double. This feature is
available on Power9 when the __SET_FPSCR_RN_RETURNS_FPSCR__ macro
is defined.
GCC commit ef3bbc69d15707e4db6e2f198c621effb636cc26 adds
this feature.
Changes are done to use __builtin_set_fpscr_rn instead of mffscrn
or mffscrni in __fe_mffscrn(rn).
Suggested-by: Carl Love <cel@us.ibm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
04bf7d2d8a ("chk: Add and fix hidden builtin definitions for *_chk")
added an #undef for longjmp and siglongjmp to compensate for the
definition in include/setjmp.h, but missed doing so for the powerpc
version too.
Fixes: 04bf7d2d8a ("chk: Add and fix hidden builtin definitions for
*_chk")
This patch enables the option to influence hwcaps used by PowerPC.
The environment variable, GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,-zzz....,
can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature xxx
and zzz, where the feature name is case-sensitive and has to match the ones
mentioned in the file{sysdeps/powerpc/dl-procinfo.c}.
Note that the hwcap tunables only used in the IFUNC selection.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The compiler might not see that internal definition is an alias
due the libc_ifunc macro, which redefines __strchrnul. With
gcc 6 it fails with:
In file included from <command-line>:0:0:
./../include/libc-symbols.h:472:33: error: ‘__EI___strchrnul’ aliased to
undefined symbol ‘__GI___strchrnul’
extern thread __typeof (name) __EI_##name \
^
./../include/libc-symbols.h:468:3: note: in expansion of macro
‘__hidden_ver2’
__hidden_ver2 (, local, internal, name)
^~~~~~~~~~~~~
./../include/libc-symbols.h:476:29: note: in expansion of macro
‘__hidden_ver1’
# define hidden_def(name) __hidden_ver1(__GI_##name, name, name);
^~~~~~~~~~~~~
./../include/libc-symbols.h:557:32: note: in expansion of macro
‘hidden_def’
# define libc_hidden_def(name) hidden_def (name)
^~~~~~~~~~
../sysdeps/powerpc/powerpc64/multiarch/strchrnul.c:38:1: note: in
expansion of macro ‘libc_hidden_def’
libc_hidden_def (__strchrnul)
^~~~~~~~~~~~~~~
Use libc_ifunc_hidden as stpcpy. Checked on powerpc64 with
gcc 6 and gcc 13.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Bump autoconf requirement to 2.71 to allow regenerating configure on
more recent distributions. autoconf 2.71 has been in Fedora since F36
and is the current version in Debian stable (bookworm). It appears to
be current in Gentoo as well.
All sysdeps configure and preconfigure scripts have also been
regenerated; all changes are trivial transformations that do not affect
functionality.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The sparc ABI has multiple cases on how to handle JMP_SLOT relocations,
(sparc_fixup_plt/sparc64_fixup_plt). For BINDNOW, _dl_audit_symbind
will be responsible to setup the final relocation value; while for
lazy binding _dl_fixup/_dl_profile_fixup will call the audit callback
and tail cail elf_machine_fixup_plt (which will call
sparc64_fixup_plt).
This patch fixes by issuing the SPARC specific routine on bindnow and
forwarding the audit value to elf_machine_fixup_plt for lazy resolution.
It fixes the la_symbind for bind-now tests on sparc64 and sparcv9:
elf/tst-audit24a
elf/tst-audit24b
elf/tst-audit24c
elf/tst-audit24d
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
The fread routine return value needs to be checked when fortification
is enabled, hence use xfread helper.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
All fixes are in comments, so the binaries should be identical
before/after this commit, but I can't verify this.
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
This patch redirects the error functions to the appropriate
longdouble variants which enables the compiler to optimize
for the abi ieeelongdouble.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
And make always supported. The configure option was added on glibc 2.25
and some features require it (such as hwcap mask, huge pages support, and
lock elisition tuning). It also simplifies the build permutations.
Changes from v1:
* Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs
more discussion.
* Cleanup more code.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The default, and power7 implementation just adds word aligned
access when inputs have the same aligment. The unaligned case
is still done by byte operations.
This is already covered by the generic implementation, which also add
the unaligned input optimization.
Checked on powerpc64-linux-gnu built without multi-arch for powerpc64,
power7, power8, and power9 (build for le).
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
The default, power4, and power7 implementation just adds word aligned
access when inputs have the same aligment. The unaligned case
is still done by byte operations.
This is already covered by the generic implementation, which also add
the unaligned input optimization.
Checked on powerpc-linux-gnu built without multi-arch for powerpc,
power4, and power7.
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
Although static linker can optimize it to local call, it follows the
internal scheme to provide hidden proto and definitions.
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
Although static linker can optimize it to local call, it follows the
internal scheme to provide hidden proto and definitions.
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
While ppc has the more important string functions in assembly,
there are still a few generic routines used.
Use the Power 6 CMPB insn for testing of zeros.
Checked on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
It also cleanups the multiple inclusion by leaving the ifunc
implementation to undef the weak_alias and libc_hidden_def.
Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
New algorithm read the first aligned address and mask off the
unwanted bytes (this strategy is similar to arch-specific
implementations used on powerpc, sparc, and sh).
The loop now read word-aligned address and check using the has_eq
macro.
Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
and powerpc64-linux-gnu by removing the arch-specific assembly
implementation and disabling multi-arch (it covers both LE and BE
for 64 and 32 bits).
Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
New algorithm read the first aligned address and mask off the unwanted
bytes (this strategy is similar to arch-specific implementations used
on powerpc, sparc, and sh).
The loop now read word-aligned address and check using the has_zero_eq
function.
Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu,
and powerpc-linux-gnu by removing the arch-specific assembly
implementation and disabling multi-arch (it covers both LE and BE
for 64 and 32 bits).
Co-authored-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
It moves OP_T_THRES out of memcopy.h to its own header and adjust
each architecture that redefines it.
Checked with a build and check with run-built-tests=no for all major
Linux ABIs.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
This patch cleans up the power4 strncmp optimization for powerpc64 which
is unlikely to be used anywhere.
Tested on ppc64le with and without --disable-multi-arch flag.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>