Commit Graph

15954 Commits

Author SHA1 Message Date
Joe Ramsay
a8e3ab3074 aarch64: Add vector implementations of log2 routines
A table is also added, which is shared between AdvSIMD and SVE log2.
2023-10-23 15:00:45 +01:00
Joe Ramsay
b39e9db5e3 aarch64: Add vector implementations of exp2 routines
Some routines reuse table from v_exp_data.c
2023-10-23 15:00:45 +01:00
Joe Ramsay
f554334c05 aarch64: Add vector implementations of tan routines
This includes some utility headers for evaluating polynomials using
various schemes.
2023-10-23 15:00:44 +01:00
Stefan Liebler
f5677d9ceb tst-spawn-cgroup.c: Fix argument order of UNSUPPORTED message.
The arguments for "expected" and "got" are mismatched.  Furthermore
this patch is dumping both values as hex.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-10-20 08:46:09 +02:00
Stefan Liebler
97a58d885b s390: Fix undefined behaviour in feenableexcept, fedisableexcept [BZ #30960]
If feenableexcept or fedisableexcept gets excepts=FE_INVALID=0x80
as input, we have a signed left shift: 0x80 << 24 which is not
representable as int and thus is undefined behaviour according to
C standard.

This patch casts excepts as unsigned int before shifting, which is
defined.

For me, the observed undefined behaviour is that the shift is done
with "unsigned"-instructions, which is exactly what we want.
Furthermore, I don't get any exception-flags.

After the fix, the code is using the same instruction sequence as
before.
2023-10-19 14:28:22 +02:00
Florian Weimer
dd32e1db38 Revert "elf: Always call destructors in reverse constructor order (bug 30785)"
This reverts commit 6985865bc3.

Reason for revert:

The commit changes the order of ELF destructor calls too much relative
to what applications expect or can handle.  In particular, during
process exit and _dl_fini, after the revert commit, we no longer call
the destructors of the main program first; that only happens after
some dlopen'ed objects have been destructed.  This robs applications
of an opportunity to influence destructor order by calling dlclose
explicitly from the main program's ELF destructors.  A couple of
different approaches involving reverse constructor order were tried,
and none of them worked really well.  It seems we need to keep the
dependency sorting in _dl_fini.

There is also an ambiguity regarding nested dlopen calls from ELF
constructors: Should those destructors run before or after the object
that called dlopen?  Commit 6985865bc3 used reverse order
of the start of ELF constructor calls for destructors, but arguably
using completion of constructors is more correct.  However, that alone
is not sufficient to address application compatibility issues (it
does not change _dl_fini ordering at all).
2023-10-18 11:30:38 +02:00
Bruno Victal
3333eb55b7 Add LE DSCP code point from RFC-8622.
Signed-off-by: Bruno Victal <mirai@makinata.eu>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-10-17 19:00:27 +02:00
Joseph Myers
ff5d2abd18 Add HWCAP2_MOPS from Linux 6.5 to AArch64 bits/hwcap.h
Linux 6.5 adds a new AArch64 HWCAP2 value, HWCAP2_MOPS.  Add it to
glibc's bits/hwcap.h.

Tested with build-many-glibcs.py for aarch64-linux-gnu.
2023-10-17 13:13:27 +00:00
Joseph Myers
5ef608f364 Add SCM_SECURITY, SCM_PIDFD to bits/socket.h
Linux 6.5 adds a constant SCM_PIDFD (recall that the non-uapi
linux/socket.h, where this constant is added, is in fact a header
providing many constants that are part of the kernel/userspace
interface).  This shows up that SCM_SECURITY, from the same set of
definitions and added in Linux 2.6.17, is also missing from glibc,
although glibc has the first two constants from this set, SCM_RIGHTS
and SCM_CREDENTIALS; add both missing constants to glibc.

Tested for x86_64.
2023-10-16 13:19:26 +00:00
Joseph Myers
2399ab0d20 Add AT_HANDLE_FID from Linux 6.5 to bits/fcntl-linux.h
Linux 6.5 adds a constant AT_HANDLE_FID; add it to glibc.  Because
this is a flag for the function name_to_handle_at declared in
bits/fcntl-linux.h, put the flag there rather than alongside other
AT_* flags in (OS-independent) fcntl.h.

Tested for x86_64.
2023-10-16 13:18:51 +00:00
Andreas Schwab
5aa1ddfcb3 Avoid maybe-uninitialized warning in __kernel_rem_pio2
With GCC 14 on 32-bit x86 the compiler emits a maybe-uninitialized
warning:

../sysdeps/ieee754/dbl-64/k_rem_pio2.c: In function '__kernel_rem_pio2':
../sysdeps/ieee754/dbl-64/k_rem_pio2.c:364:20: error: 'fq' may be used uninitialized [-Werror=maybe-uninitialized]
  364 |           y[0] = fq[0]; y[1] = fq[1]; y[2] = fw;
      |                  ~~^~~

This is similar to the warning that is suppressed in the other branch of
the switch.  Help the compiler knowing that the variable is always
initialized, which also makes the suppression obsolete.
2023-10-16 09:59:32 +02:00
Noah Goldstein
a3c50bf46a x86: Prepare strrchr-evex and strrchr-evex512 for AVX10
This commit refactors `strrchr-evex` and `strrchr-evex512` to use a
common implementation: `strrchr-evex-base.S`.

The motivation is `strrchr-evex` needed to be refactored to not use
64-bit masked registers in preperation for AVX10.

Once vec-width masked register combining was removed, the EVEX and
EVEX512 implementations can easily be implemented in the same file
without any major overhead.

The net result is performance improvements (measured on TGL) for both
`strrchr-evex` and `strrchr-evex512`. Although, note there are some
regressions in the test suite and it may be many of the cases that
make the total-geomean of improvement/regression across bench-strrchr
are cold. The point of the performance measurement is to show there
are no major regressions, but the primary motivation is preperation
for AVX10.

Benchmarks where taken on TGL:
https://www.intel.com/content/www/us/en/products/sku/213799/intel-core-i711850h-processor-24m-cache-up-to-4-80-ghz/specifications.html

EVEX geometric_mean(N=5) of all benchmarks New / Original   : 0.74
EVEX512 geometric_mean(N=5) of all benchmarks New / Original: 0.87

Full check passes on x86.
2023-10-06 00:18:55 -05:00
Joe Ramsay
5a4b6f8e4b aarch64: Optimise vecmath logs
* Transpose table layout for improved memory access
* Use half-vector special comparisons for AdvSIMD
* Improve register use near special-case branches
  - Due to the presence of a function call, return value would get
    mov-d out of x0 in order to facilitate PCS. By moving the final
    computation after the branch this can be avoided

Also change SVE routines to use overloaded intrinsics for readability.
2023-10-05 16:54:16 +01:00
Joe Ramsay
480a0dfe1a aarch64: Cosmetic change in SVE exp routines
Use overloaded intrinsics for readability. Codegen does not
change, however while we're bringing the routines up-to-date with
recent improvements to other routines in AOR it is worth copying
this change over as well.
2023-10-05 16:54:00 +01:00
Joe Ramsay
9180160e08 aarch64: Optimize SVE cos & cosf
Saves a mov by ensuring return value does not need to be moved out of
the way before special-case branch. Also change to use overloaded
intrinsics.
2023-10-05 16:53:38 +01:00
Joe Ramsay
8014d1e832 aarch64: Improve vecmath sin routines
* Update ULP comment reflecting a new observed max in [-pi/2, pi/2]
* Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions
* Improve register use near special-case branch

Also use overloaded intrinsics for SVE.
2023-10-05 16:53:06 +01:00
Volker Weißmann
7bb8045ec0 Fix FORTIFY_SOURCE false positive
When -D_FORTIFY_SOURCE=2 was given during compilation,
sprintf and similar functions will check if their
first argument is in read-only memory and exit with
*** %n in writable segment detected ***
otherwise. To check if the memory is read-only, glibc
reads frpm the file "/proc/self/maps". If opening this
file fails due to too many open files (EMFILE), glibc
will now ignore this error.

Fixes [BZ #30932]

Signed-off-by: Volker Weißmann <volker.weissmann@gmx.de>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-10-04 08:07:43 -03:00
Siddhesh Poyarekar
0d5f9ea97f Propagate GLIBC_TUNABLES in setxid binaries
GLIBC_TUNABLES scrubbing happens earlier than envvar scrubbing and some
tunables are required to propagate past setxid boundary, like their
env_alias.  Rely on tunable scrubbing to clean out GLIBC_TUNABLES like
before, restoring behaviour in glibc 2.37 and earlier.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-10-02 15:35:05 -04:00
Kir Kolyshkin
9e4e896f0f Linux: add ST_NOSYMFOLLOW
Linux v5.10 added a mount option MS_NOSYMFOLLOW, which was added to
glibc in commit 0ca21427d9.

Add the corresponding statfs/statvfs flag bit, ST_NOSYMFOLLOW.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-10-02 10:54:27 -03:00
Joe Simmons-Talbott
08e9a60a1a mips: dl-machine-reject-phdr: Get rid of alloca.
Read directly into the mips_abiflags struct rather than reading the
entire segment and using alloca when the passed buffer is not big enough.

Checked with build-many-glibcs.py on mips-linux-gnu

Tested-by: Ying Huang <ying.huang@oss.cipunited.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-10-02 12:55:27 +00:00
Noah Goldstein
d90b43a4ed x86: Add support for AVX10 preset and vec size in cpu-features
This commit add support for the new AVX10 cpu features:
https://cdrdv2-public.intel.com/784267/355989-intel-avx10-spec.pdf

We add checks for:
    - `AVX10`: Check if AVX10 is present.
    - `AVX10_{X,Y,Z}MM`: Check if a given vec class has AVX10 support.

`make check` passes and cpuid output was checked against GNR/DMR on an
emulator.
2023-09-29 14:18:42 -05:00
Samuel Thibault
29d4591b07 hurd: Drop REG_GSFS and REG_ESDS from x86_64's ucontext
These are useless on x86_64, and __NGREG was actually wrong with them.
2023-09-28 00:10:13 +02:00
Manjunath Matti
4eac1825ed fegetenv_and_set_rn now uses the builtins provided by GCC.
On powerpc, SET_RESTORE_ROUND uses inline assembly to optimize the
prologue get/save/set rounding mode operations for POWER9 and
later by using 'mffscrn' where possible, this was introduced by
commit f1c56cdff0.

GCC version 14 onwards supports builtins as __builtin_set_fpscr_rn
which now returns the FPSCR fields in a double. This feature is
available on Power9 when the __SET_FPSCR_RN_RETURNS_FPSCR__ macro
is defined.
GCC commit ef3bbc69d15707e4db6e2f198c621effb636cc26 adds
this feature.

Changes are done to use __builtin_set_fpscr_rn instead of mffscrn
or mffscrni in __fe_mffscrn(rn).

Suggested-by: Carl Love <cel@us.ibm.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-27 13:55:36 -03:00
Adhemerval Zanella
551101e824 io: Do not implement fstat with fstatat
AT_EMPTY_PATH is a requirement to implement fstat over fstatat,
however it does not prevent the kernel to read the path argument.
It is not an issue, but on x86-64 with SMAP-capable CPUs the kernel is
forced to perform expensive user memory access.  After that regular
lookup is performed which adds even more overhead.

Instead, issue the fstat syscall directly on LFS fstat implementation
(32 bit architectures will still continue to use statx, which is
required to have 64 bit time_t support).  it should be even a
small performance gain on non x86_64, since there is no need
to handle the path argument.

Checked on x86_64-linux-gnu.
2023-09-27 09:30:24 -03:00
Wilco Dijkstra
6b695e5c62 AArch64: Remove -0.0 check from vector sin
Remove the unnecessary extra checks for sin (-0.0) from vector sin/sinf,
improving performance.  Passes regress.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-09-26 13:40:07 +01:00
Florian Weimer
f563971b5b elf: Add dummy declaration of _dl_audit_objclose for !SHARED
This allows us to avoid some #ifdef SHARED conditionals.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-09-26 11:40:12 +02:00
Romain Geissler
ec6b95c330 Fix leak in getaddrinfo introduced by the fix for CVE-2023-4806 [BZ #30843]
This patch fixes a very recently added leak in getaddrinfo.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-09-25 01:21:51 +01:00
caiyinyu
672b91ba10 Revert "LoongArch: Add glibc.cpu.hwcap support."
This reverts commit a53451559d.
2023-09-21 09:10:11 +08:00
Joseph Myers
457bb77255 Update kernel version to 6.5 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py
and tst-pidfd-consts.py to 6.5.  (There are no new constants covered
by these tests in 6.5 that need any other header changes;
tst-mount-consts.py was updated separately along with a header
constant addition.)

Tested with build-many-glibcs.py.
2023-09-20 13:36:46 +00:00
caiyinyu
a53451559d LoongArch: Add glibc.cpu.hwcap support.
Key Points:
1. On lasx & lsx platforms, We must use _dl_runtime_{profile, resolve}_{lsx, lasx}
   to save vector registers.
2. Via "tunables", users can choose str/mem_{lasx,lsx,unaligned} functions with
   `export GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX,...`.
   Note: glibc.cpu.hwcaps doesn't affect _dl_runtime_{profile, resolve}_{lsx, lasx}
   selection.

Usage Notes:
1. Only valid inputs: LASX, LSX, UAL. Case-sensitive, comma-separated, no spaces.
2. Example: `export GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX,UAL` turns on LASX & UAL.
   Unmentioned features turn off. With default ifunc: lasx > lsx > unaligned >
   aligned > generic, effect is: lasx > unaligned > aligned > generic; lsx off.
3. Incorrect GLIBC_TUNABLES settings will show error messages.
   For example: On lsx platforms, you cannot enable lasx features. If you do
   that, you will get error messages.
4. Valid input examples:
   - GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX: lasx > aligned > generic.
   - GLIBC_TUNABLES=glibc.cpu.hwcaps=LSX,UAL: lsx > unaligned > aligned > generic.
   - GLIBC_TUNABLES=glibc.cpu.hwcaps=LASX,UAL,LASX,UAL,LSX,LASX,UAL: Repetitions
     allowed but not recommended. Results in: lasx > lsx > unaligned > aligned >
     generic.
2023-09-19 09:11:49 +08:00
Siddhesh Poyarekar
973fe93a56 getaddrinfo: Fix use after free in getcanonname (CVE-2023-4806)
When an NSS plugin only implements the _gethostbyname2_r and
_getcanonname_r callbacks, getaddrinfo could use memory that was freed
during tmpbuf resizing, through h_name in a previous query response.

The backing store for res->at->name when doing a query with
gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in
gethosts during the query.  For AF_INET6 lookup with AI_ALL |
AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second
for a v4 lookup.  In this case, if the first call reallocates tmpbuf
enough number of times, resulting in a malloc, th->h_name (that
res->at->name refers to) ends up on a heap allocated storage in tmpbuf.
Now if the second call to gethosts also causes the plugin callback to
return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF
reference in res->at->name.  This then gets dereferenced in the
getcanonname_r plugin call, resulting in the use after free.

Fix this by copying h_name over and freeing it at the end.  This
resolves BZ #30843, which is assigned CVE-2023-4806.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-09-15 14:38:28 -04:00
dengjianbo
780adf7aea LoongArch: Change to put magic number to .rodata section
Change to put magic number to .rodata section in memmove-lsx, and use
pcalau12i and %pc_lo12 with vld to get the data.
2023-09-15 09:07:47 +08:00
dengjianbo
24279aecf3 LoongArch: Add ifunc support for strrchr{aligned, lsx, lasx}
According to glibc strrchr microbenchmark test results, this implementation
could reduce the runtime time as following:

Name                Percent of rutime reduced
strrchr-lasx        10%-50%
strrchr-lsx         0%-50%
strrchr-aligned     5%-50%

Generic strrchr is implemented by function strlen + memrchr, the lasx version
will compare with generic strrchr implemented by strlen-lasx + memrchr-lasx,
the lsx version will compare with generic strrchr implemented by strlen-lsx +
memrchr-lsx, the aligned version will compare with generic strrchr implemented
by strlen-aligned + memrchr-generic.
2023-09-15 09:07:47 +08:00
dengjianbo
06251002d4 LoongArch: Add ifunc support for strcpy, stpcpy{aligned, unaligned, lsx, lasx}
According to glibc strcpy and stpcpy microbenchmark test results(changed
to use generic_strcpy and generic_stpcpy instead of strlen + memcpy),
comparing with the generic version, this implementation could reduce the
runtime as following:

Name              Percent of rutime reduced
strcpy-aligned    8%-45%
strcpy-unaligned  8%-48%, comparing with the aligned version, unaligned
                  version takes less instructions to copy the tail of data
		  which length is less than 8. it also has better performance
		  in case src and dest cannot be both aligned with 8bytes
strcpy-lsx        20%-80%
strcpy-lasx       15%-86%
stpcpy-aligned    6%-43%
stpcpy-unaligned  8%-48%
stpcpy-lsx        10%-80%
stpcpy-lasx       10%-87%
2023-09-15 09:07:47 +08:00
caiyinyu
c6c73e136a LoongArch: Replace deprecated $v0 with $a0 to eliminate 'as' Warnings. 2023-09-15 09:07:47 +08:00
caiyinyu
f5242db159 LoongArch: Add lasx/lsx support for _dl_runtime_profile. 2023-09-15 09:07:42 +08:00
Joseph Myers
803f4073cc Add MOVE_MOUNT_BENEATH from Linux 6.5 to sys/mount.h
This patch adds the MOVE_MOUNT_BENEATH constant from Linux 6.5 to
glibc's sys/mount.h and updates tst-mount-consts.py to reflect these
constants being up to date with that Linux kernel version.

Tested with build-many-glibcs.py.
2023-09-14 14:58:15 +00:00
Joseph Myers
72511f539c Update syscall lists for Linux 6.5
Linux 6.5 has one new syscall, cachestat, and also enables the
cacheflush syscall for hppa.  Update syscall-names.list and regenerate
the arch-syscall.h headers with build-many-glibcs.py update-syscalls.

Tested with build-many-glibcs.py.
2023-09-12 14:08:53 +00:00
Sergei Trofimovich
073edbdfab
ia64: Work around miscompilation and fix build on ia64's gcc-10 and later
Needed since gcc-10 enabled -fno-common by default.

[In use in Gentoo since gcc-10, no problems observed.
Also discussed with and reviewed by Jessica Clarke from
Debian. Andreas]

Bug: https://bugs.gentoo.org/723268
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2023-09-11 19:19:46 +02:00
Joe Simmons-Talbott
5f798d38e9 stdio: Remove __libc_message alloca usage
Use a fixed size array instead.  The maximum number of arguments
is set by macro tricks.

Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-11 16:16:49 +00:00
Samuel Thibault
a43003ebf6 htl: avoid exposing the vm_region symbol 2023-09-09 10:07:39 +02:00
Florian Weimer
6985865bc3 elf: Always call destructors in reverse constructor order (bug 30785)
The current implementation of dlclose (and process exit) re-sorts the
link maps before calling ELF destructors.  Destructor order is not the
reverse of the constructor order as a result: The second sort takes
relocation dependencies into account, and other differences can result
from ambiguous inputs, such as cycles.  (The force_first handling in
_dl_sort_maps is not effective for dlclose.)  After the changes in
this commit, there is still a required difference due to
dlopen/dlclose ordering by the application, but the previous
discrepancies went beyond that.

A new global (namespace-spanning) list of link maps,
_dl_init_called_list, is updated right before ELF constructors are
called from _dl_init.

In dl_close_worker, the maps variable, an on-stack variable length
array, is eliminated.  (VLAs are problematic, and dlclose should not
call malloc because it cannot readily deal with malloc failure.)
Marking still-used objects uses the namespace list directly, with
next and next_idx replacing the done_index variable.

After marking, _dl_init_called_list is used to call the destructors
of now-unused maps in reverse destructor order.  These destructors
can call dlopen.  Previously, new objects do not have l_map_used set.
This had to change: There is no copy of the link map list anymore,
so processing would cover newly opened (and unmarked) mappings,
unloading them.  Now, _dl_init (indirectly) sets l_map_used, too.
(dlclose is handled by the existing reentrancy guard.)

After _dl_init_called_list traversal, two more loops follow.  The
processing order changes to the original link map order in the
namespace.  Previously, dependency order was used.  The difference
should not matter because relocation dependencies could already
reorder link maps in the old code.

The changes to _dl_fini remove the sorting step and replace it with
a traversal of _dl_init_called_list.  The l_direct_opencount
decrement outside the loader lock is removed because it appears
incorrect: the counter manipulation could race with other dynamic
loader operations.

tst-audit23 needs adjustments to the changes in LA_ACT_DELETE
notifications.  The new approach for checking la_activity should
make it clearer that la_activty calls come in pairs around namespace
updates.

The dependency sorting test cases need updates because the destructor
order is always the opposite order of constructor order, even with
relocation dependencies or cycles present.

There is a future cleanup opportunity to remove the now-constant
force_first and for_fini arguments from the _dl_sort_maps function.

Fixes commit 1df71d32fe ("elf: Implement
force_first handling in _dl_sort_maps_dfs (bug 28937)").

Reviewed-by: DJ Delorie <dj@redhat.com>
2023-09-08 12:34:27 +02:00
Aurelien Jarno
434bf72a94 io: Fix record locking contants for powerpc64 with __USE_FILE_OFFSET64
Commit 5f828ff824 ("io: Fix F_GETLK, F_SETLK, and F_SETLKW for
powerpc64") fixed an issue with the value of the lock constants on
powerpc64 when not using __USE_FILE_OFFSET64, but it ended-up also
changing the value when using __USE_FILE_OFFSET64 causing an API change.

Fix that by also checking that define, restoring the pre
4d0fe291ae commit values:

Default values:
- F_GETLK: 5
- F_SETLK: 6
- F_SETLKW: 7

With -D_FILE_OFFSET_BITS=64:
- F_GETLK: 12
- F_SETLK: 13
- F_SETLKW: 14

At the same time, it has been noticed that there was no test for io lock
with __USE_FILE_OFFSET64, so just add one.

Tested on x86_64-linux-gnu, i686-linux-gnu and
powerpc64le-unknown-linux-gnu.

Resolves: BZ #30804.
Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2023-09-07 21:56:31 +02:00
Joe Simmons-Talbott
955a47a4bf getaddrinfo: Get rid of alloca
Use a scratch_buffer rather than alloca to avoid potential stack
overflow.
2023-09-06 13:33:02 +00:00
Christoph Müllner
3d6fcf1bd7 riscv: Add support for XTheadBb in string-fz[a,i].h
XTheadBb has similar instructions like Zbb, which allow optimized
string processing:
* th.ff0: find-first zero is a CLZ instruction.
* th.tstnbz: Similar like orc.b, but with a bit-inverted result.

The instructions are documented here:
  https://github.com/T-head-Semi/thead-extension-spec/tree/master/xtheadbb

These instructions can be found in the T-Head C906 and the C910.

Tested with the string tests.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-06 09:27:43 -03:00
Siddhesh Poyarekar
3bf7bab88b getcanonname: Fix a typo
This code is generally unused in practice since there don't seem to be
any NSS modules that only implement _nss_MOD_gethostbyname2_r and not
_nss_MOD_gethostbyname3_r.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-09-05 17:04:05 -04:00
Adhemerval Zanella Netto
e7190fc73d linux: Add pidfd_getpid
This interface allows to obtain the associated process ID from the
process file descriptor.  It is done by parsing the procps fdinfo
information.  Its prototype is:

   pid_t pidfd_getpid (int fd)

It returns the associated pid or -1 in case of an error and sets the
errno accordingly.  The possible errno values are those from open, read,
and close (used on procps parsing), along with:

   - EBADF if the FD is negative, does not have a PID associated, or if
     the fdinfo fields contain a value larger than pid_t.

   - EREMOTE if the PID is in a separate namespace.

   - ESRCH if the process is already terminated.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 13:08:59 -03:00
Adhemerval Zanella Netto
0d6f9f6265 posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
Returning a pidfd allows a process to keep a race-free handle for a
child process, otherwise, the caller will need to either use pidfd_open
(which still might be subject to TOCTOU) or keep the old racy interface
base on pid_t.

To correct use pifd_spawn, the kernel must support not only returning
the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux
5.4).  If kernel does not support the waitid, pidfd return ENOSYS.
It avoids the need to racy workarounds, such as reading the procfs
fdinfo to get the pid to use along with other wait interfaces.

These interfaces are similar to the posix_spawn and posix_spawnp, with
the only difference being it returns a process file descriptor (int)
instead of a process ID (pid_t).  Their prototypes are:

  int pidfd_spawn (int *restrict pidfd,
                   const char *restrict file,
                   const posix_spawn_file_actions_t *restrict facts,
                   const posix_spawnattr_t *restrict attrp,
                   char *const argv[restrict],
                   char *const envp[restrict])

  int pidfd_spawnp (int *restrict pidfd,
                    const char *restrict path,
                    const posix_spawn_file_actions_t *restrict facts,
                    const posix_spawnattr_t *restrict attrp,
                    char *const argv[restrict_arr],
                    char *const envp[restrict_arr]);

A new symbol is used instead of a posix_spawn extension to avoid
possible issues with language bindings that might track the return
argument lifetime.  Although on Linux pid_t and int are interchangeable,
POSIX only states that pid_t should be a signed integer.

Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to void rehash posix_spawn API or add a new one. It
also means that both interfaces support the same attribute and file
actions, and a new flag or file action on posix_spawn is also added
automatically for pidfd_spawn.

Also, using posix_spawn plumbing allows the reusing of most of the
current testing with some changes:

  - waitid is used instead of waitpid since it is a more generic
    interface.

  - tst-posix_spawn-setsid.c is adapted to take into consideration that
    the caller can check for session id directly.  The test now spawns
itself and writes the session id as a file instead.

  - tst-spawn3.c need to know where pidfd_spawn is used so it keeps an
    extra file description unused.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 13:08:59 -03:00
Adhemerval Zanella Netto
ce2bfb8569 linux: Add posix_spawnattr_{get, set}cgroup_np (BZ 26371)
These functions allow to posix_spawn and posix_spawnp to use
CLONE_INTO_CGROUP with clone3, allowing the child process to
be created in a different cgroup version 2.  These are GNU
extensions that are available only for Linux, and also only
for the architectures that implement clone3 wrapper
(HAVE_CLONE3_WRAPPER).

To create a process on a different cgroupv2, one can use the:

  posix_spawnattr_t attr;
  posix_spawnattr_init (&attr);
  posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP);
  posix_spawnattr_setcgroup_np (&attr, cgroup);
  posix_spawn (...)

Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control
whether the cgroup file descriptor will be used or not with
clone3.

There is no fallback if either clone3 does not support the flag
or if the architecture does not provide the clone3 wrapper, in
this case posix_spawn returns EOPNOTSUPP.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 13:08:48 -03:00
Adhemerval Zanella Netto
ad77b1bcca linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc
Not all architectures added clone3 syscall.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 10:15:48 -03:00
Adhemerval Zanella Netto
e7d1c58664 mips: Add the clone3 wrapper
It follows the internal signature:

extern int clone3 (struct clone_args *__cl_args, size_t __size,
                   int (*__func) (void *__arg), void *__arg);

Checked on mips64el-linux-gnueabihf, mips64el-n32-linux-gnu, and
mipsel-linux-gnu.
2023-09-05 10:15:48 -03:00
Adhemerval Zanella Netto
b56f7fe79e arm: Add the clone3 wrapper
It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
		    int (*__func) (void *__arg), void *__arg);

Checked on arm-linux-gnueabihf.
2023-09-05 10:15:48 -03:00
Samuel Thibault
8076906109 htl: Fix stack information for main thread
We can easily directly ask the kernel with vm_region rather than
assuming a one-page stack.
2023-09-03 21:11:29 +02:00
Szabolcs Nagy
d2123d6827 elf: Fix slow tls access after dlopen [BZ #19924]
In short: __tls_get_addr checks the global generation counter and if
the current dtv is older then _dl_update_slotinfo updates dtv up to the
generation of the accessed module. So if the global generation is newer
than generation of the module then __tls_get_addr keeps hitting the
slow dtv update path. The dtv update path includes a number of checks
to see if any update is needed and this already causes measurable tls
access slow down after dlopen.

It may be possible to detect up-to-date dtv faster.  But if there are
many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at
least walking the slotinfo list.

This patch tries to update the dtv to the global generation instead, so
after a dlopen the tls access slow path is only hit once.  The modules
with larger generation than the accessed one were not necessarily
synchronized before, so additional synchronization is needed.

This patch uses acquire/release synchronization when accessing the
generation counter.

Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.

I have not benchmarked this. Tested by Adhemerval Zanella on aarch64,
powerpc, sparc, x86 who reported that it fixes the performance issue
of bug 19924.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-01 08:21:37 +01:00
H.J. Lu
1493622f4f x86: Check the lower byte of EAX of CPUID leaf 2 [BZ #30643]
The old Intel software developer manual specified that the low byte of
EAX of CPUID leaf 2 returned 1 which indicated the number of rounds of
CPUDID leaf 2 was needed to retrieve the complete cache information. The
newer Intel manual has been changed to that it should always return 1
and be ignored.  If the lower byte isn't 1, CPUID leaf 2 can't be used.
In this case, we ignore CPUID leaf 2 and use CPUID leaf 4 instead.  If
CPUID leaf 4 doesn't contain the cache information, cache information
isn't available at all.  This addresses BZ #30643.
2023-08-29 12:57:41 -07:00
dengjianbo
693918b6dd LoongArch: Change loongarch to LoongArch in comments 2023-08-29 10:35:38 +08:00
dengjianbo
ea7698a616 LoongArch: Add ifunc support for memcmp{aligned, lsx, lasx}
According to glibc memcmp microbenchmark test results(Add generic
memcmp), this implementation have performance improvement
except the length is less than 3, details as below:

Name             Percent of time reduced
memcmp-lasx      16%-74%
memcmp-lsx       20%-50%
memcmp-aligned   5%-20%
2023-08-29 10:35:38 +08:00
dengjianbo
1b1e9b7c10 LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx}
According to glibc memset microbenchmark test results, for LSX and LASX
versions, A few cases with length less than 8 experience performace
degradation, overall, the LASX version could reduce the runtime about
15% - 75%, LSX version could reduce the runtime about 15%-50%.

The unaligned version uses unaligned memmory access to set data which
length is less than 64 and make address aligned with 8. For this part,
the performace is better than aligned version. Comparing with the generic
version, the performance is close when the length is larger than 128. When
the length is 8-128, the unaligned version could reduce the runtime about
30%-70%, the aligned version could reduce the runtime about 20%-50%.
2023-08-29 10:35:38 +08:00
dengjianbo
55e84dc6ed LoongArch: Add ifunc support for memrchr{lsx, lasx}
According to glibc memrchr microbenchmark, this implementation could reduce
the runtime as following:

Name            Percent of rutime reduced
memrchr-lasx    20%-83%
memrchr-lsx     20%-64%
2023-08-29 10:35:38 +08:00
dengjianbo
60bcb9acbf LoongArch: Add ifunc support for memchr{aligned, lsx, lasx}
According to glibc memchr microbenchmark, this implementation could reduce
the runtime as following:

Name               Percent of runtime reduced
memchr-lasx        37%-83%
memchr-lsx         30%-66%
memchr-aligned     0%-15%
2023-08-29 10:35:38 +08:00
dengjianbo
f8664fe215 LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx}
According to glibc rawmemchr microbenchmark, A few cases tested with
char '\0' experience performance degradation due to the lasx and lsx
versions don't handle the '\0' separately. Overall, rawmemchr-lasx
implementation could reduce the runtime about 40%-80%, rawmemchr-lsx
implementation could reduce the runtime about 40%-66%, rawmemchr-aligned
implementation could reduce the runtime about 20%-40%.
2023-08-29 10:35:38 +08:00
Xi Ruoyao
3efa26749e LoongArch: Micro-optimize LD_PCREL
We are requiring Binutils >= 2.41, so explicit relocation syntax is
always supported by the assembler.  Use it to reduce one instruction.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2023-08-29 10:35:38 +08:00
Xi Ruoyao
aac842d0ed LoongArch: Remove support code for old linker in start.S
We are requiring Binutils >= 2.41, so la.pcrel always works here.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2023-08-29 10:35:38 +08:00
Xi Ruoyao
e757412c3e LoongArch: Simplify the autoconf check for static PIE
We are strictly requiring GAS >= 2.41 now, so we don't need to check
assembler capability anymore.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2023-08-29 10:35:38 +08:00
Kir Kolyshkin
42c960a4f1 Add F_SEAL_EXEC from Linux 6.3 to bits/fcntl-linux.h.
This patch adds the new F_SEAL_EXEC constant from Linux 6.3 (see Linux
commit 6fd7353829c ("mm/memfd: add F_SEAL_EXEC") to bits/fcntl-linux.h.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-08-28 14:51:39 -03:00
Adhemerval Zanella
87ced255bd m68k: Use M68K_SCALE_AVAILABLE on __mpn_lshift and __mpn_rshift
This patch adds a new macro, M68K_SCALE_AVAILABLE, similar to gmp
scale_available_p (mpn/m68k/m68k-defs.m4) that expand to 1 if a
scale factor can be used in addressing modes.  This is used
instead of __mc68020__ for some optimization decisions.

Checked on a build for m68k-linux-gnu target mc68020 and mc68040.
2023-08-25 10:07:24 -03:00
Adhemerval Zanella
b85880633f m68k: Fix build with -mcpu=68040 or higher (BZ 30740)
GCC currently does not define __mc68020__ for -mcpu=68040 or higher,
which memcpy/memmove assumptions.  Since this memory copy optimization
seems only intended for m68020, disable for other m680X0 variants.

Checked on a build for m68k-linux-gnu target mc68020 and mc68040.
2023-08-25 10:07:24 -03:00
dengjianbo
ddbb74f5c2 LoongArch: Add ifunc support for strncmp{aligned, lsx}
Based on the glibc microbenchmark, only a few short inputs with this
strncmp-aligned and strncmp-lsx implementation experience performance
degradation, overall, strncmp-aligned could reduce the runtime 0%-10%
for aligned comparision, 10%-25% for unaligend comparision, strncmp-lsx
could reduce the runtime about 0%-60%.
2023-08-24 17:19:47 +08:00
dengjianbo
82d9426e4a LoongArch: Add ifunc support for strcmp{aligned, lsx}
Based on the glibc microbenchmark, strcmp-aligned implementation could
reduce the runtime 0%-10% for aligned comparison, 10%-20% for unaligned
comparison, strcmp-lsx implemenation could reduce the runtime 0%-50%.
2023-08-24 17:19:47 +08:00
dengjianbo
e74d959862 LoongArch: Add ifunc support for strnlen{aligned, lsx, lasx}
Based on the glibc microbenchmark, strnlen-aligned implementation could
reduce the runtime more than 10%, strnlen-lsx implementation could reduce
the runtime about 50%-78%, strnlen-lasx implementation could reduce the
runtime about 50%-88%.
2023-08-24 17:19:47 +08:00
Guy-Fleury Iteriteka
1dc0bc8f07 htl: move pthread_attr_setdetachstate into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-11-gfleury@disroot.org>
2023-08-24 01:57:22 +02:00
Guy-Fleury Iteriteka
92a6c26470 htl: move pthread_attr_getdetachstate into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-10-gfleury@disroot.org>
2023-08-24 01:57:17 +02:00
Guy-Fleury Iteriteka
c2c9feebdc htl: move pthread_attr_setschedpolicy into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-9-gfleury@disroot.org>
2023-08-24 01:57:16 +02:00
Guy-Fleury Iteriteka
0f3a39072b htl: move pthread_attr_getschedpolicy into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-8-gfleury@disroot.org>
2023-08-24 01:57:14 +02:00
Guy-Fleury Iteriteka
fb2d92a5b3 htl: move pthread_attr_setinheritsched into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-7-gfleury@disroot.org>
2023-08-24 01:57:13 +02:00
Guy-Fleury Iteriteka
62cf5d2bb3 htl: move pthread_attr_getinheritsched into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-6-gfleury@disroot.org>
2023-08-24 01:57:11 +02:00
Guy-Fleury Iteriteka
79de1a0ca2 htl: move pthread_attr_getschedparam into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-5-gfleury@disroot.org>
2023-08-24 01:57:10 +02:00
Guy-Fleury Iteriteka
3caa6362d0 htl: move pthread_setschedparam into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-4-gfleury@disroot.org>
2023-08-24 01:57:08 +02:00
Guy-Fleury Iteriteka
a1a942fb5f htl: move pthread_getschedparam into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-3-gfleury@disroot.org>
2023-08-24 01:57:04 +02:00
Guy-Fleury Iteriteka
9dfa256216 htl: move pthread_equal into libc
Signed-off-by: Guy-Fleury Iteriteka <gfleury@disroot.org>
Message-Id: <20230716084414.107245-2-gfleury@disroot.org>
2023-08-24 01:56:57 +02:00
Florian Weimer
65a5112ede Linux: Avoid conflicting types in ld.so --list-diagnostics
The path auxv[*].a_val could either be an integer or a string,
depending on the a_type value.  Use a separate field, a_val_string, to
simplify mechanical parsing of the --list-diagnostics output.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-08-23 08:12:48 +02:00
H.J. Lu
a8ecb126d4 x86_64: Add log1p with FMA
On Skylake, it changes log1p bench performance by:

        Before       After     Improvement
max     63.349       58.347       8%
min     4.448        5.651        -30%
mean    12.0674      10.336       14%

The minimum code path is

 if (hx < 0x3FDA827A)                          /* x < 0.41422  */
    {
      if (__glibc_unlikely (ax >= 0x3ff00000))           /* x <= -1.0 */
        {
	   ...
        }
      if (__glibc_unlikely (ax < 0x3e200000))           /* |x| < 2**-29 */
        {
          math_force_eval (two54 + x);          /* raise inexact */
          if (ax < 0x3c900000)                  /* |x| < 2**-54 */
            {
	      ...
            }
          else
            return x - x * x * 0.5;

FMA and non-FMA code sequences look similar.  Non-FMA version is slightly
faster.  Since log1p is called by asinh and atanh, it improves asinh
performance by:

        Before       After     Improvement
max     75.645       63.135       16%
min     10.074       10.071       0%
mean    15.9483      14.9089      6%

and improves atanh performance by:

        Before       After     Improvement
max     91.768       75.081       18%
min     15.548       13.883       10%
mean    18.3713      16.8011      8%
2023-08-21 10:44:26 -07:00
Andreas Schwab
ce99601fa8 Remove references to the defunct db2 subdir
The db2 subdir has been removed more than 20 years ago.
2023-08-21 18:20:53 +02:00
Stefan Liebler
f5f96b784b s390x: Fix static PIE condition for toolchain bootstrapping.
The static PIE configure check uses link tests.  When bootstrapping
a cross-toolchain, the link tests fail due to missing crt-files /
libc.so.  As we explicitely want to test an issue in binutils (ld),
we now also explicitely check for known linker versions.

See also commit 368b7c614b
S390: Use compile-only instead of also link-tests in configure.
2023-08-18 10:57:59 +02:00
Andreas Schwab
464fd8249e m68k: fix __mpn_lshift and __mpn_rshift for non-68020
From revision 03f3d275d0d6 in the gmp repository.
2023-08-17 21:56:14 +02:00
Sam James
369f373057
sysdeps: tst-bz21269: fix -Wreturn-type
Thanks to Andreas Schwab for reporting.

Fixes: 652b9fdb77
Signed-off-by: Sam James <sam@gentoo.org>
2023-08-17 09:30:57 +01:00
dengjianbo
8944ba483f Loongarch: Add ifunc support for memcpy{aligned, unaligned, lsx, lasx} and memmove{aligned, unaligned, lsx, lasx}
These implementations improve the time to copy data in the glibc
microbenchmark as below:
memcpy-lasx       reduces the runtime about 8%-76%
memcpy-lsx        reduces the runtime about 8%-72%
memcpy-unaligned  reduces the runtime of unaligned data copying up to 40%
memcpy-aligned    reduece the runtime of unaligned data copying up to 25%
memmove-lasx      reduces the runtime about 20%-73%
memmove-lsx       reduces the runtime about 50%
memmove-unaligned reduces the runtime of unaligned data moving up to 40%
memmove-aligned   reduces the runtime of unaligned data moving up to 25%
2023-08-17 10:12:18 +08:00
dengjianbo
ba67bc8e0a Loongarch: Add ifunc support for strchr{aligned, lsx, lasx} and strchrnul{aligned, lsx, lasx}
These implementations improve the time to run strchr{nul}
microbenchmark in glibc as below:
strchr-lasx       reduces the runtime about 50%-83%
strchr-lsx        reduces the runtime about 30%-67%
strchr-aligned    reduces the runtime about 10%-20%
strchrnul-lasx    reduces the runtime about 50%-83%
strchrnul-lsx     reduces the runtime about 36%-65%
strchrnul-aligned reduces the runtime about 6%-10%
2023-08-17 10:12:18 +08:00
Sam James
652b9fdb77 sysdeps: tst-bz21269: handle ENOSYS & skip appropriately
SYS_modify_ldt requires CONFIG_MODIFY_LDT_SYSCALL to be set in the kernel, which
some distributions may disable for hardening. Check if that's the case (unset)
and mark the test as UNSUPPORTED if so.

Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
2023-08-16 21:01:39 +01:00
Sam James
e0b712dd91 sysdeps: tst-bz21269: fix test parameter
All callers pass 1 or 0x11 anyway (same meaning according to man page),
but still.

Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
2023-08-16 21:01:37 +01:00
Samuel Thibault
81dcf8b3d1 hurd: Fix strictness of <mach/thread_state.h>
Fixes: db25bc52026f ("hurd: Add prototype for and thus fix _hurdsig_abort_rpcs call")
2023-08-16 00:12:52 +02:00
H.J. Lu
1b214630ce x86_64: Add expm1 with FMA
On Skylake, it improves expm1 bench performance by:

        Before       After     Improvement
max     70.204       68.054       3%
min     20.709       16.2         22%
mean    22.1221      16.7367      24%

NB: Add

extern long double __expm1l (long double);
extern long double __expm1f128 (long double);

for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is
defined since __expm1 may be expanded in their declarations which
causes the build failure.
2023-08-14 08:14:19 -07:00
dengjianbo
135407f431 Loongarch: Add ifunc support and add different versions of strlen
strlen-lasx is implemeted by LASX simd instructions(256bit)
strlen-lsx is implemeted by LSX simd instructions(128bit)
strlen-align is implemented by LA basic instructions and never use unaligned memory acess
2023-08-14 09:47:09 +08:00
dengjianbo
cb7954c4c2 LoongArch: Add minuimum binutils required version
LoongArch glibc can add some LASX/LSX vector instructions codes,
change the required minimum binutils version to 2.41 which could
support vector instructions. HAVE_LOONGARCH_VEC_ASM is removed
accordingly.
2023-08-14 09:47:09 +08:00
dengjianbo
57b2c14272 LoongArch: Redefine macro LEAF/ENTRY.
The following usage of macro LEAF/ENTRY are all feasible:
1. LEAF(fcn) -- the align value of fcn is .align 3(default value)
2. LEAF(fcn, 6) -- the align value of fcn is .align 6
2023-08-14 09:47:09 +08:00
Noah Goldstein
084fb31bc2 x86: Fix incorrect scope of setting shared_per_thread [BZ# 30745]
The:

```
    if (shared_per_thread > 0 && threads > 0)
      shared_per_thread /= threads;
```

Code was accidentally moved to inside the else scope.  This doesn't
match how it was previously (before af992e7abd).

This patch fixes that by putting the division after the `else` block.
2023-08-11 15:33:08 -05:00
H.J. Lu
f6b10ed8e9 x86_64: Add log2 with FMA
On Skylake, it improves log2 bench performance by:

        Before       After     Improvement
max     208.779      63.827       69%
min     9.977        6.55         34%
mean    10.366       6.8191       34%
2023-08-11 07:49:45 -07:00
Florian Weimer
039ff51ac7 nscd: Do not rebuild getaddrinfo (bug 30709)
The nscd daemon caches hosts data from NSS modules verbatim, without
filtering protocol families or sorting them (otherwise separate caches
would be needed for certain ai_flags combinations).  The cache
implementation is complete separate from the getaddrinfo code.  This
means that rebuilding getaddrinfo is not needed.  The only function
actually used is __bump_nl_timestamp from check_pf.c, and this change
moves it into nscd/connections.c.

Tested on x86_64-linux-gnu with -fexceptions, built with
build-many-glibcs.py.  I also backported this patch into a distribution
that still supports nscd and verified manually that caching still works.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-08-11 10:10:16 +02:00
H.J. Lu
881546979d x86_64: Sort fpu/multiarch/Makefile
Sort Makefile variables using scripts/sort-makefile-lines.py.

No code generation changes observed in libm.  No regressions on x86_64.
2023-08-10 11:23:25 -07:00
Adhemerval Zanella
c73c96a4a1 i686: Fix build with --disable-multiarch
Since i686 provides the fortified wrappers for memcpy, mempcpy,
memmove, and memset on the same string implementation, the static
build tries to optimized it by not tying the fortified wrappers
to string routine (to avoid pulling the fortify function if
they are not required).

Checked on i686-linux-gnu building with different option:
default and --disable-multi-arch plus default, --disable-default-pie,
--enable-fortify-source={2,3}, and --enable-fortify-source={2,3}
with --disable-default-pie.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-08-10 10:29:29 -03:00