Commit Graph

41115 Commits

Author SHA1 Message Date
Stefan Liebler
ed23449dac elf: Remove _DL_HWCAP_PLATFORM
Remove the definitions of _DL_HWCAP_PLATFORM as those are not used
anymore after removal in elf/dl-cache.c:search_cache().
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
374c8b4483 elf: Remove platform strings in dl-procinfo.c
Remove the platform strings in dl-procinfo.c where also
the implementation of _dl_string_platform() was removed.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
8faada8302 elf: Remove _dl_string_platform
Despite of powerpc where the returned integer is stored in tcb,
and the diagnostics output, there is no user anymore.

Thus this patch removes the diagnostics output and
_dl_string_platform for all other platforms.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
c5aa5fd40a elf: Remove loading legacy hwcaps/platform entries in dynamic loader
The legacy hwcaps mechanism was removed with glibc 2.37:
See this commit series:
- d178c67535
x86_64: Remove platform directory library loading test
- 6099908fb8
elf: Remove legacy hwcaps support from the dynamic loader
- b78ff5a25d
elf: Remove legacy hwcaps support from ldconfig
- 4a7094119c
elf: Remove hwcap parameter from add_to_cache signature
- cfbf883db3
elf: Remove hwcap and bits_hwcap fields from struct cache_entry
- 78d9a1620b
Add NEWS entry for legacy hwcaps removal
- ab40f20364
elf: Remove _dl_string_hwcap
- e76369ed63
elf: Simplify output of hwcap subdirectories in ld.so help

According to Florian Weimer, this was an oversight and should also
have been removed.

As ldconfig does not generate ld.so.cache entries with hwcap/platform
bits in the hwcap-field anymore, this patch now skips those entries.
Thus currently only named-hwcap-entries and the default entries are
allowed.
For named-hwcap entries bit 62 is set and also the isa-level bits can
be set.
For the default entries the hwcap-field is 0.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
f14b6dfc87 x86: Remove HWCAP_START and HWCAP_COUNT
Both defines are not used anymore.  Those were only used for
_dl_string_hwcap(), which itself was removed with commit
ab40f20364
"elf: Remove _dl_string_hwcap"

Just clean up.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
YunQiang Su
eaf4fc516a
math: Update mips32/mips64 ulps for log2p1 2024-06-17 21:45:53 +02:00
Andreas K. Hüttel
98ffc1bfeb
Convert to autoconf 2.72 (vanilla release, no distribution patches)
As discussed at the patch review meeting

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Simon Chopin <simon.chopin@canonical.com>
2024-06-17 21:15:28 +02:00
Joseph Myers
7ec903e028 Implement C23 exp2m1, exp10m1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the exp2m1 and exp10m1 functions (exp2(x)-1 and
exp10(x)-1, like expm1).

As with other such functions, these use type-generic templates that
could be replaced with faster and more accurate type-specific
implementations in future.  Test inputs are copied from those for
expm1, plus some additions close to the overflow threshold (copied
from exp2 and exp10) and also some near the underflow threshold.

exp2m1 has the unusual property of having an input (M_MAX_EXP) where
whether the function overflows (under IEEE semantics) depends on the
rounding mode.  Although these could reasonably be XFAILed in the
testsuite (as we do in some cases for arguments very close to a
function's overflow threshold when an error of a few ulps in the
implementation can result in the implementation not agreeing with an
ideal one on whether overflow takes place - the testsuite isn't smart
enough to handle this automatically), since these functions aren't
required to be correctly rounding, I made the implementation check for
and handle this case specially.

The Makefile ordering expected by lint-makefiles for the new functions
is a bit peculiar, but I implemented it in this patch so that the test
passes; I don't know why log2 also needed moving in one Makefile
variable setting when it didn't in my previous patches, but the
failure showed a different place was expected for that function as
well.

The powerpc64le IFUNC setup seems not to be as self-contained as one
might hope; it shouldn't be necessary to add IFUNCs for new functions
such as these simply to get them building, but without setting up
IFUNCs for the new functions, there were undefined references to
__GI___expm1f128 (that IFUNC machinery results in no such function
being defined, but doesn't stop include/math.h from doing the
redirection resulting in the exp2m1f128 and exp10m1f128
implementations expecting to call it).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 16:31:49 +00:00
Joseph Myers
55eb99e9a9 Implement C23 log10p1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the log10p1 functions (log10(1+x): like log1p, but for
base-10 logarithms).

This is directly analogous to the log2p1 implementation (except that
whereas log2p1 has a smaller underflow range than log1p, log10p1 has a
larger underflow range).  The test inputs are copied from those for
log1p and log2p1, plus a few more inputs in that wider underflow
range.

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 13:48:13 +00:00
Joseph Myers
bb014f50c4 Implement C23 logp1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the logp1 functions (aliases for log1p functions - the
name is intended to be more consistent with the new log2p1 and
log10p1, where clearly it would have been very confusing to name those
functions log21p and log101p).  As aliases rather than new functions,
the content of this patch is somewhat different from those actually
adding new functions.

Tests are shared with log1p, so this patch *does* mechanically update
all affected libm-test-ulps files to expect the same errors for both
functions.

The vector versions of log1p on aarch64 and x86_64 are *not* updated
to have logp1 aliases (and thus there are no corresponding header,
tests, abilist or ulps changes for vector functions either).  It would
be reasonable for such vector aliases and corresponding changes to
other files to be made separately.  For now, the log1p tests instead
avoid testing logp1 in the vector case (a Makefile change is needed to
avoid problems with grep, used in generating the .c files for vector
function tests, matching more than one ALL_RM_TEST line in a file
testing multiple functions with the same inputs, when it assumes that
the .inc file only has a single such line).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 13:47:09 +00:00
Florian Weimer
ca38eff280 support: Include <limits.h> for NAME_MAX use in temp_file.c 2024-06-17 15:14:05 +02:00
Florian Weimer
cb65d66104 support: Include <stdlib.h> for atoi use in support_wait_for_thread_exit 2024-06-17 15:14:05 +02:00
Jan Kurik
6739bbb4df Extend tst-getconf.sh test with NPROCESSORS_CONF and NPROCESSORS_ONLN
Reviewed-by: Arjun Shankar <arjun@redhat.com>
2024-06-17 14:18:16 +02:00
Mike FABIAN
3ea79f5085 Define ISO 639-3 "ltg" (Latgalian) and add ltg_LV locale
Resolves: BZ # 31411

References:
https://iso639-3.sil.org/code/ltg
https://en.wikipedia.org/wiki/Latgalian_language
https://github.com/unicode-org/cldr/blob/main/common/main/ltg.xml
2024-06-17 10:53:16 +02:00
Paul Eggert
6059938728 INSTALL: regenerate 2024-06-15 10:32:34 -07:00
Paul Eggert
7c1ec1b7d0 Minor code improvement to timespec_subtract example
This saves a few instructions.
BORROW cannot be -1, since NSEC_DIFF is at most 999999999.
Idea taken from Gnulib, here:
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=fe33f943054b93af8b965ce6564b8713b0979a21
2024-06-15 08:53:50 -07:00
Paul Eggert
ee768a30fe Modernize and fix doc’s “Date and Time” (BZ 31876)
POSIX.1-2024 (now official) specifies tm_gmtoff and tm_zone.
This is a good time to update the manual’s “Date and Time”
chapter so I went through it, fixed some outdated
stuff that had been in there for decades, and improved it to match
POSIX.1-2024 better and to clarify some implementation-defined
behavior.  Glibc already conforms to POSIX.1-2024 in these matters, so
this is merely a documentation change.

* manual/examples/strftim.c: Use snprintf instead of now-deprecated
  function asctime.  Check for localtime failure.  Simplify by using
  puts instead of fputs.  Prefer ‘buf, sizeof buf’ to less-obvious
  ‘buffer, SIZE’.

* manual/examples/timespec_subtract.c: Modernize to use struct
  timespec not struct timeval, and rename from timeval_subtract.c.
  All uses changed.  Check for overflow.  Do not check for negative
  return value, which ought to be OK since negative time_t is OK.
  Use GNU indenting style.

* manual/time.texi:

  Document CLOCKS_PER_SEC, TIME_UTC, timespec_get, timespec_getres,
  strftime_l.

  Document the storage lifetime of tm_zone and of tzname.

  Caution against use of tzname, timezone and daylight, saying that
  these variables have unspecified values when TZ is geographic.
  This is what glibc actually does (contrary to what the manual said
  before this patch), and POSIX is planned to say the same thing
  <https://austingroupbugs.net/view.php?id=1816>.
  Also say that directly accessing the variables is not thread-safe.

  Say that localtime_r and ctime_r don’t necessarily set time zone
  state.  Similarly, in the tzset documentation, say that it is called
  by ctime, localtime, mktime, strftime, not that it is called by all
  time conversion functions that depend on the time zone.

  Say that tm_isdst is useful mostly just for mktime, and that
  other uses should prefer tm_gmtoff and tm_zone instead.

  Do not say that strftime ignores tm_gmtoff and tm_zone, because
  it doesn’t do that.

  Document what gmtime does to tm_gmtoff and tm_zone.

  Say that the asctime, asctime_r, ctime, and ctime_r are now deprecated
  and/or obsolescent, and that behavior is undefined if the year is <
  1000 or > 9999.  Document strftime before these now-obsolescent
  functions, so that readers see the useful function first.

  Coin the terms “geographical format” and “proleptic format” for the
  two main formats of TZ settings, to simplify exposition.  Use this
  wording consistently.

  Update top-level proleptic syntax to match POSIX.1-2024, which glibc
  already implements.  Document the angle-bracket quoted forms of time
  zone abbreviations in proleptic TZ.  Say that time zone abbreviations
  can contain only ASCII alphanumerics, ‘+’, and ‘-’.

  Document what happens if the proleptic form specifies a DST
  abbreviation and offset but omits the rules.  POSIX says this is
  implementation-defined so we need to document it.  Although this
  documentation mentions ‘posixrules’ tersely, we need to rethink
  ‘posixrules’ since I think it stops working after 2038.

  Clarify wording about TZ settings beginning with ‘;’.

  Say that timegm is in ISO C (as of C23).

  Say that POSIX.1-2024 removed gettimeofday.

  Say that tm_gmtoff and tm_zone are extensions to ISO C, which is
  clearer than saying they are invisible in a struct ISO C enviroment,
  and gives us more wiggle room if we want to make them visible in
  strict ISO C, something that ISO C allows.

  Drop mention of old standards like POSIX.1c and POSIX.2-1992 in the
  text when the history is so old that it’s no longer useful in a
  general-purpose manual.

  Define Coordinated Universal Time (UTC), time zone, time zone ruleset,
  and POSIX Epoch, and use these phrases more consistently.

  Improve TZ examples to show more variety, and to reflect current
  practice and timestamps.  Remove obsolete example about Argentina.
  Add an example for Ireland.

  Don’t rely on GCC extensions when explaining ctime_r.

  Do not say that difftime produces the mathematically correct result,
  since it might be inexact.

  For clock_t don’t say “as in the example above” when there is no
  such example, and don’t say that casting to double works “properly
  and consistently no matter what”, as it suffers from rounding and
  overflow.

  Don’t say broken-down time is not useful for calculations; it’s
  merely painful.

  Say that UTC is not defined before 1960.

  Rename Time Zone Functions to Time Zone State.  All uses changed.

  Update Internet RFC 822 → 5322, 1305 → 5905.  Drop specific years of
  ISO 8601 as they don’t matter.

  Minor style changes: @code{"..."} → @t{"..."} to avoid overquoting in
  info files, @code → @env for environment variables, Daylight Saving
  Time → daylight saving time, white space → whitespace, prime meridian
  → Prime Meridian.
2024-06-15 08:53:50 -07:00
Andreas K. Hüttel
41d6461484
manual: minor language fix (bz 31340)
Resolves: https://sourceware.org/bugzilla/show_bug.cgi?id=31340
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-06-15 15:42:29 +02:00
Noah Goldstein
5b54a33435 x86: Fix value for x86_memset_non_temporal_threshold when it is undesirable
When we don't want to use non-temporal stores for memset, we set
`x86_memset_non_temporal_threshold` to SIZE_MAX.

The current code, however, we using `maximum_non_temporal_threshold`
as the upper bound which is `SIZE_MAX >> 4` so we ended up with a
value of `0`.

Fix is to just use `SIZE_MAX` as the upper bound for when setting the
tunable.
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-06-14 17:25:05 -05:00
H.J. Lu
0b7f7842f8 elf: Change module-names to modules-names in comments
module-names should be modules-names.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-06-14 13:29:21 -07:00
Andreas K. Hüttel
3953b5b88f
i686: Regenerate ulps
Linux pinacolada 6.6.32-gentoo #1 SMP PREEMPT Sun Jun  9 14:18:17 CEST 2024 x86_64 Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz GenuineIntel GNU/Linux
32bit build for multilib environment

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-06-14 21:24:24 +02:00
Xi Ruoyao
97aa7b7346 LoongArch: Ensure sp 16-byte aligned for tlsdesc
"ADDI sp, sp, 24" and "ADDI sp, sp, SZFCSREG" (SZFCSREG = 4) are
misaligning the stack: the ABI mandates a 16-byte alignment.  Fix it
by changing the first one to "ADDI sp, sp, 32", and reuse the spare 4th
slot for saving fcsr.

Reported-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2024-06-14 10:14:54 +08:00
Florian Weimer
868ab8923a resolv: Track single-request fallback via _res._flags (bug 31476)
This avoids changing _res.options, which inteferes with change
detection as part of automatic reloading of /etc/resolv.conf.

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-06-13 18:56:30 +02:00
H.J. Lu
29807a271e x86: Properly set x86 minimum ISA level [BZ #31883]
Properly set libc_cv_have_x86_isa_level in shell for MINIMUM_X86_ISA_LEVEL
defined as

(__X86_ISA_V1 + __X86_ISA_V2 + __X86_ISA_V3 + __X86_ISA_V4)

Also set __X86_ISA_V2 to 1 for i386 if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
is defined.  There are no changes in config.h nor in config.make on x86-64.
On i386, -march=x86-64-v2 with GCC generates

 #define MINIMUM_X86_ISA_LEVEL 2

in config.h and

have-x86-isa-level = 2

in config.make.  This fixes BZ #31883.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-12 14:27:54 -07:00
DJ Delorie
8859607eaa tunables: sort tunables list (BZ 30027)
Sort tunables list at the time it's generated.  Note: adding new
tunables will cause other tunable IDs to change, but that was
the case before anyway.  POSIX does not guarantee the order of "foo
in bar" AWK operators, so the order was indeterminate before anyway.
Even depending on the order to be the same across multiple calls,
such as in this script, is undefined, so sorting the list resolves
that also.

Note that sorting is not dependent on the user's locale.
2024-06-12 14:45:18 -04:00
Adhemerval Zanella
7edd3814b0 linux: Remove __stack_prot
The __stack_prot is used by Linux to make the stack executable if
a modules requires it.  It is also marked as RELRO, which requires
to change the segment permission to RW to update it.

Also, there is no need to keep track of the flags: either the stack
will have the default permission of the ABI or should be change to
PROT_READ | PROT_WRITE | PROT_EXEC.  The only additional flag,
PROT_GROWSDOWN or PROT_GROWSUP, is Linux only and can be deducted
from _STACK_GROWS_DOWN/_STACK_GROWS_UP.

Also, the check_consistency function was already removed some time
ago.

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-06-12 15:25:54 -03:00
Philip Kaludercic
e7ac92e6ca
<stdio.h>: Acknowledge that getdelim/getline are in POSIX
These comments were written in 2003 (added in 2c008571c3), predating
the addition of getdelim(3)/getline(3) in POSIX.1-2008.

Reviewed-by: Sam James <sam@gentoo.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-06-11 22:17:12 +01:00
Mike FABIAN
10733d6a72 localedata: Lowercase day and abday in cs_CZ
Resolves: BZ # 25119

Also to sync with CLDR
2024-06-11 10:33:54 +02:00
H.J. Lu
09bc68b0ac x86: Properly set MINIMUM_X86_ISA_LEVEL for i386 [BZ #31867]
On i386, set the default minimum ISA level to 0, not 1 (baseline which
includes SSE2).  There are no changes in config.h nor in config.make on
x86-64.  This fixes BZ #31867.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Tested-by: Ian Jordan <immoloism@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-06-11 00:10:08 -07:00
Joe Damato
bef2a827a5 x86: Enable non-temporal memset tunable for AMD
In commit 46b5e98ef6 ("x86: Add seperate non-temporal tunable for
memset") a tunable threshold for enabling non-temporal memset was added,
but only for Intel hardware.

Since that commit, new benchmark results suggest that non-temporal
memset is beneficial on AMD, as well, so allow this tunable to be set
for AMD.

See:
https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
which has been updated to include data using different stategies for
large memset on AMD Zen2, Zen3, and Zen4.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-10 16:18:18 -05:00
Samuel Thibault
5968125f55 hurd: Fix getxattr/listxattr returning ERANGE
The manpage says that when the passed size is zero, they should set the
expected size and return 0. ERANGE shall be returned only when the non-zero
passed size is not large enough.
2024-06-10 22:01:40 +02:00
Samuel Thibault
ed06248019 hurd: Fix setxattr return value on replacing
When XATTR_REPLACE is set we shall succeed when the value already
exists, and fail with ENODATA otherwise, instead of the converse.
2024-06-10 22:00:20 +02:00
Samuel Thibault
ba5a23422a hurd: Fix getxattr("gnu.translator") returning ENODATA
When no translator is set, __file_get_translator would return EINVAL
which is a confusing value. Better check for a passive translation
before getting the value.
2024-06-10 21:57:53 +02:00
Samuel Thibault
74f9ee3b91 hurd: Fix lsetxattr return value
The manpage says that lsetxattr returns 0 on success, like setxattr.
2024-06-10 21:56:13 +02:00
David Paleino
eb37015879 localedata: add new locales scn_IT
Signed-off-by: David Paleino <dapal@debian.org>
2024-06-07 15:45:18 +02:00
Avinal Kumar
54c1efdac5 support: Fix typo in xgetsockname error message
The error message in xgetsockname was incorrectly referring to a
different function.  This commit fixes that.

Suggested-by: Arjun Shankar <arjun@redhat.com>
Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-06-05 09:58:55 -04:00
Mohamed Akram
2f7246015c getconf: Add NPROCESSORS_{CONF,ONLN} [BZ #31661]
These are required by the upcoming POSIX standard and are available on
other platforms.

Link: https://austingroupbugs.net/view.php?id=339
Signed-off-by: Mohamed Akram <mohd.akram@outlook.com>
Reviewed-by: Arjun Shankar <arjun@redhat.com>
2024-06-05 14:57:54 +02:00
Joe Damato
92c270d32c Linux: Add epoll ioctls
As of Linux kernel 6.9, some ioctls and a parameters structure have been
introduced which allow user programs to control whether a particular
epoll context will busy poll.

Update the headers to include these for the convenience of user apps.

The ioctls were added in Linux kernel 6.9 commit 18e2bf0edf4dd
("eventpoll: Add epoll ioctl for epoll_params") [1] to
include/uapi/linux/eventpoll.h.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/?h=v6.9&id=18e2bf0edf4dd

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-04 12:09:15 -05:00
Paul Eggert
400bdb5c85 Improve doc for time_t range (BZ 31808) 2024-06-04 09:04:04 -07:00
Paul Eggert
cafef3eb21 difftime can throw exceptions
difftime can signal an inexact conversion when converting to double,
so it should not be marked as pure or nothrow (BZ 31808).

Although we could do something more complicated, in which difftime is
plain on modern platforms but const and nothrow on obsolescent
platforms with 32-bit time_t, it hardly seems worth the trouble.
difftime is used so rarely that it's not worth taking pains to
optimize calls to it on obsolescent platforms.

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-06-04 09:04:04 -07:00
sayan paul
127fc56152 malloc: New test to check malloc alternate path using memory obstruction
The test aims to ensure that malloc uses the alternate path to
allocate memory when sbrk() or brk() fails.To achieve this,
the test first creates an obstruction at current program break,
tests that obstruction with a failing sbrk(), then checks if malloc
is still returning a valid ptr thus inferring that malloc() used
mmap() instead of brk() or sbrk() to allocate the memory.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
Reviewed-by: Zack Weinberg <zack@owlfolio.org>
2024-06-04 18:00:29 +02:00
Szabolcs Nagy
2a9943b4a0 math: Fix exp10 undefined left shift
Left shift of ki is undefined when ki<0, copy the logic from exp,
which uses unsigned arithmetics, to fix it.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-04 15:33:26 +01:00
Florian Weimer
d0106b6ae2 libio: Test for fdopen memory leak without SEEK_END support (bug 31840)
The bug report used /dev/mem, but /proc/self/mem works as well
(if available).
2024-06-04 16:09:33 +02:00
Andreas Schwab
b2c3ee3724 Remove memory leak in fdopen (bug 31840)
Deallocate the memory for the FILE structure when seeking to the end fails
in append mode.

Fixes: ea33158c96 ("Fix offset caching for streams and use it for ftell (BZ #16680)")
2024-06-04 14:42:06 +02:00
Joseph Myers
1d441791cb Add new AArch64 HWCAP2 definitions from Linux 6.9 to bits/hwcap.h
Linux 6.9 adds 15 new HWCAP2_* values for AArch64; add them to
bits/hwcap.h in glibc.

Tested with build-many-glibcs.py for aarch64-linux-gnu.
2024-06-04 12:25:05 +00:00
Joseph Myers
9063b32b3c Add more NT_ARM_* constants from Linux kernel to elf.h
Linux 6.9 adds the ELF note type NT_ARM_FPMR.  Add this to glibc's
elf.h, along with the previously missed NT_ARM_SSVE, NT_ARM_ZA and
NT_ARM_ZT (added in older kernel versions).

Tested for x86_64.
2024-06-04 12:24:37 +00:00
Florian Weimer
992daa0b4b stdlib: Describe __cxa_finalize usage in function comment
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
2024-06-03 19:04:58 +02:00
Florian Weimer
afe42e935b elf: Avoid some free (NULL) calls in _dl_update_slotinfo
This has been confirmed to work around some interposed mallocs.  Here
is a discussion of the impact test ust/libc-wrapper/test_libc-wrapper
in lttng-tools:

  New TLS usage in libgcc_s.so.1, compatibility impact
  <https://inbox.sourceware.org/libc-alpha/8734v1ieke.fsf@oldenburg.str.redhat.com/>

Reportedly, this patch also papers over a similar issue when tcmalloc
2.9.1 is not compiled with -ftls-model=initial-exec.  Of course the
goal really should be to compile mallocs with the initial-exec TLS
model, but this commit appears to be a useful interim workaround.

Fixes commit d2123d6827 ("elf: Fix slow
tls access after dlopen [BZ #19924]").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-06-03 19:02:19 +02:00
Noah Goldstein
46b5e98ef6 x86: Add seperate non-temporal tunable for memset
The tuning for non-temporal stores for memset vs memcpy is not always
the same. This includes both the exact value and whether non-temporal
stores are profitable at all for a given arch.

This patch add `x86_memset_non_temporal_threshold`. Currently we
disable non-temporal stores for non Intel vendors as the only
benchmarks showing its benefit have been on Intel hardware.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-30 12:36:09 -05:00
Noah Goldstein
5bf0ab8057 x86: Improve large memset perf with non-temporal stores [RHEL-29312]
Previously we use `rep stosb` for all medium/large memsets. This is
notably worse than non-temporal stores for large (above a
few MBs) memsets.
See:
https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
For data using different stategies for large memset on ICX and SKX.

Using non-temporal stores can be up to 3x faster on ICX and 2x faster
on SKX. Historically, these numbers would not have been so good
because of the zero-over-zero writeback optimization that `rep stosb`
is able to do. But, the zero-over-zero writeback optimization has been
removed as a potential side-channel attack, so there is no longer any
good reason to only rely on `rep stosb` for large memsets. On the flip
size, non-temporal writes can avoid data in their RFO requests saving
memory bandwidth.

All of the other changes to the file are to re-organize the
code-blocks to maintain "good" alignment given the new code added in
the `L(stosb_local)` case.

The results from running the GLIBC memset benchmarks on TGL-client for
N=20 runs:

Geometric Mean across the suite New / Old EXEX256: 0.979
Geometric Mean across the suite New / Old EXEX512: 0.979
Geometric Mean across the suite New / Old AVX2   : 0.986
Geometric Mean across the suite New / Old SSE2   : 0.979

Most of the cases are essentially unchanged, this is mostly to show
that adding the non-temporal case didn't add any regressions to the
other cases.

The results on the memset-large benchmark suite on TGL-client for N=20
runs:

Geometric Mean across the suite New / Old EXEX256: 0.926
Geometric Mean across the suite New / Old EXEX512: 0.925
Geometric Mean across the suite New / Old AVX2   : 0.928
Geometric Mean across the suite New / Old SSE2   : 0.924

So roughly a 7.5% speedup. This is lower than what we see on servers
(likely because clients typically have faster single-core bandwidth so
saving bandwidth on RFOs is less impactful), but still advantageous.

Full test-suite passes on x86_64 w/ and w/o multiarch.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-30 12:36:09 -05:00