Convert to scratch_buffers to avoid potential stack overflow.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch adds a new macro, M68K_SCALE_AVAILABLE, similar to gmp
scale_available_p (mpn/m68k/m68k-defs.m4) that expand to 1 if a
scale factor can be used in addressing modes. This is used
instead of __mc68020__ for some optimization decisions.
Checked on a build for m68k-linux-gnu target mc68020 and mc68040.
GCC currently does not define __mc68020__ for -mcpu=68040 or higher,
which memcpy/memmove assumptions. Since this memory copy optimization
seems only intended for m68020, disable for other m680X0 variants.
Checked on a build for m68k-linux-gnu target mc68020 and mc68040.
Parts of elf/tst-rtld-list-diagnostics.py have been copied from
scripts/tst-ld-trace.py.
The abnf module is entirely optional and used to verify the
ABNF grammar as included in the manual.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Based on the glibc microbenchmark, only a few short inputs with this
strncmp-aligned and strncmp-lsx implementation experience performance
degradation, overall, strncmp-aligned could reduce the runtime 0%-10%
for aligned comparision, 10%-25% for unaligend comparision, strncmp-lsx
could reduce the runtime about 0%-60%.
Based on the glibc microbenchmark, strcmp-aligned implementation could
reduce the runtime 0%-10% for aligned comparison, 10%-20% for unaligned
comparison, strcmp-lsx implemenation could reduce the runtime 0%-50%.
Based on the glibc microbenchmark, strnlen-aligned implementation could
reduce the runtime more than 10%, strnlen-lsx implementation could reduce
the runtime about 50%-78%, strnlen-lasx implementation could reduce the
runtime about 50%-88%.
The path auxv[*].a_val could either be an integer or a string,
depending on the a_type value. Use a separate field, a_val_string, to
simplify mechanical parsing of the --list-diagnostics output.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
On Skylake, it changes log1p bench performance by:
Before After Improvement
max 63.349 58.347 8%
min 4.448 5.651 -30%
mean 12.0674 10.336 14%
The minimum code path is
if (hx < 0x3FDA827A) /* x < 0.41422 */
{
if (__glibc_unlikely (ax >= 0x3ff00000)) /* x <= -1.0 */
{
...
}
if (__glibc_unlikely (ax < 0x3e200000)) /* |x| < 2**-29 */
{
math_force_eval (two54 + x); /* raise inexact */
if (ax < 0x3c900000) /* |x| < 2**-54 */
{
...
}
else
return x - x * x * 0.5;
FMA and non-FMA code sequences look similar. Non-FMA version is slightly
faster. Since log1p is called by asinh and atanh, it improves asinh
performance by:
Before After Improvement
max 75.645 63.135 16%
min 10.074 10.071 0%
mean 15.9483 14.9089 6%
and improves atanh performance by:
Before After Improvement
max 91.768 75.081 18%
min 15.548 13.883 10%
mean 18.3713 16.8011 8%
When building with fortify enabled, GCC < 12 issues a warning on the
fortify strncat wrapper might overflow the destination buffer (the
failure is tied to -Werror).
Checked on ppc64 and x86_64.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The static PIE configure check uses link tests. When bootstrapping
a cross-toolchain, the link tests fail due to missing crt-files /
libc.so. As we explicitely want to test an issue in binutils (ld),
we now also explicitely check for known linker versions.
See also commit 368b7c614b
S390: Use compile-only instead of also link-tests in configure.
These implementations improve the time to copy data in the glibc
microbenchmark as below:
memcpy-lasx reduces the runtime about 8%-76%
memcpy-lsx reduces the runtime about 8%-72%
memcpy-unaligned reduces the runtime of unaligned data copying up to 40%
memcpy-aligned reduece the runtime of unaligned data copying up to 25%
memmove-lasx reduces the runtime about 20%-73%
memmove-lsx reduces the runtime about 50%
memmove-unaligned reduces the runtime of unaligned data moving up to 40%
memmove-aligned reduces the runtime of unaligned data moving up to 25%
These implementations improve the time to run strchr{nul}
microbenchmark in glibc as below:
strchr-lasx reduces the runtime about 50%-83%
strchr-lsx reduces the runtime about 30%-67%
strchr-aligned reduces the runtime about 10%-20%
strchrnul-lasx reduces the runtime about 50%-83%
strchrnul-lsx reduces the runtime about 36%-65%
strchrnul-aligned reduces the runtime about 6%-10%
SYS_modify_ldt requires CONFIG_MODIFY_LDT_SYSCALL to be set in the kernel, which
some distributions may disable for hardening. Check if that's the case (unset)
and mark the test as UNSUPPORTED if so.
Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
All callers pass 1 or 0x11 anyway (same meaning according to man page),
but still.
Reviewed-by: DJ Delorie <dj@redhat.com>
Signed-off-by: Sam James <sam@gentoo.org>
On i686 f_type is an i32 so the test fails when that has the top bit set.
Explicitly cast to u32.
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli@nabijaczleweli.xyz>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Commit 78ceef25d6 ("configure: Remove
--enable-all-warnings option") removed it due to a missing +.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
On the test workload (mpv --cache=yes with VP9 video decoding), the
bin scanning has a very poor success rate (less than 2%). The tcache
scanning has about 50% success rate, so keep that.
Update comments in malloc/tst-memalign-2 to indicate the purpose
of the tests. Even with the scanning removed, the additional
merging opportunities since commit 542b110585
("malloc: Enable merging of remainders in memalign (bug 30723)")
are sufficient to pass the existing large bins test.
Remove leftover variables from _int_free from refactoring in the
same commit.
Reviewed-by: DJ Delorie <dj@redhat.com>
On Skylake, it improves expm1 bench performance by:
Before After Improvement
max 70.204 68.054 3%
min 20.709 16.2 22%
mean 22.1221 16.7367 24%
NB: Add
extern long double __expm1l (long double);
extern long double __expm1f128 (long double);
for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is
defined since __expm1 may be expanded in their declarations which
causes the build failure.
strlen-lasx is implemeted by LASX simd instructions(256bit)
strlen-lsx is implemeted by LSX simd instructions(128bit)
strlen-align is implemented by LA basic instructions and never use unaligned memory acess
LoongArch glibc can add some LASX/LSX vector instructions codes,
change the required minimum binutils version to 2.41 which could
support vector instructions. HAVE_LOONGARCH_VEC_ASM is removed
accordingly.
The following usage of macro LEAF/ENTRY are all feasible:
1. LEAF(fcn) -- the align value of fcn is .align 3(default value)
2. LEAF(fcn, 6) -- the align value of fcn is .align 6
The:
```
if (shared_per_thread > 0 && threads > 0)
shared_per_thread /= threads;
```
Code was accidentally moved to inside the else scope. This doesn't
match how it was previously (before af992e7abd).
This patch fixes that by putting the division after the `else` block.
Previously, calling _int_free from _int_memalign could put remainders
into the tcache or into fastbins, where they are invisible to the
low-level allocator. This results in missed merge opportunities
because once these freed chunks become available to the low-level
allocator, further memalign allocations (even of the same size are)
likely obstructing merges.
Furthermore, during forwards merging in _int_memalign, do not
completely give up when the remainder is too small to serve as a
chunk on its own. We can still give it back if it can be merged
with the following unused chunk. This makes it more likely that
memalign calls in a loop achieve a compact memory layout,
independently of initial heap layout.
Drop some useless (unsigned long) casts along the way, and tweak
the style to more closely match GNU on changed lines.
Reviewed-by: DJ Delorie <dj@redhat.com>
The nscd daemon caches hosts data from NSS modules verbatim, without
filtering protocol families or sorting them (otherwise separate caches
would be needed for certain ai_flags combinations). The cache
implementation is complete separate from the getaddrinfo code. This
means that rebuilding getaddrinfo is not needed. The only function
actually used is __bump_nl_timestamp from check_pf.c, and this change
moves it into nscd/connections.c.
Tested on x86_64-linux-gnu with -fexceptions, built with
build-many-glibcs.py. I also backported this patch into a distribution
that still supports nscd and verified manually that caching still works.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Since i686 provides the fortified wrappers for memcpy, mempcpy,
memmove, and memset on the same string implementation, the static
build tries to optimized it by not tying the fortified wrappers
to string routine (to avoid pulling the fortify function if
they are not required).
Checked on i686-linux-gnu building with different option:
default and --disable-multi-arch plus default, --disable-default-pie,
--enable-fortify-source={2,3}, and --enable-fortify-source={2,3}
with --disable-default-pie.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>