Commit Graph

40361 Commits

Author SHA1 Message Date
dengjianbo
780adf7aea LoongArch: Change to put magic number to .rodata section
Change to put magic number to .rodata section in memmove-lsx, and use
pcalau12i and %pc_lo12 with vld to get the data.
2023-09-15 09:07:47 +08:00
dengjianbo
24279aecf3 LoongArch: Add ifunc support for strrchr{aligned, lsx, lasx}
According to glibc strrchr microbenchmark test results, this implementation
could reduce the runtime time as following:

Name                Percent of rutime reduced
strrchr-lasx        10%-50%
strrchr-lsx         0%-50%
strrchr-aligned     5%-50%

Generic strrchr is implemented by function strlen + memrchr, the lasx version
will compare with generic strrchr implemented by strlen-lasx + memrchr-lasx,
the lsx version will compare with generic strrchr implemented by strlen-lsx +
memrchr-lsx, the aligned version will compare with generic strrchr implemented
by strlen-aligned + memrchr-generic.
2023-09-15 09:07:47 +08:00
dengjianbo
06251002d4 LoongArch: Add ifunc support for strcpy, stpcpy{aligned, unaligned, lsx, lasx}
According to glibc strcpy and stpcpy microbenchmark test results(changed
to use generic_strcpy and generic_stpcpy instead of strlen + memcpy),
comparing with the generic version, this implementation could reduce the
runtime as following:

Name              Percent of rutime reduced
strcpy-aligned    8%-45%
strcpy-unaligned  8%-48%, comparing with the aligned version, unaligned
                  version takes less instructions to copy the tail of data
		  which length is less than 8. it also has better performance
		  in case src and dest cannot be both aligned with 8bytes
strcpy-lsx        20%-80%
strcpy-lasx       15%-86%
stpcpy-aligned    6%-43%
stpcpy-unaligned  8%-48%
stpcpy-lsx        10%-80%
stpcpy-lasx       10%-87%
2023-09-15 09:07:47 +08:00
caiyinyu
c6c73e136a LoongArch: Replace deprecated $v0 with $a0 to eliminate 'as' Warnings. 2023-09-15 09:07:47 +08:00
caiyinyu
f5242db159 LoongArch: Add lasx/lsx support for _dl_runtime_profile. 2023-09-15 09:07:42 +08:00
Joseph Myers
803f4073cc Add MOVE_MOUNT_BENEATH from Linux 6.5 to sys/mount.h
This patch adds the MOVE_MOUNT_BENEATH constant from Linux 6.5 to
glibc's sys/mount.h and updates tst-mount-consts.py to reflect these
constants being up to date with that Linux kernel version.

Tested with build-many-glibcs.py.
2023-09-14 14:58:15 +00:00
Florian Weimer
bd77dd7e73 CVE-2023-4527: Stack read overflow with large TCP responses in no-aaaa mode
Without passing alt_dns_packet_buffer, __res_context_search can only
store 2048 bytes (what fits into dns_packet_buffer).  However,
the function returns the total packet size, and the subsequent
DNS parsing code in _nss_dns_gethostbyname4_r reads beyond the end
of the stack-allocated buffer.

Fixes commit f282cdbe7f ("resolv: Implement no-aaaa
stub resolver option") and bug 30842.
2023-09-13 14:10:56 +02:00
John David Anglin
c8fa383f4c resolv: Fix some unaligned accesses in resolver [BZ #30750]
Signed-off-by: John David Anglin <dave.anglin@bell.net>
2023-09-13 11:04:41 +00:00
Joseph Myers
72511f539c Update syscall lists for Linux 6.5
Linux 6.5 has one new syscall, cachestat, and also enables the
cacheflush syscall for hppa.  Update syscall-names.list and regenerate
the arch-syscall.h headers with build-many-glibcs.py update-syscalls.

Tested with build-many-glibcs.py.
2023-09-12 14:08:53 +00:00
Sergei Trofimovich
073edbdfab
ia64: Work around miscompilation and fix build on ia64's gcc-10 and later
Needed since gcc-10 enabled -fno-common by default.

[In use in Gentoo since gcc-10, no problems observed.
Also discussed with and reviewed by Jessica Clarke from
Debian. Andreas]

Bug: https://bugs.gentoo.org/723268
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2023-09-11 19:19:46 +02:00
Joe Simmons-Talbott
5f798d38e9 stdio: Remove __libc_message alloca usage
Use a fixed size array instead.  The maximum number of arguments
is set by macro tricks.

Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-11 16:16:49 +00:00
Samuel Thibault
a43003ebf6 htl: avoid exposing the vm_region symbol 2023-09-09 10:07:39 +02:00
Adam Jackson
8cb69e0543 libio: Fix oversized __io_vtables
IO_VTABLES_LEN is the size of the struct array in bytes, not the number
of __IO_jump_t's in the array. Drops just under 384kb from .rodata on
LP64 machines.

Fixes: 3020f72618 ("libio: Remove the usage of __libc_IO_vtables")
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Tested-by: Florian Weimer <fweimer@redhat.com>
2023-09-08 23:00:04 +02:00
Joseph Myers
deeaa5e90f Use Linux 6.5 in build-many-glibcs.py
This patch makes build-many-glibcs.py use Linux 6.5.

Tested with build-many-glibcs.py (host-libraries, compilers and glibcs
builds).
2023-09-08 20:04:42 +00:00
Florian Weimer
53df2ce688 elf: Remove unused l_text_end field from struct link_map
It is a left-over from commit 52a01100ad
("elf: Remove ad-hoc restrictions on dlopen callers [BZ #22787]").

When backporting commmit 6985865bc3
("elf: Always call destructors in reverse constructor order
(bug 30785)"), we can move the l_init_called_next field to this
place, so that the internal GLIBC_PRIVATE ABI does not change.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2023-09-08 18:39:20 +02:00
Florian Weimer
6985865bc3 elf: Always call destructors in reverse constructor order (bug 30785)
The current implementation of dlclose (and process exit) re-sorts the
link maps before calling ELF destructors.  Destructor order is not the
reverse of the constructor order as a result: The second sort takes
relocation dependencies into account, and other differences can result
from ambiguous inputs, such as cycles.  (The force_first handling in
_dl_sort_maps is not effective for dlclose.)  After the changes in
this commit, there is still a required difference due to
dlopen/dlclose ordering by the application, but the previous
discrepancies went beyond that.

A new global (namespace-spanning) list of link maps,
_dl_init_called_list, is updated right before ELF constructors are
called from _dl_init.

In dl_close_worker, the maps variable, an on-stack variable length
array, is eliminated.  (VLAs are problematic, and dlclose should not
call malloc because it cannot readily deal with malloc failure.)
Marking still-used objects uses the namespace list directly, with
next and next_idx replacing the done_index variable.

After marking, _dl_init_called_list is used to call the destructors
of now-unused maps in reverse destructor order.  These destructors
can call dlopen.  Previously, new objects do not have l_map_used set.
This had to change: There is no copy of the link map list anymore,
so processing would cover newly opened (and unmarked) mappings,
unloading them.  Now, _dl_init (indirectly) sets l_map_used, too.
(dlclose is handled by the existing reentrancy guard.)

After _dl_init_called_list traversal, two more loops follow.  The
processing order changes to the original link map order in the
namespace.  Previously, dependency order was used.  The difference
should not matter because relocation dependencies could already
reorder link maps in the old code.

The changes to _dl_fini remove the sorting step and replace it with
a traversal of _dl_init_called_list.  The l_direct_opencount
decrement outside the loader lock is removed because it appears
incorrect: the counter manipulation could race with other dynamic
loader operations.

tst-audit23 needs adjustments to the changes in LA_ACT_DELETE
notifications.  The new approach for checking la_activity should
make it clearer that la_activty calls come in pairs around namespace
updates.

The dependency sorting test cases need updates because the destructor
order is always the opposite order of constructor order, even with
relocation dependencies or cycles present.

There is a future cleanup opportunity to remove the now-constant
force_first and for_fini arguments from the _dl_sort_maps function.

Fixes commit 1df71d32fe ("elf: Implement
force_first handling in _dl_sort_maps_dfs (bug 28937)").

Reviewed-by: DJ Delorie <dj@redhat.com>
2023-09-08 12:34:27 +02:00
Aurelien Jarno
434bf72a94 io: Fix record locking contants for powerpc64 with __USE_FILE_OFFSET64
Commit 5f828ff824 ("io: Fix F_GETLK, F_SETLK, and F_SETLKW for
powerpc64") fixed an issue with the value of the lock constants on
powerpc64 when not using __USE_FILE_OFFSET64, but it ended-up also
changing the value when using __USE_FILE_OFFSET64 causing an API change.

Fix that by also checking that define, restoring the pre
4d0fe291ae commit values:

Default values:
- F_GETLK: 5
- F_SETLK: 6
- F_SETLKW: 7

With -D_FILE_OFFSET_BITS=64:
- F_GETLK: 12
- F_SETLK: 13
- F_SETLKW: 14

At the same time, it has been noticed that there was no test for io lock
with __USE_FILE_OFFSET64, so just add one.

Tested on x86_64-linux-gnu, i686-linux-gnu and
powerpc64le-unknown-linux-gnu.

Resolves: BZ #30804.
Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2023-09-07 21:56:31 +02:00
Florian Weimer
d99609a3eb manual: Fix ld.so diagnostics menu/section structure
And shorten the section/node names a bit, so that the menu
entries become easier to read.

Texinfo 6.5 fails to process the previous structure:

./dynlink.texi:56: warning: node `Dynamic Linker Introspection' is
  next for `Dynamic Linker Diagnostics' in sectioning but not in menu
./dynlink.texi:56: warning: node up `Dynamic Linker Diagnostics'
  in menu `Dynamic Linker Invocation' and
  in sectioning `Dynamic Linker' differ
./dynlink.texi:1: node `Dynamic Linker' lacks menu item for
  `Dynamic Linker Diagnostics' despite being its Up target
./dynlink.texi:226: warning: node prev `Dynamic Linker Introspection' in menu `Dynamic Linker Invocation'
  and in sectioning `Dynamic Linker Diagnostics' differ

Texinfo 7.0.2 does not report an error.

This fixes commit f21962ddfc
("manual: Document ld.so --list-diagnostics output").

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-09-06 18:37:21 +02:00
Joe Simmons-Talbott
955a47a4bf getaddrinfo: Get rid of alloca
Use a scratch_buffer rather than alloca to avoid potential stack
overflow.
2023-09-06 13:33:02 +00:00
Christoph Müllner
3d6fcf1bd7 riscv: Add support for XTheadBb in string-fz[a,i].h
XTheadBb has similar instructions like Zbb, which allow optimized
string processing:
* th.ff0: find-first zero is a CLZ instruction.
* th.tstnbz: Similar like orc.b, but with a bit-inverted result.

The instructions are documented here:
  https://github.com/T-head-Semi/thead-extension-spec/tree/master/xtheadbb

These instructions can be found in the T-Head C906 and the C910.

Tested with the string tests.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-06 09:27:43 -03:00
Siddhesh Poyarekar
3bf7bab88b getcanonname: Fix a typo
This code is generally unused in practice since there don't seem to be
any NSS modules that only implement _nss_MOD_gethostbyname2_r and not
_nss_MOD_gethostbyname3_r.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-09-05 17:04:05 -04:00
Adhemerval Zanella Netto
e7190fc73d linux: Add pidfd_getpid
This interface allows to obtain the associated process ID from the
process file descriptor.  It is done by parsing the procps fdinfo
information.  Its prototype is:

   pid_t pidfd_getpid (int fd)

It returns the associated pid or -1 in case of an error and sets the
errno accordingly.  The possible errno values are those from open, read,
and close (used on procps parsing), along with:

   - EBADF if the FD is negative, does not have a PID associated, or if
     the fdinfo fields contain a value larger than pid_t.

   - EREMOTE if the PID is in a separate namespace.

   - ESRCH if the process is already terminated.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 13:08:59 -03:00
Adhemerval Zanella Netto
0d6f9f6265 posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
Returning a pidfd allows a process to keep a race-free handle for a
child process, otherwise, the caller will need to either use pidfd_open
(which still might be subject to TOCTOU) or keep the old racy interface
base on pid_t.

To correct use pifd_spawn, the kernel must support not only returning
the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux
5.4).  If kernel does not support the waitid, pidfd return ENOSYS.
It avoids the need to racy workarounds, such as reading the procfs
fdinfo to get the pid to use along with other wait interfaces.

These interfaces are similar to the posix_spawn and posix_spawnp, with
the only difference being it returns a process file descriptor (int)
instead of a process ID (pid_t).  Their prototypes are:

  int pidfd_spawn (int *restrict pidfd,
                   const char *restrict file,
                   const posix_spawn_file_actions_t *restrict facts,
                   const posix_spawnattr_t *restrict attrp,
                   char *const argv[restrict],
                   char *const envp[restrict])

  int pidfd_spawnp (int *restrict pidfd,
                    const char *restrict path,
                    const posix_spawn_file_actions_t *restrict facts,
                    const posix_spawnattr_t *restrict attrp,
                    char *const argv[restrict_arr],
                    char *const envp[restrict_arr]);

A new symbol is used instead of a posix_spawn extension to avoid
possible issues with language bindings that might track the return
argument lifetime.  Although on Linux pid_t and int are interchangeable,
POSIX only states that pid_t should be a signed integer.

Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to void rehash posix_spawn API or add a new one. It
also means that both interfaces support the same attribute and file
actions, and a new flag or file action on posix_spawn is also added
automatically for pidfd_spawn.

Also, using posix_spawn plumbing allows the reusing of most of the
current testing with some changes:

  - waitid is used instead of waitpid since it is a more generic
    interface.

  - tst-posix_spawn-setsid.c is adapted to take into consideration that
    the caller can check for session id directly.  The test now spawns
itself and writes the session id as a file instead.

  - tst-spawn3.c need to know where pidfd_spawn is used so it keeps an
    extra file description unused.

Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 13:08:59 -03:00
Adhemerval Zanella Netto
ce2bfb8569 linux: Add posix_spawnattr_{get, set}cgroup_np (BZ 26371)
These functions allow to posix_spawn and posix_spawnp to use
CLONE_INTO_CGROUP with clone3, allowing the child process to
be created in a different cgroup version 2.  These are GNU
extensions that are available only for Linux, and also only
for the architectures that implement clone3 wrapper
(HAVE_CLONE3_WRAPPER).

To create a process on a different cgroupv2, one can use the:

  posix_spawnattr_t attr;
  posix_spawnattr_init (&attr);
  posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP);
  posix_spawnattr_setcgroup_np (&attr, cgroup);
  posix_spawn (...)

Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control
whether the cgroup file descriptor will be used or not with
clone3.

There is no fallback if either clone3 does not support the flag
or if the architecture does not provide the clone3 wrapper, in
this case posix_spawn returns EOPNOTSUPP.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 13:08:48 -03:00
Adhemerval Zanella Netto
ad77b1bcca linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and sparc
Not all architectures added clone3 syscall.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-05 10:15:48 -03:00
Adhemerval Zanella Netto
e7d1c58664 mips: Add the clone3 wrapper
It follows the internal signature:

extern int clone3 (struct clone_args *__cl_args, size_t __size,
                   int (*__func) (void *__arg), void *__arg);

Checked on mips64el-linux-gnueabihf, mips64el-n32-linux-gnu, and
mipsel-linux-gnu.
2023-09-05 10:15:48 -03:00
Adhemerval Zanella Netto
b56f7fe79e arm: Add the clone3 wrapper
It follows the internal signature:

  extern int clone3 (struct clone_args *__cl_args, size_t __size,
		    int (*__func) (void *__arg), void *__arg);

Checked on arm-linux-gnueabihf.
2023-09-05 10:15:48 -03:00
Samuel Thibault
4be913652c hurd: Avoid including thread_state.h in installed header
thread_state.h is not actually installed. It was only needed for
struct machine_thread_all_state, which we can just declare, actually.
2023-09-05 11:58:26 +02:00
Samuel Thibault
6333a6014f __call_tls_dtors: Use call_function_static_weak 2023-09-04 20:03:37 +02:00
Bruno Haible
2897b231a6 intl: Treat C.UTF-8 locale like C locale (BZ# 16621)
The wiki page https://sourceware.org/glibc/wiki/Proposals/C.UTF-8
says that "Setting LC_ALL=C.UTF-8 will ignore LANGUAGE just like it
does with LC_ALL=C." This patch implements it.

* intl/dcigettext.c (guess_category_value): Treat C.<encoding> locale
like the C locale.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-09-04 15:31:36 +02:00
Samuel Thibault
8076906109 htl: Fix stack information for main thread
We can easily directly ask the kernel with vm_region rather than
assuming a one-page stack.
2023-09-03 21:11:29 +02:00
Samuel Thibault
89ade8d8cb htl: thread_local destructors support 2023-09-03 15:23:56 +02:00
Szabolcs Nagy
d2123d6827 elf: Fix slow tls access after dlopen [BZ #19924]
In short: __tls_get_addr checks the global generation counter and if
the current dtv is older then _dl_update_slotinfo updates dtv up to the
generation of the accessed module. So if the global generation is newer
than generation of the module then __tls_get_addr keeps hitting the
slow dtv update path. The dtv update path includes a number of checks
to see if any update is needed and this already causes measurable tls
access slow down after dlopen.

It may be possible to detect up-to-date dtv faster.  But if there are
many modules loaded (> TLS_SLOTINFO_SURPLUS) then this requires at
least walking the slotinfo list.

This patch tries to update the dtv to the global generation instead, so
after a dlopen the tls access slow path is only hit once.  The modules
with larger generation than the accessed one were not necessarily
synchronized before, so additional synchronization is needed.

This patch uses acquire/release synchronization when accessing the
generation counter.

Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.

I have not benchmarked this. Tested by Adhemerval Zanella on aarch64,
powerpc, sparc, x86 who reported that it fixes the performance issue
of bug 19924.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-09-01 08:21:37 +01:00
H.J. Lu
1493622f4f x86: Check the lower byte of EAX of CPUID leaf 2 [BZ #30643]
The old Intel software developer manual specified that the low byte of
EAX of CPUID leaf 2 returned 1 which indicated the number of rounds of
CPUDID leaf 2 was needed to retrieve the complete cache information. The
newer Intel manual has been changed to that it should always return 1
and be ignored.  If the lower byte isn't 1, CPUID leaf 2 can't be used.
In this case, we ignore CPUID leaf 2 and use CPUID leaf 4 instead.  If
CPUID leaf 4 doesn't contain the cache information, cache information
isn't available at all.  This addresses BZ #30643.
2023-08-29 12:57:41 -07:00
lijianglin
e1d3312015 add GB18030-2022 charmap and test the entire GB18030 charmap [BZ #30243]
support GB18030-2022 after add and change some transcoding relationship
of GB18030-2022.Details are as follows:
add 25 transcoding relationship
  UE81E 0x82359037
  UE826 0x82359038
  UE82B 0x82359039
  UE82C 0x82359130
  UE832 0x82359131
  UE843 0x82359132
  UE854 0x82359133
  UE864 0x82359134
  UE78D 0x84318236
  UE78F 0x84318237
  UE78E 0x84318238
  UE790 0x84318239
  UE791 0x84318330
  UE792 0x84318331
  UE793 0x84318332
  UE794 0x84318333
  UE795 0x84318334
  UE796 0x84318335
  UE816 0xfe51
  UE817 0xfe52
  UE818 0xfe53
  UE831 0xfe6c
  UE83B 0xfe76
  UE855 0xfe91
change 6 transcoding relationship
  U20087 0x95329031
  U20089 0x95329033
  U200CC 0x95329730
  U215D7 0x9536b937
  U2298F 0x9630ba35
  U241FE 0x9635b630
Test the entire GB18030 charmap, not only the Unicode BMP part.

Co-authored-by: yangyanchao <yangyanchao6@huawei.com>
Co-authored-by: liqingqing <liqingqing3@huawei.com>
Co-authored-by: Bruno Haible <bruno@clisp.org>
Reviewed-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Mike FABIAN <mfabian@redhat.com>
2023-08-29 19:02:30 +02:00
Joseph Myers
d3c34a2dd9 Use GMP 6.3.0, MPFR 4.2.1 in build-many-glibcs.py
This patch makes build-many-glibcs.py use the new GMP 6.3.0 and MPFR
4.2.1 releases.

Tested with build-many-glibcs.py (host-libraries, compilers and glibcs
builds).
2023-08-29 14:11:35 +00:00
Colin Leroy-Mira
dfe8c44588 localedata: Translit common emojis to smileys [BZ #30649]
Add common emojis to the translit-able characters (mostly
faces and hearts), and translit them to old-fashioned
smileys.

Signed-off-by: Colin Leroy-Mira <colin@colino.net>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-08-29 09:31:23 +02:00
Florian Weimer
c00b984fcd nscd: Skip unusable entries in first pass in prune_cache (bug 30800)
Previously, if an entry was marked unusable for any reason, but had
not timed out yet, the assert would trigger.

One way to get into such state is if a data change is detected during
re-validation of an entry.  This causes the entry to be marked as not
usable.  If exits nscd soon after that, then the clock jumps
backwards, and nscd restarted, the cache re-validation run after
startup triggers the removed assert.

The change is more complicated than just the removal of the assert
because entries marked as not usable should be garbage-collected in
the second pass.  To make this happen, it is necessary to update some
book-keeping data.

Reviewed-by: DJ Delorie <dj@redhat.com>
2023-08-29 08:28:38 +02:00
dengjianbo
693918b6dd LoongArch: Change loongarch to LoongArch in comments 2023-08-29 10:35:38 +08:00
dengjianbo
ea7698a616 LoongArch: Add ifunc support for memcmp{aligned, lsx, lasx}
According to glibc memcmp microbenchmark test results(Add generic
memcmp), this implementation have performance improvement
except the length is less than 3, details as below:

Name             Percent of time reduced
memcmp-lasx      16%-74%
memcmp-lsx       20%-50%
memcmp-aligned   5%-20%
2023-08-29 10:35:38 +08:00
dengjianbo
1b1e9b7c10 LoongArch: Add ifunc support for memset{aligned, unaligned, lsx, lasx}
According to glibc memset microbenchmark test results, for LSX and LASX
versions, A few cases with length less than 8 experience performace
degradation, overall, the LASX version could reduce the runtime about
15% - 75%, LSX version could reduce the runtime about 15%-50%.

The unaligned version uses unaligned memmory access to set data which
length is less than 64 and make address aligned with 8. For this part,
the performace is better than aligned version. Comparing with the generic
version, the performance is close when the length is larger than 128. When
the length is 8-128, the unaligned version could reduce the runtime about
30%-70%, the aligned version could reduce the runtime about 20%-50%.
2023-08-29 10:35:38 +08:00
dengjianbo
55e84dc6ed LoongArch: Add ifunc support for memrchr{lsx, lasx}
According to glibc memrchr microbenchmark, this implementation could reduce
the runtime as following:

Name            Percent of rutime reduced
memrchr-lasx    20%-83%
memrchr-lsx     20%-64%
2023-08-29 10:35:38 +08:00
dengjianbo
60bcb9acbf LoongArch: Add ifunc support for memchr{aligned, lsx, lasx}
According to glibc memchr microbenchmark, this implementation could reduce
the runtime as following:

Name               Percent of runtime reduced
memchr-lasx        37%-83%
memchr-lsx         30%-66%
memchr-aligned     0%-15%
2023-08-29 10:35:38 +08:00
dengjianbo
f8664fe215 LoongArch: Add ifunc support for rawmemchr{aligned, lsx, lasx}
According to glibc rawmemchr microbenchmark, A few cases tested with
char '\0' experience performance degradation due to the lasx and lsx
versions don't handle the '\0' separately. Overall, rawmemchr-lasx
implementation could reduce the runtime about 40%-80%, rawmemchr-lsx
implementation could reduce the runtime about 40%-66%, rawmemchr-aligned
implementation could reduce the runtime about 20%-40%.
2023-08-29 10:35:38 +08:00
Xi Ruoyao
3efa26749e LoongArch: Micro-optimize LD_PCREL
We are requiring Binutils >= 2.41, so explicit relocation syntax is
always supported by the assembler.  Use it to reduce one instruction.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2023-08-29 10:35:38 +08:00
Xi Ruoyao
aac842d0ed LoongArch: Remove support code for old linker in start.S
We are requiring Binutils >= 2.41, so la.pcrel always works here.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2023-08-29 10:35:38 +08:00
Xi Ruoyao
e757412c3e LoongArch: Simplify the autoconf check for static PIE
We are strictly requiring GAS >= 2.41 now, so we don't need to check
assembler capability anymore.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2023-08-29 10:35:38 +08:00
Kir Kolyshkin
42c960a4f1 Add F_SEAL_EXEC from Linux 6.3 to bits/fcntl-linux.h.
This patch adds the new F_SEAL_EXEC constant from Linux 6.3 (see Linux
commit 6fd7353829c ("mm/memfd: add F_SEAL_EXEC") to bits/fcntl-linux.h.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-08-28 14:51:39 -03:00
Joe Simmons-Talbott
46924663bd argp-parse: Get rid of alloca
Even though the alloca usage is relatively small and fixed size the code
can be written without using alloca.  Convert to local variables.

Checked on x86_64-linux-gnu.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-08-28 16:49:27 +00:00
Joe Simmons-Talbott
4d8b093933 gencat: Get rid of alloca.
Convert to scratch_buffers to avoid potential stack overflow.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-08-28 16:42:53 +00:00