Commit Graph

40790 Commits

Author SHA1 Message Date
Adhemerval Zanella
2a969b53c0 elf: Do not duplicate the GLIBC_TUNABLES string
The tunable parsing duplicates the tunable environment variable so it
null-terminates each one since it simplifies the later parsing. It has
the drawback of adding another point of failure (__minimal_malloc
failing), and the memory copy requires tuning the compiler to avoid mem
operations calls.

The parsing now tracks the tunable start and its size. The
dl-tunable-parse.h adds helper functions to help parsing, like a strcmp
that also checks for size and an iterator for suboptions that are
comma-separated (used on hwcap parsing by x86, powerpc, and s390x).

Since the environment variable is allocated on the stack by the kernel,
it is safe to keep the references to the suboptions for later parsing
of string tunables (as done by set_hwcaps by multiple architectures).

Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, and
aarch64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-12-19 13:25:45 -03:00
Joseph Myers
5275fc784c Do not build sparc32 libgcc functions into static libc
Since GCC commit f31a019d1161ec78846473da743aedf49cca8c27 "Emit
funcall external declarations only if actually used.", the glibc
testsuite has failed to build for 32-bit SPARC with GCC mainline.

/scratch/jmyers/glibc-bot/install/compilers/sparc64-linux-gnu/lib/gcc/sparc64-glibc-linux-gnu/14.0.0/../../../../sparc64-glibc-linux-gnu/bin/ld: /scratch/jmyers/glibc-bot/install/compilers/sparc64-linux-gnu/lib/gcc/sparc64-glibc-linux-gnu/14.0.0/32/libgcc.a(_divsi3.o): in function `.div':
/scratch/jmyers/glibc-bot/src/gcc/libgcc/config/sparc/lb1spc.S:138: multiple definition of `.div'; /scratch/jmyers/glibc-bot/build/glibcs/sparcv9-linux-gnu/glibc/libc.a(sdiv.o):/scratch/jmyers/glibc-bot/src/glibc/gnulib/../sysdeps/sparc/sparc32/sparcv9/sdiv.S:13: first defined here
/scratch/jmyers/glibc-bot/install/compilers/sparc64-linux-gnu/lib/gcc/sparc64-glibc-linux-gnu/14.0.0/../../../../sparc64-glibc-linux-gnu/bin/ld: disabling relaxation; it will not work with multiple definitions
collect2: error: ld returned 1 exit status
make[3]: *** [../Rules:298: /scratch/jmyers/glibc-bot/build/glibcs/sparcv9-linux-gnu/glibc/nptl/tst-cancel24-static] Error 1

https://sourceware.org/pipermail/libc-testresults/2023q4/012154.html

I'm not sure of the exact sequence of undefined references that cause
first the glibc object file defining .div and then the libgcc object
file defining both .div and .udiv to be pulled in (which must have
been perturbed by that GCC change in a way that introduced the build
failure), but I think the failure illustrates that it's inherently
fragile for glibc to define symbols in separate object files that
libgcc defines in the same object file - and indeed for glibc to
redefine libgcc symbols at all, since the division into object files
shouldn't really be part of the interface between libgcc and libc.

These symbols appear to be in libc only for compatibility, maybe one
of the cases where they were accidentally exported from shared libc in
glibc 2.0 before the introduction of symbol versioning and so programs
started expecting shared libc to provide them.  Thus, there is no need
to have them in static libc.  Add this set of libgcc functions to
shared-only-routines so they are no longer provided in static libc.
(No change is made regarding .mul - dotmul source file - since unlike
the other symbols in this grouping, it doesn't actually appear to be a
libgcc symbol, at least in current GCC.)

Tested with build-many-glibcs.py for sparcv9-linux-gnu with GCC
mainline.
2023-12-19 16:00:11 +00:00
H.J. Lu
4d8a01d2b0 x86/cet: Check CPU_FEATURE_ACTIVE in permissive mode
Verify that CPU_FEATURE_ACTIVE works properly in permissive mode.
2023-12-19 06:58:05 -08:00
H.J. Lu
28bd6f832d x86/cet: Check legacy shadow stack code in .init_array section
Verify that legacy shadow stack code in .init_array section in application
and shared library, which are marked as shadow stack enabled, will trigger
segfault.
2023-12-19 06:57:47 -08:00
H.J. Lu
9424ce80c2 x86/cet: Add tests for GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK
Verify that GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK turns off shadow
stack properly.
2023-12-19 06:57:39 -08:00
H.J. Lu
71c0cc3357 x86/cet: Check CPU_FEATURE_ACTIVE when CET is disabled
Verify that CPU_FEATURE_ACTIVE (SHSTK) works properly when CET is
disabled.
2023-12-19 06:57:32 -08:00
H.J. Lu
f418fe6f97 x86/cet: Check legacy shadow stack applications
Add tests to verify that legacy shadow stack applications run properly
when shadow stack is enabled in Linux kernel.
2023-12-19 06:57:27 -08:00
Mike FABIAN
1e70252508 localedata: id_ID: change first weekday to Sunday
Resolves: BZ # 30412

See: https://sourceware.org/bugzilla/show_bug.cgi?id=30412#c7

CLDR also has ID in the list of territories which have Sunday as the
first day of the week.
2023-12-19 11:23:19 +01:00
Stefan Liebler
664f565f9c s390: Set psw addr field in getcontext and friends.
So far if the ucontext structure was obtained by getcontext and co,
the return address was stored in general purpose register 14 as
it is defined as return address in the ABI.

In contrast, the context passed to a signal handler contains the address
in psw.addr field.

If somebody e.g. wants to dump the address of the context, the origin
needs to be known.

Now this patch adjusts getcontext and friends and stores the return address
also in psw.addr field.

Note that setcontext isn't adjusted and it is not supported to pass a
ucontext structure from signal-handler to setcontext.  We are not able to
restore all registers and branching to psw.addr without clobbering one
register.
2023-12-19 11:00:19 +01:00
Matthew Sterrett
e957308723 x86: Unifies 'strlen-evex' and 'strlen-evex512' implementations.
This commit uses a common implementation 'strlen-evex-base.S' for both
'strlen-evex' and 'strlen-evex512'

The motivation is to reduce the number of implementations to maintain.
This incidentally gives a small performance improvement.

All tests pass on x86.

Benchmarks were taken on SKX.
https://www.intel.com/content/www/us/en/products/sku/123613/intel-core-i97900x-xseries-processor-13-75m-cache-up-to-4-30-ghz/specifications.html

Geometric mean for strlen-evex512 over all benchmarks (N=10) was (new/old) 0.939
Geometric mean for wcslen-evex512 over all benchmarks (N=10) was (new/old) 0.965

Code Size Changes:
    strlen-evex512.S    :  +24 bytes
    wcslen-evex512.S    :  +54 bytes

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-12-18 12:38:01 -06:00
H.J. Lu
442983319b x86/cet: Don't assume that SHSTK implies IBT
Since shadow stack (SHSTK) is enabled in the Linux kernel without
enabling indirect branch tracking (IBT), don't assume that SHSTK
implies IBT.  Use "CPU_FEATURE_ACTIVE (IBT)" to check if IBT is active
and "CPU_FEATURE_ACTIVE (SHSTK)" to check if SHSTK is active.
2023-12-18 07:04:18 -08:00
RushingAlien
12ab77e893 id_ID: Update Time Locales
Hello! I am Indonesian, was born and raised in Indonesia and still do live in
Indonesia.

This patch brings a few changes to the time locales of id_ID, which
includes :
\- Defining am_pm and time_fmpt_ampm
\- Changing time_fmt and d_t_fmt to use the 24-hour format
\- Changing first_weekday to Monday
This is a squashed version of what is previously a 5 patch set

Here are reasons and details of the changes :

Change 1 part 1

id_ID: Define `am_pm` string

Current formatting does not define am_pm string, leading to AM and PM
not being specified in 12 H time format. This change defines the string
by changing it from an empty string to "AM";"PM".

output of `date +%r`:
before commit: 01:23
after commit: 01:23 PM

Change 1 part 2

id_ID: Define time_fmt_ampm, change from an empty string

Currently, time_fmpt_ampm is set to an empty string, causing some
programs to not be able to display time in the 12-hour format, for
example, glib: https://gitlab.gnome.org/GNOME/glib/-/issues/2967.
This commit changes it from an empty string to "%I:%M:%S %p"

Change 2 part 1

id_ID: Use 24-hour format for time_fmt

Indonesian standard and formal time format uses the 24-hour format inst-
ead of the 12-hour format. This commit aims to change the id_ID locale's
time_fmt to match that accordingly.

Change 2 part 2

id_ID: Use 24-hour format for d_t_fmt.

Indonesian standard and formal time format uses the 24-hour format inst-
ead of the 12-hour format. This commit aims to change the id_ID locale's
d_t_fmt to match that accordingly.

Change 3

id_ID: Change first_weekday to monday

Indonesian calendar starts of the week with Monday, let's comply

Message-ID: <20230821035530.9075-1-rushing27alien@gmail.com>
Resolves: BZ # 30412
Reviewed-by: Mike Fabian <mfabian@redhat.com>
2023-12-18 09:57:33 +01:00
Flavio Cruz
ad26c25137 Update code to handle the new ABI for sending inlined port rights.
For i686, this change is no op but for x86_64 it forces all inlined port
rights to be 8 bytes long.
2023-12-17 23:48:37 +01:00
H.J. Lu
0b850186fd x86/cet: Check user_shstk in /proc/cpuinfo
Linux kernel reports CPU shadow stack feature in /proc/cpuinfo as
user_shstk, instead of shstk.
2023-12-17 10:42:06 -08:00
H.J. Lu
49b4de21dc Add a test for setjmp/longjmp within user context
Verify that setjmp/longjmp works correctly within a user context.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-12-16 08:39:08 -08:00
H.J. Lu
08bc191fd1 Add a test for longjmp from user context
Verify that longjmp works correctly after setcontext is called to switch
to a user context.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-12-16 08:38:48 -08:00
Manjunath Matti
93a739d4a1 powerpc: Add space for HWCAP3/HWCAP4 in the TCB for future Power.
This patch reserves space for HWCAP3/HWCAP4 in the TCB of powerpc.
These hardware capabilities bits will be used by future Power
architectures.

Versioned symbol '__parse_hwcap_3_4_and_convert_at_platform' advertises
the availability of the new HWCAP3/HWCAP4 data in the TCB.

This is an ABI change for GLIBC 2.39.

Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2023-12-15 20:20:14 -06:00
Amrita H S
90bcc8721e powerpc: Fix performance issues of strcmp power10
Current implementation of strcmp for power10 has
performance regression for multiple small sizes
and alignment combination.

Most of these performance issues are fixed by this
patch. The compare loop is unrolled and page crosses
of unrolled loop is handled.

Thanks to Paul E. Murphy for helping in fixing the
performance issues.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Co-Authored-By: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2023-12-15 16:42:40 -06:00
Mike FABIAN
73d92c4b73 localedata: Convert el_GR and el_CY locales to UTF-8 2023-12-15 21:08:44 +01:00
Mike FABIAN
14a94f2e35 localedata: el_GR: Greece now uses the 24h format for time
Resolves: BZ # 23012
2023-12-15 21:08:44 +01:00
MAHESH BODAPATI
b9182c793c powerpc : Add optimized memchr for POWER10
Optimized memchr for POWER10 based on existing rawmemchr and strlen.
Reordering instructions and loop unrolling helped in getting better performance.
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2023-12-14 14:40:14 -06:00
Bruno Haible
d0aefec499 intl: Treat C.UTF-8 locale like C locale, part 2 (BZ# 16621)
The previous commit was incomplete: gettext() still returns a translation
if the file /usr/share/locale/C/LC_MESSAGES/<domain>.mo exists. This patch
prohibits the translation also in this case.

* gettext-runtime/intl/dcigettext.c (DCIGETTEXT): Treat C.<encoding> locale
like the C locale.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-12-12 10:08:07 +01:00
Ludwig Rydberg
fc039ce850 resolv: Fix a few unaligned accesses to fields in HEADER
After refactoring the alloca usage in 40c0add7d4 ("resolve: Remove
__res_context_query alloca usage") a few unaligned accesses to HEADER
fields surfaced. These unaligned accesses led to problems when running
the resolv test suite on sparc32-linux (leon) as many tests failed due to
SIGBUS crashes.

The issue(s) occured during T_QUERY_A_AND_AAAA queries as the second query
now can start on an unaligned address (previously it was explicitly aligned).

With this patch the unaligned accesses are now fixed by using the
UHEADER instead to ensure the fields are accessed with byte
loads/stores.

The patch has been verfied by running the resolv test suite on sparc32
and x86_64.

Signed-off-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-12-12 09:42:55 +01:00
H.J. Lu
4753e92868 x86: Check PT_GNU_PROPERTY early
The PT_GNU_PROPERTY segment is scanned before PT_NOTE.  For binaries
with the PT_GNU_PROPERTY segment, we can check it to avoid scan of
the PT_NOTE segment.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2023-12-11 11:05:46 -08:00
H.J. Lu
7e03e0de7e sysdeps/x86/Makefile: Split and sort tests
Put each test on a separate line and sort tests.
2023-12-11 08:49:57 -08:00
Florian Weimer
b3bee76c5f elf: Initialize GLRO(dl_lazy) before relocating libc in dynamic startup
GLRO(dl_lazy) is used to set the parameters for the early
_dl_relocate_object call, so the consider_profiling setting has to
be applied before the call.

Fixes commit 78ca44da01 ("elf: Relocate
libc.so early during startup and dlmopen (bug 31083)").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-12-08 14:33:03 +01:00
Siddhesh Poyarekar
60c57b8467 Move CVE information into advisories directory
One of the requirements to becoming a CVE Numbering Authority (CNA) is
to publish advisories.  Do this by maintaining a file for each CVE fixed
in the advisories directory in the source tree.  Links to the advisories
can then be shared as:

https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=advisories/GLIBC-SA-YYYY-NNNN

The file format at the moment is rudimentary and derives from the git
commit format, i.e. a subject line and a potentially multi-paragraph
description and then tags to describe some meta information.  This is a
loose format at the moment and could change as we evolve this.

Also add a script process-fixed-cves.sh that processes these advisories
and generates a list to add to NEWS at release time.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-12-07 12:31:23 -05:00
Amrita H S
3367d8e180 powerpc: Optimized strcmp for power10
This patch is based on __strcmp_power9 and __strlen_power10.

Improvements from __strcmp_power9:

    1. Uses new POWER10 instructions
       - This code uses lxvp to decrease contention on load
         by loading 32 bytes per instruction.

    2. Performance implication
       - This version has around 30% better performance on average.
       - Performance regression is seen for a specific combination
         of sizes and alignments. Some of them is observed without
         changes also, while rest may be induced by the patch.

Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
2023-12-07 11:10:40 -06:00
Adhemerval Zanella
546a1ba664 elf: Fix wrong break removal from 8ee878592c
Reported-by: Alexander Monakov <amonakov@ispras.ru>
2023-12-07 11:17:35 -03:00
Mike FABIAN
958478889c localedata: Convert day names in nn_NO locale to UTF-8 2023-12-07 08:28:25 +01:00
Mike FABIAN
ff25f355af localedata: Remove trailing whitespace in weekday names in nn_NO locale
Resolves: BZ # 25868
2023-12-07 08:28:25 +01:00
Adhemerval Zanella
4369019520 elf: Refactor process_envvars
It splits between process_envvars_secure and process_envvars_default,
with the former used to process arguments for __libc_enable_secure.
It does not have any semantic change, just simplify the code so there
is no need to handle __libc_enable_secure on each len switch.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-12-05 13:21:36 -03:00
Adhemerval Zanella
61d848b554 elf: Ignore LD_BIND_NOW and LD_BIND_NOT for setuid binaries
To avoid any environment variable to change setuid binaries
semantics.

Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-12-05 13:21:36 -03:00
Adhemerval Zanella
876a12e513 elf: Ignore loader debug env vars for setuid
Loader already ignores LD_DEBUG, LD_DEBUG_OUTPUT, and
LD_TRACE_LOADED_OBJECTS. Both LD_WARN and LD_VERBOSE are similar to
LD_DEBUG, in the sense they enable additional checks and debug
information, so it makes sense to disable them.

Also add both LD_VERBOSE and LD_WARN on filtered environment variables
for setuid binaries.

Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-12-05 13:21:36 -03:00
Siddhesh Poyarekar
f85722f9cd Adapt the security policy for the security page
Call the document a "Security Policy" to disambiguate it from the
security *process* documented in the security page.  Also, point to the
security page for bug reporting and CVE assignment.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-12-05 09:15:10 -05:00
Andreas Schwab
3f79842788 aarch64: correct CFI in rawmemchr (bug 31113)
The .cfi_return_column directive changes the return column for the whole
FDE range.  But the actual intent is to tell the unwinder that the value
in x30 (lr) now resides in x15 after the move, and that is expressed by
the .cfi_register directive.
2023-12-05 12:49:37 +01:00
Joe Ramsay
63d0a35d5f math: Add new exp10 implementation
New implementation is based on the existing exp/exp2, with different
reduction constants and polynomial. Worst-case error in round-to-
nearest is 0.513 ULP.

The exp/exp2 shared table is reused for exp10 - .rodata size of
e_exp_data increases by 64 bytes.

As for exp/exp2, targets with single-instruction rounding/conversion
intrinsics can use them by toggling TOINT_INTRINSICS=1 and adding the
necessary code to their math_private.h.

Improvements on Neoverse V1 compared to current GLIBC master:
exp10 thruput: 3.3x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8]
exp10 latency: 1.8x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8]

Tested on:
    aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and
    x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction)

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-12-04 15:52:11 +00:00
Szabolcs Nagy
8e755f5bc8 aarch64: fix tested ifunc variants
Don't test a64fx string functions when BTI is enabled since they are
not BTI compatible.
2023-12-04 14:41:26 +00:00
Florian Weimer
b9390ba936 stdlib: Fix array bounds protection in insertion sort phase of qsort
The previous check did not do anything because tmp_ptr already
points before run_ptr due to the way it is initialized.

Fixes commit e4d8117b82
("stdlib: Avoid another self-comparison in qsort").

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2023-12-04 06:35:56 +01:00
Samuel Thibault
d776a59723 Revert "Update code to handle the new ABI for sending inlined port rights."
This reverts commit 7e23b3c2c0.
2023-12-03 02:06:29 +01:00
Samuel Thibault
3e85650423 Revert "hurd: Fix build"
This reverts commit 7096914dd8.
2023-12-03 02:06:19 +01:00
Samuel Thibault
7096914dd8 hurd: Fix build
7e23b3c2c0 ("Update code to handle the new ABI for sending inlined
port rights.") was missing a cast.

Fixes 7e23b3c2c0
2023-12-03 01:52:58 +01:00
Flavio Cruz
7e23b3c2c0 Update code to handle the new ABI for sending inlined port rights.
For i686, this change is no op but for x86_64 it forces all inlined port
rights to be 8 bytes long.
Message-ID: <20231124213041.952886-2-flaviocruz@gmail.com>
2023-12-03 01:03:41 +01:00
Samuel Thibault
2fb85a3787 hurd: [!__USE_MISC] Do not #undef BSD macros in ioctls
When e.g. including termios.h first and then sys/ioctl.h, without e.g.
_BSD_SOURCE, the latter would #undef e.g. ECHO, without defining it.
2023-12-02 21:26:50 +01:00
Adhemerval Zanella
4e16d89866 linux: Make fdopendir fail with O_PATH (BZ 30373)
It is not strictly required by the POSIX, since O_PATH is a Linux
extension, but it is QoI to fail early instead of at readdir.  Also
the check is free, since fdopendir already checks if the file
descriptor is opened for read.

Checked on x86_64-linux-gnu.
2023-11-30 13:37:04 -03:00
Stefan Liebler
807849965b Avoid padding in _init and _fini. [BZ #31042]
The linker just concatenates the .init and .fini sections which
results in the complete _init and _fini functions. If needed the
linker adds padding bytes due to an alignment. GNU ld is adding
NOPs, which is fine.  But e.g. mold is adding traps which results
in broken _init and _fini functions.

Thus this patch removes the alignment in .init and .fini sections
in crtn.S files.

We keep the 4 byte function alignment in crti.S files. As the
assembler now also outputs the start of _init and _fini functions
as multiples of 4 byte, it perhaps has to fill it. Although GNU as
is using NOPs here, to be sure, we just keep the alignment with
0x07 (=NOPs) at the end of crti.S.

In order to avoid an obvious NOP slide in _fini, this patch also
uses an lg instead of lgr instruction. Then the emitted instructions
needs a multiple of 4 bytes.
2023-11-30 13:31:23 +01:00
Joe Ramsay
7b12776584 aarch64: Improve special-case handling in AdvSIMD double-precision libmvec routines
Avoids emitting many saves/restores of vector registers, reduces the
amount of code generated around the scalar fallback.
2023-11-29 15:03:36 +00:00
Adhemerval Zanella
bc6d79f4ae malloc: Improve MAP_HUGETLB with glibc.malloc.hugetlb=2
Even for explicit large page support, allocation might use mmap without
the hugepage bit set if the requested size is smaller than
mmap_threshold.  For this case where mmap is issued, MAP_HUGETLB is set
iff the allocation size is larger than the used large page.

To force such allocations to use large pages, also tune the mmap_threhold
(if it is not explicit set by a tunable).  This forces allocation to
follow the sbrk path, which will fall back to mmap (which will try large
pages before galling back to default mmap).

Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
2023-11-29 09:30:04 -03:00
Adhemerval Zanella
a4c3f5f46e elf: Add a way to check if tunable is set (BZ 27069)
The patch adds two new macros, TUNABLE_GET_DEFAULT and TUNABLE_IS_INITIALIZED,
here the former get the default value with a signature similar to
TUNABLE_GET, while the later returns whether the tunable was set by
the environment variable.

Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
2023-11-29 09:30:00 -03:00
Noah Goldstein
9469261cf1 x86: Only align destination to 1x VEC_SIZE in memset 4x loop
Current code aligns to 2x VEC_SIZE. Aligning to 2x has no affect on
performance other than potentially resulting in an additional
iteration of the loop.
1x maintains aligned stores (the only reason to align in this case)
and doesn't incur any unnecessary loop iterations.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2023-11-28 12:06:19 -06:00