Commit Graph

34555 Commits

Author SHA1 Message Date
H.J. Lu
a486152569 string: Add a testcase for wcsncmp with SIZE_MAX [BZ #28755]
Verify that wcsncmp (L("abc"), L("abd"), SIZE_MAX) == 0.  The new test
fails without

commit ddf0992cf5
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Sun Jan 9 16:02:21 2022 -0600

    x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755]

and

commit 7e08db3359
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Sun Jan 9 16:02:28 2022 -0600

    x86: Fix __wcsncmp_evex in strcmp-evex.S [BZ# 28755]

This is for BZ #28755.

Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>

(cherry picked from commit aa5a720056)
2022-02-17 11:43:34 -08:00
H.J. Lu
28689d6255 x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064]
commit 6f573a27b6
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Wed Jun 23 01:19:34 2021 -0400

    x86-64: Add wcslen optimize for sse4.1

added wcsnlen-sse4.1 to the wcslen ifunc implementation list.  Since the
random value in the the RSI register is larger than the wide-character
string length in the existing wcslen test, it didn't trigger the wcslen
test failure.  Add a test to force 0 into the RSI register before calling
wcslen.

(cherry picked from commit a6e7c3745d)
2022-02-01 12:51:16 -08:00
Noah Goldstein
352cb39fa0 x86: Remove wcsnlen-sse4_1 from wcslen ifunc-impl-list [BZ #28064]
The following commit

commit 6f573a27b6
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Wed Jun 23 01:19:34 2021 -0400

    x86-64: Add wcslen optimize for sse4.1

Added wcsnlen-sse4.1 to the wcslen ifunc implementation list and did
not add wcslen-sse4.1 to wcslen ifunc implementation list. This commit
fixes that by removing wcsnlen-sse4.1 from the wcslen ifunc
implementation list and adding wcslen-sse4.1 to the ifunc
implementation list.

Testing:
test-wcslen.c, test-rsi-wcslen.c, and test-rsi-strlen.c are passing as
well as all other tests in wcsmbs and string.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 0679442def)
2022-02-01 12:51:09 -08:00
H.J. Lu
5d94c86a47 x86: Black list more Intel CPUs for TSX [BZ #27398]
Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in:

https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html

This fixes BZ #27398.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 1e000d3d33)
2022-02-01 07:38:16 -08:00
H.J. Lu
7aac95739a x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033]
From

https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html

* Intel TSX will be disabled by default.
* The processor will force abort all Restricted Transactional Memory (RTM)
  transactions by default.
* A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated,
  which is set to indicate to updated software that the loaded microcode is
  forcing RTM abort.
* On processors that enumerate support for RTM, the CPUID enumeration bits
  for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to
  be set by default after microcode update.
* Workloads that were benefited from Intel TSX might experience a change
  in performance.
* System software may use a new bit in Model-Specific Register (MSR) 0x10F
  TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock
  Elision (HLE) and RTM bits to indicate to software that Intel TSX is
  disabled.

1. Add RTM_ALWAYS_ABORT to CPUID features.
2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set.  This skips the
string/tst-memchr-rtm etc. testcases on the affected processors, which
always fail after a microcde update.
3. Check RTM feature, instead of usability, against /proc/cpuinfo.

This fixes BZ #28033.

(cherry picked from commit ea8e465a6b)
2022-02-01 07:38:09 -08:00
H.J. Lu
a0ed5893fc NEWS: Add a bug fix entry for BZ #27974 2022-01-27 16:27:20 -08:00
Noah Goldstein
ac1d6f25d6 String: Add overflow tests for strnlen, memchr, and strncat [BZ #27974]
This commit adds tests for a bug in the wide char variant of the
functions where the implementation may assume that maxlen for wcsnlen
or n for wmemchr/strncat will not overflow when multiplied by
sizeof(wchar_t).

These tests show the following implementations failing on x86_64:

wcsnlen-sse4_1
wcsnlen-avx2

wmemchr-sse2
wmemchr-avx2

strncat would fail as well if it where on a system that prefered
either of the wcsnlen implementations that failed as it relies on
wcsnlen.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit da5a6fba0f)
2022-01-27 16:26:26 -08:00
Noah Goldstein
f9fe740ec4 x86: Optimize strlen-evex.S
No bug. This commit optimizes strlen-evex.S. The
optimizations are mostly small things but they add up to roughly
10-30% performance improvement for strlen. The results for strnlen are
bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and
test-wcsnlen are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit 4ba6558684)
2022-01-27 16:26:21 -08:00
Noah Goldstein
dd92fa3029 x86: Fix overflow bug in wcsnlen-sse4_1 and wcsnlen-avx2 [BZ #27974]
This commit fixes the bug mentioned in the previous commit.

The previous implementations of wmemchr in these files relied
on maxlen * sizeof(wchar_t) which was not guranteed by the standard.

The new overflow tests added in the previous commit now
pass (As well as all the other tests).

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit a775a7a3eb)
2022-01-27 16:26:15 -08:00
Noah Goldstein
2599967449 x86-64: Add wcslen optimize for sse4.1
No bug. This comment adds the ifunc / build infrastructure
necessary for wcslen to prefer the sse4.1 implementation
in strlen-vec.S. test-wcslen.c is passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 6f573a27b6)
2022-01-27 16:26:10 -08:00
H.J. Lu
ccf4e0edde x86-64: Move strlen.S to multiarch/strlen-vec.S
Since strlen.S contains SSE2 version of strlen/strnlen and SSE4.1
version of wcslen/wcsnlen, move strlen.S to multiarch/strlen-vec.S
and include multiarch/strlen-vec.S from SSE2 and SSE4.1 variants.
This also removes the unused symbols, __GI___strlen_sse2 and
__GI___wcsnlen_sse4_1.

(cherry picked from commit a0db678071)
2022-01-27 16:26:04 -08:00
Alice Xu
3cee1ddad2 x86-64: Fix an unknown vector operation in memchr-evex.S
An unknown vector operation occurred in commit 2a76821c30. Fixed it
by using "ymm{k1}{z}" but not "ymm {k1} {z}".

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 6ea916adfa)
2022-01-27 16:25:58 -08:00
Noah Goldstein
cba2fbe17f x86: Optimize memchr-evex.S
No bug. This commit optimizes memchr-evex.S. The optimizations include
replacing some branches with cmovcc, avoiding some branches entirely
in the less_4x_vec case, making the page cross logic less strict,
saving some ALU in the alignment process, and most importantly
increasing ILP in the 4x loop. test-memchr, test-rawmemchr, and
test-wmemchr are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 2a76821c30)
2022-01-27 16:25:53 -08:00
Noah Goldstein
0a3b2efccc x86: Optimize strlen-avx2.S
No bug. This commit optimizes strlen-avx2.S. The optimizations are
mostly small things but they add up to roughly 10-30% performance
improvement for strlen. The results for strnlen are bit more
ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen
are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit aaa23c3507)
2022-01-27 16:25:48 -08:00
Noah Goldstein
c51e9501c2 x86: Fix overflow bug with wmemchr-sse2 and wmemchr-avx2 [BZ #27974]
This commit fixes the bug mentioned in the previous commit.

The previous implementations of wmemchr in these files relied
on n * sizeof(wchar_t) which was not guranteed by the standard.

The new overflow tests added in the previous commit now
pass (As well as all the other tests).

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 645a158978)
2022-01-27 16:25:42 -08:00
Noah Goldstein
5c38877df5 x86: Optimize memchr-avx2.S
No bug. This commit optimizes memchr-avx2.S. The optimizations include
replacing some branches with cmovcc, avoiding some branches entirely
in the less_4x_vec case, making the page cross logic less strict,
asaving a few instructions the in loop return loop. test-memchr,
test-rawmemchr, and test-wmemchr are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit acfd088a19)
2022-01-27 16:25:37 -08:00
H.J. Lu
8fa2afcec3 test-strnlen.c: Check that strnlen won't go beyond the maximum length
Place strings ending at page boundary without the null byte.  If an
implementation goes beyond EXP_LEN, it will trigger the segfault.

(cherry picked from commit cb882b21b6)
2022-01-27 16:25:31 -08:00
H.J. Lu
0717959eb3 test-strnlen.c: Initialize wchar_t string with wmemset [BZ #27655]
Use wmemset to initialize wchar_t string.

(cherry picked from commit 86859b7e58)
2022-01-27 16:25:26 -08:00
H.J. Lu
3fc179e2b2 x86-64: Require BMI2 for __strlen_evex and __strnlen_evex
Since __strlen_evex and __strnlen_evex added by

commit 1fd8c163a8
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 5 06:24:52 2021 -0800

    x86-64: Add ifunc-avx2.h functions with 256-bit EVEX

use sarx:

c4 e2 6a f7 c0       	sarx   %edx,%eax,%eax

require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c.
ifunc-avx2.h already requires BMI2 for EVEX implementation.

(cherry picked from commit 55bf411b45)
2022-01-27 16:25:20 -08:00
H.J. Lu
c3535cb6cd NEWS: Add a bug fix entry for BZ #27457 2022-01-27 12:25:41 -08:00
Sunil K Pandey
aff5eec9ce x86-64: Fix ifdef indentation in strlen-evex.S
Fix some indentations of ifdef in file strlen-evex.S which are off by 1
and confusing to read.

(cherry picked from commit 595c22ecd8)
2022-01-27 12:07:11 -08:00
H.J. Lu
7056607e6a x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
Update ifunc-memmove.h to select the function optimized with AVX512
instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.

(cherry picked from commit e4fda46310)
2022-01-27 12:07:11 -08:00
H.J. Lu
a54fae8df4 x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort
with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.

(cherry picked from commit 4e2d8f3527)
2022-01-27 12:07:11 -08:00
H.J. Lu
674137c9b0 x86: Add string/memory function tests in RTM region
At function exit, AVX optimized string/memory functions have VZEROUPPER
which triggers RTM abort.   When such functions are called inside a
transactionally executing RTM region, RTM abort causes severe performance
degradation.  Add tests to verify that string/memory functions won't
cause RTM abort in RTM region.

(cherry picked from commit 4bd660be40)
2022-01-27 12:07:11 -08:00
H.J. Lu
5be8a84721 x86-64: Add AVX optimized string/memory functions for RTM
Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with

	xtest
	jz	1f
	vzeroall
	ret
1:
	vzeroupper
	ret

at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.

(cherry picked from commit 7ebba91361)
2022-01-27 12:07:11 -08:00
H.J. Lu
757d90ff37 x86-64: Add memcmp family functions with 256-bit EVEX
Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function
exit.

(cherry picked from commit 91264fe357)
2022-01-27 12:07:11 -08:00
H.J. Lu
9650d04fb6 x86-64: Add memset family functions with 256-bit EVEX
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM
abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.

(cherry picked from commit 1b968b6b9b)
2022-01-27 12:07:11 -08:00
H.J. Lu
2f4a98ab17 x86-64: Add memmove family functions with 256-bit EVEX
Update ifunc-memmove.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.

(cherry picked from commit 63ad43566f)
2022-01-27 12:07:11 -08:00
H.J. Lu
c90c70b8f9 x86-64: Add strcpy family functions with 256-bit EVEX
Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.

(cherry picked from commit 525bc2a32c)
2022-01-27 12:07:11 -08:00
H.J. Lu
c671b231cf x86-64: Add ifunc-avx2.h functions with 256-bit EVEX
Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to
select the function optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW
and BMI2 since VZEROUPPER isn't needed at function exit.

For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP
is set.

(cherry picked from commit 1fd8c163a8)
2022-01-27 12:07:11 -08:00
H.J. Lu
be07b3e059 x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP
1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered
by VZEROUPPER inside a transactionally executing RTM region.
2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp.  Add
Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.

(cherry picked from commit 1da50d4bda)
2022-01-27 12:07:11 -08:00
H.J. Lu
2f3fb944b3 NEWS: Add a bug fix entry for BZ #28755 2022-01-27 07:28:12 -08:00
Noah Goldstein
be6fb78a1f x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755]
Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to
__wcscmp_avx2. For x86_64 this covers the entire address range so any
length larger could not possibly be used to bound `s1` or `s2`.

test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit ddf0992cf5)
2022-01-27 07:28:12 -08:00
H.J. Lu
646ec4ebe5 NEWS: Add a bug fix entry for BZ #24794 2022-01-27 07:28:12 -08:00
Tulio Magno Quites Machado Filho
edceb7f520 test-container: Install with $(all-subdirs) [BZ #24794]
Whenever a sub-make is created, it inherits the variable subdirs from its
parent.  This is also true when make check is called with a restricted
list of subdirs.  In this scenario, make install is executed "partially"
and testroot.pristine ends up with an incomplete installation.

	[BZ #24794]
	* Makefile (testroot.pristine/install.stamp): Pass
	subdirs='$(all-subdirs)' to make install.

Reviewed-by: DJ Delorie <dj@redhat.com>
(cherry picked from commit 35e038c1d2)
2022-01-27 07:28:12 -08:00
Siddhesh Poyarekar
2c9083f93d Fix SXID_ERASE behavior in setuid programs (BZ #27471)
When parse_tunables tries to erase a tunable marked as SXID_ERASE for
setuid programs, it ends up setting the envvar string iterator
incorrectly, because of which it may parse the next tunable
incorrectly.  Given that currently the implementation allows malformed
and unrecognized tunables pass through, it may even allow SXID_ERASE
tunables to go through.

This change revamps the SXID_ERASE implementation so that:

- Only valid tunables are written back to the tunestr string, because
  of which children of SXID programs will only inherit a clean list of
  identified tunables that are not SXID_ERASE.

- Unrecognized tunables get scrubbed off from the environment and
  subsequently from the child environment.

- This has the side-effect that a tunable that is not identified by
  the setxid binary, will not be passed on to a non-setxid child even
  if the child could have identified that tunable.  This may break
  applications that expect this behaviour but expecting such tunables
  to cross the SXID boundary is wrong.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

(cherry picked from commit 2ed18c5b53)
2021-04-14 11:08:02 +05:30
Siddhesh Poyarekar
dcfcb1b208 Enhance setuid-tunables test
Instead of passing GLIBC_TUNABLES via the environment, pass the
environment variable from parent to child.  This allows us to test
multiple variables to ensure better coverage.

The test list currently only includes the case that's already being
tested.  More tests will be added later.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

(cherry picked from commit 061fe3f8ad)
2021-04-14 11:07:47 +05:30
Siddhesh Poyarekar
1927d02d19 tst-env-setuid: Use support_capture_subprogram_self_sgid
Use the support_capture_subprogram_self_sgid to spawn an sgid child.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

(cherry picked from commit ca33528106)
2021-04-14 11:07:47 +05:30
Siddhesh Poyarekar
2c23a4d7e0 support: Add capability to fork an sgid child
Add a new function support_capture_subprogram_self_sgid that spawns an
sgid child of the running program with its own image and returns the
exit code of the child process.  This functionality is used by at
least three tests in the testsuite at the moment, so it makes sense to
consolidate.

There is also a new function support_subprogram_wait which should
provide simple system() like functionality that does not set up file
actions.  This is useful in cases where only the return code of the
spawned subprocess is interesting.

This patch also ports tst-secure-getenv to this new function.  A
subsequent patch will port other tests.  This also brings an important
change to tst-secure-getenv behaviour.  Now instead of succeeding, the
test fails as UNSUPPORTED if it is unable to spawn a setgid child,
which is how it should have been in the first place.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

(cherry picked from commit 716a3bdc41)
2021-04-14 11:07:45 +05:30
Siddhesh Poyarekar
23d0e97fe5 support: Typo and formatting fixes
- Add a newline to the end of error messages in transfer().
- Fixed the name of support_subprocess_init().

(cherry picked from commit 95c68080a3)
2021-04-14 11:06:05 +05:30
Siddhesh Poyarekar
ad25ea0c47 support: Pass environ to child process
Pass environ to posix_spawn so that the child process can inherit
environment of the test.

(cherry picked from commit e958490f8c)
2021-04-14 11:06:05 +05:30
DJ Delorie
e9c0d7d7ff nscd: Fix double free in netgroupcache [BZ #27462]
In commit 745664bd79 a use-after-free
was fixed, but this led to an occasional double-free.  This patch
tracks the "live" allocation better.

Tested manually by a third party.

Related: RHBZ 1927877

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit dca565886b)
2021-03-08 15:42:15 +05:30
Florian Weimer
a9acd88a5d gconv: Fix assertion failure in ISO-2022-JP-3 module (bug 27256)
The conversion loop to the internal encoding does not follow
the interface contract that __GCONV_FULL_OUTPUT is only returned
after the internal wchar_t buffer has been filled completely.  This
is enforced by the first of the two asserts in iconv/skeleton.c:

	      /* We must run out of output buffer space in this
		 rerun.  */
	      assert (outbuf == outerr);
	      assert (nstatus == __GCONV_FULL_OUTPUT);

This commit solves this issue by queuing a second wide character
which cannot be written immediately in the state variable, like
other converters already do (e.g., BIG5-HKSCS or TSCII).

Reported-by: Tavis Ormandy <taviso@gmail.com>
(cherry picked from commit 7d88c6142c)
2021-01-27 15:22:18 +01:00
H.J. Lu
1864775abc x86: Check IFUNC definition in unrelocated executable [BZ #20019]
Calling an IFUNC function defined in unrelocated executable also leads to
segfault.  Issue a fatal error message when calling IFUNC function defined
in the unrelocated executable from a shared library.

On x86, ifuncmain6pie failed with:

[hjl@gnu-cfl-2 build-i686-linux]$ ./elf/ifuncmain6pie --direct
./elf/ifuncmain6pie: IFUNC symbol 'foo' referenced in '/export/build/gnu/tools-build/glibc-32bit/build-i686-linux/elf/ifuncmod6.so' is defined in the executable and creates an unsatisfiable circular dependency.
[hjl@gnu-cfl-2 build-i686-linux]$ readelf -rW elf/ifuncmod6.so | grep foo
00003ff4  00000706 R_386_GLOB_DAT         0000400c   foo_ptr
00003ff8  00000406 R_386_GLOB_DAT         00000000   foo
0000400c  00000401 R_386_32               00000000   foo
[hjl@gnu-cfl-2 build-i686-linux]$

Remove non-JUMP_SLOT relocations against foo in ifuncmod6.so, which
trigger the circular IFUNC dependency, and build ifuncmain6pie with
-Wl,-z,lazy.

(cherry picked from commits 6ea5b57afa
 and 7137d682eb)
2021-01-13 14:30:42 -08:00
H.J. Lu
420ade1f64 x86: Set header.feature_1 in TCB for always-on CET [BZ #27177]
Update dl_cet_check() to set header.feature_1 in TCB when both IBT and
SHSTK are always on.

(cherry picked from commit 2ef23b5205)
2021-01-13 09:24:35 -08:00
H.J. Lu
8493ba72b1 x86-64: Avoid rep movsb with short distance [BZ #27130]
When copying with "rep movsb", if the distance between source and
destination is N*4GB + [1..63] with N >= 0, performance may be very
slow.  This patch updates memmove-vec-unaligned-erms.S for AVX and
AVX512 versions with the distance in RCX:

	cmpl	$63, %ecx
	// Don't use "rep movsb" if ECX <= 63
	jbe	L(Don't use rep movsb")
	Use "rep movsb"

Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random
and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its
performance impact is within noise range as "rep movsb" is only used for
data size >= 4KB.

(cherry picked from commit 3ec5d83d2a)
2021-01-12 07:01:34 -08:00
Szabolcs Nagy
a3c78954ee aarch64: Fix DT_AARCH64_VARIANT_PCS handling [BZ #26798]
The variant PCS support was ineffective because in the common case
linkmap->l_mach.plt == 0 but then the symbol table flags were ignored
and normal lazy binding was used instead of resolving the relocs early.
(This was a misunderstanding about how GOT[1] is setup by the linker.)

In practice this mainly affects SVE calls when the vector length is
more than 128 bits, then the top bits of the argument registers get
clobbered during lazy binding.

Fixes bug 26798.

(cherry picked from commit 558251bd87)
2020-11-04 12:26:02 +00:00
Wilco Dijkstra
28ff0f650c AArch64: Use __memcpy_simd on Neoverse N2/V1
Add CPU detection of Neoverse N2 and Neoverse V1, and select __memcpy_simd as
the memcpy/memmove ifunc.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit e11ed9d2b4)
2020-10-14 14:31:24 +01:00
Wilco Dijkstra
be3eaffd5a [AArch64] Improve integer memcpy
Further optimize integer memcpy.  Small cases now include copies up
to 32 bytes.  64-128 byte copies are split into two cases to improve
performance of 64-96 byte copies.  Comments have been rewritten.

(cherry picked from commit 7000651327)
2020-10-12 18:29:42 +01:00
Krzysztof Koch
c969e84e0c aarch64: Increase small and medium cases for __memcpy_generic
Increase the upper bound on medium cases from 96 to 128 bytes.
Now, up to 128 bytes are copied unrolled.

Increase the upper bound on small cases from 16 to 32 bytes so that
copies of 17-32 bytes are not impacted by the larger medium case.

Benchmarking:
The attached figures show relative timing difference with respect
to 'memcpy_generic', which is the existing implementation.
'memcpy_med_128' denotes the the version of memcpy_generic with
only the medium case enlarged. The 'memcpy_med_128_small_32' numbers
are for the version of memcpy_generic submitted in this patch, which
has both medium and small cases enlarged. The figures were generated
using the script from:
https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html

Depending on the platform, the performance improvement in the
bench-memcpy-random.c benchmark ranges from 6% to 20% between
the original and final version of memcpy.S

Tested against GLIBC testsuite and randomized tests.

(cherry picked from commit b9f145df85)
2020-10-12 18:29:42 +01:00