Commit Graph

37593 Commits

Author SHA1 Message Date
Florian Weimer
832f50be6c elf: Call free from base namespace on error in dl-libc.c [BZ #27646]
In dlerror_run, free corresponds to the local malloc in the
namespace, but GLRO (dl_catch_error) uses the malloc from the base
namespace.  elf/tst-dlmopen-gethostbyname triggers this mismatch,
but it does not crash, presumably because of a fastbin deallocation.

Fixes commit c2059edce2 ("elf: Use
_dl_catch_error from base namespace in dl-libc.c [BZ #27646]") and
commit b2964eb1d9 ("dlfcn: Failures
after dlmopen should not terminate process [BZ #24772]").
2021-07-06 14:30:33 +02:00
Khem Raj
c8935581de linux: Check for null value msghdr struct before use
This avoids crashes in libc when cmsg is null and refrencing msg
structure when it is null

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-05 15:11:13 -03:00
Siddhesh Poyarekar
91fb0f17a5 hooks.c: Remove incorrect comment
The comment about different values of glibc.malloc.check is no longer
valid.
2021-07-04 18:15:18 +05:30
Tulio Magno Quites Machado Filho
e766ce3088 mtrace: Add attribute nocommon to mallwatch
Avoid compilation errors GCC versions that do not default to
-fno-common, e.g. GCC <= 9.

Fixes commit 00d28960c5 ("mtrace:
Deprecate mallwatch and tr_break").

Suggested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Suggested-by: Florian Weimer <fweimer@redhat.com>
2021-07-02 18:14:01 -03:00
Siddhesh Poyarekar
c501803035 Move glibc.malloc.check implementation into its own file
Separate the malloc check implementation from the malloc hooks.  They
still use the hooks but are now maintained in a separate file.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-03 00:48:12 +05:30
Siddhesh Poyarekar
00d28960c5 mtrace: Deprecate mallwatch and tr_break
The variable and function pair appear to provide a way for users to
set conditional breakpoints in mtrace when a specific address is
returned by the allocator.  This can be achieved by using conditional
breakpoints in gdb so it is redundant.  There is no documentation of
this interface in the manual either, so it appears to have been a hack
that got added to debug malloc.  Deprecate these symbols and do not
call tr_break anymore.

Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-03 00:47:34 +05:30
Siddhesh Poyarekar
7df5c7bcce Drop source dependencies on hooks.c and arena.c
Dependencies on hooks.c and arena.c get auto-computed when generating
malloc.o{,s}.d so there is no need to add them manually.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
2021-07-03 00:46:46 +05:30
JeffyChen
dfec225ee1 malloc: Initiate tcache shutdown even without allocations [BZ #28028]
After commit 1e26d35193 ("malloc: Fix
tcache leak after thread destruction [BZ #22111]"),
tcache_shutting_down is still not early enough.  When we detach a
thread with no tcache allocated, tcache_shutting_down would still be
false.

Reviewed-by: DJ Delorie <dj@redhat.com>
2021-07-02 17:39:24 +02:00
Siddhesh Poyarekar
784fff6ea5 Add mcheck tests to malloc
Like malloc-check, add generic rules to run all tests in malloc by
linking with libmcheck.a so as to provide coverage for mcheck().
Currently the following 12 tests fail:

FAIL: malloc/tst-malloc-backtrace-mcheck
FAIL: malloc/tst-malloc-fork-deadlock-mcheck
FAIL: malloc/tst-malloc-stats-cancellation-mcheck
FAIL: malloc/tst-malloc-tcache-leak-mcheck
FAIL: malloc/tst-malloc-thread-exit-mcheck
FAIL: malloc/tst-malloc-thread-fail-mcheck
FAIL: malloc/tst-malloc-usable-static-mcheck
FAIL: malloc/tst-malloc-usable-static-tunables-mcheck
FAIL: malloc/tst-malloc-usable-tunables-mcheck
FAIL: malloc/tst-malloc_info-mcheck
FAIL: malloc/tst-memalign-mcheck
FAIL: malloc/tst-posix_memalign-mcheck

and they have been added to tests-exclude-mcheck for now to keep
status quo.  At least the last two can be attributed to bugs in
mcheck() but I haven't fixed them here since they should be fixed by
removing malloc hooks.  Others need to be triaged to check if they're
due to mcheck bugs or due to actual bugs.

Reviewed-by: DJ Delorie <dj@redhat.com>
2021-07-02 17:03:42 +05:30
Siddhesh Poyarekar
7f784fabcb iconvconfig: Use the public feof_unlocked
Build of iconvconfig failed with CFLAGS=-Os since __feof_unlocked is
not a public symbol.  Replace with feof_unlocked (defined to
__feof_unlocked when IS_IN (libc)) to fix this.

Reported-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-07-02 16:53:25 +05:30
Florian Weimer
dbb949f53d resolv: Move libanl into libc (if libpthread is in libc)
The symbols gai_cancel, gai_error, gai_suspend, getaddrinfo_a,
__gai_suspend_time64 were moved using scripts/move-symbol-to-libc.py.

For Hurd (which remains !PTHREAD_IN_LIBC), a few #define redirects
had to be added because several pthread functions are not available
under __.  (Linux uses __ prefixes for most hidden aliases, and has
to in some cases to avoid linknamespace issues.)
2021-07-02 11:45:00 +02:00
Pedro Franco de Carvalho
813c6ec808 powerpc: optimize strcpy/stpcpy for POWER9/10
This patch modifies the current POWER9 implementation of strcpy and
stpcpy to optimize it for POWER9/10.

Since no new POWER10 instructions are used, the original POWER9 strcpy is
modified instead of creating a new implementation for POWER10.  This
implementation is based on both the original POWER9 implementation of
strcpy and the preamble of the new POWER10 implementation of strlen.

The changes also affect stpcpy, which uses the same implementation with
some additional code before returning.

On POWER9, averaging improvements across the benchmark
inputs (length/source alignment/destination alignment), for an
experiment that ran the benchmark five times, bench-strcpy showed an
improvement of 5.23%, and bench-stpcpy showed an improvement of 6.59%.

On POWER10, bench-strcpy showed 13.16%, and bench-stpcpy showed 13.59%.

The changes are:

1. Removed the null string optimization.

   Although this results in a few extra cycles for the null string, in
   combination with the second change, this resulted in improvements for
   for other cases.

2. Adapted the preamble from strlen for POWER10.

   This is the part of the function that handles up to the first 16 bytes
   of the string.

3. Increased number of unrolled iterations in the main loop to 6.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Tested-by: Matheus Castanho <msc@linux.ibm.com>
2021-07-01 17:58:53 -03:00
H.J. Lu
8241409e29 soft-fp: Add __extendhfxf2 and __truncxfhf2
1. Add __extendhfxf2 to return an IEEE half converted to IEEE extended.
2. Add __truncxfhf2 to truncate IEEE extended into IEEE half.

These are needed by x86 _Float16:

https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

support in GCC.
2021-07-01 11:02:58 -07:00
H.J. Lu
ea8e465a6b x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033]
From

https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html

* Intel TSX will be disabled by default.
* The processor will force abort all Restricted Transactional Memory (RTM)
  transactions by default.
* A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated,
  which is set to indicate to updated software that the loaded microcode is
  forcing RTM abort.
* On processors that enumerate support for RTM, the CPUID enumeration bits
  for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to
  be set by default after microcode update.
* Workloads that were benefited from Intel TSX might experience a change
  in performance.
* System software may use a new bit in Model-Specific Register (MSR) 0x10F
  TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock
  Elision (HLE) and RTM bits to indicate to software that Intel TSX is
  disabled.

1. Add RTM_ALWAYS_ABORT to CPUID features.
2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set.  This skips the
string/tst-memchr-rtm etc. testcases on the affected processors, which
always fail after a microcde update.
3. Check RTM feature, instead of usability, against /proc/cpuinfo.

This fixes BZ #28033.
2021-07-01 10:47:35 -07:00
Joseph Myers
b1b4f7209e Update syscall lists for Linux 5.13
Linux 5.13 has three new syscalls (landlock_create_ruleset,
landlock_add_rule, landlock_restrict_self).  Update syscall-names.list
and regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.

Tested with build-many-glibcs.py.
2021-07-01 17:37:36 +00:00
Stefan Liebler
7c45df18e1 s390: Fix MEMCHR_Z900_G5 ifunc-variant if n>=0x80000000 [BZ #28024]
On s390 (31bit), the pointer to the first byte after s always wraps
around with n >= 0x80000000 and can lead to stop searching before
end of s.

Thus this patch just use NULL as byte after s in this case and
the srst instruction stops searching with "not found" when wrapping
around from top address to zero.

This is observable with testcase string/test-memchr
starting with commit "String: Add overflow tests for strnlen, memchr,
and strncat [BZ #27974]"
https://sourceware.org/git/?p=glibc.git;a=commit;h=da5a6fba0febbfc90896ce1b2eb75c6d8a88a72d
2021-07-01 16:46:59 +02:00
Stefan Liebler
ba436665b1 Fix extra PLT reference in libc.so due to __glob64_time64 if build with gcc 7.5 on 32bit.
Starting with recent commit 84f7ce8447
"posix: Add glob64 with 64-bit time_t support", elf/check-localplt
fails due to extra PLT reference __glob64_time64 in __glob64_time64
itself.

This is observable with gcc 7.5 on x86_64 with -m32 or s390x with
-m31.  E.g. if build with gcc 10, gcc is generating a call to
__glob64_time64.localalias.

This patch is adding a hidden version of __glob64_time64 in the
same way as for __globfree64_time64.
2021-07-01 16:46:59 +02:00
Wilco Dijkstra
6a34c928c2 AArch64: Add hp-timing.h
Add hp-timing.h using the cntvct_el0 counter. Return timing in nanoseconds
so it is fully compatible with generic hp-timing. Don't set HP_TIMING_INLINE
in the dynamic linker since it adds unnecessary overheads and some ancient
kernels may not handle emulating cntcvt correctly. Currently cntvct_el0 is
only used for timing in the benchtests.

Reviewed-by: Szabolcs Nagy  <szabolcs.nagy@arm.com>
2021-07-01 15:42:05 +01:00
Wilco Dijkstra
252cad02d4 AArch64: Improve strnlen performance
Optimize strnlen by avoiding UMINV which is slow on most cores. On Neoverse N1
large strings are 1.8x faster than the current version, and bench-strnlen is
50% faster overall. This version is MTE compatible.

Reviewed-by: Szabolcs Nagy  <szabolcs.nagy@arm.com>
2021-07-01 15:32:36 +01:00
Florian Weimer
eb68d7d23c Linux: Avoid calling malloc indirectly from __get_nprocs
malloc initialization depends on __get_nprocs, so using
scratch buffers in __get_nprocs may result in infinite recursion.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-06-30 17:41:47 +02:00
Joseph Myers
38f58041ba Use Linux 5.13 in build-many-glibcs.py
This patch makes build-many-glibcs.py use Linux 5.13.

Tested with build-many-glibcs.py (host-libraries, compilers and glibcs
builds).
2021-06-30 13:29:08 +00:00
Florian Weimer
734c60ebb6 login: Move libutil into libc
The symbols forkpty, login, login_tty, logout, logwtmp, openpty
were moved using scripts/move-symbol-to-libc.py.

This is a single commit because most of the symbols are tied together
via forkpty, for example.

Several changes to use hidden prototypes are needed.  This commit
also updates pseudoterminal terminology on modified lines.

For 390 (31-bit), this commit follows the existing style for the
compat symbol version creation.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-30 08:43:37 +02:00
Florian Weimer
98164ba55d login: Rework hidden prototypes for __setutent, __utmpname, __endutent
Replace attribute_hidden with a regular combination of
libc_hidden_proto and libc_hidden_def.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-30 07:28:27 +02:00
Florian Weimer
8d1f854d60 login: Hidden prototypes for _getpt, __ptsname_r, grantpt, unlockpt
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-30 07:28:12 +02:00
Florian Weimer
3640654575 nptl_db: Re-use the ELF-to-abilist converter for ABI checking
The previous approach uses readelf -DWs, which does not produce
a stable output format (older binutils versions do not include
symbol version information).  This commit re-uses scripts/abilist.awk
with a tweak to include GLIBC_PRIVATE symbols.  This awk script
is based on objdump -T output, which appears to be stable over time.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-06-29 22:17:08 +02:00
Andreas Roeseler
9dc7dc5708 Add RFC 8335 Definitions from Linux 5.13
RFC 8335 defines the network utility PROBE, which builds off of the
capabilities of Ping to query more detailed interface information from
networking nodes.

The definitions included in this patchset have been accepted into the
linux net-next branch and will be included in Linux 5.13. This
patchset adds the same definitions to the glibc for use in the
iputils package.

The relevant commits for the Linux definitions can be found here:
e542d29ca8
750f4fc2a1

These changes have been tested by running the glibc tests on x86_64

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-29 15:38:27 -03:00
Florian Weimer
5e1ce61e3e nss: Fix NSS_DECLARE_MODULE_FUNCTIONS handling of _nss_*_endnetgrent
The old version had an additional underscore, making the declaration
ineffective.
2021-06-29 12:06:40 +02:00
Stefan Liebler
259a17cc98 s390x: Update math: redirect roundeven function
After recent commit
447954a206
"math: redirect roundeven function", building on
s390x fails with:
Error: symbol `__roundevenl' is already defined

Similar to aarch64/riscv fix, this patch redirects target
specific functions for s390x:
commit 3213ed770c
"Update math: redirect roundeven function"
2021-06-29 09:07:14 +02:00
Adhemerval Zanella
c32c868ab8 posix: Add _Fork [BZ #4737]
Austin Group issue 62 [1] dropped the async-signal-safe requirement
for fork and provided a async-signal-safe _Fork replacement that
does not run the atfork handlers.  It will be included in the next
POSIX standard.

It allow to close a long standing issue to make fork AS-safe (BZ#4737).
As indicated on the bug, besides the internal lock for the atfork
handlers itself; there is no guarantee that the handlers itself will
not introduce more AS-safe issues.

The idea is synchronize fork with the required internal locks to allow
children in multithread processes to use mostly of standard function
(even though POSIX states only AS-safe function should be used).  On
signal handles, _Fork should be used intead and only AS-safe functions
should be used.

For testing, the new tst-_Fork only check basic usage.  I also added
a new tst-mallocfork3 which uses the same strategy to check for
deadlock of tst-mallocfork2 but using threads instead of subprocesses
(and it does deadlock if it replaces _Fork with fork).

[1] https://austingroupbugs.net/view.php?id=62
2021-06-28 15:55:56 -03:00
Florian Weimer
dd45734e32 nptl: Add glibc.pthread.stack_cache_size tunable
The valgrind/helgrind test suite needs a way to make stack dealloction
more prompt, and this feature seems to be generally useful.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-06-28 16:41:58 +02:00
Florian Weimer
fef400a2f9 nptl: Export libthread_db-used symbols under GLIBC_PRIVATE
This allows distributions to strip debugging information from
libc.so.6 without impacting the debugging experience.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-06-28 15:05:42 +02:00
Florian Weimer
b369cc4e9c nptl: Rename nptl_version to __nptl_version
This prepares it for exporting as a dynamic symbol.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-06-28 14:34:48 +02:00
Florian Weimer
d22705e7de nptl_db: Clean up main/rtld variable handling
Most symbols are now in libc.so.6.  The "main" (exempted from
coverage checks) status is therefore not necessary.  Use
DB_MAIN_VARIABLE for the remaining separate symbol,
__nptl_initial_report_events.  DB_RTLD_VARIABLE is now unused, so
remove it.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-06-28 14:34:32 +02:00
Szabolcs Nagy
3101b96787 arm: align stack in clone [BZ 28020]
The arm PCS requires 8 byte aligned stack at function entry.
Previously unaligned stack could crash the clone child.

Fixes bug 28020.
2021-06-28 11:35:44 +01:00
Florian Weimer
30639e79d3 Linux: Cleanups after librt move
librt.so is no longer installed for PTHREAD_IN_LIBC, and tests
are not linked against it.  $(librt) is introduced globally for
shared tests that need to be linked for both PTHREAD_IN_LIBC
and !PTHREAD_IN_LIBC.

GLIBC_PRIVATE symbols that were needed during the transition are
removed again.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-06-28 09:51:01 +02:00
Florian Weimer
477910b83e Linux: Move timer_settime, __timer_settime64 from librt to libc
The symbols were moved using scripts/move-symbol-to-libc.py.

The way the ABI intransition is implemented is changed with this
commit: the implementation is now consolidated in one file with a
TIMER_T_WAS_INT_COMPAT check.

The shared librt is now empty, so this commit adds a placeholder
symbol at the base version, GLIBC_2.2, and potentially at the
GLIBC_2.3.3 version as well (the leftover from the int/timer_t ABI
transition).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-28 09:51:01 +02:00
Florian Weimer
a1d6ed027b Linux: Move timer_gettime, __timer_gettime64 from librt to libc
The symbols were moved using scripts/move-symbol-to-libc.py.

The way the ABI intransition is implemented is changed with this
commit: the implementation is now consolidated in one file with a
TIMER_T_WAS_INT_COMPAT check.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-28 09:51:01 +02:00
Florian Weimer
df6d227e69 Linux: Move timer_getoverrun from librt to libc
The symbol was moved using scripts/move-symbol-to-libc.py.

The way the ABI intransition is implemented is changed with this
commit: the implementation is now consolidated in one file with a
TIMER_T_WAS_INT_COMPAT check.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-28 09:51:00 +02:00
Florian Weimer
273a2a2ae8 Linux: Move timer_create, timer_delete from librt to libc
The symbols were moved using scripts/move-symbol-to-libc.py.

timer_create and timer_delete are tied together via the int/timer_t
compatibility code.  The way the ABI intransition is implemented
is changed with this commit: the implementation is now consolidated
in one file with a TIMER_T_WAS_INT_COMPAT check.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-28 09:51:00 +02:00
Florian Weimer
d7d0efec47 Linux: Define TIMER_T_WAS_INT_COMPAT in kernel-posix-timers.h
This is almost equivalent to __WORDSIZE == 64
&& OTHER_SHLIB_COMPAT (librt, GLIBC_2_1, GLIBC_2_3_3), except
that this expression is true for mips64/n64 targets as well,
even though those did not undergo the timer_t transition.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-06-28 09:51:00 +02:00
Florian Weimer
8208be389b Install shared objects under their ABI names
Previously, the installed objects were named like libc-2.33.so,
and the ABI soname libc.so.6 was just a symbolic link.

The Makefile targets to install these symbolic links are no longer
needed after this, so they are removed with this commit.  The more
general $(make-link) command (which invokes scripts/rellns-sh) is
retained because other symbolic links are still needed.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@rehdat.com>
2021-06-28 08:33:57 +02:00
Florian Weimer
6bf789d69e elf: Generalize name-based DSO recognition in ldconfig
This introduces <dl-is_dso.h> and the _dl_is_dso function.  A
test ensures that the official names of libc.so, ld.so, and their
versioned names are recognized.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-06-28 08:33:57 +02:00
Florian Weimer
b89d5de250 Makerules: Remove lib-version, $(subdir-version)
Also clarify that the "versioned" term refers to the soname, not the glibc
version (which also ends up in the installed file name).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-06-28 08:33:57 +02:00
Florian Weimer
86f0179bc0 nptl_db: Install libthread_db under a regular implementation name
Currently, the name is always libthread_db-1.0.so.  It does not change
with the glibc version, like the other libraries.

GDB hard-codes libthread_db.so.1 (the soname), so this change does not
affect loading libthread_db.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-06-28 08:33:57 +02:00
Siddhesh Poyarekar
9429049c17 iconvconfig: Fix multiple issues
It was noticed on big-endian systems that msgfmt would fail with the
following error:

msgfmt: gconv_builtin.c:70: __gconv_get_builtin_trans: Assertion `cnt < sizeof (map) / sizeof (map[0])' failed.
Aborted (core dumped)

This is only seen on installed systems because it was due to a
corrupted gconv-modules.cache.  iconvconfig had the following issues
(it was specifically freeing fulldir that caused this issue, but other
cleanups are also needed) that this patch fixes.

- Add prefix only if dir starts with '/'
- Use asprintf instead of mempcpy so that the directory string is NULL
  terminated
- Make a copy of the directory reference in new_module so that fulldir
  can be freed within the same scope in handle_dir.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-06-28 09:15:55 +05:30
Andreas Schwab
5adda61f62 wordexp: handle overflow in positional parameter number (bug 28011)
Use strtoul instead of atoi so that overflow can be detected.
2021-06-27 19:35:42 +02:00
H.J. Lu
3213ed770c Update math: redirect roundeven function
Redirect target specific roundeven functions for aarch64, ldbl-128ibm
and riscv.
2021-06-27 07:56:57 -07:00
Shen-Ta Hsieh
eb9066203f Use GCC builtins for roundeven functions if desired.
This patch is using the corresponding GCC builtin for roundevenf,
roundeven and roundevenl if the USE_FUNCTION_BUILTIN macros are defined
to one in math-use-builtins.h.

These builtin functions is supported since GCC 10.

The code of the generic implementation is not changed.

Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-27 07:56:57 -07:00
Shen-Ta Hsieh
1683249d17 x86_64: roundeven with sse4.1 support
This patch adds support for the sse4.1 hardware floating point
roundeven.

Here is some benchmark results on my systems:

=AMD Ryzen 9 3900X 12-Core Processor=

* benchmark result before this commit
|            |    roundeven |   roundevenf |
|------------|--------------|--------------|
| duration   |  3.75587e+09 |  3.75114e+09 |
| iterations |  3.93053e+08 |  4.35402e+08 |
| max        | 52.592       | 58.71        |
| min        |  7.98        |  7.22        |
| mean       |  9.55563     |  8.61535     |

* benchmark result after this commit
|            |     roundeven |   roundevenf |
|------------|---------------|--------------|
| duration   |   3.73815e+09 |  3.73738e+09 |
| iterations |   5.82692e+08 |  5.91498e+08 |
| max        |  56.468       | 51.642       |
| min        |   6.27        |  6.156       |
| mean       |   6.41532     |  6.3185      |

=Intel(R) Pentium(R) CPU D1508 @ 2.20GHz=

* benchmark result before this commit
|            |    roundeven |   roundevenf |
|------------|--------------|--------------|
| duration   |  2.18208e+09 |  2.18258e+09 |
| iterations |  2.39932e+08 |  2.46924e+08 |
| max        | 96.378       | 98.035       |
| min        |  6.776       |  5.94        |
| mean       |  9.09456     |  8.83907     |

* benchmark result after this commit
|            |    roundeven |   roundevenf |
|------------|--------------|--------------|
| duration   |  2.17415e+09 |  2.17005e+09 |
| iterations |  3.56193e+08 |  4.09824e+08 |
| max        | 51.693       | 97.192       |
| min        |  5.926       |  5.093       |
| mean       |  6.10385     |  5.29507     |

Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-27 07:56:57 -07:00
Shen-Ta Hsieh
447954a206 math: redirect roundeven function
This patch redirect roundeven function for futhermore changes.

Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-27 07:56:57 -07:00