H.J. Lu
df87f54923
Check DATA_CACHE_SIZE_HALF
2010-04-14 22:18:27 -07:00
H.J. Lu
dd37cd1a12
Optimie x86-64 SSE4 memcmp for unaligned data.
2010-04-14 17:53:44 -07:00
H.J. Lu
404a6e3201
x86-64 SSE4 optimized memcmp
...
This is 64bit SSE4 optimized memcmp. It improves memcmp by upto 3X
on Intel Core i7.
2010-04-14 00:12:53 -07:00
Ulrich Drepper
bbbdd77809
Update x86-64 cpu multiarch selection header.
2010-04-13 19:17:10 -07:00
Ulrich Drepper
22f4f44b67
Fix concurrent handling of __cpu_features.
2010-04-04 00:25:46 -07:00
H.J. Lu
7d9335ecd7
Don't define __strpbrk_sse42 in static library
2010-03-24 12:16:24 -07:00
H.J. Lu
5a7af22fbb
Unroll the loop x86-64 SSE4.2 strlen.
2010-01-13 07:51:48 -08:00
H.J. Lu
3af48cbdfa
Optimize 32bit memset/memcpy with SSE2/SSSE3.
2010-01-12 11:22:03 -08:00
H.J. Lu
2510d01ddb
Define bit_SSE2 and index_SSE2.
2009-12-13 15:23:02 -08:00
H.J. Lu
51ddd2c01e
Define bit_XXX and index_XXX.
...
This patch defines bit_XXX and index_XXX and use them to check processor
feature in assembly code. It can prevent typos in processor feature
check.
2009-12-13 09:47:02 -08:00
Ulrich Drepper
823bc6da65
Fix whitespaces.
2009-10-22 22:50:00 -07:00
H.J. Lu
001659f4d5
Implement SSE4.2 optimized strchr and strrchr.
2009-10-22 22:47:12 -07:00
Roland McGrath
b0f3a2e43f
Clean up unnecessary libc_hidden_builtin_def fiddling in x86 multiarch definitions.
2009-10-06 20:01:23 -07:00
Roland McGrath
9d6982d5d2
Clean up x86 multiarch HAS_FOO macros.
2009-10-06 19:59:03 -07:00
Jakub Jelinek
22bb992d51
Fix strstr/strcasestr/fma/fmaf on x86_64.
2009-09-02 19:43:04 -07:00
H.J. Lu
5a4eb7282e
Remove ENABLE_SSSE3_ON_ATOM.
...
It turns that SSSE3 isn't slow on Atom. The problem is bsf. This patch
removes ENABLE_SSSE3_ON_ATOM.
2009-08-28 14:54:46 -07:00
Ulrich Drepper
8e436522e1
Move SSE4.2 functions together.
2009-08-08 09:38:32 -07:00
Ulrich Drepper
0fda545d5f
Add SSSE3-optimized implementation of str{,n}cmp for x86-64.
2009-08-07 22:51:02 -07:00
Ulrich Drepper
57b378ac89
Avoid warning through fake initialization.
2009-08-07 16:19:54 -07:00
H.J. Lu
02cea47161
Add x86 32-bit SSE4.2 string functions.
...
This patch adds 32bit SSE4.2 string functions. It uses -16L instead of
0xfffffffffffffff0L, which works for both 32bit and 64bit long. Tested
on 32bit Core i7 and Core 2.
2009-08-04 12:13:43 -07:00
H.J. Lu
6f6f1215f6
Support multiarch for i686.
...
This patch adds multiarch support when configured for i686. I modified
some x86-64 functions to support 32bit. I will contribute 32bit SSE string
and memory functions later.
2009-07-31 11:53:35 -07:00
Ulrich Drepper
78c4ef475d
Add support for x86-64 fma instruction.
...
Use it to implement fma and fmaf, if possible.
2009-07-29 15:26:06 -07:00
Ulrich Drepper
9a1d2d4555
Prepare use if IFUNC functions outside libc.so.
...
We use a callback function into libc.so to get access to the data
structure with the information and have special versions of the test
macros which automatically use this function.
2009-07-29 15:22:28 -07:00
Ulrich Drepper
e83c1a8a72
Refine testing for xmm/ymm register use in x86-64 ld.so.
...
The test now takes the callgraph into account. Only code called
during runtime relocation is affected by the limitation. We now
determine the affected object files as closely as possible from
the outside. This allowed to remove some the specializations
for some of the string functions as they are only used in other
code paths.
2009-07-27 13:40:27 -07:00
Ulrich Drepper
16d2ea4c82
Make sure no code in ld.so uses xmm/ymm registers on x86-64.
...
This patch introduces a test to make sure no function modifies the
xmm/ymm registers. With the exception of the auditing functions.
The test is probably too pessimistic. All code linked into ld.so
is checked. Perhaps at some point the callgraph starting from
_dl_fixup and _dl_profile_fixup is checked and we can start using
faster SSE-using functions in parts of ld.so.
2009-07-26 16:10:00 -07:00
H.J. Lu
7956a3d27c
Add SSE2 support to str{,n}cmp for x86-64.
2009-07-26 13:32:28 -07:00
H.J. Lu
4e5b5821bf
Some some optimizations for x86-64 strcmp.
2009-07-25 19:15:14 -07:00
Ulrich Drepper
29e92fa5cd
Optimize x86-64 SSE4.2 strcmp.
...
The file contained some code which was never used. Don't compile it
in.
2009-07-25 12:02:47 -07:00
Ulrich Drepper
d28797e426
Perform test for Arom x86-64 in central place and handle it.
...
There will be more than one function which, in multiarch mode, wants
to use SSSE3. We should not test in each of them for Atoms with
slow SSSE3. Instead, disable the SSSE3 bit in the startup code for
such machines.
2009-07-23 13:15:17 -07:00
Ulrich Drepper
ae612b04cc
Minor cleanups in x86-64 strstr.
2009-07-21 07:52:12 -07:00
Ulrich Drepper
a8f895ebe1
Better check for optimization in new x86-64 strstr/strcasestr.
2009-07-20 21:18:28 -07:00
H.J. Lu
2b7a8664fa
SSE4.2 strstr/strcasestr for x86-64.
...
This patch implements SSE4.2 strstr/strcasestr, using Knuth-Morris-Pratt
string searching algorithm.
2009-07-20 21:06:50 -07:00
Ulrich Drepper
cea4329592
Minor cleanups in recently added files.
2009-07-03 03:23:01 -07:00
Ulrich Drepper
d6485c981b
Align functions to 16-byte boundary.
...
Some of the new multi-arch string functions for x86-64 were
not aligned to 16 byte boundarie,s possibly creating unnecessary
cache line misses and delays.
2009-07-03 03:01:57 -07:00
H.J. Lu
06e51c8f3d
Add SSE4.2 support for strcspn, strpbrk, and strspn on x86-64.
2009-07-03 02:48:56 -07:00
Ulrich Drepper
af263b8154
Whitespace fixes in last patch.
2009-07-02 03:43:05 -07:00
H.J. Lu
ab6a873fe0
SSSE3 strcpy/stpcpy for x86-64
...
This patch adds SSSE3 strcpy/stpcpy. I got up to 4X speed up on Core 2
and Core i7. I disabled it on Atom since SSSE3 version is slower for
shorter (<64byte) data.
2009-07-02 03:39:03 -07:00
Ulrich Drepper
b38a2e2e64
Fix little checkin problem in last patch.
2009-06-30 04:41:38 -07:00
H.J. Lu
0181291385
Determine and store processor family and model on x86-64.
2009-06-30 04:39:09 -07:00
Ulrich Drepper
059215ae21
Clean up whitespaces in last patch.
2009-06-22 20:39:37 -07:00
H.J. Lu
772f4e6a1b
Add SSE4.2 support for strcmp and strncmp on x86-64.
2009-06-22 20:38:41 -07:00
Ulrich Drepper
b77c932329
Add SSE4.2 optimized rawmemchr implementation for x86-64.
2009-06-05 16:54:50 -07:00
Ulrich Drepper
6f9eea15bf
Forgot some more cleanups for the SSE4.2 strlen on x86-64.
2009-06-05 11:51:59 -07:00
Ulrich Drepper
f85a9e72e2
Add missing cleanups from SSE4.2 x86-64 strlen.
2009-06-05 11:39:45 -07:00
Ulrich Drepper
3ab2d57a4d
Optimize x86-64 strlen for SSE4.2.
...
The SSE4.2 implementation is used in the DSO only. The patch also adds
some infrastructure to be used in similar code later one.
2009-06-05 11:32:00 -07:00
Ulrich Drepper
8ea2372936
Fix up sched_cpucount in x86-64.
...
Now that static executables can handle IFUNC functions don't exclude
optimization for sched_cpucount for !SHARED.
2009-05-31 23:46:42 -07:00
Ulrich Drepper
963cb6fcb4
Simplify CPUID value handling.
...
SO far Intel and AMD use exactly the same bits meaning the same
things in CPUID index 1. Simplify the code. Should an architecture
come along which doesn't use the same semantics then it must use a
different index value than COMMON_CPUID_INDEX_1.
2009-05-31 17:52:05 -07:00
Ulrich Drepper
425ce2edb9
* config.h.in (USE_MULTIARCH): Define.
...
* configure.in: Handle --enable-multi-arch.
* elf/dl-runtime.c (_dl_fixup): Handle STT_GNU_IFUNC.
(_dl_fixup_profile): Likewise.
* elf/do-lookup.c (dl_lookup_x): Likewise.
* sysdeps/x86_64/dl-machine.h: Handle STT_GNU_IFUNC.
* elf/elf.h (STT_GNU_IFUNC): Define.
* include/libc-symbols.h (libc_ifunc): Define.
* sysdeps/x86_64/cacheinfo.c: If USE_MULTIARCH is defined, use the
framework in init-arch.h to get CPUID values.
* sysdeps/x86_64/multiarch/Makefile: New file.
* sysdeps/x86_64/multiarch/init-arch.c: New file.
* sysdeps/x86_64/multiarch/init-arch.h: New file.
* sysdeps/x86_64/multiarch/sched_cpucount.c: New file.
* config.make.in (experimental-malloc): Define.
* configure.in: Handle --enable-experimental-malloc.
* malloc/Makefile: Handle experimental-malloc flag.
* malloc/malloc.c: Implement PER_THREAD and ATOMIC_FASTBINS features.
* malloc/arena.c: Likewise.
* malloc/hooks.c: Likewise.
* malloc/malloc.h: Define M_ARENA_TEST and M_ARENA_MAX.
2009-03-13 23:53:18 +00:00