Commit Graph

324 Commits

Author SHA1 Message Date
Ulrich Drepper
fe36dd025e Fix tolower operation in strcasestr. 2010-07-30 00:09:07 -07:00
Ulrich Drepper
880113d91e Avoid compiling unneeded file in ld.so. 2010-07-27 21:12:59 -07:00
Ulrich Drepper
24fb0f88ed Add optimized x86-64 implementation of strnlen.
While at it, beef up the test suite for strnlen and add performance
tests for it, too.
2010-07-26 08:37:08 -07:00
Ulrich Drepper
8e96b93aa7 Speed up x86-64 strcasestr a bit moew.
Using the new SSE4.2 instructions is cool but not really the fastest.
Some older SSE instructions can do the trick faster.
2010-07-24 08:34:44 -07:00
Andreas Schwab
f6a31e0eb6 Add strcasestr-nonascii to i386 build 2010-07-21 07:26:18 -07:00
Ulrich Drepper
d02dc4ba08 Fix non-ASCII case of SSE4.2 strcasstr. 2010-07-16 16:00:22 -07:00
Ulrich Drepper
cc9f2e47a0 Speed up SSE4.2 strcasestr by avoiding indirect function call. 2010-07-16 15:37:38 -07:00
H.J. Lu
6fb8cbcb58 Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and
Core i7.  It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and
up to 1X on Core i7.  It also improves memmove by up to 3X on Atom, up to
4X on Core 2 and up to 2X on Core i7.
2010-06-30 08:26:11 -07:00
H.J. Lu
3c88fe1e3a Incorrect x86 CPU family and model check. 2010-05-27 11:14:18 -07:00
Ulrich Drepper
94a27fabeb Whitespace fix. 2010-04-14 22:29:51 -07:00
H.J. Lu
a11ec63713 Add x86-32 FMA support 2010-04-14 22:27:59 -07:00
H.J. Lu
df87f54923 Check DATA_CACHE_SIZE_HALF 2010-04-14 22:18:27 -07:00
H.J. Lu
dd37cd1a12 Optimie x86-64 SSE4 memcmp for unaligned data. 2010-04-14 17:53:44 -07:00
H.J. Lu
404a6e3201 x86-64 SSE4 optimized memcmp
This is 64bit SSE4 optimized memcmp. It improves memcmp by upto 3X
on Intel Core i7.
2010-04-14 00:12:53 -07:00
Ulrich Drepper
bbbdd77809 Update x86-64 cpu multiarch selection header. 2010-04-13 19:17:10 -07:00
Ulrich Drepper
22f4f44b67 Fix concurrent handling of __cpu_features. 2010-04-04 00:25:46 -07:00
H.J. Lu
7d9335ecd7 Don't define __strpbrk_sse42 in static library 2010-03-24 12:16:24 -07:00
Richard Guenther
e39acb1f16 Fix R_X86_64_PC32 overflow detection 2010-03-04 19:33:41 -08:00
Ulrich Drepper
4a1297d761 We can use the 64-bit register versions of the double functions. 2010-02-24 20:00:30 -08:00
Andreas Schwab
7eb22e757e Avoid PLT call to fegetenv on s390 2010-02-09 22:34:17 -08:00
Ulrich Drepper
f69190e74a Prevent silent errors should x86-64 strncmp be needed outside libc. 2010-01-14 08:09:32 -08:00
H.J. Lu
5a7af22fbb Unroll the loop x86-64 SSE4.2 strlen. 2010-01-13 07:51:48 -08:00
H.J. Lu
3af48cbdfa Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
H.J. Lu
2510d01ddb Define bit_SSE2 and index_SSE2. 2009-12-13 15:23:02 -08:00
H.J. Lu
51ddd2c01e Define bit_XXX and index_XXX.
This patch defines bit_XXX and index_XXX and use them to check processor
feature in assembly code.  It can prevent typos in processor feature
check.
2009-12-13 09:47:02 -08:00
Ulrich Drepper
823bc6da65 Fix whitespaces. 2009-10-22 22:50:00 -07:00
H.J. Lu
001659f4d5 Implement SSE4.2 optimized strchr and strrchr. 2009-10-22 22:47:12 -07:00
Roland McGrath
b0f3a2e43f Clean up unnecessary libc_hidden_builtin_def fiddling in x86 multiarch definitions. 2009-10-06 20:01:23 -07:00
Roland McGrath
9d6982d5d2 Clean up x86 multiarch HAS_FOO macros. 2009-10-06 19:59:03 -07:00
Roland McGrath
7967983fd4 configure tweaks, support $libc_add_on_config_subdirs 2009-09-15 14:14:42 -07:00
Jakub Jelinek
22bb992d51 Fix strstr/strcasestr/fma/fmaf on x86_64. 2009-09-02 19:43:04 -07:00
Jakub Jelinek
240441038f Fix x86_64 bits/mathinline.h for -m32 compilation. 2009-09-01 15:30:12 -07:00
Andreas Schwab
c2735e958a Fix parse error in bits/mathinline.h with --std=c99 2009-08-31 17:26:14 +02:00
H.J. Lu
5a4eb7282e Remove ENABLE_SSSE3_ON_ATOM.
It turns that SSSE3 isn't slow on Atom. The problem is bsf. This patch
removes ENABLE_SSSE3_ON_ATOM.
2009-08-28 14:54:46 -07:00
Ulrich Drepper
65b14bcee2 Optimize out duplicated scalbln code for x86-64. 2009-08-25 16:46:34 -07:00
Ulrich Drepper
7423a3456a Optimized signbit{,f} for x86-64. 2009-08-25 14:54:12 -07:00
Ulrich Drepper
84088310ce Handle AVX saving on x86-64 in interrupted smbol lookups.
If a signal arrived during a symbol lookup and the signal handler also
required a symbol lookup, the end of the lookup in the signal handler reset
the flag whether restoring AVX/SSE registers is needed.  Resetting means
in this case that the tail part of the outer lookup code will try to
restore the registers and this can fail miserably.  We now restore to the
previous value which makes nesting calls possible.
2009-08-25 10:42:30 -07:00
Ulrich Drepper
cf00cc00bc Add ceil implementation for 64-bit machines.
On 64-bit machines we should not split doubles into two 32 bit
integer and handle the words separately.  We have wide registers.
This patch implements a 64-bit ceil version.  Ideally all other
functions will be converted over time.
2009-08-24 18:05:48 -07:00
Ulrich Drepper
9a1ea1525e Optimize float construction/extraction on x86-64. 2009-08-24 14:52:49 -07:00
Ulrich Drepper
ef72d5f1b9 Optimize x86-64 signbit{,f} a bit. 2009-08-24 10:20:58 -07:00
H.J. Lu
4e1e2f4247 Support mixed SSE/AVX audit and check AVX only once.
This patch fixes mixed SSE/AVX audit and checks AVX only once in
_dl_runtime_profile. When an AVX or SSE register value in pltenter is
modified, we have to make sure that the SSE part value is the same in both
lr_xmm and lr_vector fields so that pltexit will get the correct value
from either lr_xmm or lr_vector fields. AVX-enabled pltenter should
update both lr_xmm and lr_vector fields to support stacked AVX/SSE
pltenter functions.
2009-08-08 10:54:42 -07:00
Ulrich Drepper
8e436522e1 Move SSE4.2 functions together. 2009-08-08 09:38:32 -07:00
Ulrich Drepper
0fda545d5f Add SSSE3-optimized implementation of str{,n}cmp for x86-64. 2009-08-07 22:51:02 -07:00
Ulrich Drepper
57b378ac89 Avoid warning through fake initialization. 2009-08-07 16:19:54 -07:00
Ulrich Drepper
3aa2588d4a Fix whitespaces in last checkin. 2009-08-07 09:47:12 -07:00
H.J. Lu
a546baa9cd Properly count number of logical processors on Intel CPUs.
The meaning of the 25-14 bits in EAX returned from cpuid with EAX = 4
has been changed from "the maximum number of threads sharing the cache"
to "the maximum number of addressable IDs for logical processors sharing
the cache" if cpuid takes EAX = 11.  We need to use results from both
EAX = 4 and EAX = 11 to get the number of threads sharing the cache.

The 25-14 bits in EAX on Core i7 is 15 although the number of logical
processors is 8.  Here is a white paper on this:

http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/

This patch correctly counts number of logical processors on Intel CPUs
with EAX = 11 support on cpuid.  Tested on Dinnington, Core i7 and
Nehalem EX/EP.

It also fixed Pentium Ds workaround since EBX may not have the right
value returned from cpuid with EAX = 1.
2009-08-07 09:39:36 -07:00
H.J. Lu
02cea47161 Add x86 32-bit SSE4.2 string functions.
This patch adds 32bit SSE4.2 string functions.  It uses -16L instead of
0xfffffffffffffff0L, which works for both 32bit and 64bit long.  Tested
on 32bit Core i7 and Core 2.
2009-08-04 12:13:43 -07:00
H.J. Lu
6f6f1215f6 Support multiarch for i686.
This patch adds multiarch support when configured for i686.  I modified
some x86-64 functions to support 32bit. I will contribute 32bit SSE string
and memory functions later.
2009-07-31 11:53:35 -07:00
Ulrich Drepper
98b1e6c866 ____longjmp_chk is now OS-specific.
We use sigaltstack internally which on some systems is a syscall
and should be used as such.  Move the x86-64 version to the Linux
specific directory and create in its place a file which always
causes compile errors.
2009-07-30 21:42:27 -07:00
Ulrich Drepper
8e80581787 Change code a bit to correct CFI. 2009-07-30 21:29:27 -07:00