Marek Polacek
622c86f480
Remove __ELF__ conditionals
2012-02-07 00:41:11 +01:00
Joseph Myers
3b1004624e
Fix makefile/configure problems with sse2avx changes.
2012-01-30 19:55:15 +00:00
Ulrich Drepper
96bc5b45a6
Optimize x86-64 math inline header a bit
2012-01-28 21:20:06 -05:00
Ulrich Drepper
56f6f6a240
Use -msse2avx option for x86-64 libm functions
2012-01-28 14:48:46 -05:00
Ulrich Drepper
73139a7628
Simplify use of AVX instructions in internal math macros
2012-01-28 11:19:06 -05:00
Ulrich Drepper
08cf777f9e
Really fix AVX tests
...
There is no problem with strcmp, it doesn't use the YMM registers.
The math routines might since gcc perhaps generates such code.
Introduce bit_YMM_USBALE and use it in the math routines.
2012-01-26 09:45:54 -05:00
Ulrich Drepper
afc5ed09cb
Reset bit_AVX in __cpu_features is OS support is missing
2012-01-26 07:45:14 -05:00
Ulrich Drepper
a0da5fe1e4
More fallout from supporting only ELF
2012-01-08 00:45:01 -05:00
Ulrich Drepper
a784e50247
Remove pre-ISO C support
...
No more __const.
2012-01-07 23:57:22 -05:00
Ulrich Drepper
0269750ca6
Remove non-ELF support
2012-01-07 20:30:26 -05:00
Ulrich Drepper
305502f69d
Fix up a comment
2012-01-07 11:49:33 -05:00
Ulrich Drepper
e40bca1ef9
Yet more ia64 removal fallout
2012-01-07 11:44:02 -05:00
Marek Polacek
530a32499a
Fix typos in comments
2011-12-23 13:59:40 -05:00
Ulrich Drepper
67371b5666
Prevent warnings due to long long constants
2011-12-23 13:52:59 -05:00
Liubov Dmitrieva
15db4de19d
Fix overrun in destination buffer
2011-12-23 12:02:15 -05:00
Ulrich Drepper
21eaf3a5f9
Use __REDIRECT_NTH for __feraiseexcept_renamed
2011-12-22 08:05:21 -05:00
Rafael Ávila de Espíndola
d2daaa1eb6
Define x86_64 feraiseexcept inline only under __USE_EXTERN_INLINES.
2011-12-21 13:27:09 -08:00
Ulrich Drepper
370a7d88f7
WP fixes
2011-12-17 14:41:05 -05:00
Ulrich Drepper
1d3e4b618a
Optimized wcschr and wcscpy for x86-64 and x86-32
2011-12-17 14:39:23 -05:00
Ulrich Drepper
aff2453df7
Fix more warnings
2011-12-03 21:49:35 -05:00
Ulrich Drepper
34372fc6d3
Fix test of non-ASCII locales in x86-64 strcasecmp et.al.
2011-11-01 16:46:23 -04:00
Ulrich Drepper
52e4b9eb62
More cleanups of x86-64 strstr
2011-10-28 19:01:48 -04:00
Ulrich Drepper
fd52bc6dc4
Clean up x86-64 strcasestr
...
Actually describe in the C code what is going on.
2011-10-28 18:18:04 -04:00
Ulrich Drepper
a5b81e1fb7
Remove code without too much effects
...
Some of the AVX-specific code is not giving enough speed-up to
justify the extra code.
2011-10-28 16:55:01 -04:00
Ulrich Drepper
e0016b11d6
Add AVX optimized versions for some x86-64 math functions
2011-10-25 21:34:55 -04:00
Ulrich Drepper
618280a192
Optimize x86-64 SSE4.2+ strcmp a bit more
2011-10-25 14:50:31 -04:00
Ulrich Drepper
31ea014d8b
Use VEX encoding in inline math functions on x86-64 when possible
2011-10-25 08:17:57 -04:00
Ulrich Drepper
31d3cc00b0
Cleanup FMA4 patch
...
Move the FMA4 code into its own section. Avoid some of the duplication
of data resulting from the double use of source files.
2011-10-25 00:56:33 -04:00
Ulrich Drepper
202c9deb15
Better DLA_FMS
...
It's better to use __builtin_fma if it works. Use it for gcc 4.6 and
higher. Move the x86-64 dla.h to the correct place.
2011-10-24 22:11:21 -04:00
Ulrich Drepper
a0cf1edd4c
Use inline asm for DLA_FMS because of broken old compilers
2011-10-24 21:17:10 -04:00
Ulrich Drepper
af968f62f2
Optimize accurate 64-bit routines for FMA4 on x86-64
2011-10-24 20:19:17 -04:00
Ulrich Drepper
09229f3e1b
Fix WS
2011-10-23 14:57:28 -04:00
Liubov Dmitrieva
ce7dd29f28
Optimized strnlen and wcscmp for x86-64
2011-10-23 14:56:04 -04:00
Ulrich Drepper
f17424ed53
Fix WS
2011-10-23 13:35:24 -04:00
Liubov Dmitrieva
95584d3b33
Fix signedness in wcscmp comparison
2011-10-23 13:34:15 -04:00
Ulrich Drepper
774a2669af
Clean up FMA use
...
The macro's name should reflect that subtraction is being done. And
use __builtin_fma, it seems to work after all.
2011-10-23 13:31:01 -04:00
Ulrich Drepper
c8b3296bbe
Clean up last dla.h change
2011-10-23 12:50:28 -04:00
Ulrich Drepper
fb24de5932
Fix typo in last change
2011-10-22 20:09:58 -04:00
Ulrich Drepper
0d355eb7c7
Update ULPs for x86-64
2011-10-22 20:06:23 -04:00
Ulrich Drepper
c196fed8f0
Fix compilation problems in x86-64 init-arch
2011-10-21 20:47:20 -04:00
Ulrich Drepper
1a97a8c78f
Don't use NULL in last s_fma{,f} change
2011-10-21 07:39:28 -04:00
Ulrich Drepper
ed72b6545f
Check for FMA4 support and generate appropriate fma functions
2011-10-20 22:43:15 -04:00
Ulrich Drepper
8d4f46c613
Move fma routines to right place
2011-10-20 21:55:41 -04:00
Ulrich Drepper
855d156018
Optimize x86-64 rawmemchr and add test
2011-10-19 22:22:29 -04:00
Ulrich Drepper
d9a4d2ab27
Add optimized str{,n}casecmp for AVX on x86-64
2011-10-19 12:42:38 -04:00
Andreas Schwab
8f3b1ffefa
Fix PLT use for feraiseexcept on x86_64
2011-10-19 13:03:31 +02:00
Ulrich Drepper
d9a8d0abcc
Use new internal libc_fe* interfaces in more functions
2011-10-18 15:11:31 -04:00
Ulrich Drepper
4855e3ddf5
Provide combined internal feholdexcept/fesetround interface
2011-10-18 09:59:04 -04:00
Ulrich Drepper
23ce562780
Pretty print last change to x86-64 mathinline.h
2011-10-18 09:38:47 -04:00
Ulrich Drepper
581d30e386
Add optimized nearbyint{,f} for x86-64
2011-10-18 09:13:23 -04:00
Ulrich Drepper
d38f1dba00
Start optimizing the use of the fenv interfaces in libm itself
2011-10-18 09:00:46 -04:00
Andreas Schwab
83c7615c2d
Fix last change
2011-10-18 14:11:29 +02:00
Andreas Schwab
caa6c9d845
Fix linkage conflict with feraiseexcept
2011-10-18 11:46:51 +02:00
Ulrich Drepper
228a984d54
Relax asm requirements for recently added x86-64 math interfaces
2011-10-17 20:30:52 -04:00
Ulrich Drepper
c8553a6a6f
Makr x86-64 math_private.h more robust
2011-10-17 16:00:39 -04:00
Ulrich Drepper
ed22dcf691
Provide internal optimizations on x86-64 with SSE4.1
...
Provide macros so that the internal users can, if possible, directly use
the new instructions.
Also fix up the mathinline.h header when compiling with SSE4.1 enabled.
2011-10-17 11:23:40 -04:00
Ulrich Drepper
b171c13768
Fix last x86-64 mathinline change
...
Use correct function names.
2011-10-17 10:37:00 -04:00
Ulrich Drepper
ad0f5cad15
Use rounds{s,d} for x86 rint, ceil, floor
2011-10-16 20:58:17 -04:00
Ulrich Drepper
2d1f3a4db6
Fix WS
2011-10-15 11:11:12 -04:00
Liubov Dmitrieva
be13f7bff6
Optimized memcmp and wmemcmp for x86-64 and x86-32
2011-10-15 11:10:08 -04:00
Andreas Schwab
6b1f68c91f
Fix lost feraiseexcept symbol
2011-10-14 11:21:23 +02:00
Andreas Schwab
714fad23c6
Fix PLT use in feupdateenv on x86_64
2011-10-13 15:26:45 +02:00
Andreas Schwab
81dcc7fb74
Check for zero size in memrchr for x86_64
2011-10-13 13:34:41 +02:00
Ulrich Drepper
0ac5ae2335
Optimize libm
...
libm is now somewhat integrated with gcc's -ffinite-math-only option
and lots of the wrapper functions have been optimized.
2011-10-12 11:27:51 -04:00
Ulrich Drepper
7edb55ce06
Optimize use of isnan, isinf, finite
2011-10-08 10:18:26 -04:00
Ulrich Drepper
66fb11b1da
Fix whitespace
2011-10-07 11:50:21 -04:00
Liubov Dmitrieva
093ecf9299
Improve 64 bit memchr, memrchr, rawmemchr with SSE2
2011-10-07 11:49:10 -04:00
Andreas Schwab
3a62d00d40
Don't call ifunc functions in trace mode
2011-10-05 14:35:40 +02:00
Andreas Schwab
bf972c9dfc
Fix parse error in bits/mathinline.h with --std=c99
2011-09-26 14:01:30 +02:00
Ulrich Drepper
4c1a1f71c0
Add fmax and fmin inlines for x86-64
2011-09-15 13:11:08 -04:00
Ulrich Drepper
ee4d03150a
Use correct section to allow merging
2011-09-14 13:43:24 -04:00
Ulrich Drepper
cd20565401
Optimized lrint and llrint for x86-64
2011-09-14 12:58:43 -04:00
Andreas Schwab
e529793b50
Avoid macro clash between <sys/select.h> and <linux/posix_types.h>
2011-09-13 15:16:38 +02:00
Ulrich Drepper
83cd142045
Remove --wth-tls option, TLS support is required
2011-09-11 15:02:01 -04:00
Ulrich Drepper
d063d16433
Remove support for !USE___THREAD
2011-09-10 16:50:28 -04:00
Petr Baudis
1248c1c415
Fix jn precision
2011-09-09 22:16:10 -04:00
H.J. Lu
08a300c956
Simplify AVX check
2011-09-07 21:38:23 -04:00
Ulrich Drepper
ceaa0c5dc3
Move Atom-optimized code out of the way and together
2011-09-06 21:53:03 -04:00
Ulrich Drepper
8e1294e83f
Remove now-wrong comment
2011-09-06 17:20:33 -04:00
Ulrich Drepper
6d18b67f4d
Fix whitespaces
2011-09-05 21:42:12 -04:00
Liubov Dmitrieva
a5f524e479
Add Atom-optimized strchr and strrchr for x86-64
2011-09-05 21:34:03 -04:00
Ulrich Drepper
49d42c37ba
Add optimized x86-64 wcscmp
2011-09-05 14:08:23 -04:00
Ulrich Drepper
0276a718c0
Fix minor CFI problem in regular x86-64 trampoline
2011-08-20 08:58:44 -04:00
Ulrich Drepper
c88f17668b
Fix CFI info in x86-64 trampolines for non-AVX code
2011-08-20 08:56:30 -04:00
Ulrich Drepper
8e999d2962
Minor optimization of popcount in l10nflist
2011-08-11 14:07:04 -04:00
Andreas Schwab
8c1a459f9a
Fix inline strncat/strncmp on x86
2011-08-04 14:59:25 -04:00
Ulrich Drepper
bba33c289b
One more typo in AVX test
2011-07-23 15:18:13 -04:00
Ulrich Drepper
2ee5518515
Merge branch 'master' of ssh://sourceware.org/git/glibc
...
Conflicts:
ChangeLog
2011-07-23 00:04:15 -04:00
Ulrich Drepper
1aae088a8a
One more change to XSAVE patch
2011-07-22 23:33:22 -04:00
Andreas Schwab
1d002f2539
Fix AVX check
2011-07-22 14:33:47 -04:00
Ulrich Drepper
21137f89c5
Fix overflow bug is optimized strncat for x86-64
2011-07-21 12:32:36 -04:00
Ulrich Drepper
5644ef5461
Fix check for AVX enablement
...
The AVX bit is set if the CPU supports AVX. But this doesn't mean the
kernel does. Add checks according to Intel's documentation.
2011-07-20 21:21:03 -04:00
Ulrich Drepper
6986b98a18
Force :a_x86_64_ymm to be 16-byte aligned
2011-07-20 14:20:00 -04:00
Ulrich Drepper
8002999481
Fix whitespaces
2011-07-19 17:27:09 -04:00
Liubov Dmitrieva
99710781cc
Improve 64 bit strcat functions with SSE2/SSSE3
2011-07-19 17:11:54 -04:00
Ulrich Drepper
ecaddd6699
Rebuild configure scripts
2011-07-06 21:29:02 -04:00
H.J. Lu
8912479f9e
Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64
2011-06-24 15:14:22 -04:00
H.J. Lu
0b1cbaaef5
Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32
2011-06-24 14:15:32 -04:00
David S. Miller
42675c6ff0
Add an elf_ifunc_invoke interface so that architectures can implement
...
the ifunc resolver calls however they wish.
2011-06-20 19:56:40 -07:00
H.J. Lu
3d29045b5e
Assume Intel Core i3/i5/i7 processor if AVX is available
2011-06-03 07:01:25 -04:00
H.J. Lu
8db736347c
Fix typo in x86-64 powl
2011-05-18 19:50:48 -04:00
Mike Frysinger
4c559bcdf3
Fix static linking with checking x86/x86-64 memcpy.
2011-04-17 22:20:47 -04:00
Ulrich Drepper
e6c6149412
Fix memory leak in TLS of loaded objects.
2011-04-10 22:43:01 -04:00
Ulrich Drepper
dedc7c7b05
Fix typo in cache information table for x86-{32,64}.
2011-04-03 09:32:31 -04:00
H.J. Lu
0354e35501
Work around old buggy program which cannot cope with memcpy semantics.
2011-04-01 19:38:21 -04:00
Ulrich Drepper
bb2420590c
Last change caused infinite loops because of missing loop increment.
2011-03-22 01:52:43 -04:00
H.J. Lu
c97a1282a4
Handle page boundaries in x86 SSE4.2 strncmp.
2011-03-21 05:35:38 -04:00
Ulrich Drepper
2a11560107
Implement x86 cpuid handling of leaf4 for cache information.
2011-03-20 08:14:30 -04:00
Harsha Jagasia
7e4ba49cd3
Enable SSE2 memset for AMD'supcoming Orochi processor.
...
This patch enables SSE2 memset for AMD's upcoming Orochi processor.
This patch also fixes the following bug:
For misaligned blocks larger than > 144 Bytes, memset branches into
the integer code path depending on the value of misalignment even if
the startup code chooses the SSE2 code path upfront, when multiarch
is enabled.
2011-03-04 23:30:08 -05:00
Ulrich Drepper
baa6c69a57
Work around empty line at end file generated by autoconf.
2011-02-17 01:26:07 -05:00
Ulrich Drepper
e943389325
Remove use of ranlib.
2011-02-15 14:52:29 -05:00
Roland McGrath
a0bf67cca2
Fix some warning nits.
2011-02-04 10:53:51 -08:00
Ulrich Drepper
f257bbd77d
Clean up some bits/select.h headers.
2011-01-09 16:49:17 -05:00
Ryan S. Arnold
30950a5fd2
Make PowerPC64 default to nonexecutable stack
2010-12-19 22:49:01 -05:00
H.J. Lu
13b695749a
Support Intel processor model 6 and model 0x2.
2010-11-12 03:48:52 -05:00
H.J. Lu
8ca52c6e3b
Fix one exit path in x86-64 SSE4.2 str{,n}casecmp.
2010-11-10 03:05:37 -05:00
Ulrich Drepper
69da074d7a
Fix warnings in __bswap_16.
2010-11-10 02:38:35 -05:00
H.J. Lu
ff02d5280b
Use IFUNC on x86-64 memset
2010-11-08 03:41:34 -05:00
Ulrich Drepper
c0dde15b5d
32bit memset-sse2.S fails with uneven cache size
...
32bit memset-sse2.S assumes cache size is multiple of 128 bytes. If
it isn't true, memset-sse2.S will fail. For example, a processor can
have 24576 KB L3 cache and 20 cores. That is 2516582 byte per core. Half
of it is 1258291, which isn't helpful for vector instructions. This
patch rounds cache sizes to multiple of 256 bytes and adds "raw" cache
sizes.
2010-11-05 07:57:46 -04:00
Richard Li
dbf3a06904
Fix x86-64 strchr propagation of search byte into all bytes of SSE register
2010-10-25 14:13:17 -04:00
Ulrich Drepper
18edac4857
Provide FP_FAST_FMA{,F,L} definitions for x86/x86-64.
2010-10-19 12:56:42 -04:00
Jakub Jelinek
5e908464b9
Implement accurate fma.
2010-10-13 22:27:03 -04:00
Jakub Jelinek
9ff8d36f27
Correct implementation of fmaf.
2010-10-11 09:27:05 -04:00
Ulrich Drepper
45db99c7d0
Fix handling of tail bytes of buffer in SSE2/SSSE3 x86-64 version strn{,case}cmp
2010-10-03 22:10:30 -04:00
Ulrich Drepper
015a4c6193
Re-enable all strncasecmp versions.
2010-09-20 20:18:00 -07:00
Ulrich Drepper
8ffcee4a04
Fix limit detection in x86-64 SSE2 strncasecmp.
2010-09-20 14:02:23 -07:00
Ulrich Drepper
0959ffc97b
Update x86-64 mpn routines from GMP 5.0.1.
2010-09-02 23:36:25 -07:00
Ulrich Drepper
01d2601561
Fix typo in last commit.
2010-08-26 22:35:42 -07:00
Ulrich Drepper
9ea3de11f1
Move slow Atom code to separate section.
2010-08-26 22:17:03 -07:00
Ulrich Drepper
107b2fa56c
Shorten x86-64 strlen a bit.
2010-08-26 22:12:16 -07:00
H.J. Lu
623aac7f84
Unroll x86-64 strlen
2010-08-26 22:09:34 -07:00
H.J. Lu
b416a90085
Missing comma in last commit.
2010-08-26 13:18:46 -07:00
Roland McGrath
8b2b771538
Clean up warnings in new x86_64/multiarch code.
2010-08-25 12:13:08 -07:00
H.J. Lu
e73015f2d6
Unroll 32bit SSE strlen and handle slow bsf
2010-08-25 10:07:37 -07:00
Ulrich Drepper
1cdfe7242f
Add missing copyright year updated and pretty printing.
2010-08-24 11:42:19 -07:00
Richard Henderson
73f27d5e72
Clean up SSE variable shifts
2010-08-24 11:35:01 -07:00
Ulrich Drepper
9da4bb316f
Fix two typos in x86-64 SSE4.2 strncasecmp implementation.
2010-08-19 09:20:44 -07:00
Ulrich Drepper
1feccb6caf
Fix fourth parameter of SSE4.2 strcmp for x86-64.
2010-08-15 20:46:09 -07:00
Ulrich Drepper
28c90b2cf5
Use correct register for fourth parameter of x86-64 strncasecmp_l.
2010-08-15 17:42:12 -07:00
Ulrich Drepper
25244f174f
Undo inccorect change.
2010-08-15 10:34:33 -07:00
Ulrich Drepper
e9f82e0d1d
Add optimized strncasecmp versions for x86-64.
2010-08-14 22:04:01 -07:00
Ulrich Drepper
ca6bb004eb
Fix x86-64 build without multiarch.
2010-08-14 14:56:32 -07:00
Andi Kleen
d22e4cc939
x86: Add support for frame pointer less mcount
2010-08-07 21:24:05 -07:00
Ulrich Drepper
73507d3ae0
Add support for SSSE3 and SSE4.2 versions of strcasecmp on x86-64.
2010-07-31 21:41:09 -07:00
Ulrich Drepper
66f6765a47
Pretty printing x86-64 SSE4.3 strcmp.
2010-07-30 12:54:37 -07:00
Ulrich Drepper
42e08a5438
Implement optimized strcaecmp for x86-64.
2010-07-30 00:14:04 -07:00
Ulrich Drepper
fe36dd025e
Fix tolower operation in strcasestr.
2010-07-30 00:09:07 -07:00
Ulrich Drepper
880113d91e
Avoid compiling unneeded file in ld.so.
2010-07-27 21:12:59 -07:00
Ulrich Drepper
24fb0f88ed
Add optimized x86-64 implementation of strnlen.
...
While at it, beef up the test suite for strnlen and add performance
tests for it, too.
2010-07-26 08:37:08 -07:00
Ulrich Drepper
8e96b93aa7
Speed up x86-64 strcasestr a bit moew.
...
Using the new SSE4.2 instructions is cool but not really the fastest.
Some older SSE instructions can do the trick faster.
2010-07-24 08:34:44 -07:00
Andreas Schwab
f6a31e0eb6
Add strcasestr-nonascii to i386 build
2010-07-21 07:26:18 -07:00
Ulrich Drepper
d02dc4ba08
Fix non-ASCII case of SSE4.2 strcasstr.
2010-07-16 16:00:22 -07:00
Ulrich Drepper
cc9f2e47a0
Speed up SSE4.2 strcasestr by avoiding indirect function call.
2010-07-16 15:37:38 -07:00
H.J. Lu
6fb8cbcb58
Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7
...
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and
Core i7. It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and
up to 1X on Core i7. It also improves memmove by up to 3X on Atom, up to
4X on Core 2 and up to 2X on Core i7.
2010-06-30 08:26:11 -07:00
H.J. Lu
3c88fe1e3a
Incorrect x86 CPU family and model check.
2010-05-27 11:14:18 -07:00
Ulrich Drepper
94a27fabeb
Whitespace fix.
2010-04-14 22:29:51 -07:00
H.J. Lu
a11ec63713
Add x86-32 FMA support
2010-04-14 22:27:59 -07:00
H.J. Lu
df87f54923
Check DATA_CACHE_SIZE_HALF
2010-04-14 22:18:27 -07:00
H.J. Lu
dd37cd1a12
Optimie x86-64 SSE4 memcmp for unaligned data.
2010-04-14 17:53:44 -07:00
H.J. Lu
404a6e3201
x86-64 SSE4 optimized memcmp
...
This is 64bit SSE4 optimized memcmp. It improves memcmp by upto 3X
on Intel Core i7.
2010-04-14 00:12:53 -07:00
Ulrich Drepper
bbbdd77809
Update x86-64 cpu multiarch selection header.
2010-04-13 19:17:10 -07:00
Ulrich Drepper
22f4f44b67
Fix concurrent handling of __cpu_features.
2010-04-04 00:25:46 -07:00
H.J. Lu
7d9335ecd7
Don't define __strpbrk_sse42 in static library
2010-03-24 12:16:24 -07:00
Richard Guenther
e39acb1f16
Fix R_X86_64_PC32 overflow detection
2010-03-04 19:33:41 -08:00
Ulrich Drepper
4a1297d761
We can use the 64-bit register versions of the double functions.
2010-02-24 20:00:30 -08:00
Andreas Schwab
7eb22e757e
Avoid PLT call to fegetenv on s390
2010-02-09 22:34:17 -08:00
Ulrich Drepper
f69190e74a
Prevent silent errors should x86-64 strncmp be needed outside libc.
2010-01-14 08:09:32 -08:00
H.J. Lu
5a7af22fbb
Unroll the loop x86-64 SSE4.2 strlen.
2010-01-13 07:51:48 -08:00
H.J. Lu
3af48cbdfa
Optimize 32bit memset/memcpy with SSE2/SSSE3.
2010-01-12 11:22:03 -08:00
H.J. Lu
2510d01ddb
Define bit_SSE2 and index_SSE2.
2009-12-13 15:23:02 -08:00
H.J. Lu
51ddd2c01e
Define bit_XXX and index_XXX.
...
This patch defines bit_XXX and index_XXX and use them to check processor
feature in assembly code. It can prevent typos in processor feature
check.
2009-12-13 09:47:02 -08:00
Ulrich Drepper
823bc6da65
Fix whitespaces.
2009-10-22 22:50:00 -07:00
H.J. Lu
001659f4d5
Implement SSE4.2 optimized strchr and strrchr.
2009-10-22 22:47:12 -07:00
Roland McGrath
b0f3a2e43f
Clean up unnecessary libc_hidden_builtin_def fiddling in x86 multiarch definitions.
2009-10-06 20:01:23 -07:00
Roland McGrath
9d6982d5d2
Clean up x86 multiarch HAS_FOO macros.
2009-10-06 19:59:03 -07:00
Roland McGrath
7967983fd4
configure tweaks, support $libc_add_on_config_subdirs
2009-09-15 14:14:42 -07:00
Jakub Jelinek
22bb992d51
Fix strstr/strcasestr/fma/fmaf on x86_64.
2009-09-02 19:43:04 -07:00
Jakub Jelinek
240441038f
Fix x86_64 bits/mathinline.h for -m32 compilation.
2009-09-01 15:30:12 -07:00
Andreas Schwab
c2735e958a
Fix parse error in bits/mathinline.h with --std=c99
2009-08-31 17:26:14 +02:00
H.J. Lu
5a4eb7282e
Remove ENABLE_SSSE3_ON_ATOM.
...
It turns that SSSE3 isn't slow on Atom. The problem is bsf. This patch
removes ENABLE_SSSE3_ON_ATOM.
2009-08-28 14:54:46 -07:00
Ulrich Drepper
65b14bcee2
Optimize out duplicated scalbln code for x86-64.
2009-08-25 16:46:34 -07:00
Ulrich Drepper
7423a3456a
Optimized signbit{,f} for x86-64.
2009-08-25 14:54:12 -07:00
Ulrich Drepper
84088310ce
Handle AVX saving on x86-64 in interrupted smbol lookups.
...
If a signal arrived during a symbol lookup and the signal handler also
required a symbol lookup, the end of the lookup in the signal handler reset
the flag whether restoring AVX/SSE registers is needed. Resetting means
in this case that the tail part of the outer lookup code will try to
restore the registers and this can fail miserably. We now restore to the
previous value which makes nesting calls possible.
2009-08-25 10:42:30 -07:00
Ulrich Drepper
cf00cc00bc
Add ceil implementation for 64-bit machines.
...
On 64-bit machines we should not split doubles into two 32 bit
integer and handle the words separately. We have wide registers.
This patch implements a 64-bit ceil version. Ideally all other
functions will be converted over time.
2009-08-24 18:05:48 -07:00
Ulrich Drepper
9a1ea1525e
Optimize float construction/extraction on x86-64.
2009-08-24 14:52:49 -07:00
Ulrich Drepper
ef72d5f1b9
Optimize x86-64 signbit{,f} a bit.
2009-08-24 10:20:58 -07:00
H.J. Lu
4e1e2f4247
Support mixed SSE/AVX audit and check AVX only once.
...
This patch fixes mixed SSE/AVX audit and checks AVX only once in
_dl_runtime_profile. When an AVX or SSE register value in pltenter is
modified, we have to make sure that the SSE part value is the same in both
lr_xmm and lr_vector fields so that pltexit will get the correct value
from either lr_xmm or lr_vector fields. AVX-enabled pltenter should
update both lr_xmm and lr_vector fields to support stacked AVX/SSE
pltenter functions.
2009-08-08 10:54:42 -07:00
Ulrich Drepper
8e436522e1
Move SSE4.2 functions together.
2009-08-08 09:38:32 -07:00
Ulrich Drepper
0fda545d5f
Add SSSE3-optimized implementation of str{,n}cmp for x86-64.
2009-08-07 22:51:02 -07:00
Ulrich Drepper
57b378ac89
Avoid warning through fake initialization.
2009-08-07 16:19:54 -07:00
Ulrich Drepper
3aa2588d4a
Fix whitespaces in last checkin.
2009-08-07 09:47:12 -07:00
H.J. Lu
a546baa9cd
Properly count number of logical processors on Intel CPUs.
...
The meaning of the 25-14 bits in EAX returned from cpuid with EAX = 4
has been changed from "the maximum number of threads sharing the cache"
to "the maximum number of addressable IDs for logical processors sharing
the cache" if cpuid takes EAX = 11. We need to use results from both
EAX = 4 and EAX = 11 to get the number of threads sharing the cache.
The 25-14 bits in EAX on Core i7 is 15 although the number of logical
processors is 8. Here is a white paper on this:
http://software.intel.com/en-us/articles/intel-64-architecture-processor-topology-enumeration/
This patch correctly counts number of logical processors on Intel CPUs
with EAX = 11 support on cpuid. Tested on Dinnington, Core i7 and
Nehalem EX/EP.
It also fixed Pentium Ds workaround since EBX may not have the right
value returned from cpuid with EAX = 1.
2009-08-07 09:39:36 -07:00
H.J. Lu
02cea47161
Add x86 32-bit SSE4.2 string functions.
...
This patch adds 32bit SSE4.2 string functions. It uses -16L instead of
0xfffffffffffffff0L, which works for both 32bit and 64bit long. Tested
on 32bit Core i7 and Core 2.
2009-08-04 12:13:43 -07:00
H.J. Lu
6f6f1215f6
Support multiarch for i686.
...
This patch adds multiarch support when configured for i686. I modified
some x86-64 functions to support 32bit. I will contribute 32bit SSE string
and memory functions later.
2009-07-31 11:53:35 -07:00
Ulrich Drepper
98b1e6c866
____longjmp_chk is now OS-specific.
...
We use sigaltstack internally which on some systems is a syscall
and should be used as such. Move the x86-64 version to the Linux
specific directory and create in its place a file which always
causes compile errors.
2009-07-30 21:42:27 -07:00
Ulrich Drepper
8e80581787
Change code a bit to correct CFI.
2009-07-30 21:29:27 -07:00
Ulrich Drepper
07df809969
Optimize ____longjmp_chk for x86-64 a bit.
2009-07-30 20:09:30 -07:00
Ulrich Drepper
5ead9ce5c7
Fix x86-64 ____longjmp_chk to handle signal stacks.
...
The simple test previously used might trigger if the longjmp jumps
from the signal stack to the normal stack. We now explicitly test
for this case.
2009-07-30 17:31:48 -07:00
Ulrich Drepper
78c4ef475d
Add support for x86-64 fma instruction.
...
Use it to implement fma and fmaf, if possible.
2009-07-29 15:26:06 -07:00
Ulrich Drepper
9a1d2d4555
Prepare use if IFUNC functions outside libc.so.
...
We use a callback function into libc.so to get access to the data
structure with the information and have special versions of the test
macros which automatically use this function.
2009-07-29 15:22:28 -07:00
Ulrich Drepper
649bf13320
Improve CFI in x86-64 ld.so trampoline code.
2009-07-29 08:50:03 -07:00
H.J. Lu
09e0389eb1
Properly restore AVX registers on x86-64.
...
tst-audit4 and tst-audit5 fail under AVX emulator due to je instead of
jne. This patch fixes them.
2009-07-29 08:40:54 -07:00
Ulrich Drepper
b48a267b8f
Preserve SSE registers in runtime relocations on x86-64.
...
SSE registers are used for passing parameters and must be preserved
in runtime relocations. This is inside ld.so enforced through the
tests in tst-xmmymm.sh. But the malloc routines used after startup
come from libc.so and can be arbitrarily complex. It's overkill
to save the SSE registers all the time because of that. These calls
are rare. Instead we save them on demand. The new infrastructure
put in place in this patch makes this possible and efficient.
2009-07-29 08:33:03 -07:00
Ulrich Drepper
e83c1a8a72
Refine testing for xmm/ymm register use in x86-64 ld.so.
...
The test now takes the callgraph into account. Only code called
during runtime relocation is affected by the limitation. We now
determine the affected object files as closely as possible from
the outside. This allowed to remove some the specializations
for some of the string functions as they are only used in other
code paths.
2009-07-27 13:40:27 -07:00
Ulrich Drepper
009a69f0bc
No need for special strcmp for rtld.
2009-07-27 06:55:04 -07:00
Ulrich Drepper
16d2ea4c82
Make sure no code in ld.so uses xmm/ymm registers on x86-64.
...
This patch introduces a test to make sure no function modifies the
xmm/ymm registers. With the exception of the auditing functions.
The test is probably too pessimistic. All code linked into ld.so
is checked. Perhaps at some point the callgraph starting from
_dl_fixup and _dl_profile_fixup is checked and we can start using
faster SSE-using functions in parts of ld.so.
2009-07-26 16:10:00 -07:00
H.J. Lu
7956a3d27c
Add SSE2 support to str{,n}cmp for x86-64.
2009-07-26 13:32:28 -07:00
H.J. Lu
4e5b5821bf
Some some optimizations for x86-64 strcmp.
2009-07-25 19:15:14 -07:00
Ulrich Drepper
29e92fa5cd
Optimize x86-64 SSE4.2 strcmp.
...
The file contained some code which was never used. Don't compile it
in.
2009-07-25 12:02:47 -07:00
Ulrich Drepper
b2509a1e38
Avoid cpuid instructions in cache info discovery.
...
When multiarch is enabled we have this information stored. Use it.
2009-07-23 14:03:53 -07:00
Ulrich Drepper
3e9099b4f6
Add more cache descriptors for L3 caches on x86 and x86-64.
...
The most recent AP 485 describes a few more cache descriptors for
L3 caches with 24-way associativity.
2009-07-23 13:42:46 -07:00
Ulrich Drepper
d28797e426
Perform test for Arom x86-64 in central place and handle it.
...
There will be more than one function which, in multiarch mode, wants
to use SSSE3. We should not test in each of them for Atoms with
slow SSSE3. Instead, disable the SSSE3 bit in the startup code for
such machines.
2009-07-23 13:15:17 -07:00
Ulrich Drepper
ae612b04cc
Minor cleanups in x86-64 strstr.
2009-07-21 07:52:12 -07:00
Ulrich Drepper
a8f895ebe1
Better check for optimization in new x86-64 strstr/strcasestr.
2009-07-20 21:18:28 -07:00
H.J. Lu
2b7a8664fa
SSE4.2 strstr/strcasestr for x86-64.
...
This patch implements SSE4.2 strstr/strcasestr, using Knuth-Morris-Pratt
string searching algorithm.
2009-07-20 21:06:50 -07:00
Ulrich Drepper
c8027cced1
Optimize restoring of ymm registers on x86-64.
...
The patch mainly reduces the code size but also avoids some jumps.
2009-07-16 07:15:15 -07:00
Ulrich Drepper
24a12a5a5f
Fix up whitespaces in new memcmp for x86-64.
2009-07-16 07:02:27 -07:00
H.J. Lu
e26c9b8415
memcmp implementation for x86-64 using SSE2.
2009-07-16 07:00:34 -07:00
Ulrich Drepper
ca419225a3
Fix thinko in AVX audit patch.
...
Don't use AVX instructions too often.
2009-07-15 17:59:14 -07:00
Ulrich Drepper
47fc9b710b
Fix typo in last change.
2009-07-15 17:51:11 -07:00
Ulrich Drepper
d7bd7a8ae8
Secure AVX changes for auditing code.
...
The original AVX patch used a function pointer to handle the difference
between machines with and without AVX support. This is insecure. A
well-placed memory exploit could lead to redirection of the execution.
Using a variable and several tests is a bit slower but cannot be
exploited in this way.
2009-07-15 17:41:36 -07:00
H.J. Lu
b0ecde3a63
Add AVX support to ld.so auditing for x86-64.
2009-07-10 12:04:14 -07:00
Ulrich Drepper
cea4329592
Minor cleanups in recently added files.
2009-07-03 03:23:01 -07:00
Ulrich Drepper
d6485c981b
Align functions to 16-byte boundary.
...
Some of the new multi-arch string functions for x86-64 were
not aligned to 16 byte boundarie,s possibly creating unnecessary
cache line misses and delays.
2009-07-03 03:01:57 -07:00
H.J. Lu
06e51c8f3d
Add SSE4.2 support for strcspn, strpbrk, and strspn on x86-64.
2009-07-03 02:48:56 -07:00
H.J. Lu
167d5ed5de
Fix handling of xmm6 in ld.so audit hooks on x86-64.
2009-07-02 04:33:12 -07:00
Ulrich Drepper
af263b8154
Whitespace fixes in last patch.
2009-07-02 03:43:05 -07:00
H.J. Lu
ab6a873fe0
SSSE3 strcpy/stpcpy for x86-64
...
This patch adds SSSE3 strcpy/stpcpy. I got up to 4X speed up on Core 2
and Core i7. I disabled it on Atom since SSSE3 version is slower for
shorter (<64byte) data.
2009-07-02 03:39:03 -07:00
Ulrich Drepper
e6bd12ddf7
Regenerated.
2009-06-30 05:33:52 -07:00
Ulrich Drepper
b38a2e2e64
Fix little checkin problem in last patch.
2009-06-30 04:41:38 -07:00
H.J. Lu
0181291385
Determine and store processor family and model on x86-64.
2009-06-30 04:39:09 -07:00
Ulrich Drepper
059215ae21
Clean up whitespaces in last patch.
2009-06-22 20:39:37 -07:00
H.J. Lu
772f4e6a1b
Add SSE4.2 support for strcmp and strncmp on x86-64.
2009-06-22 20:38:41 -07:00
Jakub Jelinek
fab8238de6
Fix x86-64 memchr for large lengths.
2009-06-16 10:23:31 -07:00
Ulrich Drepper
eb0b6cb6e1
Fix warnings when using <sys/select.h>.
...
gcc 4.4 is more picky. And the x86-64 version of <bits/select.h>
contained a now unnecessary asm optimization. Remove it.
2009-06-14 16:09:42 -07:00
Ulrich Drepper
b77c932329
Add SSE4.2 optimized rawmemchr implementation for x86-64.
2009-06-05 16:54:50 -07:00
Ulrich Drepper
6f9eea15bf
Forgot some more cleanups for the SSE4.2 strlen on x86-64.
2009-06-05 11:51:59 -07:00
Ulrich Drepper
f85a9e72e2
Add missing cleanups from SSE4.2 x86-64 strlen.
2009-06-05 11:39:45 -07:00
Ulrich Drepper
3ab2d57a4d
Optimize x86-64 strlen for SSE4.2.
...
The SSE4.2 implementation is used in the DSO only. The patch also adds
some infrastructure to be used in similar code later one.
2009-06-05 11:32:00 -07:00
Ulrich Drepper
2f3f7b9da2
More small optimizations for x86-64 strlen.
2009-06-04 16:45:35 -07:00
Ulrich Drepper
747785f2b3
Tiny strlen for x86-64 optimization.
...
I didn't remove an instruction from a previous version in the final
version.
2009-06-04 10:54:29 -07:00
Ulrich Drepper
fd96f06208
Small optimization of STT_GNU_IFUNC handling.
...
The test to call the indirect function now includes a subtest to
checked whether the symbol is defined. When coming to that point
this is almost always the case. The test for STT_GNU_IFUNC on the
other hand rarely is true. Move it to the front means we don't have
to perform the second test unless really necessary.
2009-06-01 11:49:05 -07:00
Ulrich Drepper
b7629ee33f
Better error message for invalid relocatio in static binary.
2009-06-01 11:39:24 -07:00
Ulrich Drepper
8ea2372936
Fix up sched_cpucount in x86-64.
...
Now that static executables can handle IFUNC functions don't exclude
optimization for sched_cpucount for !SHARED.
2009-05-31 23:46:42 -07:00
Ulrich Drepper
7441470835
Finish IFUNC support for x86 and x86-64.
...
Add support for the IRELAIVE relocation and IFUNC in static executables.
2009-05-31 23:45:33 -07:00
Ulrich Drepper
963cb6fcb4
Simplify CPUID value handling.
...
SO far Intel and AMD use exactly the same bits meaning the same
things in CPUID index 1. Simplify the code. Should an architecture
come along which doesn't use the same semantics then it must use a
different index value than COMMON_CPUID_INDEX_1.
2009-05-31 17:52:05 -07:00
Ulrich Drepper
1de0c16183
Compact cache info data structure for x86/x86-64.
...
This saves about 1.5kB in the DSO.
2009-05-29 11:53:36 -07:00
H.J. Lu
e7535de78f
Add missing .text directives.
...
The ____longjmp_chk functions on x86 and x86-64 were placed in .rodata.str1.1.
2009-05-21 18:38:11 -07:00
Ulrich Drepper
b50f8e42ba
Check for valid stack frame in longjmp.
...
If longjmp restores the stack frame to an address which is beyond
the stack frame at the time of the longjmp call it would install
an uninitialized stack frame. If compiled with _FORTIFY_SOURCE
defined, longjmp will now bail out in this situation.
2009-05-15 19:37:13 -07:00
Ulrich Drepper
deb84c43b1
* version.h (VERSION): Bump to 2.10.1.
...
* nss/getXXbyYY_r.c: If NO_COMPAT_NEEDED is defined don't define any
compatibility functions.
* nss/getXXent_r.c: Likewise.
* gshadow/getsgent_r.c: Define NO_COMPAT_NEEDED.
* gshadow/getsgnam_r.c: Likewise.
* gshadow/Version: Remove duplicate entries.
* sysdeps/x86_64/cacheinfo.c (intel_02_cache_info): Add missing entries
for recent processor.
* sysdeps/unix/sysv/linux/i386/sysconf.c (intel_02_cache_info):
Likewise.
2009-05-10 18:38:52 +00:00