glibc/string
Leonhard Holz 0742aef6e5 strcoll: improve performance by removing the cache (#15884)
this is a path that should solve bug 15884. It complains about the performance
of strcoll(). It was found out that the runtime of strcoll() is actually bound
to strlen which is needed for calculating the size of a cache that was
installed to improve the comparison performance.

The idea for this patch was that the cache is only useful in rare cases
(strings of same length and same first-level-chars) and that it would be
better to avoid memory allocation at all. To prove this I wrote a performance
test bench-strcoll.c with test data in benchtests-strcoll.tar.gz. Also
modifications in benchtests/Makefile and localedata/Makefile are necessary to
make it work.

After removing the cache the strcoll method showed the predicted behavior
(getting slightly faster) in all but the test case for hindi word sorting.
This was due the hindi text having much more equal words than the other ones.
For equal strings the performance was worse since all comparison levels were
run through and from the second level on the cache improved the comparison
performance of the original version.

Therefore I added a bytewise test via strcmp iff the first level comparison
found that both strings did match because in this case it is very likely that
equal strings are compared. This solved the problem with the hindi test case
and improved the performance of the others.

Performance comparison:

glibc files     -33.77%
vi_VN.UTF-8     -34.12%
en_US.UTF-8     -42.42%
ar_SA.UTF-8     -27.49%
zh_CN.UTF-8     +07.90%
cs_CZ.UTF-8     -29.67%
en_GB.UTF-8     -28.50%
da_DK.UTF-8     -36.57%
pl_PL.UTF-8     -39.31%
fr_FR.UTF-8     -28.57%
pt_PT.UTF-8     -22.82%
el_GR.UTF-8     -26.77%
ru_RU.UTF-8     -35.81%
iw_IL.UTF-8     -35.34%
es_ES.UTF-8     -34.46%
hi_IN.UTF-8     -00.38%
sv_SE.UTF-8     -36.99%
hu_HU.UTF-8     -16.35%
tr_TR.UTF-8     -27.80%
is_IS.UTF-8     -33.24%
it_IT.UTF-8     -24.39%
sr_RS.UTF-8     -37.55%
ja_JP.UTF-8     +02.84%
2014-10-17 15:47:23 +05:30
..
bits Update feature guard for strdup/strndup in <bits/string2.h> 2014-06-16 10:21:31 +02:00
_strerror.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-addsep.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-append.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-count.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-create.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-ctsep.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-delete.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-extract.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-insert.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-next.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-replace.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz-stringify.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
argz.h Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
basename.c string: Cosmetic cleanup of string functions 2014-04-07 09:44:02 +01:00
bcopy.c PowerPC: optimized memmove for POWER7/PPC64 2014-07-07 15:41:21 -05:00
bug-envz1.c * string/Makefile (tests): Add bug-envz1. 2006-06-04 16:36:04 +00:00
bug-strcoll1.c Update. 2001-04-26 20:45:18 +00:00
bug-strncat1.c * malloc/memusagestat.c (main): Use return instead of exit to 2000-12-31 10:52:32 +00:00
bug-strpbrk1.c * malloc/memusagestat.c (main): Use return instead of exit to 2000-12-31 10:52:32 +00:00
bug-strspn1.c * malloc/memusagestat.c (main): Use return instead of exit to 2000-12-31 10:52:32 +00:00
bug-strtok1.c [BZ #2126] 2006-01-10 00:25:07 +00:00
byteswap.h Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
bzero.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
Depend Update. 2001-03-19 21:40:15 +00:00
endian.h Combine __USE_BSD and __USE_SVID into __USE_MISC. 2014-02-12 23:41:01 +00:00
envz.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
envz.h Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
ffs.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
ffsll.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
inl-tester.c Update. 1997-09-11 12:09:10 +00:00
Makefile Remove redundant C locale settings. 2014-06-07 19:58:36 +00:00
memccpy.c string: Cosmetic cleanup of string functions 2014-04-07 09:44:02 +01:00
memchr.c string/memchr.c: Merge from gnulib 2014-07-04 09:23:21 +01:00
memcmp.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memcpy.c Fix -Wundef warning on PAGE_COPY_THRESHOLD 2014-07-03 01:49:43 +05:30
memfrob.c string: Cosmetic cleanup of string functions 2014-04-07 09:44:02 +01:00
memmem.c Use glibc_likely instead __builtin_expect. 2014-02-10 15:07:12 +01:00
memmove.c Fix -Wundef warning on PAGE_COPY_THRESHOLD 2014-07-03 01:49:43 +05:30
memory.h Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
mempcpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memrchr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
memset.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
noinl-tester.c Update. 1997-09-16 00:42:43 +00:00
rawmemchr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
stpcpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
stpncpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
str-two-way.h Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
stratcliff.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcasecmp_l.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcasecmp.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcasestr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcat.c string: Cosmetic cleanup of string functions 2014-04-07 09:44:02 +01:00
strchr.c string: Cosmetic cleanup of string functions 2014-04-07 09:44:02 +01:00
strchrnul.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strcmp.c string: Cosmetic cleanup of string functions 2014-04-07 09:44:02 +01:00
strcoll_l.c strcoll: improve performance by removing the cache (#15884) 2014-10-17 15:47:23 +05:30
strcoll.c string: Cosmetic cleanup of string functions 2014-04-07 09:44:02 +01:00
strcpy.c string: Cosmetic cleanup of string functions 2014-04-07 09:44:02 +01:00
strcspn.c PowerPC: optimized strcspn for PPC64/POWER7 2014-03-20 11:24:52 -05:00
strdup.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strerror_l.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strerror.c Use glibc_likely instead __builtin_expect. 2014-02-10 15:07:12 +01:00
strfry.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
string-inlines.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
string.h Complete _BSD_SOURCE / _SVID_source followup cleanup. 2014-02-21 21:45:26 +00:00
strings.h Combine __USE_BSD and __USE_SVID into __USE_MISC. 2014-02-12 23:41:01 +00:00
strlen.c string: Cosmetic cleanup of string functions 2014-04-07 09:44:02 +01:00
strncase_l.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strncase.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strncat.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strncmp.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strncpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strndup.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strnlen.c Use glibc_likely instead __builtin_expect. 2014-02-10 15:07:12 +01:00
strpbrk.c PowerPC: optimized strpbrk for POWER7 2014-03-20 19:46:13 -05:00
strrchr.c PowerPC: strrchr optimization for POWER7/PPC64 2014-03-03 08:06:41 -06:00
strsep.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strsignal.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strspn.c PowerPC: strspn optimization for PPC64/POWER7 2014-03-11 08:54:33 -05:00
strstr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strtok_r.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strtok.c string: Cosmetic cleanup of string functions 2014-04-07 09:44:02 +01:00
strverscmp.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
strxfrm_l.c Move findidx nested functions to top-level. 2014-09-11 16:02:17 -07:00
strxfrm.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
swab.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-bcopy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-bzero.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-ffs.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-memccpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-memchr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-memcmp.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-memcpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-memmem.c Use glibc_likely instead __builtin_expect. 2014-02-10 15:07:12 +01:00
test-memmove.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-mempcpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-memrchr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-memset.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-rawmemchr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-stpcpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-stpncpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strcasecmp.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strcasestr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strcat.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strchr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strchrnul.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strcmp.c Fix v9/64-bit strcmp when string ends in multiple zero bytes. 2014-05-01 12:15:06 -07:00
test-strcpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strcspn.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-string.h Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strlen.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strncasecmp.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strncat.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strncmp.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strncpy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strnlen.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strpbrk.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strrchr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strspn.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
test-strstr.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
testcopy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
tester.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
tst-bswap.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
tst-endian.c Add #include <stdint.h> for uint[32|64]_t usage (except installed headers). 2013-05-16 11:32:54 -05:00
tst-inlcall.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
tst-strcoll-overflow.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
tst-strfry.c * stdlib/random_r.c (__initstate_r): Don't use non-existing state. 2005-04-12 15:29:07 +00:00
tst-strlen.c Add optimized x86-64 implementation of strnlen. 2010-07-26 08:37:08 -07:00
tst-strtok_r.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
tst-strtok.c Update. 2001-02-22 13:46:25 +00:00
tst-strxfrm2.c * string/strxfrm_l.c (STRXFRM): Fix trailing \1 optimization 2006-11-10 15:20:59 +00:00
tst-strxfrm.c 2002-08-29 Roland McGrath <roland@redhat.com> 2002-08-29 09:26:30 +00:00
tst-svc2.c [BZ #9893] 2009-03-14 23:57:33 +00:00
tst-svc.c * malloc/memusagestat.c (main): Use return instead of exit to 2000-12-31 10:52:32 +00:00
tst-svc.expect * string/strverscmp.c (__strverscmp): Fix last cleanups. 2009-04-07 06:51:59 +00:00
tst-svc.input * string/strverscmp.c (__strverscmp): Fix last cleanups. 2009-04-07 06:51:59 +00:00
Versions Sort Versions files 2013-02-17 16:34:04 +01:00
wordcopy.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
xpg-strerror.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00