Commit Graph

35302 Commits

Author SHA1 Message Date
Stefan Liebler
bfdb731438 S390: Fix handling of needles crossing a page in strstr z15 ifunc-variant. [BZ #25226]
If the specified needle crosses a page-boundary, the s390-z15 ifunc variant of
strstr truncates the needle which results in invalid results.

This is fixed by loading the needle beyond the page boundary to v18 instead of v16.
The bug is sometimes observable in test-strstr.c in check1 and check2 as the
haystack and needle is stored on stack. Thus the needle can be on a page boundary.

check2 is now extended to test haystack / needles located on stack, at end of page
and on two pages.

This bug was introduced with commit 6f47401bd5
("S390: Add arch13 strstr ifunc variant.") and is already released in glibc 2.30.
2019-11-27 12:35:40 +01:00
Adhemerval Zanella
acfe409119 nptl: Fix __PTHREAD_MUTEX_INITIALIZER for !__PTHREAD_MUTEX_HAVE_PREV
The nptl: Add struct_mutex.h added a wrong initializer for
architectures that uses the generic struct_mutex.h.

Checked on sparcv9-linux-gnu (where I noted the issue with the
nptl/tst-initializers1*).
2019-11-26 17:00:19 -03:00
Sandra Loosemore
c72e5cd87d Compile elf/rtld.c with -fno-tree-loop-distribute-patterns.
In GCC 10, the default at -O2 is now -ftree-loop-distribute-patterns.
This optimization causes GCC to "helpfully" convert the hand-written
loop in _dl_start into a call to memset, which is not available that
early in program startup.  Similar problems in other places in GLIBC
have been addressed by explicitly building with
-fno-tree-loop-distribute-patterns, but this one may have been
overlooked previously because it only affects targets where
HAVE_BUILTIN_MEMSET is not defined.

This patch fixes a bug observed on nios2-linux-gnu target that caused
all programs to segv on startup.
2019-11-26 19:18:23 +01:00
Adhemerval Zanella
cc0e0b097c hppa: Remove unrequired nptl headers
Now that both pthread_mutex_t and pthread_rwlock_t static initializer
are parametrized in their own headers HPPA pthread.h is identical to
generic nptl one.

Checked on hppa-linux-gnu.

Change-Id: I236cfceb5656cfcce42c9e367a4f6803e2abd88b
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
7ddac7f265 nptl: Add default pthread-offsets.h
This patch adds a default pthread-offsets.h based on default
thread definitions from struct_mutex.h and struct_rwlock.h.
The idea is to simplify new ports inclusion.

Checked with a build on affected abis.

Change-Id: I7785a9581e651feb80d1413b9e03b5ac0452668a
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
94a62cc55a nptl: Add default pthreadtypes-arch.h
This patch adds a default pthreadtypes-arch.h, the idea is to simpify
new ports inclusion and an override is required only if the architecture
adds some arch-specific extensions or requirement.

The default values on the new generic header are based on current
architecture define value and they are not optimal compared to current
code requirements as below.

  - On 64 bits __SIZEOF_PTHREAD_BARRIER_T is defined as 32 while is
    sizeof (struct pthread_barrier) is 20 bytes.

  - On 32 bits __SIZEOF_PTHREAD_ATTR_T is defined as 36 while
    sizeof (struct pthread_attr) is 32.

The default values are not changed so the generic header could be
used by some architectures.

Checked with a build on affected abis.

Change-Id: Ie0cd586258a2650f715c1af0c9fe4e7063b0409a
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
7df8af43ad nptl: Add struct_rwlock.h
This patch adds a new generic __pthread_rwlock_arch_t definition meant
to be used by new ports.  Its layout mimics the current usage on some
64 bits ports and it allows some ports to use the generic definition.
The arch __pthread_rwlock_arch_t definition is moved from
pthreadtypes-arch.h to another arch-specific header (struct_rwlock.h).

Also the static intialization macro for pthread_rwlock_t is set to use
an arch defined on (__PTHREAD_RWLOCK_INITIALIZER) which simplifies its
implementation.

The default pthread_rwlock_t layout differs from current ports with:

  1. Internal layout is the same for 32 bits and 64 bits.

  2. Internal flag is an unsigned short so it should not required
     additional padding to align for word boundary (if it is the case
     for the ABI).

Checked with a build on affected abis.

Change-Id: I776a6a986c23199929d28a3dcd30272db21cd1d0
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
1c3f9acf1f nptl: Add struct_mutex.h
The current way of defining the common mutex definition for POSIX and
C11 on pthreadtypes-arch.h (added by commit 06be6368da) is
not really the best options for newer ports.  It requires define some
misleading flags that should be always defined as 0
(__PTHREAD_COMPAT_PADDING_MID and __PTHREAD_COMPAT_PADDING_END), it
exposes options used solely for linuxthreads compat mode
(__PTHREAD_MUTEX_USE_UNION and __PTHREAD_MUTEX_NUSERS_AFTER_KIND), and
requires newer ports to explicit define them (adding more boilerplate
code).

This patch adds a new default __pthread_mutex_s definition meant to
be used by newer ports.  Its layout mimics the current usage on both
32 and 64 bits ports and it allows most ports to use the generic
definition.  Only ports that use some arch-specific definition (such
as hardware lock-elision or linuxthreads compat) requires specific
headers.

For 32 bit, the generic definitions mimic the other 32-bit ports
of using an union to define the fields uses on adaptive and robust
mutexes (thus not allowing both usage at same time) and by using a
single linked-list for robust mutexes.  Both decisions seemed to
follow what recent ports have done and make the resulting
pthread_mutex_t/mtx_t object smaller.

Also the static intialization macro for pthread_mutex_t is set to use
a macro __PTHREAD_MUTEX_INITIALIZER where the architecture can redefine
in its struct_mutex.h if it requires additional fields to be
initialized.

Checked with a build on affected abis.

Change-Id: I30a22c3e3497805fd6e52994c5925897cffcfe13
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
0377a7fde6 nptl: Remove rwlock elision definitions
The new rwlock implementation added by cc25c8b4c1 (2.25) removed
support for lock-elision.  This patch removes remaining the
arch-specific unused definitions.

Checked with a build against all affected ABIs.

Change-Id: I5dec8af50e3cd56d7351c52ceff4aa3771b53cd6
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
48dbce60cf nptl: Add tests for internal pthread_rwlock_t offsets
This patch new build tests to check for internal fields offsets for
internal pthread_rwlock_t definition.  Althoug the '__data.__flags'
field layout should be preserved due static initializators, the patch
also adds tests for the futexes that may be used in a shared memory
(although using different libc version in such scenario is not really
supported).

Checked with a build against all affected ABIs.

Change-Id: Iccc103d557de13d17e4a3f59a0cad2f4a640c148
2019-11-26 13:53:36 +00:00
Adhemerval Zanella
71d260c107 nptl: Cleanup mutex internal offset tests
The offsets of pthread_mutex_t __data.__nusers, __data.__spins,
__data.elision, __data.list are not required to be constant over
the releases.  Only the __data.__kind is used for static
initializers.

This patch also adds an additional size check for __data.__kind.

Checked with a build against affected ABIs.

Change-Id: I7a4e48cc91b4c4ada57e9a5d1b151fb702bfaa9f
2019-11-26 13:53:36 +00:00
Egor Kobylkin
7fc8c286e3 locale: Greek -> ASCII transliteration table [BZ #12031] 2019-11-26 12:23:09 +01:00
Rafał Lużyński
c372d2e863 ru_UA locale: use copy "ru_RU" in LC_TIME (bug 25044)
Replacing incorrect abbreviated weekday names "Пнд", "Вто", "Срд"...
with correct ones "Пн", "Вт", "Ср"... makes the LC_TIME sections in
those two locales almost identical.  The only remaining difference
was that ab_alt_mon elements in ru_UA were lowercase while in ru_RU
they had the first letter uppercase, the latter was pointed as
a better choice by a native speaker.  This commit unifies LC_TIME
between ru_RU and ru_UA.
2019-11-26 11:54:29 +01:00
Tim Rühsen
c1de872c8c sysdeps/posix/getaddrinfo: Return early on invalid address family
Check address family before expensive function call (__check_pf).
2019-11-26 10:19:33 +01:00
Tim Rühsen
cceb038ac0 sysdeps/posix: Simplify if expression in getaddrinfo
Small code cleanup for better readability.
2019-11-26 09:58:50 +01:00
Joseph Myers
17832eefee Use Linux 5.4 in build-many-glibcs.py.
This patch makes build-many-glibcs.py use Linux 5.4.

Tested with build-many-glibcs.py (compilers and glibcs builds).
2019-11-26 01:46:32 +00:00
Adhemerval Zanella
d9202f1883 arm: Fix armv7 selection after 'Split BE/LE abilist'
It adds the missing Implies for armv7, armv6, armv6t2 after the
commit 1673ba87fe.  Without the Implies a build with the
compiler targeting the aforementioned architecture does not select
the arch-specific optimization including the ifunc selectors.

I checked with a build against armv5, armv6, armv6t2, armv7, and
armv7-neon for both LE and BE.  For armv6 and armv7 I also checked
that both sysdeps selection and the resulting implementation built
is the expected ones.
2019-11-25 13:34:44 -03:00
Gabriel F. T. Gomes
b370c5f014 ldbl-128ibm-compat: Add wide character scanning functions
Similarly to what was done for regular character scanning functions,
this patch uses the new mode mask, SCANF_LDBL_USES_FLOAT128, in the
'mode' argument of the wide characters scanning function,
__vfwscanf_internal (which is also extended to support scanning
floating-point values with IEEE binary128, by redirecting calls to
__wcstold_internal to __wcstof128_internal).

Tested for powerpc64le.

Reviewed-By: Paul E. Murphy <murphyp@linux.ibm.com>
2019-11-22 18:13:20 -03:00
Gabriel F. T. Gomes
a5b15bdec8 ldbl-128ibm-compat: Add regular character scanning functions
The 'mode' argument to __vfscanf_internal allows the selection of the
long double format for all long double arguments requested by the format
string.  Currently, there are two possibilities: long double with the
same format as double or long double as something else.  The 'something
else' format varies between architectures, and on powerpc64le, it means
IBM Extended Precision format.

In preparation for the third option of long double format on
powerpc64le, this patch uses the new mode mask,
SCANF_LDBL_USES_FLOAT128, which tells __vfscanf_internal to call
__strtof128_internal, instead of __strtold_internal, and save the output
into a _Float128 variable.

Tested for powerpc64le.

Reviewed-By: Paul E. Murphy <murphyp@linux.ibm.com>
2019-11-22 18:13:01 -03:00
Gabriel F. T. Gomes
c2f959ed5f ldbl-128ibm-compat: Test positional arguments
The format string can request positional parameters, instead of relying
on the order in which they appear as arguments.  Since this has an
effect on how the type of each argument is determined, this patch
extends the test cases to use positional parameters with mixed double
and long double types, to verify that the IEEE long double
implementations of *printf work correctly in this scenario.

Tested for powerpc64le.

Reviewed-By: Paul E. Murphy <murphyp@linux.ibm.com>
2019-11-22 18:12:54 -03:00
Gabriel F. T. Gomes
5bbbd5ae05 ldbl-128ibm-compat: Test double values
A single format string can take double and long double parameters at the
same time.  Internally, these parameters are routed to the same
function, which correctly reads them and calls the underlying functions
responsible for the actual conversion to string.  This patch adds a new
case to test this scenario.

Tested for powerpc64le.

Reviewed-By: Paul E. Murphy <murphyp@linux.ibm.com>
2019-11-22 18:12:37 -03:00
Gabriel F. T. Gomes
329037cead ldbl-128ibm-compat: Add wide character, fortified printing functions
Similarly to what was done for the regular character, fortified printing
functions, this patch combines the mode masks PRINTF_LDBL_USES_FLOAT128
and PRINTF_FORTIFY to provide wide character versions of fortified
printf functions.  It also adds two flavors of test cases: one that
explicitly calls the fortified functions, and another that reuses the
non-fortified test, but defining _FORTIFY_SOURCE as 2.  The first
guarantees that the implementations are actually being tested
(independently of what's in bits/wchar2.h), whereas the second
guarantees that the redirections calls the correct function in the IBM
and IEEE long double cases.

Tested for powerpc64le.

Reviewed-By: Paul E. Murphy <murphyp@linux.ibm.com>
2019-11-22 18:12:27 -03:00
Gabriel F. T. Gomes
5aa64dbc29 ldbl-128ibm-compat: Add regular character, fortified printing functions
Since the introduction of internal functions with explicit flags for the
printf family of functions, the 'mode' parameter can be used to select
which format long double parameters have (with the mode flags:
PRINTF_LDBL_IS_DBL and PRINTF_LDBL_USES_FLOAT128), as well as to select
whether to check for overflows (mode flag: PRINTF_FORTIFY).

This patch combines PRINTF_LDBL_USES_FLOAT128 and PRINTF_FORTIFY to
provide the IEEE binary128 version of printf-like function for platforms
where long double can take this format, in addition to the double format
and to some non-ieee format (currently, this means powerpc64le).

There are two flavors of test cases provided with this patch: one that
explicitly calls the fortified functions, for instance __asprintf_chk,
and another that reuses the non-fortified test, but defining
_FORTIFY_SOURCE as 2.  The first guarantees that the implementations are
actually being tested (in bits/stdio2.h, vprintf gets redirected to
__vfprintf_chk, which would leave __vprintf_chk untested), whereas the
second guarantees that the redirections calls the correct function in
the IBM and IEEE long double cases.

Tested for powerpc64le.

Reviewed-By: Paul E. Murphy <murphyp@linux.ibm.com>
2019-11-22 18:11:49 -03:00
Gabriel F. T. Gomes
1771a5cf0e ldbl-128ibm-compat: Add wide character printing functions
Similarly to what was done for regular character printing functions,
this patch uses the new mode mask, PRINTF_LDBL_USES_FLOAT128, in the
'mode' argument of the wide characters printing function,
__vfwprintf_internal (which is also extended to support printing
floating-point values with IEEE binary128, by saving floating-point
values into variables of type __float128 and adjusting the parameters to
__printf_fp and __printf_fphex as if it was a call from a wide-character
version of strfromf128 (even though such version does not exist)).

Tested for powerpc64le.

Reviewed-By: Paul E. Murphy <murphyp@linux.ibm.com>
2019-11-22 18:11:38 -03:00
Gabriel F. T. Gomes
421a1d34bf ldbl-128ibm-compat: Add regular character printing functions
The 'mode' argument to __vfprintf_internal allows the selection of the
long double format for all long double arguments requested by the format
string.  Currently, there are two possibilities: long double with the
same format as double or long double as something else.  The 'something
else' format varies between architectures, and on powerpc64le, it means
IBM Extended Precision format.

In preparation for the third option of long double format on
powerpc64le, this patch uses the new mode mask,
PRINTF_LDBL_USES_FLOAT128, which tells __vfprintf_internal to save the
floating-point values into variables of type __float128 and adjusts the
parameters to __printf_fp and __printf_fphex as if it was a call from
strfromf128.

Many files from the stdio-common, wcsmbs, argp, misc, and libio
directories will have IEEE binary128 counterparts.  Setting the correct
compiler options to these files (original and counterparts) would
produce a large amount of repetitive Makefile rules.  To avoid this
repetition, this patch adds a Makefile routine that iterates over the
files adding or removing the appropriate flags.

Tested for powerpc64le.

Reviewed-By: Florian Weimer <fweimer@redhat.com>
Reviewed-By: Joseph Myers <joseph@codesourcery.com>
Reviewed-By: Paul E. Murphy <murphyp@linux.ibm.com>
2019-11-22 18:10:52 -03:00
Gabriel F. T. Gomes
93486ba583 Use DEPRECATED_SCANF macro for remaining C99-compliant scanf functions
When the commit

commit 03992356e6
Author: Zack Weinberg <zackw@panix.com>
Date:   Sat Feb 10 11:58:35 2018 -0500

    Use C99-compliant scanf under _GNU_SOURCE with modern compilers.

added the DEPRECATED_SCANF macro to select when redirections of *scanf
functions to their ISO C99 compliant versions should happen, it
accidentally missed doing it for vfwscanf, vwscanf, and vswscanf.

Tested for powerpc64le and with build-many-glibcs (i686-linux-gnu and
nios2-linux-gnu are failing with current master, and with this patch,
but I didn't see a regression).

Change-Id: I706b344a3fb50be017cdab9251d9da18a3ba8c60
2019-11-22 15:29:21 -03:00
Adhemerval Zanella
8781c1301d misc: Set generic pselect as ENOSYS
The generic pselect implementation has the very specific race condition
that motived the creation of the pselect syscall (no atomicity in
signal mask set/reset).  Using it as generic implementation is
counterproductive  Also currently only microblaze uses it as fallback
when used on kernel prior 3.15.

This patch moves the generic implementation to a microblaze specific
one, sets the generic internal as a ENOSYS, and cleanups the Linux
generic implementation.

The microblaze implementation mimics the previous Linux generic one,
where it either uses pselect6 directly if __ASSUME_PSELECT or a
first try pselect6 then the fallback otherwise.

Checked on x86_64-linux-gnu and microblaze-linux-gnu.
2019-11-22 14:40:57 -03:00
Paul A. Clarke
102b5b0caf Remove duplicate inline implementation of issignalingf
Very recent commit 854e91bf6b enabled
inline of issignalingf() in general (__issignalingf in include/math.h).
There is another implementation for an inline use of issignalingf
(issignalingf_inline in sysdeps/ieee754/flt-32/math_config.h)
which could instead make use of the new enablement.

Replace the use of issignalingf_inline with __issignaling.  Using
issignaling (instead of __issignalingf) will allow future enhancements
to the type-generic implementation, issignaling, to be automatically
adopted.

The implementations are slightly different, and compile to slightly
different code, but I measured no significant performance difference.

The second implementation was brought to my attention by:
Suggested-by: Joseph Myers <joseph@codesourcery.com>
Reviewed-by: Joseph Myers <joseph@codesourcery.com>
2019-11-22 11:37:40 -06:00
Emilio Cobos Álvarez
bfa864e164 Don't use a custom wrapper macro around __has_include (bug 25189).
This causes issues when using clang with -frewrite-includes to e.g.,
submit the translation unit to a distributed compiler.

In my case, I was building Firefox using sccache.

See [1] for a reduced test-case since I initially thought this was a
clang bug, and [2] for more context.

Apparently doing this is invalid C++ per [cpp.cond], which mentions [3]:

> The #ifdef and #ifndef directives, and the defined conditional
> inclusion operator, shall treat __has_include and __has_cpp_attribute
> as if they were the names of defined macros.  The identifiers
> __has_include and __has_cpp_attribute shall not appear in any context
> not mentioned in this subclause.

[1]: https://bugs.llvm.org/show_bug.cgi?id=43982
[2]: https://bugs.llvm.org/show_bug.cgi?id=37990
[3]: http://eel.is/c++draft/cpp.cond#7.sentence-2

Change-Id: Id4b8ee19176a9e4624b533087ba870c418f27e60
2019-11-21 17:54:16 +01:00
Paul A. Clarke
854e91bf6b Enable inlining issignalingf within glibc
issignalingf is a very small function used in some areas where
better performance (and smaller code) might be helpful.

Create inline implementation for issignalingf.

Reviewed-by: Joseph Myers <joseph@codesourcery.com>
2019-11-21 09:39:48 -06:00
Florian Weimer
fcb04b9aed Introduce DL_LOOKUP_FOR_RELOCATE flag for _dl_lookup_symbol_x
This will allow changes in dependency processing during non-lazy
binding, for more precise processing of NODELETE objects: During
initial relocation in dlopen, the fate of NODELETE objects is still
unclear, so objects which are depended upon by NODELETE objects
cannot immediately be marked as NODELETE.

Change-Id: Ic7b94a3f7c4719a00ca8e6018088567824da0658
2019-11-21 13:31:29 +01:00
Marcin Kościelnicki
d5dfad4326 rtld: Check __libc_enable_secure before honoring LD_PREFER_MAP_32BIT_EXEC (CVE-2019-19126) [BZ #25204]
The problem was introduced in glibc 2.23, in commit
b9eb92ab05
("Add Prefer_MAP_32BIT_EXEC to map executable pages with MAP_32BIT").
2019-11-21 12:56:44 +01:00
Florian Weimer
2a764c6ee8 Enhance _dl_catch_exception to allow disabling exception handling
In some cases, it is necessary to introduce noexcept regions
where raised dynamic loader exceptions (e.g., from lazy binding)
are fatal, despite being nested in a code region with an active
exception handler.  This change enhances _dl_catch_exception with
to provide such a capability.  The existing function is reused,
so that it is not necessary to introduce yet another function with
a similar purpose.

Change-Id: Iec1bf642ff95a349fdde8040e9baf851ac7b8904
2019-11-16 15:57:01 +01:00
Florian Weimer
84df7a4637 hurd: Suppress GCC 10 -Warray-bounds warning in init-first.c [BZ #25097]
The trampoline code should really be rewritten in assembler because
this is all very undefined at the C level.

Change-Id: Ided58244ca0ee48892519faac5ac222a4e02dec4
2019-11-16 15:40:59 +01:00
Florian Weimer
9e3e27c4e3 linux: Add comment on affinity set sizes to tst-skeleton-affinity.c
Change-Id: Ic6ec48f75f3a0576d3121befd04531382c92afb4
2019-11-15 13:05:09 +01:00
Florian Weimer
e21a786771 Avoid zero-length array at the end of struct link_map [BZ #25097]
l_audit ends up as an internal array with _rtld_global, and GCC 10
warns about this.

This commit does not change the layout of _rtld_global, so it is
suitable for backporting.  Future changes could allocate more of the
audit state dynamically and remove it from always-allocated data
structures, to optimize the common case of inactive auditing.

Change-Id: Ic911100730f9124d4ea977ead8e13cee64b84d45
2019-11-15 13:03:59 +01:00
Florian Weimer
e1d559f337 Introduce link_map_audit_state accessor function
To improve GCC 10 compatibility, it is necessary to remove the l_audit
zero-length array from the end of struct link_map.  In preparation of
that, this commit introduces an accessor function for the audit state,
so that it is possible to change the representation of the audit state
without adjusting the code that accesses it.

Tested on x86_64-linux-gnu.  Built on i686-gnu.

Change-Id: Id815673c29950fc011ae5301d7cde12624f658df
2019-11-15 13:03:40 +01:00
Florian Weimer
c7bf5ceab6 Properly initialize audit cookie for the dynamic loader [BZ #25157]
The l_audit array is indexed by audit module, not audit function.

Change-Id: I180eb3573dc1c57433750f5d8cb18271460ba5f2
2019-11-15 13:03:32 +01:00
Florian Weimer
c9bf28d625 nios2: Work around backend bug triggered by csu/libc-tls.c (GCC PR 92499)
Change-Id: If5df5b05d15f0418af821a9ac8cc0fad53437b10
2019-11-14 12:39:49 +01:00
Florian Weimer
70c6e15654 Redefine _IO_iconv_t to store a single gconv step pointer [BZ #25097]
libio can only deal with gconv conversions which consist of a single
step.  Not using __gconv_info simplifies the data structures somewhat.

This eliminates a new GCC 10 warning about subscribing an inner
zero-length array.

Tested on x86_64-linux-gnu with mainline GCC.  Built with
build-many-glibcs.py, also with mainline GCC.  Due to GCC PR 92039,
there are failures left on 32-bit architectures with float128 support.

Change-Id: I8b4c489b619a53154712ff32e1b6f13bb92d4203
2019-11-13 18:18:51 +01:00
Krzysztof Koch
15740788d7 Add new script for plotting string benchmark JSON output
Add a script for visualizing the JSON output generated by existing
glibc string microbenchmarks.

Overview:
plot_strings.py is capable of plotting benchmark results in the
following formats, which are controlled with the -p or --plot argument:
1. absolute timings (-p time): plot the timings as they are in the
input benchmark results file.
2. relative timings (-p rel): plot relative timing difference with
respect to a chosen ifunc (controlled with -b argument).
3. performance relative to max (-p max): for each varied parameter
value, plot 1/timing as the percentage of the maximum value out of
the plotted ifuncs.
4. throughput (-p thru): plot varied parameter value over timing

For all types of graphs, there is an option to explicitly specify
the subset of ifuncs to plot using the --ifuncs parameter.

For plot types 1. and 4. one can hide/expose exact benchmark figures
using the --values flag.

When plotting relative timing differences between ifuncs, the first
ifunc listed in the input JSON file is the baseline, unless the
baseline implementation is explicitly chosen with the --baseline
parameter. For the ease of reading, the script marks the statistically
insignificant range on the graphs. The default is +-5% but this
value can be controlled with the --threshold parameter.

To accommodate for the heterogeneity in benchmark results files,
one can control i.e the x-axis scale, the resolution (dpi) of the
generated figures or the key to access the varied parameter value
in the JSON file. The corresponding options are --logarithmic,
--resolution or --key. The --key parameter ensures that plot_strings.py
works with all files which pass JSON schema validation. The schema
can be chosen with the --schema parameter.

If a window manager is available, one can enable interactive
figure display using the --display flag.

Finally, one can use the --grid flag to enable grid lines in the
generated figures.

Implementation:
plot_strings.py traverses the JSON tree until a 'results' array
is found and generates a separate figure for each such array.
The figure is then saved to a file in one of the available formats
(controlled with the --extension parameter).

As the tree is traversed, the recursive function tracks the metadata
about the test being run, so that each figure has a unique and
meaningful title and filename.

While plot_strings.py works with existing benchmarks, provisions
have been made to allow adding more structure and metadata to these
benchmarks. Currently, many benchmarks produce multiple timing values
for the same value of the varied parameter (typically 'length').
Mutiple data points for the same parameter usually mean that some other
parameter was varied as well, for example, if memmove's src and dst
buffers overlap or not (see bench-memmove-walk.c and
bench-memmove-walk.out).

Unfortunately, this information is not exposed in the benchmark output
file, so plot_strings.py has to resort to computing the geometric mean
of these multiple values. In the process, useful information about the
benchmark configuration is lost. Also, averaging the timings for
different alignments can hide useful characterstics of the benchmarked
ifuncs.

Testing:
plot_strings.py has been tested on all existing string microbenchmarks
which produce results in JSON format. The script was tested on both
Windows 10 and Ubuntu 16.04.2 LTS. It runs on both python 2 and 3
(2.7.12 and 3.5.12 tested).

Useful commands:
1. Plot timings for all ifuncs in bench-strlen.out:
$ ./plot_strings.py bench-strlen.out

2. Display help:
$ ./plot_strings.py -h

3. Plot throughput for __memset_avx512_unaligned_erms and
__memset_avx512_unaligned. Save the generated figure in pdf format to
'results/'. Use logarithmic x-axis scale, show grid lines and expose
the performance numbers:
$ ./plot_strings.py bench.out -o results/ -lgv -e pdf -p thru \
-i __memset_avx512_unaligned_erms __memset_avx512_unaligned

4. Plot relative timings for all ifuncs in bench.out with __generic_memset
as baseline. Display percentage difference threshold of +-10%:
$ ./plot_strings.py bench.out -p rel  -b __generic_memset -t 10

Discussion:
1. I would like to propose relaxing the benchout_strings.schema.json
to allow specifying either a 'results' array with 'timings' (as before)
or a 'variants' array. See below example:

{
 "timing_type": "hp_timing",
 "functions": {
  "memcpy": {
   "bench-variant": "default",
   "ifuncs": ["generic_memcpy", "__memcpy_thunderx"],
   "variants": [
    {
     "name": "powers of 2",
     "variants": [
      {
       "name": "both aligned",
       "results": [
        {
         "length": 1,
         "align1": 0,
         "align2": 0,
         "timings": [x, y]
        },
        {
         "length": 2,
         "align1": 0,
         "align2": 0,
         "timings": [x, y]
        },
...
        {
         "length": 65536,
         "align1": 0,
         "align2": 0,
         "timings": [x, y]
        }]
      },
      {
       "name": "dst misaligned",
       "results": [
        {
         "length": 1,
         "align1": 0,
         "align2": 0,
         "timings": [x, y]
        },
        {
         "length": 2,
         "align1": 0,
         "align2": 1,
         "timings": [x, y]
        },
...

'variants' array consists of objects such that each object has a 'name'
attribute to describe the configuration of a particular test in the
benchmark. This can be a description, for example, of how the parameter
was varied or what was the buffer alignment tested. The 'name' attribute
is then followed by another 'variants' array or a 'results' array.

The nesting of variants allows arbitrary grouping of benchmark timings,
while allowing description of these groups. Using recusion, it is
possible to proceduraly create titles and filenames for the figures being
generated.
2019-11-13 14:18:52 +00:00
Florian Weimer
02132c0f4c support: Fix support_set_small_thread_stack_size to build on Hurd
PTHREAD_STACK_MIN comes from <limits.h>, so include it explicitly.
However, it is not defined on Hurd, so compensate for that as well.

Built on x86_64-linux-gnu, i686-linux-gnu, i686-gnu.

Change-Id: Ifacc888ef86731c2639721b0932ae59583bd6b3e
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com>
2019-11-13 14:10:11 +01:00
Florian Weimer
d4625a19fe login: Use pread64 in utmp implementation
This reduces the possible error scenarios considerably because
no longer can file seek fail, leaving the file descriptor in an
inconsistent state and out of sync with the cache.

As a result, it is possible to avoid setting file_offset to -1
to make an error persistent.  Instead, subsequent calls will retry
the operation and report any errors returned by the kernel.

This change also avoids reading the file from the start if pututline
is called multiple times, to work around lock acquisition failures
due to timeouts.

Change-Id: If21ea0c162c38830a89331ea93cddec14c0974de
2019-11-12 20:13:35 +01:00
Florian Weimer
ca136bb0a3 Clarify purpose of assert in _dl_lookup_symbol_x
Only one of the currently defined flags is incompatible with versioned
symbol lookups, so it makes sense to check for that flag and not its
complement.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
Change-Id: I3384349cef90cfd91862ebc34a4053f0c0a99404
2019-11-12 20:11:04 +01:00
Krzysztof Koch
b9f145df85 aarch64: Increase small and medium cases for __memcpy_generic
Increase the upper bound on medium cases from 96 to 128 bytes.
Now, up to 128 bytes are copied unrolled.

Increase the upper bound on small cases from 16 to 32 bytes so that
copies of 17-32 bytes are not impacted by the larger medium case.

Benchmarking:
The attached figures show relative timing difference with respect
to 'memcpy_generic', which is the existing implementation.
'memcpy_med_128' denotes the the version of memcpy_generic with
only the medium case enlarged. The 'memcpy_med_128_small_32' numbers
are for the version of memcpy_generic submitted in this patch, which
has both medium and small cases enlarged. The figures were generated
using the script from:
https://www.sourceware.org/ml/libc-alpha/2019-10/msg00563.html

Depending on the platform, the performance improvement in the
bench-memcpy-random.c benchmark ranges from 6% to 20% between
the original and final version of memcpy.S

Tested against GLIBC testsuite and randomized tests.
2019-11-12 17:08:18 +00:00
Florian Weimer
76a7c103eb login: Introduce matches_last_entry to utmp processing
This simplifies internal_getut_nolock and fixes a regression,
introduced in commit be6b16d975
("login: Acquire write lock early in pututline [BZ #24882]")
in pututxline because __utmp_equal can only compare process-related
utmp entries.

Fixes: be6b16d975
Change-Id: Ib8a85002f7f87ee41590846d16d7e52bdb82f5a5
2019-11-12 17:16:18 +01:00
Florian Weimer
cba932a5a9 slotinfo in struct dtv_slotinfo_list should be flexible array [BZ #25097]
GCC 10 will warn about subscribing inner length zero arrays.  Use a GCC
extension in csu/libc-tls.c to allocate space for the static_slotinfo
variable.  Adjust nptl_db so that the type description machinery does
not attempt to determine the size of the flexible array member slotinfo.

Change-Id: I51be146a7857186a4ede0bb40b332509487bdde8
2019-11-12 13:54:30 +01:00
Adhemerval Zanella
42b926d303 Fix clock_nanosleep when interrupted by a signal
This patch fixes the time64 support (added by 2e44b10b42) where it
misses the remaining argument updated if __NR_clock_nanosleep
returns EINTR.

Checked on i686-linux-gnu on 4.15 kernel (no time64 support) and
on 5.3 kernel (with time64 support).

Reviewed-by: Alistair Francis <alistair23@gmail.com>
2019-11-11 16:47:20 -03:00
Arjun Shankar
f0f0d79ac3 libio/tst-fopenloc: Use xsetlocale, xfopen, and xfclose 2019-11-11 17:40:46 +01:00
Arjun Shankar
cce35a50c1 support: Add xsetlocale function 2019-11-11 17:40:46 +01:00