Added support for TX lock elision of pthread mutexes on s390 and
s390x. This may improve lock scaling of existing programs on TX
capable systems. The lock elision code is only built with
--enable-lock-elision=yes and then requires a GCC version supporting
the TX builtins. With lock elision default mutexes are elided via
__builtin_tbegin, if the cpu supports transactions. By default lock
elision is not enabled and the elision code is not built.
lll_unlock() will be called again if it goes to "wake_all" in
pthread_cond_broadcast(). This may make another thread which is
waiting for lock in pthread_cond_timedwait() unlock. So there are
more than one threads get the lock, it will break the shared data.
It's introduced by commit 8313cb997d2d("FUTEX_*_REQUEUE_PI support for
non-x86 code")
This patch removes the arch specific powerpc implementation and instead
uses the linux default one. Although the current powerpc implementation
already constains the required memory barriers for correct
initialization, the default implementation shows a better performance on
newer chips.
[BZ #15215] This unifies various pthread_once architecture-specific
implementations which were using the same algorithm with slightly different
implementations. It also adds missing memory barriers that are required for
correctness.
This patch moves the __PTHREAD_SPINS definition to arch specific header
since pthread_mutex_t layout is also arch specific. This leads to no
need to defining __PTHREAD_MUTEX_HAVE_ELISION and thus removing of the
undefined compiler warning.
At this point, we can only abort the process because we have already
switched credentials on other threads. Returning an error would still
leave the process in an inconsistent state.
The new xtest needs root privileges to run.
When profiling programs with lock problems with perf record -g dwarf,
libunwind can currently not backtrace through the futex and unlock
functions in pthread. This is because they use out of line sections,
and those are not correctly described in dwarf2 (I believe needs
dwarf3 or 4).
This patch first removes the out of line sections. They only save a
single jump, but cause a lot of pain. Then it converts the now inline
lock code to use the now standard gas .cfi_* commands.
With these changes libunwind/perf can backtrace through the futex
functions now.
Longer term it would be likely better to just use C futex() functions
on x86 like all the other architectures. This would clean the code up
even more.
ChangeLog:
2014-03-17 Will Newton <will.newton@linaro.org>
* nptl/sysdeps/pthread/pthread.h: Check
__PTHREAD_MUTEX_HAVE_ELISION is defined before testing
its value.
We got rid of LinuxThreads in 2005, but we didn't remove
__LT_SPINLOCK_INIT back then. Do it now.
* nptl/sysdeps/pthread/bits/libc-lockP.h [defined NOT_IN_libc
&& !defined IS_IN_libpthread && __LT_SPINNOCK_INIT != 0]:
Remove.
This is one of a several NPTL patches to build glibc on hppa.
The pthread_attr_[sg]etstack functions are defined by POSIX as
taking a stackaddr that is the lowest addressable byte of the
storage used for the stack. However, the internal iattr variable
of the same name in NPTL is actually the final stack address
as usable in the stack pointer for the machine. Therefore the
NPTL implementation must add and subtract stacksize for
_STACK_GROWS_DOWN architectures. HPPA is a _STACK_GROWS_UP
architecture and doesn't need to add or subtract anything,
the stack address *is* the lowest addressable byte of the
storage.
Tested on hppa-linux-gnu, with no regressions.
Can't impact any other targets because of the conditionals.
If nobody objects I'll check this in at the end of the week.
I can't see there being any objections to this patch except
that it introduces more code to maintain for an old architecture
(perhaps we'll get another _S_G_U target in the future?).
This patch systematically renames miscellaneous tests so their outputs
use a *.out name (unless the test is just running some glibc program
with its conventional output file name, rather than a special program
at all, as in catgets tests generating *.cat). In the case of the
iconv test test-iconvconfig, output is redirected where it wasn't
before.
In various places the "generated" variable is updated to reflect the
revised test names; in iconvdata/Makefile a typo (mmtrace-tst-loading)
is also fixed. resolv/Makefile sets both "generate" (which appears
unused) and "generated". Bitrot in the settings of these variables
could no doubt be fixed so that "make clean" after build and testing
leaves results the same as after configure (and indeed the
tests-special / xtests-special variables could be used to simplify
things, by removing those files automatically rather than listing them
manually in these variables), and "make distclean" leaves an empty
build directory, but right now it appears various files don't get
deleted. I think they are liable to continue to bitrot in the absence
of routine testing that these targets actually work, given that
building in the source directory isn't supported and that was the main
use of such makefile targets.
Tested x86_64.
* elf/Makefile (tests-special): Rename tests to end with .out.
($(objpfx)noload-mem): Likewise.
($(objpfx)tst-leaks1-mem): Likewise.
($(objpfx)tst-leaks1-static-mem.out): Likewise.
* iconv/Makefile (xtests-special): Change test-iconvconfig to
$(objpfx)test-iconvconfig.out.
(test-iconvconfig): Change to $(objpfx)test-iconvconfig.out. Use
set -e inside subshell and redirect output to file.
* iconvdata/Makefile (generated): Rename tests to end with .out.
Correct type.
(tests-special): Rename tests to end with .out.
($(objpfx)mtrace-tst-loading): Likewise.
* intl/Makefile (generated): Likewise.
(tests-special): Likewise.
($(objpfx)mtrace-tst-gettext): Likewise.
* misc/Makefile (generated): Likewise.
(tests-special): Likewise.
($(objpfx)tst-error1-mem): Likewise.
* nptl/Makefile (tests-special): Likewise.
($(objpfx)tst-stack3-mem): Likewise.
(generated): Likewise.
* posix/Makefile (generated): Likewise.
(tests-special): Likewise.
(xtests-special): Likewise.
($(objpfx)tst-fnmatch-mem): Likewise.
($(objpfx)bug-regex2-mem): Likewise.
($(objpfx)bug-regex14-mem): Likewise.
($(objpfx)bug-regex21-mem): Likewise.
($(objpfx)bug-regex31-mem): Likewise.
($(objpfx)tst-vfork3-mem): Likewise.
($(objpfx)tst-rxspencer-no-utf8-mem): Likewise.
($(objpfx)tst-pcre-mem): Likewise.
($(objpfx)tst-boost-mem): Likewise.
($(objpfx)bug-ga2-mem): Likewise.
($(objpfx)bug-glob2-mem): Likewise.
* resolv/Makefile (generate): Likewise.
(tests-special): Likewise.
(xtests-special): Likewise.
(generated): Likewise.
($(objpfx)mtrace-tst-leaks): Likewise.
($(objpfx)mtrace-tst-leaks2): Likewise.
localedata:
* Makefile (generated): Rename tests to end with .out.
(tests-special): Likewise.
($(objpfx)mtrace-tst-leaks): Likewise.
This patch is a revised and updated version of
<https://sourceware.org/ml/libc-alpha/2014-01/msg00196.html>.
In order to generate overall summaries of the results of all tests in
the glibc testsuite, we need to identify and concatenate the files
with the results of individual tests.
Tomas Dohnalek's patch used $(common-objpfx)*/*.test-result for this.
However, the normal glibc approach is explicit enumeration of the
expected set of files with a given property, rather than all files
matching some pattern like that. Furthermore, we would like to be
able to mark tests as UNRESOLVED if the file with their results is for
some reason missing, and in future we would like to be able to mark
tests as UNSUPPORTED if they are disabled for a particular
configuration (rather than simply having them missing from the list of
tests as at present). Such handling of tests that were not run or did
not record results requires an explicit enumeration of tests.
For the tests following the default makefile rules, $(tests) (and
$(xtests)) provides such an enumeration. Others, however, are added
directly as dependencies of the "tests" and "xtests" makefile
targets. This patch changes the makefiles to put them in variables
tests-special and xtests-special, with appropriate dependencies on the
tests listed there then being added centrally.
Those variables are used in Rules and so need to be set before Rules
is included in a subdirectory makefile, which is often earlier in the
makefile than the dependencies were present before. We previously
discussed the question of where to include Rules; see the question at
<https://sourceware.org/ml/libc-alpha/2012-11/msg00798.html>, and a
discussion in
<https://sourceware.org/ml/libc-alpha/2013-01/msg00337.html> of why
Rules is included early rather than late in subdirectory makefiles.
It was necessary to avoid an indirection through the check-abi target
and get the check-abi-* targets for individual libraries into the
tests-special variable. The intl/ test $(objpfx)tst-gettext.out,
previously built only because of dependencies from other tests, was
also added to tests-special for the same reason.
The entries in tests-special are the full makefile targets, complete
with $(objpfx) and .out. If a future change causes tests to be named
consistently with a .out suffix, this can be changed to include just
the path relative to $(objpfx), without .out.
Tested x86_64, including that the same set of files is generated in
the build directory by a build and testsuite run both before and after
the patch (except for changes to the
elf/tst-null-argv.debug.out.<number> file name), and a build with
run-built-tests=no to verify there aren't any more obvious instances
of the issue Marcus Shawcroft reported with a previous version in
<https://sourceware.org/ml/libc-alpha/2014-01/msg00462.html>.
* Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
(tests): Depend on $(tests-special).
* Makerules (check-abi-list): New variable.
(check-abi): Depend on $(check-abi-list).
[$(subdir) = elf] (tests-special): Add
$(objpfx)check-abi-libc.out.
[$(build-shared) = yes && subdir] (tests-special): Add
$(check-abi-list).
[$(build-shared) = yes && subdir] (tests): Do not depend on
check-abi.
* Rules (tests): Depend on $(tests-special).
(xtests): Depend on $(xtests-special).
* catgets/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* conform/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* elf/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* grp/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* iconv/Makefile (xtests): Change dependencies to ....
(xtests-special): ... additions to this variable.
* iconvdata/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* intl/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable. Also add
$(objpfx)tst-gettext.out.
* io/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* libio/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* malloc/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* misc/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* nptl/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* nptl_db/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* posix/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
(xtests): Change dependencies to ....
(xtests-special): ... additions to this variable.
* resolv/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
(xtests): Change dependencies to ....
(xtests-special): ... additions to this variable.
* stdio-common/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
(do-tst-unbputc): Remove target.
(do-tst-printf): Likewise.
* stdlib/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* string/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* sysdeps/x86/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
localedata:
* Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
In <https://sourceware.org/ml/libc-alpha/2014-01/msg00196.html> I
noted it was necessary to add includes of Makeconfig early in various
subdirectory makefiles for the tests-special variable settings added
by that patch to be conditional on configuration information. No-one
commented on the general question there of whether Makeconfig should
always be included immediately after the definition of subdir.
This patch implements that early inclusion of Makeconfig in each
directory (which is a lot easier than consistent placement of includes
of Rules). Includes are added if needed, or moved up if already
present. Subdirectory "all:" targets are removed, since Makeconfig
provides one.
There is potential for further cleanups I haven't done. Rules and
Makerules have code such as
ifneq "$(findstring env,$(origin headers))" ""
headers :=
endif
to override to empty any value of various variables that came from the
environment. I think there is a case for Makeconfig setting all the
subdirectory variables (other than subdir) to empty to ensure no
outside value is going to take effect if a subdirectory fails to
define a variable. (A list of such variables, possibly out of date
and incomplete, is in manual/maint.texi.) Rules and Makerules would
give errors if Makeconfig hadn't already been included, instead of
including it themselves. The special code to override values coming
from the environment would then be obsolete and could be removed.
Tested x86_64, including that installed binaries are identical before
and after the patch.
* argp/Makefile: Include Makeconfig immediately after defining
subdir.
* assert/Makefile: Likewise.
* benchtests/Makefile: Likewise.
* catgets/Makefile: Likewise.
* conform/Makefile: Likewise.
* crypt/Makefile: Likewise.
* csu/Makefile: Likewise.
(all): Remove target.
* ctype/Makefile: Include Makeconfig immediately after defining
subdir.
* debug/Makefile: Likewise.
* dirent/Makefile: Likewise.
* dlfcn/Makefile: Likewise.
* gmon/Makefile: Likewise.
* gnulib/Makefile: Likewise.
* grp/Makefile: Likewise.
* gshadow/Makefile: Likewise.
* hesiod/Makefile: Likewise.
* hurd/Makefile: Likewise.
(all): Remove target.
* iconvdata/Makefile: Include Makeconfig immediately after
defining subdir.
* inet/Makefile: Likewise.
* intl/Makefile: Likewise.
* io/Makefile: Likewise.
* libio/Makefile: Likewise.
(all): Remove target.
* locale/Makefile: Include Makeconfig immediately after defining
subdir.
* login/Makefile: Likewise.
* mach/Makefile: Likewise.
(all): Remove target.
* malloc/Makefile: Include Makeconfig immediately after defining
subdir.
(all): Remove target.
* manual/Makefile: Include Makeconfig immediately after defining
subdir.
* math/Makefile: Likewise.
* misc/Makefile: Likewise.
* nis/Makefile: Likewise.
* nss/Makefile: Likewise.
* po/Makefile: Likewise.
(all): Remove target.
* posix/Makefile: Include Makeconfig immediately after defining
subdir.
* pwd/Makefile: Likewise.
* resolv/Makefile: Likewise.
* resource/Makefile: Likewise.
* rt/Makefile: Likewise.
* setjmp/Makefile: Likewise.
* shadow/Makefile: Likewise.
* signal/Makefile: Likewise.
* socket/Makefile: Likewise.
* soft-fp/Makefile: Likewise.
* stdio-common/Makefile: Likewise.
* stdlib/Makefile: Likewise.
* streams/Makefile: Likewise.
* string/Makefile: Likewise.
* sunrpc/Makefile: Likewise.
(all): Remove target.
* sysvipc/Makefile: Include Makeconfig immediately after defining
subdir.
* termios/Makefile: Likewise.
* time/Makefile: Likewise.
* timezone/Makefile: Likewise.
(all): Remove target.
* wcsmbs/Makefile: Include Makeconfig immediately after defining
subdir.
* wctype/Makefile: Likewise.
libidn/ChangeLog:
* Makefile: Include Makeconfig immediately after defining subdir.
localedata/ChangeLog:
* Makefile: Include Makeconfig immediately after defining subdir.
(all): Remove target.
nptl/ChangeLog:
* Makefile: Include Makeconfig immediately after defining subdir.
nptl_db/ChangeLog:
* Makefile: Include Makeconfig immediately after defining subdir.
This patch splits makefile rules that generate a file then run cmp to
check the contents of that file into separate rules to generate and
compare the file. This simplifies making those tests generate PASS /
FAIL results, by removing the need to insert && between commands in
the test so that a $(evaluate-test) call is reached. It also avoids
the oddity of the .out file being an intermediate file rather than the
final result generated, as noted for some of these tests in
<https://sourceware.org/ml/libc-alpha/2012-10/msg00894.html>.
In many cases, the rule to run the program was no longer needed
because the default rules for running test programs on the host to
generate a .out file sufficed. (I'm not asserting the commands run
after this patch are *exactly* the same as before, simply that the
rules did nothing special that appeared deliberate or relevant to
anything about what the tests were testing. In cases where the rules
redirected stderr as well as stdout, I left the existing rule's
redirection in place to avoid changing what gets compared with the
expected results.)
It's clear there is a lot in common between the various -cmp.out rules
and it might be possible in future to refactor them into more generic
support for the case of comparing test output against a baseline.
(Some baselines are *.exp, some *.expect, some directly embedded in
the makefiles, and nptl/tst-cleanupx0.expect appears unused.)
Tested x86_64.
* elf/Makefile ($(objpfx)order.out): Remove rule.
[$(run-built-tests) = yes] (tests): Depend on
$(objpfx)order-cmp.out.
($(objpfx)order-cmp.out): New rule.
[$(run-built-tests) = yes] (tests): Depend on
$(objpfx)tst-array1-cmp.out, $(objpfx)tst-array1-static-cmp.out,
$(objpfx)tst-array2-cmp.out, $(objpfx)tst-array3-cmp.out,
$(objpfx)tst-array4-cmp.out, $(objpfx)tst-array5-cmp.out and
$(objpfx)tst-array5-static-cmp.out.
($(objpfx)tst-array1.out): Remove rule.
($(objpfx)tst-array1-cmp.out): New rule.
($(objpfx)tst-array1-static.out): Remove rule.
($(objpfx)tst-array1-static-cmp.out): New rule.
($(objpfx)tst-array2.out): Remove rule.
($(objpfx)tst-array2-cmp.out): New rule.
($(objpfx)tst-array3.out): Remove rule.
($(objpfx)tst-array3-cmp.out): New rule.
($(objpfx)tst-array4.out): Remove rule.
($(objpfx)tst-array4-cmp.out): New rule.
($(objpfx)tst-array5.out): Remove rule.
($(objpfx)tst-array5-cmp.out): New rule.
($(objpfx)tst-array5-static.out): Remove rule.
($(objpfx)tst-array5-static-cmp.out): New rule.
[$(run-built-tests) = yes] (tests): Depend on
$(objpfx)order2-cmp.out.
($(objpfx)order2.out): Remove rule.
($(objpfx)order2-cmp.out): New rule.
($(objpfx)tst-initorder.out): Remove rule.
[$(run-built-tests) = yes] (tests): Depend on
$(objpfx)tst-initorder-cmp.out.
($(objpfx)tst-initorder-cmp.out): New rule.
($(objpfx)tst-initorder2.out): Remove rule.
[$(run-built-tests) = yes] (tests): Depend on
$(objpfx)tst-initorder2-cmp.out.
($(objpfx)tst-initorder2-cmp.out): New rule.
[$(run-built-tests) = yes] (tests): Depend on
$(objpfx)tst-unused-dep-cmp.out.
($(objpfx)tst-unused-dep-cmp.out): Do not run cmp.
($(objpfx)tst-unused-dep-cmp.out): New rule.
* stdio-common/Makefile [$(run-built-tests) = yes] (tests): Depend
on $(objpfx)tst-setvbuf1-cmp.out.
($(objpfx)tst-setvbuf1.out): Do not run cmp.
($(objpfx)tst-setvbuf1-cmp.out): New rule.
* string/Makefile [$(run-built-tests) = yes] (tests): Depend
$(objpfx)tst-svc-cmp.out instead of $(objpfx)tst-svc.out.
($(objpfx)tst-svc.out): Remove rule.
($(objpfx)tst-svc-cmp.out): New rule.
nptl:
* Makefile ($(objpfx)tst-cleanup0.out): Do not run cmp.
[$(run-built-tests) = yes] (tests): Depend on
$(objpfx)tst-cleanup0-cmp.out.
($(objpfx)tst-cleanup0-cmp.out): New rule.
Support for /proc/self/task/$tid/comm as added in Linux 2.6.33,
therefore since the test tst-setgetname relies on this functionality
to operate we must skip the test in kernels < 2.6.33. We wrap the
checks with __ASSUME_PROC_PID_TASK_COMM such that in the future when
we move arch_minimum_kernel to 2.6.33 we can remove this code.
TLS in a dlopened object works fine when accessed from a signal
handler. The default kernel scheduling parameters prevents the
testcase to finish within the 4 seconds.
Tested the bigger timeout on s390 and s390x.
Since asynchronous cancellation was removed from system by
commit c4dd57c300
Author: Ondřej Bílka <neleai@seznam.cz>
Date: Tue Jan 14 16:07:50 2014 +0100
Do not enable asynchronous cancellation in system. Fixes bug 14782.
We needlessly enabled thread cancellation before it was necessary.
As
only call that needs to be guarded is waitpid which is cancellation
point we could remove cancellation altogether.
we shouldn't check asynchronous cancellation on system.
[BZ #14782]
* tst-cancel-wrappers.sh: Remove system.
This commit adds a testcase for pthread_setname_np
and pthread_getname_np. The testcase itself has
four tests to validate that these functions work
as expected. The test is only enabled for Linux
since it requires access to an alternate method
for validating the functions work.
This updates glibc for the changes in the ELFv2 relating to the
stack frame layout. These are described in more detail here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01149.htmlhttp://gcc.gnu.org/ml/gcc-patches/2013-11/msg01146.html
Specifically, the "compiler and linker doublewords" were removed,
which has the effect that the save slot for the TOC register is
now at offset 24 rather than 40 to the stack pointer.
In addition, a function may now no longer necessarily assume that
its caller has set up a 64-byte register save area its use.
To address the first change, the patch goes through all assembler
files and replaces immediate offsets in instructions accessing the
ABI-defined stack slots by symbolic offsets. Those already were
defined in ucontext_i.sym and used in some of the context routines,
but that doesn't really seem like the right place for those defines.
The patch instead defines those symbolic offsets in sysdeps.h,
in two variants for the old and new ABI, and uses them systematically
in all assembler files, not just the context routines.
The second change only affected a few assembler files that used
the save area to temporarily store some registers. In those
cases where this happens within a leaf function, this patch
changes the code to store those registers to the "red zone"
below the stack pointer. Otherwise, the functions already allocate
a stack frame, and the patch changes them to add extra space in
these frames as temporary space for the ELFv2 ABI.
The TCB header on Intel contains a field __private_ss that is used
to efficiently implement the -fsplit-stack GCC feature.
In order to prepare for a possible future implementation of that
feature on powerpc64, we'd like to reserve a similar field in
the TCB header as well. (It would be good if this went in with
or before the ELFv2 patches to ensure that this field will be
available always in the ELFv2 environment.)
The field needs to be added at the front of tcbhead_t structure
to avoid changing the ABI; see the recent discussion when adding
the EBB fields.
Autoconf has been deprecating configure.in for quite a long time.
Rename all our configure.in and preconfigure.in files to .ac.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
http://sourceware.org/ml/libc-alpha/2013-08/msg00096.html
This adds the basic configury bits for powerpc64le and powerpcle.
* configure.in: Map powerpc64le and powerpcle to base_machine/machine.
* configure: Regenerate.
* nptl/shlib-versions: Powerpc*le starts at 2.18.
* shlib-versions: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00090.html
This patch fixes symbol versioning in setjmp/longjmp. The existing
code uses raw versions, which results in wrong symbol versioning when
you want to build glibc with a base version of 2.19 for LE.
Note that the merging the 64-bit and 32-bit versions in novmx-lonjmp.c
and pt-longjmp.c doesn't result in GLIBC_2.0 versions for 64-bit, due
to the base in shlib_versions.
* sysdeps/powerpc/longjmp.c: Use proper symbol versioning macros.
* sysdeps/powerpc/novmx-longjmp.c: Likewise.
* sysdeps/powerpc/powerpc32/bsd-_setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/bsd-setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/mcount.c: Likewise.
* sysdeps/powerpc/powerpc32/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc64/setjmp.S: Likewise.
* nptl/sysdeps/unix/sysv/linux/powerpc/pt-longjmp.c: Likewise.
Fixes BZ #15988.
The check had a typo - it checked for PTHREAD_MUTEX_ROBUST_NP instead
of PTHREAD_MUTEX_ROBUST_NORMAL_NP. It has now been replaced by the
already existing convenience macro USE_REQUEUE_PI.
Resolves#15921
The test case nptl/tst-cleanup2 fails on s390x and power6 due to
instruction sheduling in gcc. This was reported in gcc:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58034
but it was concluded that gcc is allowed to assume that the first
argument to sprintf is a character array - NULL not being a valid
character array.
PTHREAD_MUTEX_NORMAL requires deadlock for nesting, DEFAULT
does not. Since glibc uses the same value (0) disable elision
for any call to pthread_mutexattr_settype() with a 0 value.
This implies that a program can disable elision by doing
pthread_mutexattr_settype(&attr, PTHREAD_MUTEX_NORMAL)
Based on a original proposal by Rich Felker.
Add elision paths to the basic mutex locks.
The normal path has a check for RTM and upgrades the lock
to RTM when available. Trylocks cannot automatically upgrade,
so they check for elision every time.
We use a 4 byte value in the mutex to store the lock
elision adaptation state. This is separate from the adaptive
spin state and uses a separate field.
Condition variables currently do not support elision.
Recursive mutexes and condition variables may be supported at some point,
but are not in the current implementation. Also "trylock" will
not automatically enable elision unless some other lock call
has been already called on the lock.
This version does not use IFUNC, so it means every lock has one
additional check for elision. Benchmarking showed the overhead
to be negligible.
tst-mutex5 and 8 test some behaviour not required by POSIX,
that elision changes. This changes these tests to not check
this when elision is enabled at configure time.
Add Enable/disable flags used internally
Extend the mutex initializers to have the fields needed for
elision. The layout stays the same, and this is not visible
to programs.
These changes are not exposed outside pthread
Lock elision using TSX is a technique to optimize lock scaling
It allows to run locks in parallel using hardware support for
a transactional execution mode in 4th generation Intel Core CPUs.
See http://www.intel.com/software/tsx for more Information.
This patch implements a simple adaptive lock elision algorithm based
on RTM. It enables elision for the pthread mutexes and rwlocks.
The algorithm keeps track whether a mutex successfully elides or not,
and stops eliding for some time when it is not.
When the CPU supports RTM the elision path is automatically tried,
otherwise any elision is disabled.
The adaptation algorithm and its tuning is currently preliminary.
The code adds some checks to the lock fast paths. Micro-benchmarks
show little to no difference without RTM.
This patch implements the low level "lll_" code for lock elision.
Followon patches hook this into the pthread implementation
Changes with the RTM mutexes:
-----------------------------
Lock elision in pthreads is generally compatible with existing programs.
There are some obscure exceptions, which are expected to be uncommon.
See the manual for more details.
- A broken program that unlocks a free lock will crash.
There are ways around this with some tradeoffs (more code in hot paths)
I'm still undecided on what approach to take here; have to wait for testing reports.
- pthread_mutex_destroy of a lock mutex will not return EBUSY but 0.
- There's also a similar situation with trylock outside the mutex,
"knowing" that the mutex must be held due to some other condition.
In this case an assert failure cannot be recovered. This situation is
usually an existing bug in the program.
- Same applies to the rwlocks. Some of the return values changes
(for example there is no EDEADLK for an elided lock, unless it aborts.
However when elided it will also never deadlock of course)
- Timing changes, so broken programs that make assumptions about specific timing
may expose already existing latent problems. Note that these broken programs will
break in other situations too (loaded system, new faster hardware, compiler
optimizations etc.)
- Programs with non recursive mutexes that take them recursively in a thread and
which would always deadlock without elision may not always see a deadlock.
The deadlock will only happen on an early or delayed abort (which typically
happens at some point)
This only happens for mutexes not explicitely set to PTHREAD_MUTEX_NORMAL
or PTHREAD_MUTEX_ADAPTIVE_NP. PTHREAD_MUTEX_NORMAL mutexes do not elide.
The elision default can be set at configure time.
This patch implements the basic infrastructure for elision.
Static applications that call pthread_exit on the main
thread segfault. This is because after a thread terminates
__libc_start_main decrements __nptl_nthreads which is only
defined in pthread_create. Therefore the right solution is
to add a requirement to pthread_create from pthread_exit.
~~~
nptl/
2013-06-24 Vladimir Nikulichev <v.nikulichev@gmail.com>
[BZ #12310]
* pthread_exit.c: Add reference to pthread_create.
This patch introduces two new convenience functions to set the default
thread attributes used for creating threads. This allows a programmer
to set the default thread attributes just once in a process and then
run pthread_create without additional attributes.
Resolves BZ #15618.
pthread_attr_getaffinity_np may write beyond bounds of the input
cpuset buffer if the size of the input buffer is smaller than the
buffer present in the input pthread attributes. Fix is to copy to the
extent of the minimum of the source and the destination.
It is very very possible that the futex syscall returns an
error and that the caller of lll_futex_wake may want to
look at that error and propagate the failure.
This patch allows a caller to see the syscall error.
There are no users of the syscall error at present, but
future cleanups are now be able to check for the error.
--
nplt/
2013-06-10 Carlos O'Donell <carlos@redhat.com>
* sysdeps/unix/sysv/linux/i386/lowlevellock.h
(lll_futex_wake): Return syscall error.
* sysdeps/unix/sysv/linux/x86_64/lowlevellock.h
(lll_futex_wake): Return syscall error.
The sem_post.c file uses atomic functions without including
atomic.h. Add `#include <atomic.h>' to the file to prevent
any compile time warnings when other headers change and
atomic.h isn't implicitly included.
---
nptl/
2013-04-07 Carlos O'Donell <carlos@redhat.com>
* sysdeps/unix/sysv/linux/sem_post.c: Include atomic.h.
Fixes BZ #15337.
Static builds fail with the following warning:
/home/tools/glibc/glibc/nptl/../nptl/sysdeps/unix/sysv/linux/x86_64/cancellation.S:80:
undefined reference to `__GI___pthread_unwind'
When the source is configured with --disable-hidden-plt. This is
because the preprocessor conditional in cancellation.S only checks if
the build is for SHARED, whereas hidden_def is defined appropriately
only for a SHARED build that will have symbol versioning *and* hidden
defs are enabled. The last case is false here.
This reverts the change that allows the POSIX Thread default stack size
to be changed by the environment variable
GLIBC_PTHREAD_DEFAULT_STACKSIZE. It has been requested that more
discussion happen before this change goes into 2.18.
This feature is specifically for the C++ compiler to offload calling
thread_local object destructors on thread program exit, to glibc.
This is to overcome the possible complication of destructors of
thread_local objects getting called after the DSO in which they're
defined is unloaded by the dynamic linker. The DSO is marked as
'unloadable' if it has a constructed thread_local object and marked as
'unloadable' again when all the constructed thread_local objects
defined in it are destroyed.
* sysdeps/unix/sysv/linux/i386/i486/pthread_cond_timedwait.S
(__pthread_cond_timedwait): If possible use FUTEX_WAIT_BITSET to
directly use absolute timeout.
Since the FUTEX_WAIT operation takes a relative timeout, the
pthread_cond_timedwait and other timed function implementations have
to get a relative timeout from the absolute timeout parameter it gets
before it makes the futex syscall. This value is then converted back
into an absolute timeout within the kernel. This is a waste and has
hence been improved upon by a FUTEX_WAIT_BITSET operation (OR'd with
FUTEX_CLOCK_REALTIME to make the kernel use the realtime clock instead
of the default monotonic clock). This was implemented only in the x86
and sh assembly code and not in the C code. This patch implements
support for FUTEX_WAIT_BITSET whenever available (since linux-2.6.29)
for s390 and powerpc.
nptl/
* sysdeps/unix/sysv/linux/sparc/lowlevellock.h (BUSY_WAIT_NOP):
Define when we have v9 instructions available.
* sysdeps/unix/sysv/linux/sparc/sparc64/cpu_relax.S: New file.
* sysdeps/unix/sysv/linux/sparc/sparc32/sparcv9/cpu_relax.S: New
file.
* sysdeps/unix/sysv/linux/sparc/sparc32/sparcv9/Makefile: New
file.
* sysdeps/unix/sysv/linux/sparc/sparc64/Makefile: Add cpu_relax
to libpthread-routines.
The macro pthread_cleanup_push_defer_np in pthread.h has a misaligned
line continuation marker. This marker was previously aligned, but
recent changes have moved it out of alignment. This change realigns
the marker. This also reduces the diff against the hppa version of
pthread.h where the marker is aligned.
[BZ #14652]
When a thread waiting in pthread_cond_wait with a PI mutex is
cancelled after it has returned successfully from the futex syscall
but just before async cancellation is disabled, it enters its
cancellation handler with the mutex held and simply calling a
mutex_lock again will result in a deadlock. Hence, it is necessary to
see if the thread owns the lock and try to lock it only if it doesn't.
[BZ #14568]
* sysdeps/sparc/tls.h (DB_THREAD_SELF_INCLUDE): Delete.
(DB_THREAD_SELF): Use constants for the register offsets. Correct
the case of a 64-bit debugger with a 32-bit inferior.
[BZ #14417]
A futex call with FUTEX_WAIT_REQUEUE_PI returns with the mutex locked
on success. If such a successful thread is pipped to the cond_lock by
another spuriously woken waiter, it could be sent back to wait on the
futex with the mutex lock held, thus causing a deadlock. So it is
necessary that the thread relinquishes the mutex before going back to
sleep.
[BZ #14477]
Add an additional entry in the exception table to jump to
__condvar_w_cleanup2 instead of __condvar_w_cleanup for PI mutexes
when %ebx contains the address of the futex instead of the condition
variable.
Ref gcc.gnu.org/bugzilla/show_bug.cgi?id=52839#c10
Release barriers are needed to ensure that any memory written by
init_routine is seen by other threads before *once_control changes.
In the case of clear_once_control we need to flush any partially
written state.
In some cases, the compiler would optimize out the call to
allocate_and_test and thus result in a false positive for the test
case. Another problem was the fact that the compiler could in some
cases generate additional shifting of the stack pointer, resulting in
alloca moving the stack pointer beyond what is allowed by the
rlimit. Hence, accessing the stackaddr returned by pthread_getattr_np
is safer than relying on the alloca'd result.
Another problem is when RLIMIT may be very large, which may result in
violation of other resource limits. Hence we cap the max stack size to
8M for this test.
When rlimit is small enough to be used as the stacksize to be returned
in pthread_getattr_np, cases where a stack is made executable due to a
DSO load get stack size that is larger than what the kernel
allows. This is because in such a case the stack size does not account
for the pages that have auxv and program arguments.
Additionally, the stacksize for the process derived from this should
be truncated to align to page size to avoid going beyond rlimit.
When a stack is marked executable due to loading a DSO that requires
an executable stack, the logic tends to leave out a portion of stack
after the first frame, thus causing a difference in the value returned
by pthread_getattr_np before and after the stack is marked
executable. It ought to be possible to fix this by marking the rest of
the stack as executable too, but in the interest of marking as less of
the stack as executable as possible, the path this fix takes is to
make pthread_getattr_np also look at the first frame as the underflow
end of the stack and compute size and stack top accordingly.
The above happens only for the main process stack. NPTL thread stacks
are not affected by this change.