Commit Graph

243 Commits

Author SHA1 Message Date
Adhemerval Zanella
f338c7c5f5 math: Use log10p1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic log10p1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      68.5251        32.2627        52.92%
x86_64v2                    68.8912        32.7887        52.41%
x86_64v3                    59.3427        27.0521        54.41%
i686                        162.026        103.383        36.19%
aarch64                     26.8513        14.5695        45.74%
power10                     12.7426         8.4929        33.35%
powerpc                     16.6768        9.29135        44.29%

reciprocal-throughput        master        patched   improvement
x86_64                      26.0969        12.4023        52.48%
x86_64v2                    25.0045        11.0748        55.71%
x86_64v3                    20.5610        10.2995        49.91%
i686                        89.8842        78.5211        12.64%
aarch64                     17.1200         9.4832        44.61%
power10                      6.7814         6.4258         5.24%
powerpc                      15.769         7.6825        51.28%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:40 -03:00
Adhemerval Zanella
8ae9e51376 math: Use log1pf from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows slight better performance to the generic log1pf.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (M1,
gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      71.8142        38.9668        45.74%
x86_64v2                    71.9094        39.1321        45.58%
x86_64v3                    60.1000        32.4016        46.09%
i686                        147.105        104.258        29.13%
aarch64                     26.4439        14.0050        47.04%
power10                     19.4874         9.4146        51.69%
powerpc                     17.6145        8.00736        54.54%

reciprocal-throughput        master        patched   improvement
x86_64                      19.7604        12.7254        35.60%
x86_64v2                    19.0039        11.9455        37.14%
x86_64v3                    16.8559        11.9317        29.21%
i686                        82.3426        73.9718        10.17%
aarch64                     14.4665         7.9614        44.97%
power10                     11.9974         8.4117        29.89%
powerpc                     7.15222         6.0914        14.83%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:39 -03:00
Adhemerval Zanella
c369580814 math: Use log2p1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic log2p1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      70.1462        47.0090        32.98%
x86_64v2                    70.2513        47.6160        32.22%
x86_64v3                    60.4840        39.9443        33.96%
i686                        164.068        122.909        25.09%
aarch64                     25.9169        16.9207        34.71%
power10                     18.1261        9.8592         45.61%
powerpc                     17.2683        9.38665        45.64%

reciprocal-throughput        master        patched   improvement
x86_64                      26.2240        16.4082        37.43%
x86_64v2                    25.0911        15.7480        37.24%
x86_64v3                    20.9371        11.7264        43.99%
i686                        90.4209        95.3073        -5.40%
aarch64                     16.8537        8.9561         46.86%
power10                     12.9401        6.5555         49.34%
powerpc                     9.01763        7.54745        16.30%

The performance decrease for i686 is mostly due the use of x87 fpu,
when building with '-msse2 -mfpmath=sse:

                             master        patched   improvement
latency                     164.068        102.982        37.23%
reciprocal-throughput       89.1968        82.5117         7.49%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:39 -03:00
Adhemerval Zanella
bbd578b38d math: Use expm1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic expm1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      96.7402        36.4026        62.37%
x86_64v2                    97.5391        33.4625        65.69%
x86_64v3                    82.1778        30.8668        62.44%
i686                         120.58        94.8302        21.35%
aarch64                     32.3558        12.8881        60.17%
power10                     23.5087        9.8574         58.07%
powerpc                     23.4776        9.06325        61.40%

reciprocal-throughput        master        patched   improvement
x86_64                      27.8224        15.9255        42.76%
x86_64v2                    27.8364        9.6438         65.36%
x86_64v3                    20.3227        9.6146         52.69%
i686                        63.5629        59.4718         6.44%
aarch64                     17.4838        7.1082         59.34%
power10                     12.4644        8.7829         29.54%
powerpc                     14.2152        5.94765        58.16%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:35 -03:00
Adhemerval Zanella
5c22fd25c1 math: Use exp2m1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic exp2m1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).  The
only change is to handle FLT_MAX_EXP for FE_DOWNWARD or FE_TOWARDZERO.

The benchmark inputs are based on exp2f ones.

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      40.6042        48.7104       -19.96%
x86_64v2                    40.7506        35.9032        11.90%
x86_64v3                    35.2301        31.7956        9.75%
i686                        102.094        94.6657        7.28%
aarch64                     18.2704        15.1387        17.14%
power10                     11.9444         8.2402        31.01%

reciprocal-throughput        master        patched   improvement
x86_64                      20.8683        16.1428        22.64%
x86_64v2                    19.5076        10.4474        46.44%
x86_64v3                    19.2106        10.4014        45.86%
i686                        56.4054        59.3004        -5.13%
aarch64                     12.0781         7.3953        38.77%
power10                      6.5306         5.9388         9.06%

The generic implementation calls __ieee754_exp2f and x86_64 provides
an optimized ifunc version (built with -mfma -mavx2, not correctly
rounded).  This explains the performance difference for x86_64.

Same for i686, where the ABI provides an optimized __ieee754_exp2f
version built with '-msse2 -mfpmath=sse'.  When built wth same
flags, the new algorithm shows a better performance:

                            master        patched    improvement
latency                    102.094        91.2823         10.59%
reciprocal-throughput      56.4054        52.7984          6.39%

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:35 -03:00
Adhemerval Zanella
5fa89852fa math: Use exp10m1f from CORE-MATH
The CORE-MATH implementation is correctly rounded (for any rounding mode)
and shows better performance compared to the generic exp10m1f.

The code was adapted to glibc style and to use the definition of
math_config.h (to handle errno, overflow, and underflow).  I mostly
fixed some small issues in corner cases (sNaN handling, -INFINITY,
a specific overflow check).

Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (Neoverse-N1,
gcc 13.3.1), and powerpc (POWER10, gcc 13.2.1):

Latency                      master        patched   improvement
x86_64                      45.4690        49.5845        -9.05%
x86_64v2                    46.1604        36.2665        21.43%
x86_64v3                    37.8442        31.0359        17.99%
i686                        121.367        93.0079        23.37%
aarch64                     21.1126        15.0165        28.87%
power10                     12.7426        8.4929         33.35%

reciprocal-throughput        master        patched   improvement
x86_64                      19.6005        17.4005        11.22%
x86_64v2                    19.6008        11.1977        42.87%
x86_64v3                    17.5427        10.2898        41.34%
i686                        59.4215        60.9675        -2.60%
aarch64                     13.9814        7.9173         43.37%
power10                      6.7814        6.4258          5.24%

The generic implementation calls __ieee754_exp10f which has an
optimized version, although it is not correctly rounded, which is
the main culprit of the the latency difference for x86_64 and
throughp for i686.

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-01 11:27:26 -03:00
Paul Zimmermann
392b3f0971 replace tgammaf by the CORE-MATH implementation
The CORE-MATH implementation is correctly rounded (for any rounding mode).
This can be checked by exhaustive tests in a few minutes since there are
less than 2^32 values to check against for example GNU MPFR.
This patch also adds some bench values for tgammaf.

Tested on x86_64 and x86 (cfarm26).

With the initial GNU libc code it gave on an Intel(R) Core(TM) i7-8700:

      "tgammaf": {
       "": {
        "duration": 3.50188e+09,
        "iterations": 2e+07,
        "max": 602.891,
        "min": 65.1415,
        "mean": 175.094
       }
      }

With the new code:

      "tgammaf": {
       "": {
        "duration": 3.30825e+09,
        "iterations": 5e+07,
        "max": 211.592,
        "min": 32.0325,
        "mean": 66.1649
       }
      }

With the initial GNU libc code it gave on cfarm26 (i686):

  "tgammaf": {
   "": {
    "duration": 3.70505e+09,
    "iterations": 6e+06,
    "max": 2420.23,
    "min": 243.154,
    "mean": 617.509
   }
  }

With the new code:

  "tgammaf": {
   "": {
    "duration": 3.24497e+09,
    "iterations": 1.8e+07,
    "max": 1238.15,
    "min": 101.155,
    "mean": 180.276
   }
  }

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

Changes in v2:
    - include <math.h> (fix the linknamespace failures)
    - restored original benchtests/strcoll-inputs/filelist#en_US.UTF-8 file
    - restored original wrapper code (math/w_tgammaf_compat.c),
      except for the dealing with the sign
    - removed the tgammaf/float entries in all libm-test-ulps files
    - address other comments from Joseph Myers
      (https://sourceware.org/pipermail/libc-alpha/2024-July/158736.html)

Changes in v3:
    - pass NULL argument for signgam from w_tgammaf_compat.c
    - use of math_narrow_eval
    - added more comments

Changes in v4:
    - initialize local_signgam to 0 in math/w_tgamma_template.c
    - replace sysdeps/ieee754/dbl-64/gamma_productf.c by dummy file

Changes in v5:
    - do not mention local_signgam any more in math/w_tgammaf_compat.c
    - initialize local_signgam to 1 instead of 0 in w_tgamma_template.c
      and added comment

Changes in v6:
    - pass NULL as 2nd argument of __ieee754_gammaf_r in
      w_tgammaf_compat.c, and check for NULL in e_gammaf_r.c

Changes in v7:
    - added Signed-off-by line for Alexei Sibidanov (author of the code)

Changes in v8:
    - added Signed-off-by line for Paul Zimmermann (submitted of the patch)

Changes in v9:
    - address comments from review by Adhemerval Zanella
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-10-11 11:12:32 +02:00
John David Anglin
3fc1d3bc33 hppa: Update libm-test-ulps 2024-09-09 09:57:42 -04:00
Adhemerval Zanella
745c3cc10f elf: Make dl-fptr and dl-symaddr hppa specific
With ia64 removal, the function descriptor supports is only used
by HPPA and new architectures do not seem leaning towards this
design.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-08-19 14:54:07 -03:00
John David Anglin
431c1be28e hppa: Update libm-test-ulps 2024-07-24 16:43:01 -04:00
John David Anglin
9dddb26954 Update hppa libm-test-ulps 2024-06-23 13:51:25 -04:00
John David Anglin
da61ba3f89 Update hppa libm-test-ulps 2024-06-20 19:44:04 -04:00
Andreas K. Hüttel
98ffc1bfeb
Convert to autoconf 2.72 (vanilla release, no distribution patches)
As discussed at the patch review meeting

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Simon Chopin <simon.chopin@canonical.com>
2024-06-17 21:15:28 +02:00
Joseph Myers
bb014f50c4 Implement C23 logp1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the logp1 functions (aliases for log1p functions - the
name is intended to be more consistent with the new log2p1 and
log10p1, where clearly it would have been very confusing to name those
functions log21p and log101p).  As aliases rather than new functions,
the content of this patch is somewhat different from those actually
adding new functions.

Tests are shared with log1p, so this patch *does* mechanically update
all affected libm-test-ulps files to expect the same errors for both
functions.

The vector versions of log1p on aarch64 and x86_64 are *not* updated
to have logp1 aliases (and thus there are no corresponding header,
tests, abilist or ulps changes for vector functions either).  It would
be reasonable for such vector aliases and corresponding changes to
other files to be made separately.  For now, the log1p tests instead
avoid testing logp1 in the vector case (a Makefile change is needed to
avoid problems with grep, used in generating the .c files for vector
function tests, matching more than one ALL_RM_TEST line in a file
testing multiple functions with the same inputs, when it assumes that
the .inc file only has a single such line).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 13:47:09 +00:00
Florian Weimer
4d4da5aab9 login: Check default sizes of structs utmp, utmpx, lastlog
The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-19 14:38:17 +02:00
Joseph Myers
42cc619dfb Refer to C23 in place of C2X in glibc
WG14 decided to use the name C23 as the informal name of the next
revision of the C standard (notwithstanding the publication date in
2024).  Update references to C2X in glibc to use the C23 name.

This is intended to update everything *except* where it involves
renaming files (the changes involving renaming tests are intended to
be done separately).  In the case of the _ISOC2X_SOURCE feature test
macro - the only user-visible interface involved - support for that
macro is kept for backwards compatibility, while adding
_ISOC23_SOURCE.

Tested for x86_64.
2024-02-01 11:02:01 +00:00
Paul Eggert
dff8da6b3e Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
Bruno Haible
d082930272 hppa: Fix undefined behaviour in feclearexcept (BZ 30983)
The expression

  (excepts & FE_ALL_EXCEPT) << 27

produces a signed integer overflow when 'excepts' is specified as
FE_INVALID (= 0x10), because
  - excepts is of type 'int',
  - FE_ALL_EXCEPT is of type 'int',
  - thus (excepts & FE_ALL_EXCEPT) is (int) 0x10,
  - 'int' is 32 bits wide.

The patched code produces the same instruction sequence as
previosuly.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-12-19 15:12:38 -03:00
Adhemerval Zanella
55f41ef8de elf: Remove LD_PROFILE for static binaries
The _dl_non_dynamic_init does not parse LD_PROFILE, which does not
enable profile for dlopen objects.  Since dlopen is deprecated for
static objects, it is better to remove the support.

It also allows to trim down libc.a of profile support.

Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-11-21 16:15:42 -03:00
Siddhesh Poyarekar
c6cb8783b5 configure: Use autoconf 2.71
Bump autoconf requirement to 2.71 to allow regenerating configure on
more recent distributions.  autoconf 2.71 has been in Fedora since F36
and is the current version in Debian stable (bookworm).  It appears to
be current in Gentoo as well.

All sysdeps configure and preconfigure scripts have also been
regenerated; all changes are trivial transformations that do not affect
functionality.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-07-17 10:08:10 -04:00
Adhemerval Zanella
dddc88587a sparc: Fix la_symbind for bind-now (BZ 23734)
The sparc ABI has multiple cases on how to handle JMP_SLOT relocations,
(sparc_fixup_plt/sparc64_fixup_plt).  For BINDNOW, _dl_audit_symbind
will be responsible to setup the final relocation value; while for
lazy binding _dl_fixup/_dl_profile_fixup will call the audit callback
and tail cail elf_machine_fixup_plt (which will call
sparc64_fixup_plt).

This patch fixes by issuing the SPARC specific routine on bindnow and
forwarding the audit value to elf_machine_fixup_plt for lazy resolution.
It fixes the la_symbind for bind-now tests on sparc64 and sparcv9:

  elf/tst-audit24a
  elf/tst-audit24b
  elf/tst-audit24c
  elf/tst-audit24d

Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2023-07-12 15:29:08 -03:00
John David Anglin
5000549746 Revert "hppa: Drop 16-byte pthread lock alignment"
This change reverts commits c4468cd399
and ab991a3d1b.
2023-07-06 15:47:50 +00:00
Paul Pluzhnikov
2cbeda847b Fix a few more typos I missed in previous round -- BZ 25337 2023-06-02 23:46:32 +00:00
Paul Pluzhnikov
65cc53fe7c Fix misspellings in sysdeps/ -- BZ 25337 2023-05-30 23:02:29 +00:00
Andreas Schwab
ea08d8dcea Remove last remnants of have-protected 2023-05-22 13:31:04 +02:00
Sam James
c8bd171caf hppa: Fix 'concurrency' typo in comment
Signed-off-by: Sam James <sam@gentoo.org>
2023-05-05 10:12:39 +01:00
John David Anglin
c4468cd399 hppa: Update struct __pthread_rwlock_arch_t comment.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
2023-04-05 18:54:47 +00:00
John David Anglin
ab991a3d1b hppa: Drop 16-byte pthread lock alignment
Linux threads were removed about 12 years ago and the current
nptl implementation only requires 4-byte alignment for pthread
locks.

The 16-byte alignment causes various issues. For example in
building ignition-msgs, we have:

/usr/include/google/protobuf/map.h:124:37: error: static assertion failed
  124 |   static_assert(alignof(value_type) <= 8, "");
      |                 ~~~~~~~~~~~~~~~~~~~~^~~~

This is caused by the 16-byte pthread lock alignment.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2023-03-26 21:16:22 +00:00
Richard Henderson
c62b1c29c2 hppa: Add string-fza.h, string-fzc.h, and string-fzi.h
Use UXOR,SBZ to test for a zero byte within a word.  While we can
get semi-decent code out of asm-goto, we would do slightly better
with a compiler builtin.

For index_zero et al, sequential testing of bytes is less expensive than
any tricks that involve a count-leading-zeros insn that we don't have.

Checked on hppa-linux-gnu.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-02-06 16:19:35 -03:00
Richard Henderson
be836d9153 hppa: Add memcopy.h
GCC's combine pass cannot merge (x >> c | y << (32 - c)) into a
double-word shift unless (1) the subtract is in the same basic block
and (2) the result of the subtract is used exactly once.  Neither
condition is true for any use of MERGE.

By forcing the use of a double-word shift, we not only reduce
contention on SAR, but also allow the setting of SAR to be hoisted
outside of a loop.

Checked on hppa-linux-gnu.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-02-06 16:19:35 -03:00
Joseph Myers
6d7e8eda9b Update copyright dates with scripts/update-copyrights 2023-01-06 21:14:39 +00:00
Florian Weimer
1f34a23288 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249)
This makes it more likely that the compiler can compute the strlen
argument in _startup_fatal at compile time, which is required to
avoid a dependency on strlen this early during process startup.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2022-11-03 17:28:03 +01:00
John David Anglin
b7bd94068e hppa: Fix initialization of dp register [BZ 29635]
After upgrading glibc to Debian 2.35-1, gdb faulted on
startup and dropped core in a function call in the main
application.  This was caused by not initializing the
global dp register for the main application early enough.

Restore the code to initialize dp in _dl_start_user.
It was removed when code was added to initialize dp in
elf_machine_runtime_setup.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2022-10-01 19:49:25 +00:00
Wilco Dijkstra
22f4ab2d20 Use atomic_exchange_release/acquire
Rename atomic_exchange_rel/acq to use atomic_exchange_release/acquire
since these map to the standard C11 atomic builtins.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-09-26 16:58:08 +01:00
Adhemerval Zanella
6242602273 hppa: Remove _dl_skip_args usage (BZ# 29165)
Different than other architectures, hppa creates an unrelated stack
frame where ld.so argc/argv adjustments done by ad43cac44a
is not done on the argc/argv saved/restore by _dl_start_user.

Instead load _dl_argc and _dl_argv directlty instead of adjust them
using _dl_skip_args value.

Checked on hppa-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-05-30 16:32:35 -03:00
Fangrui Song
098a657fe4 elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC
PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage
variables and hidden visibility variables in a shared object (ld.so)
need dynamic relocations (usually R_*_RELATIVE). PI (position
independent) in the macro name is a misnomer: a code sequence using GOT
is typically position-independent as well, but using dynamic relocations
does not meet the requirement.

Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new
ports will define PI_STATIC_AND_HIDDEN. Current ports defining
PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure
default.

No functional change.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-04-26 09:26:22 -07:00
Fangrui Song
3ee318c923 Remove -z combreloc and HAVE_Z_COMBRELOC
-z combreloc has been the default regadless of the architecture since
binutils commit f4d733664aabd7bd78c82895e030ec9779a92809 (2002). The
configure check added in commit fdde83499a (2001) has long been
unneeded.

We can therefore treat HAVE_Z_COMBRELOC as always 1 and delete dead code
paths in dl-machine.h files (many were copied from commit a711b01d34
and ee0cb67ec2).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-04-04 17:19:07 -07:00
John David Anglin
d2224ffbdd hppa: Fix warnings from _dl_lookup_address
This change fixes two warnings from _dl_lookup_address.

The first warning comes from dropping the volatile keyword from
desc in the call to _dl_read_access_allowed.  We now have a full
atomic barrier between loading desc[0] and the access check, so
desc no longer needs to be declared as volatile.

The second warning comes from the implicit declaration of
_dl_fix_reloc_arg.  This is fixed by including dl-runtime.h and
declaring _dl_fix_reloc_arg in dl-runtime.h.
2022-02-22 18:51:35 +00:00
John David Anglin
17c57d70bd hppa: Fix typo 2022-02-14 17:41:59 +00:00
John David Anglin
2e20cd63c9 Fix elf/tst-audit2 on hppa
The test elf/tst-audit2 fails on hppa with a segmentation fault in the
long branch stub used to call malloc from calloc.  This occurs because
the test is not a PIC executable and calloc is called from the dynamic
linker before the dp register is initialized in _dl_start_user.

The fix is to move the dp register initialization into
elf_machine_runtime_setup.  Since the address of $global$ can't be
loaded directly, we continue to use the DT_PLTGOT value from the
the main_map to initialize dp.
2022-02-14 15:14:49 +00:00
Adhemerval Zanella
9e94f57484 hppa: Fix bind-now audit (BZ #28857)
On hppa, a function pointer returned by la_symbind is actually a function
descriptor has the plabel bit set (bit 30).  This must be cleared to get
the actual address of the descriptor.  If the descriptor has been bound,
the first word of the descriptor is the physical address of theA function,
otherwise, the first word of the descriptor points to a trampoline in the
PLT.

This patch also adds a workaround on tests because on hppa (and it seems
to be the only ABI I have see it), some shared library adds a dynamic PLT
relocation to am empty symbol name:

$ readelf -r elf/tst-audit25mod1.so
[...]
Relocation section '.rela.plt' at offset 0x464 contains 6 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
00002008  00000081 R_PARISC_IPLT                508
[...]

It breaks some assumptions on the test, where a symbol with an empty
name ("") is passed on la_symbind.

Checked on x86_64-linux-gnu and hppa-linux-gnu.
2022-02-09 08:47:42 -03:00
Adhemerval Zanella
32612615c5 elf: Issue la_symbind for bind-now (BZ #23734)
The audit symbind callback is not called for binaries built with
-Wl,-z,now or when LD_BIND_NOW=1 is used, nor the PLT tracking callbacks
(plt_enter and plt_exit) since this would change the expected
program semantics (where no PLT is expected) and would have performance
implications (such as for BZ#15533).

LAV_CURRENT is also bumped to indicate the audit ABI change (where
la_symbind flags are set by the loader to indicate no possible PLT
trace).

To handle powerpc64 ELFv1 function descriptor, _dl_audit_symbind
requires to know whether bind-now is used so the symbol value is
updated to function text segment instead of the OPD (for lazy binding
this is done by PPC64_LOAD_FUNCPTR on _dl_runtime_resolve).

Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
powerpc64-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-02-01 14:49:46 -03:00
Florian Weimer
6b0978c14a Restore ENTRY_POINT definition on hppa, ia64 (bug 28749)
ENTRY_POINT is still needed for elf/rtld.c.  Fixes commit 4fb4e7e821
("csu: Always use __executable_start in gmon-start.c").
2022-01-07 14:47:31 +01:00
Paul Eggert
581c785bf3 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.

I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah.  I don't
know why I run into these diagnostics whereas others evidently do not.

remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Adhemerval Zanella
83b8d5027d malloc: Remove memusage.h
And use machine-sp.h instead.  The Linux implementation is based on
already provided CURRENT_STACK_FRAME (used on nptl code) and
STACK_GROWS_UPWARD is replaced with _STACK_GROWS_UP.
2021-12-28 14:57:57 -03:00
Adhemerval Zanella
8c0664e2b8 elf: Add _dl_audit_pltexit
It consolidates the code required to call la_pltexit audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-28 08:40:38 -03:00
Adhemerval Zanella
691d9ae9e6 Remove ununsed tcb-offset
Some architectures do not use the auto-generated tcb-offsets.h.
2021-12-17 17:47:29 -03:00
Siddhesh Poyarekar
23645707f1 Replace --enable-static-pie with --disable-default-pie
Build glibc programs and tests as PIE by default and enable static-pie
automatically if the architecture and toolchain supports it.

Also add a new configuration option --disable-default-pie to prevent
building programs as PIE.

Only the following architectures now have PIE disabled by default
because they do not work at the moment.  hppa, ia64, alpha and csky
don't work because the linker is unable to handle a pcrel relocation
generated from PIE objects.  The microblaze compiler is currently
failing with an ICE.  GNU hurd tries to enable static-pie, which does
not work and hence fails.  All these targets have default PIE disabled
at the moment and I have left it to the target maintainers to enable PIE
on their targets.

build-many-glibcs runs clean for all targets.  I also tested x86_64 on
Fedora and Ubuntu, to verify that the default build as well as
--disable-default-pie work as expected with both system toolchains.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-12-13 08:08:59 +05:30
Florian Weimer
627f5ede70 Remove TLS_TCB_ALIGN and TLS_INIT_TCB_ALIGN
TLS_INIT_TCB_ALIGN is not actually used.  TLS_TCB_ALIGN was likely
introduced to support a configuration where the thread pointer
has not the same alignment as THREAD_SELF.  Only ia64 seems to use
that, but for the stack/pointer guard, not for storing tcbhead_t.
Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift
the thread pointer, potentially landing in a different residue class
modulo the alignment, but the changes should not impact that.

In general, given that TLS variables have their own alignment
requirements, having different alignment for the (unshifted) thread
pointer and struct pthread would potentially result in dynamic
offsets, leading to more complexity.

hppa had different values before: __alignof__ (tcbhead_t), which
seems to be 4, and __alignof__ (struct pthread), which was 8
(old default) and is now 32.  However, it defines THREAD_SELF as:

/* Return the thread descriptor for the current thread.  */
# define THREAD_SELF \
  ({ struct pthread *__self;			\
	__self = __get_cr27();			\
	__self - 1;				\
   })

So the thread pointer points after struct pthread (hence __self - 1),
and they have to have the same alignment on hppa as well.

Similarly, on ia64, the definitions were different.  We have:

# define TLS_PRE_TCB_SIZE \
  (sizeof (struct pthread)						\
   + (PTHREAD_STRUCT_END_PADDING < 2 * sizeof (uintptr_t)		\
      ? ((2 * sizeof (uintptr_t) + __alignof__ (struct pthread) - 1)	\
	 & ~(__alignof__ (struct pthread) - 1))				\
      : 0))
# define THREAD_SELF \
  ((struct pthread *) ((char *) __thread_self - TLS_PRE_TCB_SIZE))

And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment
(confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c).

On m68k, we have a larger gap between tcbhead_t and struct pthread.
But as far as I can tell, the port is fine with that.  The definition
of TCB_OFFSET is sufficient to handle the shifted TCB scenario.

This fixes commit 23c77f6018
("nptl: Increase default TCB alignment to 32").

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-09 23:47:49 +01:00
Florian Weimer
ce2248ab91 nptl: Introduce <tcb-access.h> for THREAD_* accessors
These are common between most architectures.  Only the x86 targets
are outliers.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09 09:49:32 +01:00