Commit Graph

9 Commits

Author SHA1 Message Date
Joseph Myers
688903eb3e Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
Adhemerval Zanella
4e00196912 aarch64: fix memset with --disable-multi-arch
* sysdeps/aarch64/memset.S (MEMSET): Define.
2017-12-20 12:05:32 +00:00
Siddhesh Poyarekar
5a67c4fa01 aarch64: Optimized memset for falkor
The generic memset reads dczid_el0 on every memset.  This has a
significant impact on falkor for a range of sizes because reading
dczid_el0 is slow.

The DZP bit in the dczid_el0 register does not change dynamically, so
it is safe to read once during program startup.  With this patch
dczid_el0 is read once during startup and zva_size is cached.  This is
used to invoke the falkor-specific memset; the generic memset routine
remains unchanged.

The gains due to this are significant for falkor, with run time
reductions as high as 48%.  Here's a sample from the falkor tests:

Function: memset
Variant: walk
                      simple_memset	__memset_falkor	__memset_generic
=====================================================================
length=256, char=0:   139.96 (-698.28%)	   9.07 ( 48.26%)  17.53
length=257, char=0:   140.50 (-699.03%)	   9.53 ( 45.80%)  17.58
length=258, char=0:   140.96 (-703.95%)	   9.58 ( 45.36%)  17.53
length=259, char=0:   141.56 (-705.16%)	   9.53 ( 45.79%)  17.58
length=260, char=0:   142.15 (-710.76%)	   9.57 ( 45.39%)  17.53
length=261, char=0:   142.50 (-710.39%)	   9.53 ( 45.78%)  17.58
length=262, char=0:   142.97 (-715.09%)	   9.57 ( 45.42%)  17.54
length=263, char=0:   143.51 (-716.18%)	   9.53 ( 45.80%)  17.58
length=264, char=0:   143.93 (-720.55%)	   9.58 ( 45.39%)  17.54
length=265, char=0:   144.56 (-722.07%)	   9.53 ( 45.80%)  17.59
length=266, char=0:   144.98 (-726.42%)	   9.58 ( 45.42%)  17.54
length=267, char=0:   145.53 (-727.53%)	   9.53 ( 45.80%)  17.59
length=268, char=0:   146.25 (-731.81%)	   9.53 ( 45.79%)  17.58
length=269, char=0:   146.52 (-735.39%)	   9.53 ( 45.66%)  17.54
length=270, char=0:   146.97 (-735.81%)	   9.53 ( 45.80%)  17.58
length=271, char=0:   147.54 (-741.08%)	   9.58 ( 45.38%)  17.54
length=512, char=0:   268.26 (-1307.85%)  12.06 ( 36.71%)  19.05
length=513, char=0:   268.73 (-1273.89%)  13.56 ( 30.68%)  19.56
length=514, char=0:   269.31 (-1276.89%)  13.56 ( 30.68%)  19.56
length=515, char=0:   269.73 (-1279.05%)  13.56 ( 30.68%)  19.56
length=516, char=0:   270.34 (-1282.24%)  13.56 ( 30.67%)  19.56
length=517, char=0:   270.83 (-1284.71%)  13.56 ( 30.66%)  19.56
length=518, char=0:   271.20 (-1286.54%)  13.56 ( 30.67%)  19.56
length=519, char=0:   271.67 (-1288.67%)  13.65 ( 30.24%)  19.56
length=520, char=0:   272.14 (-1291.04%)  13.65 ( 30.22%)  19.56
length=521, char=0:   272.66 (-1293.69%)  13.65 ( 30.23%)  19.56
length=522, char=0:   273.14 (-1296.13%)  13.65 ( 30.20%)  19.56
length=523, char=0:   273.64 (-1298.75%)  13.65 ( 30.23%)  19.56
length=524, char=0:   274.34 (-1302.16%)  13.66 ( 30.20%)  19.57
length=525, char=0:   274.64 (-1297.78%)  13.56 ( 30.99%)  19.65
length=526, char=0:   275.20 (-1300.04%)  13.56 ( 31.01%)  19.66
length=527, char=0:   275.66 (-1302.86%)  13.56 ( 30.99%)  19.65
length=1024, char=0:  524.46 (-2169.75%)  20.12 ( 12.92%)  23.11
length=1025, char=0:  525.14 (-2124.63%)  21.62 (  8.40%)  23.61
length=1026, char=0:  525.59 (-2125.36%)  21.88 (  7.37%)  23.62
length=1027, char=0:  525.98 (-2127.14%)  21.62 (  8.46%)  23.62
length=1028, char=0:  526.68 (-2131.10%)  21.62 (  8.42%)  23.61
length=1029, char=0:  527.10 (-2131.70%)  21.79 (  7.73%)  23.62
length=1030, char=0:  527.54 (-2118.51%)  21.62 (  9.10%)  23.78
length=1031, char=0:  527.98 (-2136.37%)  21.62 (  8.43%)  23.61
length=1032, char=0:  528.70 (-2139.38%)  21.62 (  8.43%)  23.61
length=1033, char=0:  529.25 (-2124.37%)  21.62 (  9.11%)  23.79
length=1034, char=0:  529.48 (-2142.95%)  21.62 (  8.43%)  23.61
length=1035, char=0:  530.11 (-2145.13%)  21.62 (  8.44%)  23.61
length=1036, char=0:  530.76 (-2147.10%)  21.79 (  7.73%)  23.62
length=1037, char=0:  531.03 (-2149.45%)  21.62 (  8.42%)  23.61
length=1038, char=0:  531.64 (-2151.87%)  21.62 (  8.42%)  23.61
length=1039, char=0:  531.99 (-2151.63%)  21.80 (  7.75%)  23.63

	* sysdeps/aarch64/memset-reg.h: New file.
	* sysdeps/aarch64/memset.S: Use it.
	(__memset): Rename to MEMSET macro.
	[ZVA_MACRO]: Use zva_macro.
	* sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
	Add memset_generic and memset_falkor.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add memset ifuncs.
	* sysdeps/aarch64/multiarch/init-arch.h (INIT_ARCH): New
	local variable zva_size.
	* sysdeps/aarch64/multiarch/memset.c: New file.
	* sysdeps/aarch64/multiarch/memset_generic.S: New file.
	* sysdeps/aarch64/multiarch/memset_falkor.S: New file.
	* sysdeps/aarch64/multiarch/rtld-memset.S: New file.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
	(DCZID_DZP_MASK): New macro.
	(DCZID_BS_MASK): Likewise.
	(init_cpu_features): Read and set zva_size.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h
	(struct cpu_features): New member zva_size.
2017-11-20 18:25:04 +05:30
Joseph Myers
bfff8b1bec Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
Steve Ellcey
389d1f1b23 Partial ILP32 support for aarch64.
* sysdeps/aarch64/crti.S: Add include of sysdep.h.
	(call_weak_fn): Use PTR_REG to get correct reg name in ILP32.
	* sysdeps/aarch64/dl-irel.h: Add include of sysdep.h.
	(elf_irela): Use AARCH64_R macro to get correct relocation in ILP32.
	* sysdeps/aarch64/dl-machine.h: Add include of sysdep.h.
	(elf_machine_load_address, RTLD_START, RTLD_START_1, RTLD_START,
	elf_machine_type_class, ELF_MACHINE_JMP_SLOT, elf_machine_rela,
	elf_machine_lazy_rel): Add ifdef's for ILP32 support.
	* sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_return,
	_dl_tlsdesc_return_lazy, _dl_tlsdesc_dynamic,
	_dl_tlsdesc_resolve_hold): Extend pointers in ILP32, use PTR_REG
	to get correct reg name for ILP32.
	* sysdeps/aarch64/dl-trampoline.S (ip01): New Macro.
	(RELA_SIZE): New Macro.
	(_dl_runtime_resolve, _dl_runtime_profile): Use new macros and PTR_REG
	to support ILP32.
	* sysdeps/aarch64/jmpbuf-unwind.h (_JMPBUF_CFA_UNWINDS_ADJ): Add
	cast for ILP32 mode.
	* sysdeps/aarch64/memcmp.S (memcmp): Extend arg pointers for ILP32 mode.
	* sysdeps/aarch64/memcpy.S (memmove, memcpy): Ditto.
	* sysdeps/aarch64/memset.S (__memset): Ditto.
	* sysdeps/aarch64/strchr.S (strchr): Ditto.
	* sysdeps/aarch64/strchrnul.S (__strchrnul): Ditto.
	* sysdeps/aarch64/strcmp.S (strcmp): Ditto.
	* sysdeps/aarch64/strcpy.S (strcpy): Ditto.
	* sysdeps/aarch64/strlen.S (__strlen): Ditto.
	* sysdeps/aarch64/strncmp.S (strncmp): Ditto.
	* sysdeps/aarch64/strnlen.S (strnlen): Ditto.
	* sysdeps/aarch64/strrchr.S (strrchr): Ditto.
	* sysdeps/unix/sysv/linux/aarch64/clone.S: Ditto.
	* sysdeps/unix/sysv/linux/aarch64/setcontext.S (__setcontext): Ditto.
	* sysdeps/unix/sysv/linux/aarch64/swapcontext.S (__swapcontext): Ditto.
	* sysdeps/aarch64/__longjmp.S (__longjmp): Extend pointers in ILP32,
	change PTR_MANGLE call to use register numbers instead of names.
	* sysdeps/unix/sysv/linux/aarch64/getcontext.S (__getcontext): Ditto.
	* sysdeps/aarch64/setjmp.S (__sigsetjmp): Extend arg pointers for
	ILP32 mode, change PTR_MANGLE calls to use register numbers.
	* sysdeps/aarch64/start.S (_start): Ditto.
	* sysdeps/aarch64/nptl/bits/pthreadtypes.h
	(__PTHREAD_RWLOCK_INT_FLAGS_SHARED): New define.
	(__SIZEOF_PTHREAD_ATTR_T, __SIZEOF_PTHREAD_MUTEX_T,
	__SIZEOF_PTHREAD_MUTEXATTR_T, __SIZEOF_PTHREAD_COND_T,
	__SIZEOF_PTHREAD_COND_COMPAT_T, __SIZEOF_PTHREAD_CONDATTR_T,
	__SIZEOF_PTHREAD_RWLOCK_T, __SIZEOF_PTHREAD_RWLOCKATTR_T,
	__SIZEOF_PTHREAD_BARRIER_T, __SIZEOF_PTHREAD_BARRIERATTR_T):
	Make defined values dependent on __ILP32__.
	* sysdeps/aarch64/nptl/bits/semaphore.h (__SIZEOF_SEM_T): Change define.
	(sem_t): Change __align type.
	* sysdeps/aarch64/sysdep.h (AARCH64_R, PTR_REG, PTR_LOG_SIZE, DELOUSE,
	PTR_SIZE): New Macros.
	(LDST_PCREL, LDST_GLOBAL) Update to use PTR_REG.
	* sysdeps/unix/sysv/linux/aarch64/bits/fcntl.h (O_LARGEFILE):
	Set when in ILP32 mode.
	(F_GETLK64, F_SETLK64, F_SETLKW64): Only set in LP64 mode.
	* sysdeps/unix/sysv/linux/aarch64/dl-cache.h (DL_CACHE_DEFAULT_ID):
	Set elf flags for ILP32.
	(add_system_dir): Set ILP32 library directories.
	* sysdeps/unix/sysv/linux/aarch64/init-first.c
	(_libc_vdso_platform_setup): Set minimum kernel version for ILP32.
	* sysdeps/unix/sysv/linux/aarch64/ldconfig.h
	(SYSDEP_KNOWN_INTERPRETER_NAMES): Add ILP32 names.
	* sysdeps/unix/sysv/linux/aarch64/sigcontextinfo.h (GET_PC, SET_PC):
	New Macros.
	* sysdeps/unix/sysv/linux/aarch64/sysdep.h: Handle ILP32 pointers.
2016-11-28 09:01:23 -08:00
Wilco Dijkstra
a8c5a2a952 This is an optimized memset for AArch64. Memset is split into 4 main cases:
small sets of up to 16 bytes, medium of 16..96 bytes which are fully unrolled.
Large memsets of more than 96 bytes align the destination and use an unrolled
loop processing 64 bytes per iteration.  Memsets of zero of more than 256 use
the dc zva instruction, and there are faster versions for the common ZVA sizes
64 or 128.  STP of Q registers is used to reduce codesize without loss of
performance.

The speedup on test-memset is 1% on Cortex-A57 and 8% on Cortex-A53.

	* sysdeps/aarch64/memset.S (__memset):
	Rewrite of optimized memset.
2016-05-12 16:44:53 +01:00
Joseph Myers
f7a9f785e5 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
Joseph Myers
b168057aaa Update copyright dates with scripts/update-copyrights. 2015-01-02 16:29:47 +00:00
Marcus Shawcroft
75eff3fe90 Relocate AArch64 from ports to libc.
This patch moves the AArch64 port to the main sysdeps hierarchy.  The
move is essentially:

  git mv ports/sysdeps/aarch64 sysdeps/aarch64
  git mv ports/sysdeps/unix/sysv/linux/aarch64 sysdeps/unix/sysv/linux/aarch64

The README is updated and I've updated ChangeLog.aarch64 along the
lines of the ARM move.  The AArch64 build has been tested to confirm
that there were no changes in objdump -dr output or the shared
objects.
2014-02-11 11:36:00 +00:00