glibc/sysdeps/x86_64/nptl/pthread_mutex_backoff.h

40 lines
1.2 KiB
C
Raw Normal View History

nptl: Add backoff mechanism to spinlock loop When mutiple threads waiting for lock at the same time, once lock owner releases the lock, waiters will see lock available and all try to lock, which may cause an expensive CAS storm. Binary exponential backoff with random jitter is introduced. As try-lock attempt increases, there is more likely that a larger number threads compete for adaptive mutex lock, so increase wait time in exponential. A random jitter is also added to avoid synchronous try-lock from other threads. v2: Remove read-check before try-lock for performance. v3: 1. Restore read-check since it works well in some platform. 2. Make backoff arch dependent, and enable it for x86_64. 3. Limit max backoff to reduce latency in large critical section. v4: Fix strict-prototypes error in sysdeps/nptl/pthread_mutex_backoff.h v5: Commit log updated for regression in large critical section. Result of pthread-mutex-locks bench Test Platform: Xeon 8280L (2 socket, 112 CPUs in total) First Row: thread number First Col: critical section length Values: backoff vs upstream, time based, low is better non-critical-length: 1 1 2 4 8 16 32 64 112 140 0 0.99 0.58 0.52 0.49 0.43 0.44 0.46 0.52 0.54 1 0.98 0.43 0.56 0.50 0.44 0.45 0.50 0.56 0.57 2 0.99 0.41 0.57 0.51 0.45 0.47 0.48 0.60 0.61 4 0.99 0.45 0.59 0.53 0.48 0.49 0.52 0.64 0.65 8 1.00 0.66 0.71 0.63 0.56 0.59 0.66 0.72 0.71 16 0.97 0.78 0.91 0.73 0.67 0.70 0.79 0.80 0.80 32 0.95 1.17 0.98 0.87 0.82 0.86 0.89 0.90 0.90 64 0.96 0.95 1.01 1.01 0.98 1.00 1.03 0.99 0.99 128 0.99 1.01 1.01 1.17 1.08 1.12 1.02 0.97 1.02 non-critical-length: 32 1 2 4 8 16 32 64 112 140 0 1.03 0.97 0.75 0.65 0.58 0.58 0.56 0.70 0.70 1 0.94 0.95 0.76 0.65 0.58 0.58 0.61 0.71 0.72 2 0.97 0.96 0.77 0.66 0.58 0.59 0.62 0.74 0.74 4 0.99 0.96 0.78 0.66 0.60 0.61 0.66 0.76 0.77 8 0.99 0.99 0.84 0.70 0.64 0.66 0.71 0.80 0.80 16 0.98 0.97 0.95 0.76 0.70 0.73 0.81 0.85 0.84 32 1.04 1.12 1.04 0.89 0.82 0.86 0.93 0.91 0.91 64 0.99 1.15 1.07 1.00 0.99 1.01 1.05 0.99 0.99 128 1.00 1.21 1.20 1.22 1.25 1.31 1.12 1.10 0.99 non-critical-length: 128 1 2 4 8 16 32 64 112 140 0 1.02 1.00 0.99 0.67 0.61 0.61 0.61 0.74 0.73 1 0.95 0.99 1.00 0.68 0.61 0.60 0.60 0.74 0.74 2 1.00 1.04 1.00 0.68 0.59 0.61 0.65 0.76 0.76 4 1.00 0.96 0.98 0.70 0.63 0.63 0.67 0.78 0.77 8 1.01 1.02 0.89 0.73 0.65 0.67 0.71 0.81 0.80 16 0.99 0.96 0.96 0.79 0.71 0.73 0.80 0.84 0.84 32 0.99 0.95 1.05 0.89 0.84 0.85 0.94 0.92 0.91 64 1.00 0.99 1.16 1.04 1.00 1.02 1.06 0.99 0.99 128 1.00 1.06 0.98 1.14 1.39 1.26 1.08 1.02 0.98 There is regression in large critical section. But adaptive mutex is aimed for "quick" locks. Small critical section is more common when users choose to use adaptive pthread_mutex. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-05-06 01:50:10 +00:00
/* Pthread mutex backoff configuration.
Copyright (C) 2022-2023 Free Software Foundation, Inc.
nptl: Add backoff mechanism to spinlock loop When mutiple threads waiting for lock at the same time, once lock owner releases the lock, waiters will see lock available and all try to lock, which may cause an expensive CAS storm. Binary exponential backoff with random jitter is introduced. As try-lock attempt increases, there is more likely that a larger number threads compete for adaptive mutex lock, so increase wait time in exponential. A random jitter is also added to avoid synchronous try-lock from other threads. v2: Remove read-check before try-lock for performance. v3: 1. Restore read-check since it works well in some platform. 2. Make backoff arch dependent, and enable it for x86_64. 3. Limit max backoff to reduce latency in large critical section. v4: Fix strict-prototypes error in sysdeps/nptl/pthread_mutex_backoff.h v5: Commit log updated for regression in large critical section. Result of pthread-mutex-locks bench Test Platform: Xeon 8280L (2 socket, 112 CPUs in total) First Row: thread number First Col: critical section length Values: backoff vs upstream, time based, low is better non-critical-length: 1 1 2 4 8 16 32 64 112 140 0 0.99 0.58 0.52 0.49 0.43 0.44 0.46 0.52 0.54 1 0.98 0.43 0.56 0.50 0.44 0.45 0.50 0.56 0.57 2 0.99 0.41 0.57 0.51 0.45 0.47 0.48 0.60 0.61 4 0.99 0.45 0.59 0.53 0.48 0.49 0.52 0.64 0.65 8 1.00 0.66 0.71 0.63 0.56 0.59 0.66 0.72 0.71 16 0.97 0.78 0.91 0.73 0.67 0.70 0.79 0.80 0.80 32 0.95 1.17 0.98 0.87 0.82 0.86 0.89 0.90 0.90 64 0.96 0.95 1.01 1.01 0.98 1.00 1.03 0.99 0.99 128 0.99 1.01 1.01 1.17 1.08 1.12 1.02 0.97 1.02 non-critical-length: 32 1 2 4 8 16 32 64 112 140 0 1.03 0.97 0.75 0.65 0.58 0.58 0.56 0.70 0.70 1 0.94 0.95 0.76 0.65 0.58 0.58 0.61 0.71 0.72 2 0.97 0.96 0.77 0.66 0.58 0.59 0.62 0.74 0.74 4 0.99 0.96 0.78 0.66 0.60 0.61 0.66 0.76 0.77 8 0.99 0.99 0.84 0.70 0.64 0.66 0.71 0.80 0.80 16 0.98 0.97 0.95 0.76 0.70 0.73 0.81 0.85 0.84 32 1.04 1.12 1.04 0.89 0.82 0.86 0.93 0.91 0.91 64 0.99 1.15 1.07 1.00 0.99 1.01 1.05 0.99 0.99 128 1.00 1.21 1.20 1.22 1.25 1.31 1.12 1.10 0.99 non-critical-length: 128 1 2 4 8 16 32 64 112 140 0 1.02 1.00 0.99 0.67 0.61 0.61 0.61 0.74 0.73 1 0.95 0.99 1.00 0.68 0.61 0.60 0.60 0.74 0.74 2 1.00 1.04 1.00 0.68 0.59 0.61 0.65 0.76 0.76 4 1.00 0.96 0.98 0.70 0.63 0.63 0.67 0.78 0.77 8 1.01 1.02 0.89 0.73 0.65 0.67 0.71 0.81 0.80 16 0.99 0.96 0.96 0.79 0.71 0.73 0.80 0.84 0.84 32 0.99 0.95 1.05 0.89 0.84 0.85 0.94 0.92 0.91 64 1.00 0.99 1.16 1.04 1.00 1.02 1.06 0.99 0.99 128 1.00 1.06 0.98 1.14 1.39 1.26 1.08 1.02 0.98 There is regression in large critical section. But adaptive mutex is aimed for "quick" locks. Small critical section is more common when users choose to use adaptive pthread_mutex. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2022-05-06 01:50:10 +00:00
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */
#ifndef _PTHREAD_MUTEX_BACKOFF_H
#define _PTHREAD_MUTEX_BACKOFF_H 1
#include <fast-jitter.h>
static inline unsigned int
get_jitter (void)
{
return get_fast_jitter ();
}
#define MAX_BACKOFF 16
static inline int
get_next_backoff (int backoff)
{
/* Binary expontial backoff. Limiting max backoff
can reduce latency in large critical section. */
return (backoff < MAX_BACKOFF) ? backoff << 1 : backoff;
}
#endif