[wasm] publish TurboFan results in batches

With mprotect-based write protection of the WebAssembly code space,
we switch page protection flags each time (at least) one compilation
thread needs write access. Two such switches happen when TurboFan
compilation results are available in {ExecuteCompilationUnits}: One
switch happens when calling {NativeModule::AddCompiledCode} and one more
when calling {NativeModule::PublishCode} via
{SchedulePublishCompilationResults} and {PublishCompilationResults}.

So far, each TurboFan result was published eagerly, i.e., as soon as it
became available. This has the benefit that faster code is available
immediately, and had no large cost or downside without write protection.
However, with write protection switching permissions is expensive (an
mprotect syscall) and needs to lock the
{WasmCodeAllocator::allocation_mutex_} (which causes lock contention and
under Linux many futex syscalls). Thus, immediately publishing each
TurboFan result when using write protection can cause up to 10x slower
compilation compared with not using write protection. In terms of
syscalls we measured (non scientifically) with
{sudo perf stat -e 'syscalls:sys_enter*' d8 ...} on the Unity benchmark:
- mprotect: 10k vs. 44k syscalls (baseline vs. write protection)
- futex: 31k vs. 112k syscalls (baseline vs. write protection)
- sys time: 1.6s vs. 10s (baseline vs. write protection)
All of those are clearly to high.

The fix here is simply to batch togther multiple TurboFan functions into
one publishing step when using write protection. The batching logic
already exists for Liftoff, so we can just disable eager publishing for
TurboFan when using write protection. Additionally, we publish once when
all Liftoff results are available (even if the batch is not complete),
such that time-to-execute is not regressed.

R=clemensb@chromium.org
CC=​​​​jkummerow@chromium.org

Bug: v8:11663, chromium:932033
Change-Id: Ibf6f28ecf4733b40322e62761e66046dec60a125
Cq-Include-Trybots: luci.v8.try:v8_linux64_fyi_rel_ng
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2922114
Commit-Queue: Daniel Lehmann <dlehmann@google.com>
Reviewed-by: Clemens Backes <clemensb@chromium.org>
Cr-Commit-Position: refs/heads/master@{#74829}
This commit is contained in:
Daniel Lehmann 2021-05-27 14:31:25 +00:00 committed by V8 LUCI CQ
parent ab4986b8e1
commit 990c9386e2

View File

@ -1343,14 +1343,25 @@ CompilationExecutionResult ExecuteCompilationUnits(
return yield ? kYield : kNoMoreUnits;
}
// Before executing a TurboFan unit, ensure to publish all previous
// units. If we compiled Liftoff before, we need to publish them anyway
// to ensure fast completion of baseline compilation, if we compiled
// TurboFan before, we publish to reduce peak memory consumption.
// Also publish after finishing a certain amount of units, to avoid
// contention when all threads publish at the end.
if (unit->tier() == ExecutionTier::kTurbofan ||
queue->ShouldPublish(static_cast<int>(results_to_publish.size()))) {
// Publish after finishing a certain amount of units, to avoid contention
// when all threads publish at the end.
bool batch_full =
queue->ShouldPublish(static_cast<int>(results_to_publish.size()));
// Also publish each time the compilation tier changes from Liftoff to
// TurboFan, such that we immediately publish the baseline compilation
// results to start execution, and do not wait for a batch to fill up.
bool liftoff_finished = unit->tier() != current_tier &&
unit->tier() == ExecutionTier::kTurbofan;
// Without mprotect-based write protection, publish even more often,
// namely every TurboFan unit individually (no batching) to reduce
// peak memory consumption. However, with write protection, this results
// in a high number of page protection switches (once for each function),
// incurring syscalls and lock contention, so don't do it then.
bool publish_turbofan_unit = !FLAG_wasm_write_protect_code_memory &&
unit->tier() == ExecutionTier::kTurbofan;
if (batch_full || liftoff_finished || publish_turbofan_unit) {
std::vector<std::unique_ptr<WasmCode>> unpublished_code =
compile_scope.native_module()->AddCompiledCode(
VectorOf(std::move(results_to_publish)));