From e895b7af73728b0f1431549dbd52db37e8b1b577 Mon Sep 17 00:00:00 2001 From: Seth Brenith Date: Mon, 25 Jul 2022 08:39:46 -0700 Subject: [PATCH] Background merging of deserialized scripts Recently, https://crrev.com/c/v8/v8/+/3681880 added new API functions with which an embedder could request that V8 merge newly deserialized script data into an existing Script from the Isolate's compilation cache. This change implements those new functions. This functionality is still disabled by default due to the flag merge_background_deserialized_script_with_compilation_cache. The goal of this new functionality is to reduce memory usage when multiple frames load the same script with a long delay between (long enough for the script to have been evicted from Blink's in-memory cache and for the top-level SharedFunctionInfo to be flushed). In that case, there are two Script objects for the same script: one which was found in the Isolate compilation cache (the "old" script), and one which was recently deserialized (the "new" script). The new script's object graph is essentially standalone: it may point to internalized strings and readonly objects such as the empty feedback metadata, but otherwise it is unconnected to the rest of the heap. The merging logic takes any useful data from the new script's object graph and attaches it into the old script's object graph, so that the new Script object and any other duplicated objects can be discarded. More specifically: 1. If the new Script has a SharedFunctionInfo for a particular function literal, and the old Script does not, then the old Script is updated to refer to the new SharedFunctionInfo. 2. If the new Script has a compiled SharedFunctionInfo for a particular function literal, and the old Script has an uncompiled SharedFunctionInfo, then the old SharedFunctionInfo is updated to point to the function_data and feedback_metadata from the new SharedFunctionInfo. 3. If any used object from the new object graph points to a SharedFunctionInfo, where the old object graph contains a matching SharedFunctionInfo for the same function literal, then that pointer is updated to point to the old SharedFunctionInfo. The document at [0] includes diagrams showing an example merge on a very small script. Steps 1 and 2 above are pretty simple, but step 3 requires walking a possibly large set of objects, so this new API lets the embedder run step 3 from a background thread. Steps 1 and 2 are performed later, on the main thread. The next important question is: in what ways can the old script's object graph be modified during the background execution of step 3, or during the time after step 3 but before steps 1 and 2? A. SharedFunctionInfos can go from compiled to uncompiled due to flushing. This is okay; the worst outcome is that the function would need to be compiled again later. Such a risk is already present, since V8 doesn't keep IsCompiledScopes for every compiled function in a background-deserialized script. B. SharedFunctionInfos can go from uncompiled to compiled due to lazy compilation. This is also okay; the merge completion logic on the main thread will just keep this lazily compiled data rather than inserting compiled data from the newly deserialized object graph. C. SharedFunctionInfos can be cleared from the Script's weak array if they are no longer referenced. This is mostly okay, because any SharedFunctionInfo that is needed by the background merge is strongly referenced and therefore can't be cleared. The only problem arises if the top-level SharedFunctionInfo gets cleared, so the merge task must deliberately keep a reference to that one. D. SharedFunctionInfos can be created if they are needed due to lazy compilation of a parent function. This change is somewhat troublesome because it invalidates the background thread's work and requires a re-traversal on the main thread to update any pointers that should point to this lazily compiled SharedFunctionInfo. At a high level, this change implements three previously unimplemented functions in BackgroundDeserializeTask (in compiler.cc) and updates one: - BackgroundDeserializeTask::SourceTextAvailable, run on the main thread, checks whether there is a matching Script in the Isolate compilation cache which doesn't already have a top-level SharedFunctionInfo. If so, it saves that Script in a persistent handle. - BackgroundDeserializeTask::ShouldMergeWithExistingScript checks whether the persistent handle from the first step exists (a fast operation which can be called from any thread). - BackgroundDeserializeTask::MergeWithExistingScript, run on a background thread, performs step 3 of the merge described above and generates lists of persistent data describing how the main thread can complete the merge. - BackgroundDeserializeTask::Finish is updated to perform the merge steps 1 and 2 listed above, as well as a possible re-traversal of the graph if required due to newly created SharedFunctionInfos in the old Script. The merge logic has nothing to do with deserialization, and indeed I hope to reuse it for background compilation tasks as well, so it is all contained within a new class BackgroundMergeTask (in compiler.h,cc). It uses a second class, ForwardPointersVisitor (in compiler.cc) to perform the object visitation that updates pointers to SharedFunctionInfos. [0] https://docs.google.com/document/d/1UksB5Vm7TT1-f3S9W1dK_rP9jKn_ly0WVm_UDPpWuBw/edit Bug: v8:12808 Change-Id: Id405869e9d5b106ca7afd9c4b08cb5813e6852c6 Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/3739232 Reviewed-by: Leszek Swirski Commit-Queue: Seth Brenith Cr-Commit-Position: refs/heads/main@{#81941} --- BUILD.bazel | 1 + BUILD.gn | 1 + src/codegen/background-merge-task.h | 87 +++++ src/codegen/compiler.cc | 386 ++++++++++++++++++++- src/codegen/compiler.h | 2 + src/objects/compilation-cache-table.cc | 13 +- src/objects/fixed-array.h | 3 + src/objects/objects.cc | 7 + src/snapshot/code-serializer.cc | 62 +++- src/snapshot/code-serializer.h | 14 +- test/unittests/api/deserialize-unittest.cc | 375 ++++++++++++++++++++ 11 files changed, 919 insertions(+), 32 deletions(-) create mode 100644 src/codegen/background-merge-task.h diff --git a/BUILD.bazel b/BUILD.bazel index 1b380d83e3..74e57fb87e 100644 --- a/BUILD.bazel +++ b/BUILD.bazel @@ -1161,6 +1161,7 @@ filegroup( "src/codegen/assembler.cc", "src/codegen/assembler.h", "src/codegen/atomic-memory-order.h", + "src/codegen/background-merge-task.h", "src/codegen/bailout-reason.cc", "src/codegen/bailout-reason.h", "src/codegen/callable.h", diff --git a/BUILD.gn b/BUILD.gn index fb694a0a2b..3abe28ba06 100644 --- a/BUILD.gn +++ b/BUILD.gn @@ -2791,6 +2791,7 @@ v8_header_set("v8_internal_headers") { "src/codegen/assembler-inl.h", "src/codegen/assembler.h", "src/codegen/atomic-memory-order.h", + "src/codegen/background-merge-task.h", "src/codegen/bailout-reason.h", "src/codegen/callable.h", "src/codegen/code-comments.h", diff --git a/src/codegen/background-merge-task.h b/src/codegen/background-merge-task.h new file mode 100644 index 0000000000..dde645edb0 --- /dev/null +++ b/src/codegen/background-merge-task.h @@ -0,0 +1,87 @@ +// Copyright 2022 the V8 project authors. All rights reserved. +// Use of this source code is governed by a BSD-style license that can be +// found in the LICENSE file. + +#ifndef V8_CODEGEN_BACKGROUND_MERGE_TASK_H_ +#define V8_CODEGEN_BACKGROUND_MERGE_TASK_H_ + +#include + +#include "src/handles/maybe-handles.h" + +namespace v8 { +namespace internal { + +class FeedbackMetadata; +class PersistentHandles; +class Script; +class SharedFunctionInfo; +class String; + +struct ScriptDetails; + +// Contains data transferred between threads for background merging between a +// newly compiled or deserialized script and an existing script from the Isolate +// compilation cache. +class V8_EXPORT_PRIVATE BackgroundMergeTask { + public: + // Step 1: on the main thread, check whether the Isolate compilation cache + // contains the script. + void SetUpOnMainThread(Isolate* isolate, Handle source_text, + const ScriptDetails& script_details, + LanguageMode language_mode); + + // Step 2: on the background thread, update pointers in the new Script's + // object graph to point to corresponding objects from the cached Script where + // appropriate. May only be called if HasCachedScript returned true. + void BeginMergeInBackground(LocalIsolate* isolate, Handle