[cpu-profiler] Ensure sampled thread has Isolate lock under Windows

While the sampler checked if the sampled thread had the Isolate locked
(if locks are being used) under Linux, the check was not done under
Windows (or Fuchsia) which meant that in a multi-threading application
under Windows, thread locking was not checked making it prone to seg
faults and the like as the profiler would be using isolate->js_entry_sp
to determine the stack to walk but isolate->js_entry_sp is the stack
pointer for the thread that currently has the Isolate lock so, if the
sampled thread does not have the lock, the sampler woud be iterating
over the wrong stack, one that might actually be actively changing on
another thread. The fix was to move the lock check into CpuSampler
and Ticker (--prof) so all OSes would do the correct check.

The basic concept is that on all operating systems a CpuProfiler, and
so its corresponding CpuCampler, the profiler is tied to a thread.
This is not based on first principles or anything, it's simply the
way it works in V8, though it is a useful conceit as it makes
visualization and interpretation of profile data much easier.

To collect a sample on a thread associated with a profiler the thread
must be stopped for obvious reasons -- walking the stack of a running
thread is a formula for disaster. The mechanism for stopping a thread
is OS-specific and is done in sample.cc. There are currently three
basic approaches, one for Linux/Unix variants, one for Windows and one
for Fuchsia. The approaches vary as to which thread actually collects
the sample -- under Linux the sample is actually collected on the
(interrupted) sampled thread whereas under Fuchsia/Windows it's on
a separate thread.

However, in a multi-threaded environment (where Locker is used), it's
not sufficient for the sampled thread to be stopped. Because the stack
walk involves looking in the Isolate heap, no other thread can be
messing with the heap while the sample is collected. The only ways to
ensure this would be to either stop all threads whenever collecting a
sample, or to ensure that the thread being sampled holds the Isolate
lock so prevents other threads from messing with the heap. While there
might be something to be said for the "stop all threads" approach, the
current approach in V8 is to only stop the sampled thread so, if in a
multi-threaded environment, the profiler must check if the thread being
sampled holds the Isolate lock.

Since this check must be done, independent of which thread the sample
is being collected on (since it varies from OS to OS), the approach is
to save the thread id of the thread to be profiled/sampled when the
CpuSampler is instantiated (on all OSes it is instantiated on the
sampled thread) and then check that thread id against the Isolate lock
holder thread id before collecting a sample. If it matches, we know
sample.cc has stop the sampled thread, one way or another, and we know
that no other thread can mess with the heap (since the stopped thread
holds the Isolate lock) so it's safe to walk the stack and collect data
from the heap so the sample can be taken. It it doesn't match, we can't
safely collect the sample so we don't.

Bug: v8:10850
Change-Id: Iba6cabcd3e11a19c261c004103e37e806934dc6f
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/2411343
Reviewed-by: Peter Marshall <petermarshall@chromium.org>
Commit-Queue: Peter Marshall <petermarshall@chromium.org>
Cr-Commit-Position: refs/heads/master@{#69952}
This commit is contained in:
Alex Kodat 2020-09-16 08:00:30 -04:00 committed by Commit Bot
parent 46e06ad8fd
commit 76217f5708
10 changed files with 131 additions and 6 deletions

View File

@ -73,6 +73,9 @@ class ThreadManager {
bool IsLockedByCurrentThread() const {
return mutex_owner_.load(std::memory_order_relaxed) == ThreadId::Current();
}
bool IsLockedByThread(ThreadId id) const {
return mutex_owner_.load(std::memory_order_relaxed) == id;
}
ThreadId CurrentId();

View File

@ -239,7 +239,6 @@ void SamplerManager::DoSample(const v8::RegisterState& state) {
Isolate* isolate = sampler->isolate();
// We require a fully initialized and entered isolate.
if (isolate == nullptr || !isolate->IsInUse()) continue;
if (v8::Locker::IsActive() && !Locker::IsLocked(isolate)) continue;
sampler->SampleStack(state);
}
}

View File

@ -18,6 +18,7 @@
#include "src/diagnostics/perf-jit.h"
#include "src/execution/isolate.h"
#include "src/execution/runtime-profiler.h"
#include "src/execution/v8threads.h"
#include "src/execution/vm-state-inl.h"
#include "src/handles/global-handles.h"
#include "src/heap/combined-heap.h"
@ -881,7 +882,8 @@ class Ticker : public sampler::Sampler {
Ticker(Isolate* isolate, int interval_microseconds)
: sampler::Sampler(reinterpret_cast<v8::Isolate*>(isolate)),
sampling_thread_(
std::make_unique<SamplingThread>(this, interval_microseconds)) {}
std::make_unique<SamplingThread>(this, interval_microseconds)),
threadId_(ThreadId::Current()) {}
~Ticker() override {
if (IsActive()) Stop();
@ -903,6 +905,9 @@ class Ticker : public sampler::Sampler {
void SampleStack(const v8::RegisterState& state) override {
if (!profiler_) return;
Isolate* isolate = reinterpret_cast<Isolate*>(this->isolate());
if (v8::Locker::IsActive() &&
!isolate->thread_manager()->IsLockedByThread(threadId_))
return;
TickSample sample;
sample.Init(isolate, state, TickSample::kIncludeCEntryFrame, true);
profiler_->Insert(&sample);
@ -911,6 +916,7 @@ class Ticker : public sampler::Sampler {
private:
Profiler* profiler_ = nullptr;
std::unique_ptr<SamplingThread> sampling_thread_;
ThreadId threadId_;
};
//

View File

@ -11,6 +11,7 @@
#include "src/base/template-utils.h"
#include "src/debug/debug.h"
#include "src/execution/frames-inl.h"
#include "src/execution/v8threads.h"
#include "src/execution/vm-state-inl.h"
#include "src/libsampler/sampler.h"
#include "src/logging/counters.h"
@ -29,9 +30,17 @@ class CpuSampler : public sampler::Sampler {
public:
CpuSampler(Isolate* isolate, SamplingEventsProcessor* processor)
: sampler::Sampler(reinterpret_cast<v8::Isolate*>(isolate)),
processor_(processor) {}
processor_(processor),
threadId_(ThreadId::Current()) {}
void SampleStack(const v8::RegisterState& regs) override {
Isolate* isolate = reinterpret_cast<Isolate*>(this->isolate());
if (v8::Locker::IsActive() &&
!isolate->thread_manager()->IsLockedByThread(threadId_)) {
ProfilerStats::Instance()->AddReason(
ProfilerStats::Reason::kIsolateNotLocked);
return;
}
TickSample* sample = processor_->StartTickSample();
if (sample == nullptr) {
ProfilerStats::Instance()->AddReason(
@ -40,7 +49,6 @@ class CpuSampler : public sampler::Sampler {
}
// Every bailout up until here resulted in a dropped sample. From now on,
// the sample is created in the buffer.
Isolate* isolate = reinterpret_cast<Isolate*>(this->isolate());
sample->Init(isolate, regs, TickSample::kIncludeCEntryFrame,
/* update_stats */ true,
/* use_simulator_reg_state */ true, processor_->period());
@ -53,6 +61,7 @@ class CpuSampler : public sampler::Sampler {
private:
SamplingEventsProcessor* processor_;
ThreadId threadId_;
};
ProfilingScope::ProfilingScope(Isolate* isolate, ProfilerListener* listener)

View File

@ -34,6 +34,8 @@ const char* ProfilerStats::ReasonToString(Reason reason) {
switch (reason) {
case kTickBufferFull:
return "kTickBufferFull";
case kIsolateNotLocked:
return "kIsolateNotLocked";
case kSimulatorFillRegistersFailed:
return "kSimulatorFillRegistersFailed";
case kNoFrameRegion:

View File

@ -16,6 +16,7 @@ class ProfilerStats {
enum Reason {
// Reasons we fail to record a TickSample.
kTickBufferFull,
kIsolateNotLocked,
// These all generate a TickSample.
kSimulatorFillRegistersFailed,
kNoFrameRegion,

View File

@ -130,6 +130,16 @@ i::ReadOnlyHeap* CcTest::read_only_heap() {
return i_isolate()->read_only_heap();
}
void CcTest::AddGlobalFunction(v8::Local<v8::Context> env, const char* name,
v8::FunctionCallback callback) {
v8::Local<v8::FunctionTemplate> func_template =
v8::FunctionTemplate::New(isolate_, callback);
v8::Local<v8::Function> func =
func_template->GetFunction(env).ToLocalChecked();
func->SetName(v8_str(name));
env->Global()->Set(env, v8_str(name), func).FromJust();
}
void CcTest::CollectGarbage(i::AllocationSpace space, i::Isolate* isolate) {
i::Isolate* iso = isolate ? isolate : i_isolate();
iso->heap()->CollectGarbage(space, i::GarbageCollectionReason::kTesting);

View File

@ -137,6 +137,8 @@ class CcTest {
static i::Heap* heap();
static i::ReadOnlyHeap* read_only_heap();
static void AddGlobalFunction(v8::Local<v8::Context> env, const char* name,
v8::FunctionCallback callback);
static void CollectGarbage(i::AllocationSpace space,
i::Isolate* isolate = nullptr);
static void CollectAllGarbage(i::Isolate* isolate = nullptr);
@ -318,7 +320,7 @@ class LocalContext {
v8::Context* operator*() { return operator->(); }
bool IsReady() { return !context_.IsEmpty(); }
v8::Local<v8::Context> local() {
v8::Local<v8::Context> local() const {
return v8::Local<v8::Context>::New(isolate_, context_);
}

View File

@ -3168,6 +3168,99 @@ TEST(MultipleIsolates) {
thread2.Join();
}
// Varying called function frame sizes increases the chance of something going
// wrong if sampling an unlocked frame. We also prevent optimization to prevent
// inlining so each function call has its own frame.
const char* varying_frame_size_script = R"(
%NeverOptimizeFunction(maybeYield);
%NeverOptimizeFunction(bar);
%NeverOptimizeFunction(foo);
function maybeYield(n) {
YieldIsolate(Math.random() > yieldLimit);
}
function bar(a, b, c, d) {
maybeYield(Math.random());
return a.length + b.length + c.length + d.length;
}
function foo(timeLimit, yieldProbability) {
yieldLimit = 1 - yieldProbability;
const startTime = Date.now();
for (let i = 0; i < 1e6; i++) {
maybeYield(1);
bar("Hickory", "Dickory", "Doc", "Mouse");
YieldIsolate(Math.random() > 0.999);
if ((Date.now() - startTime) > timeLimit) break;
}
}
)";
class UnlockingThread : public v8::base::Thread {
public:
explicit UnlockingThread(v8::Local<v8::Context> env)
: Thread(Options("UnlockingThread")), env_(CcTest::isolate(), env) {}
void Run() override {
v8::Isolate* isolate = CcTest::isolate();
v8::Locker locker(isolate);
v8::Isolate::Scope isolate_scope(isolate);
v8::HandleScope scope(isolate);
v8::Local<v8::Context> env = v8::Local<v8::Context>::New(isolate, env_);
Profile(env);
}
static void Profile(v8::Local<v8::Context> env) {
v8::Isolate* isolate = CcTest::isolate();
v8::Context::Scope context_scope(env);
v8::CpuProfiler* profiler = v8::CpuProfiler::New(isolate);
profiler->SetSamplingInterval(200);
v8::Local<v8::String> profile_name = v8_str("1");
profiler->StartProfiling(profile_name);
int32_t time_limit = 200;
double yield_probability = 0.001;
v8::Local<v8::Value> args[] = {v8::Integer::New(isolate, time_limit),
v8::Number::New(isolate, yield_probability)};
v8::Local<v8::Function> function = GetFunction(env, "foo");
function->Call(env, env->Global(), arraysize(args), args).ToLocalChecked();
profiler->StopProfiling(profile_name);
profiler->Dispose();
}
private:
v8::Persistent<v8::Context> env_;
};
// Checking for crashes with multiple thread/single Isolate profiling.
TEST(MultipleThreadsSingleIsolate) {
i::FLAG_allow_natives_syntax = true;
v8::Isolate* isolate = CcTest::isolate();
v8::Locker locker(isolate);
v8::HandleScope scope(isolate);
v8::Local<v8::Context> env = CcTest::NewContext({PROFILER_EXTENSION_ID});
v8::Context::Scope context_scope(env);
CcTest::AddGlobalFunction(
env, "YieldIsolate", [](const v8::FunctionCallbackInfo<v8::Value>& info) {
v8::Isolate* isolate = info.GetIsolate();
if (!info[0]->IsTrue()) return;
v8::Unlocker unlocker(isolate);
v8::base::OS::Sleep(v8::base::TimeDelta::FromMilliseconds(1));
});
CompileRun(varying_frame_size_script);
UnlockingThread thread1(env);
UnlockingThread thread2(env);
CHECK(thread1.Start());
CHECK(thread2.Start());
// For good measure, profile on our own thread
UnlockingThread::Profile(env);
{
v8::Unlocker unlocker(isolate);
thread1.Join();
thread2.Join();
}
}
// Tests that StopProfiling doesn't wait for the next sample tick in order to
// stop, but rather exits early before a given wait threshold.
TEST(FastStopProfiling) {

View File

@ -331,7 +331,7 @@ class ReadStringVisitor : public TqObjectVisitor {
Isolate::FromRoot(GetIsolateRoot(heap_addresses_.any_heap_pointer)),
resource_data));
#else
uintptr_t data_address = reinterpret_cast<uintptr_t>(resource_data);
uintptr_t data_address = static_cast<uintptr_t>(resource_data);
#endif // V8_COMPRESS_POINTERS
if (done_) return;
ReadStringCharacters<TChar>(object, data_address);