skia2/bench/GrResourceCacheBench.cpp

187 lines
5.6 KiB
C++
Raw Normal View History

/*
* Copyright 2013 Google Inc.
*
* Use of this source code is governed by a BSD-style license that can be
* found in the LICENSE file.
*/
#include "Benchmark.h"
#if SK_SUPPORT_GPU
#include "GrGpuResource.h"
#include "GrGpuResourcePriv.h"
#include "GrContext.h"
#include "GrGpu.h"
#include "GrResourceCache.h"
#include "SkCanvas.h"
enum {
CACHE_SIZE_COUNT = 4096,
};
class BenchResource : public GrGpuResource {
public:
BenchResource (GrGpu* gpu)
Refactor to separate backend object lifecycle and GpuResource budget decision Refactor GrGpuResource to contain two different pieces of state: a) instance is budgeted or not budgeted b) instance references wrapped backend objects or not The "object lifecycle" was also attached to backend object handles (ids), which made the code a bit unclear. Backend objects would be associated with GrGpuResource::LifeCycle, even though GrGpuResource::LifeCycle refers to the GpuResource, and individual backend objects in one GpuResource might be governed with different "lifecycle". Mark the budgeted/not budgeted with SkBudgeted::kYes, SkBudgeted::kNo. This was previously GrGpuResource::kCached_LifeCycle, GrGpuResource::kUncached_LifeCycle. Mark the "references wrapped object" with boolean. This was previously GrGpuResource::kBorrowed_LifeCycle, GrGpuResource::kAdopted_LifeCycle for GrGpuResource. Associate the backend object ownership status with GrBackendObjectOwnership for the backend object handles. The resource type leaf constuctors, such has GrGLTexture or GrGLTextureRenderTarget take "budgeted" parameter. This parameter is passed to GrGpuResource::registerWithCache(). The resource type intermediary constructors, such as GrGLTexture constructors for class GrGLTextureRenderTarget do not take "budgeted" parameters, intermediary construtors do not call registerWithCache. Removes the need for tagging GrGpuResource -derived subclass constructors with "Derived" parameter. Makes instances that wrap backend objects be registered with a new function GrGpuResource::registerWithCacheWrapped(). Removes "budgeted" parameter from classes such as StencilAttahment, as they are always cached and never wrap any external backend objects. Removes the use of concept "external" from the member function names. The API refers to the objects as "wrapped", so make all related functions use the term consistently. No change in functionality. Resources referencing wrapped objects are always inserted to the cache with budget decision kNo. BUG=594928 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1862043002 Review URL: https://codereview.chromium.org/1862043002
2016-04-22 08:48:29 +00:00
: INHERITED(gpu) {
this->registerWithCache(SkBudgeted::kYes);
}
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
static void ComputeKey(int i, int keyData32Count, GrUniqueKey* key) {
static GrUniqueKey::Domain kDomain = GrUniqueKey::GenerateDomain();
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
GrUniqueKey::Builder builder(key, kDomain, keyData32Count);
for (int j = 0; j < keyData32Count; ++j) {
builder[j] = i + j;
}
}
private:
size_t onGpuMemorySize() const override { return 100; }
typedef GrGpuResource INHERITED;
};
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
static void populate_cache(GrGpu* gpu, int resourceCount, int keyData32Count) {
for (int i = 0; i < resourceCount; ++i) {
GrUniqueKey key;
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
BenchResource::ComputeKey(i, keyData32Count, &key);
GrGpuResource* resource = new BenchResource(gpu);
resource->resourcePriv().setUniqueKey(key);
resource->unref();
}
}
class GrResourceCacheBenchAdd : public Benchmark {
public:
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
GrResourceCacheBenchAdd(int keyData32Count)
: fFullName("grresourcecache_add")
, fKeyData32Count(keyData32Count) {
if (keyData32Count > 1) {
fFullName.appendf("_%d", fKeyData32Count);
}
}
bool isSuitableFor(Backend backend) override {
return backend == kNonRendering_Backend;
}
protected:
const char* onGetName() override {
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
return fFullName.c_str();
}
void onDraw(int loops, SkCanvas* canvas) override {
sk_sp<GrContext> context(GrContext::CreateMockContext());
if (nullptr == context) {
return;
}
// Set the cache budget to be very large so no purging occurs.
context->setResourceCacheLimits(CACHE_SIZE_COUNT, 1 << 30);
GrResourceCache* cache = context->getResourceCache();
// Make sure the cache is empty.
cache->purgeAllUnlocked();
SkASSERT(0 == cache->getResourceCount() && 0 == cache->getResourceBytes());
GrGpu* gpu = context->getGpu();
for (int i = 0; i < loops; ++i) {
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
populate_cache(gpu, CACHE_SIZE_COUNT, fKeyData32Count);
SkASSERT(CACHE_SIZE_COUNT == cache->getResourceCount());
}
}
private:
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
SkString fFullName;
int fKeyData32Count;
typedef Benchmark INHERITED;
};
class GrResourceCacheBenchFind : public Benchmark {
public:
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
GrResourceCacheBenchFind(int keyData32Count)
: fFullName("grresourcecache_find")
, fKeyData32Count(keyData32Count) {
if (keyData32Count > 1) {
fFullName.appendf("_%d", fKeyData32Count);
}
}
bool isSuitableFor(Backend backend) override {
return backend == kNonRendering_Backend;
}
protected:
const char* onGetName() override {
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
return fFullName.c_str();
}
void onDelayedSetup() override {
fContext.reset(GrContext::CreateMockContext());
if (!fContext) {
return;
}
// Set the cache budget to be very large so no purging occurs.
fContext->setResourceCacheLimits(CACHE_SIZE_COUNT, 1 << 30);
GrResourceCache* cache = fContext->getResourceCache();
// Make sure the cache is empty.
cache->purgeAllUnlocked();
SkASSERT(0 == cache->getResourceCount() && 0 == cache->getResourceBytes());
GrGpu* gpu = fContext->getGpu();
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
populate_cache(gpu, CACHE_SIZE_COUNT, fKeyData32Count);
}
void onDraw(int loops, SkCanvas* canvas) override {
if (!fContext) {
return;
}
GrResourceCache* cache = fContext->getResourceCache();
SkASSERT(CACHE_SIZE_COUNT == cache->getResourceCount());
for (int i = 0; i < loops; ++i) {
for (int k = 0; k < CACHE_SIZE_COUNT; ++k) {
GrUniqueKey key;
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
BenchResource::ComputeKey(k, fKeyData32Count, &key);
sk_sp<GrGpuResource> resource(cache->findAndRefUniqueResource(key));
SkASSERT(resource);
}
}
}
private:
sk_sp<GrContext> fContext;
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
SkString fFullName;
int fKeyData32Count;
typedef Benchmark INHERITED;
};
Make GrResourceCache perf less sensitive to key length change Make GrResourceCache performance less sensitive to key length change. The memcmp in GrResourceKey is called when SkTDynamicHash jumps the slots to find the hash by a index. Avoid most of the memcmps by comparing the hash first. This is important because small changes in key data length can cause big performance regressions. The theory is that key length change causes different hash values. These hash values might trigger memcmps that originally weren't there, causing the regression. Adds few specialized benches to grresourcecache_add to test different key lengths. The tests are run only on release, because on debug the SkTDynamicHash validation takes too long, and adding many such delays to development test runs would be unproductive. On release the tests are quite fast. Effect of this patch to the added tests on amd64: grresourcecache_find_10 738us -> 768us 1.04x grresourcecache_find_2 472us -> 476us 1.01x grresourcecache_find_25 841us -> 845us 1x grresourcecache_find_4 565us -> 531us 0.94x grresourcecache_find_54 1.18ms -> 1.1ms 0.93x grresourcecache_find_5 834us -> 749us 0.9x grresourcecache_find_3 620us -> 542us 0.87x grresourcecache_add_25 2.74ms -> 2.24ms 0.82x grresourcecache_add_56 3.23ms -> 2.56ms 0.79x grresourcecache_add_54 3.34ms -> 2.62ms 0.78x grresourcecache_add_5 2.68ms -> 2.1ms 0.78x grresourcecache_add_10 2.7ms -> 2.11ms 0.78x grresourcecache_add_2 1.85ms -> 1.41ms 0.76x grresourcecache_add 1.84ms -> 1.4ms 0.76x grresourcecache_add_4 1.99ms -> 1.49ms 0.75x grresourcecache_add_3 2.11ms -> 1.55ms 0.73x grresourcecache_add_55 39ms -> 13.9ms 0.36x grresourcecache_find_55 23.2ms -> 6.21ms 0.27x On arm64 the results are similar. On arm_v7_neon, the results lack the discontinuity at 55: grresourcecache_add 4.06ms -> 4.26ms 1.05x grresourcecache_add_2 4.05ms -> 4.23ms 1.05x grresourcecache_find 1.28ms -> 1.3ms 1.02x grresourcecache_find_56 3.35ms -> 3.32ms 0.99x grresourcecache_find_2 1.31ms -> 1.29ms 0.99x grresourcecache_find_54 3.28ms -> 3.24ms 0.99x grresourcecache_add_5 6.38ms -> 6.26ms 0.98x grresourcecache_add_55 8.44ms -> 8.24ms 0.98x grresourcecache_add_25 7.03ms -> 6.86ms 0.98x grresourcecache_find_25 2.7ms -> 2.59ms 0.96x grresourcecache_find_4 1.45ms -> 1.38ms 0.95x grresourcecache_find_10 2.52ms -> 2.39ms 0.95x grresourcecache_find_55 3.54ms -> 3.33ms 0.94x grresourcecache_find_5 2.5ms -> 2.32ms 0.93x grresourcecache_find_3 1.57ms -> 1.43ms 0.91x The extremely slow case, 55, is postulated to be due to the index jump collisions running the memcmp. This is not visible on arm_v7_neon probably due to hash function producing different results for 32 bit architectures. This change is needed for extending path cache key in Gr NV_path_rendering codepath. Extending is needed in order to add dashed paths to the path cache. Review URL: https://codereview.chromium.org/1132723003
2015-05-19 05:47:33 +00:00
DEF_BENCH( return new GrResourceCacheBenchAdd(1); )
#ifdef SK_RELEASE
// Only on release because on debug the SkTDynamicHash validation is too slow.
DEF_BENCH( return new GrResourceCacheBenchAdd(2); )
DEF_BENCH( return new GrResourceCacheBenchAdd(3); )
DEF_BENCH( return new GrResourceCacheBenchAdd(4); )
DEF_BENCH( return new GrResourceCacheBenchAdd(5); )
DEF_BENCH( return new GrResourceCacheBenchAdd(10); )
DEF_BENCH( return new GrResourceCacheBenchAdd(25); )
DEF_BENCH( return new GrResourceCacheBenchAdd(54); )
DEF_BENCH( return new GrResourceCacheBenchAdd(55); )
DEF_BENCH( return new GrResourceCacheBenchAdd(56); )
#endif
DEF_BENCH( return new GrResourceCacheBenchFind(1); )
#ifdef SK_RELEASE
DEF_BENCH( return new GrResourceCacheBenchFind(2); )
DEF_BENCH( return new GrResourceCacheBenchFind(3); )
DEF_BENCH( return new GrResourceCacheBenchFind(4); )
DEF_BENCH( return new GrResourceCacheBenchFind(5); )
DEF_BENCH( return new GrResourceCacheBenchFind(10); )
DEF_BENCH( return new GrResourceCacheBenchFind(25); )
DEF_BENCH( return new GrResourceCacheBenchFind(54); )
DEF_BENCH( return new GrResourceCacheBenchFind(55); )
DEF_BENCH( return new GrResourceCacheBenchFind(56); )
#endif
#endif