Cluster Telemetry
=================
### Contents ###
* [Overview](#overview)
* [Framework Usage](#framework_usage)
* [System Architecture](#system_architecture)
+ [System Diagram](#system_diagram)
+ [Detailed explanation of steps](#detailed_explanation)
* [Code](#code)
* [Contact Us](#contact_us)
Overview
--------
Cluster Telemetry allows you to run [telemetry](https://www.chromium.org/developers/telemetry)'s benchmarks, lua scripts and other tasks using multiple repository patches through Alexa's [top 1 million](http://s3.amazonaws.com/alexa-static/top-1m.csv.zip) web pages.
Developers can use the framework to measure the performance of their patch against the top subset of the internet on both Desktop and Android.
SKP files are a binary format for the draw commands Chromium sends to Skia for rasterization. The goal of the project started off with wanting to collect a large repository of 10k SKP files. This repository, after incremental changes in approaches, has since grown to ~910k and now supports running all telemetry benchmarks. The top level feature request of this project was [skia:1268](https://bug.skia.org/1268).
A web application has been created on Google Compute Engine that automates the process of capturing new archives and running telemetry benchmarks at a click of a button; results are emailed to the requester and the web application contains complete history of runs with links to results. You can run telemetry benchmarks at http://ct.skia.org.
The framework also contains the ability to run lua scripts on the SKP repository to scrape web pages. It only takes a few minutes to run a lua scraping script on ~910k SKP files.
Most users will use these two features:
* Chromium Perf. Documentation [here](https://www.chromium.org/developers/cluster-telemetry). Webpage [here](https://ct.skia.org/chromium_perf/).
* Run Lua Scripts. Documentation about lua bindings is [here](https://skia.org/user/special/lua). Webpage [here](https://ct.skia.org/lua_script/).
Framework Usage
---------------
The Chromium Perf tryserver in CT has been used to gather perf data over the top 10k web pages for the following Chromium projects:
* Slimming paint
* Performance data for layer squashing and compositing overlap map
* SkPaint in Graphics Context
* Culling
* New paint dictionary
blink-dev threads discussing how to make Chrome faster using the results gathered from CT:
* [Main thread attribution for top million sites](https://groups.google.com/a/chromium.org/d/msg/blink-dev/-R47hzmkdig/xILVgczlKgQJ)
* [Layout time for top 10k sites](https://groups.google.com/a/chromium.org/d/msg/blink-dev/fkRYGcIQN1g/_uYcAt6G8XsJ)
* [Perf profile for top million sites](https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/8qd5SmLF5n0)
Documents detailing data generated by the framework:
* [Loading measurement: alexa top 1,000](https://docs.google.com/a/chromium.org/document/d/1ca_Q7xePmCRqaYnHe7vkpCmKNFNLdDXvzgtUPt9iG8w/edit)
* [Loading measurement: alexa top million](https://docs.google.com/a/google.com/document/d/1hDDUUNE5OUV8eCjtOj7Ow6EZ2DSBCTjQirnA3Rp5pOg/edit)
* [Loading measurement: alexa top million netsim](https://docs.google.com/a/google.com/document/d/1cpLSSYpqi4SprkJcVxbS7af6avKM0qc-imxvkexmCZs/edit)
* [Perf profile - alexa top million sites](https://docs.google.com/a/google.com/document/d/1di__87watociuZj_dm22Cn72UM2xsZBXixbl8TCFQmw/edit)
The framework has also been used to run multiple lua scripts to scrape the SKP repositories for the the following:
chars-vs-glyphs, bitmap transform types, gradient color counter, 3 color gradient checks, etc.
This has been very useful for the Skia team to help determine which parts of the library to optimize and focus on.
All runs are recorded [here](https://ct.skia.org/history/).
System Architecture
-------------------
### System Diagram
![CT System Diagram](ct-system-diagram.svg)
### Detailed explanation of steps
1. User submits a Lua script task, a Performance task, or an Admin task (rebuild chrome, recreate pagesets, recreate webpage archives, capture SKPs) using the GCE web application [here](http://ct.skia.org).
2. Each task is exposed by the web application in JSON. The CT master polls the web application and picks up new tasks.
3. The master pushes new tasks to all the workers using the master scripts [here](https://skia.googlesource.com/buildbot/+/master/ct/go/master_scripts/). The master scripts then check to see when the workers are done with the requested task.
4. The workers execute the task using the worker scripts [here](https://skia.googlesource.com/buildbot/+/master/ct/go/worker_scripts/). All generated artifacts (CSV files, logs, SKP files, archives, etc) are then stored locally and copied to Google Storage. Worker logs are available [here](https://uberchromegw.corp.google.com/i/skia-ct-master/all_logs).
5. The master scripts periodically check the workers to see when they are done with the requested task. Once the workers are done the generated artifacts are then read from Google Storage and consolidated (if required).
6. The master scripts then email results of the task to the user who requested it. The master scripts also update the status of the task to completed on App Engine.
Code
----
Cluster Telemetry is primarily written in Go with a few python scripts. The framework lives in [master/ct](https://skia.googlesource.com/buildbot/+/master/ct).
Contact Us
----------
If you have questions, please email or contact rmistry@ directly.