How to benchmark and tune Google Cloud

Last edited on April 1, 2021

0 minute read

    Engineers on cloud performance teams can spend their entire workday tuning and optimizing cloud configurations. We caught up with Steve Dietz, a Google software engineer focused on Google Cloud performance, to get advice on how to performance tune, provision, and benchmark Google Cloud Platform (GCP).

    The conversation led to some revealing insights into how to configure GCP to be most optimal for your workloads. We’ve gathered some of those insights below. To catch the whole conversation with Steve, head over here.

    #1: Use a Performance Benchmarking Tool on Your CloudCopy Icon

    Performance benchmarking tools like the open source PerfKit Benchmarker (which is maintained by the Google Cloud team) allow anyone to measure the end-to-end time to provision resources in the cloud. PerfKit reports on standard peak performance metrics, including latency, throughput, time-to-complete, and Input/Output Operations Per Second (IOPS).

    A benchmarking tool should serve to provide an understanding about what’s happening in an environment, while including offering latency metrics between components in different regions. To that end, PerfKit offers a publicly available dashboard showing cross-region network latency results between all Google Cloud regions. Below are the results of Google’s own all-region to all-region round trip latency tests, using n1-standard-2 machine types and internal IP addresses. Anyone can reproduce the results themselves by running a snippet of code available on the PerfKit site.

    Screen Shot 2020-08-07 at 5.10.25 PM

    In addition to tools like PerfKit, there are a number of resources to help GCP users get the best performance out of their product. The blog post, “Performance art: Making cloud network performance benchmarking faster and easier,” and a follow-up report on measuring networking latency in the cloud, can help you get started with Google cloud benchmarking and data collection.

    #2: Read Benchmarking ResearchCopy Icon

    Cockroach Labs set out to better understand customer needs by conducting original research. This process first involved gauging how well CockroachDB performed while running in cloud environments from different providers. When the team discovered a significant difference in performance between AWS and GCP, it published its inaugural cloud report in 2018 to help customers make informed decisions when choosing a cloud provider. The 2021 version of the Cockroach Labs Cloud Report goes even further, using a series of microbenchmarks and typical customer workloads — such as CPU, network, storage, and a derivative of TPC-C — to compare the performance of AWS, Azure and GCP.

    The Cloud Report benchmarks cloud providers against transactional (OLTP) workloads. As the researchers noted in the report and in the reproduction steps, all of the benchmarks were selected with transactional workloads in mind. A machine learning-focused workload may be better served by using a different set of benchmarks to compare cloud performance.

    #3: Evaluate Workloads Before Configuring GCPCopy Icon

    One of the most common questions when setting up a cloud deployment is: should I use the provider’s default configurations?

    When Cockroach Labs set out to benchmark AWS, Azure and GCP, it needed to have enough constant factors between the three providers to ensure accurate results. The team accomplished this by using each provider’s defaults, so that misconfigurations or configuration bias wouldn’t affect the testing outcomes.

    For users, default configurations may be ideal for some workloads. Before altering the default machine configurations, consider the types of machines (family, series, machine type, etc.) that are being offered — for example, N2 with Intel versus N2D with AMD — and evaluate whether one may be better suited for your workload. One of the discoveries in the 2021 Cloud Report, was that machines running Intel CPU processors performed exceptionally well on single-core tests, but machines running Amazon’s Graviton2 and AMD performed better on the multi-core tests.

    Learn more About Provisioning and BenchmarkingCopy Icon

    Optimizing and benchmarking your cloud infrastructure involves a lot of nuance and finetuning. Our suggestions above offer a starting place. For more advice on benchmarking and provisioning GCP, listen to the full conversation between GCP and Cockroach Labs.

    cloud report