Using tunable controls for low latency in CockroachDB

Using tunable controls for low latency in CockroachDB
```

Geographically distributed databases like CockroachDB offer a number of benefits including reliability, cost-effective deployments, and more. Critics often counter that distributed databases increase latency. What if a database could offer all of the benefits of distribution, but also provide low latency?

```

With this challenge in mind, we set out to minimize latency in CockroachDB, all the while providing exceptional reliability for mission-critical workloads. We built “follow-the-workload” to be a key feature to improve performance and provide additional control to database administrators (DBAs). This feature will make use of geo-partitioning to allow DBAs to specify zone configurations in order to create data access patterns for improved latency.

The blog post is the first of a two-part series exploring the use of “follow-the-workload” and replication zone configurations to beat the latency-survivability tradeoff

Latency-Survivability Tradeoff

Before diving into how “follow the workload” can be used to improve performance, let’s first give a quick overview of how CockroachDB distributes data. As an advanced distributed database, unless instructed otherwise (via Configuring Replication Zones), CockroachDB will automatically shard data and spread it evenly across all nodes in order to increase data survivability and availability as described here.

To read an introduction on how CockroachDB breaks data up into pieces and automatically distributes these pieces please consult Distributed SQL (NewSQL) Made Easy: How CockroachDB Automates Operations.

We designed CockroachDB to automatically trade an increase in latency for strong consistency and improved survivability. By breaking data up and spreading it across multiple nodes we increase survivability. However, as a result of this very design, we increase the distance between data nodes, thereby creating additional latency as the copies of data communicate with each other from afar.

We recognize that, given this tradeoff, we need to provide the most performant system possible. As such, we set two main goals as we continued to evolve CockroachDB:

  • Increase user control of the latency-survivability tradeoff
  • Optimize performance based on DBA choices in the latency-survivability tradeoff

Low Latency with Improved User Controls

As discussed above, latency increases in parallel with the distance between data and requests. While CockroachDB can automatically make this tradeoff for customers, many users express a greater desire to influence this decision based on their business needs.

CockroachDB provides users the controls to set data location and thereby reduce latency. For example, table/database level replication zone configurations can be used to specify exactly in which Localities the data in a given table or database should or should not be placed.

Localities are key-value labels associated with each running CockroachDB process via a command-line flag. Users must turn on Localities through the locality flag to take advantage of replication zones configurations and “follow-the-workload.”

Replication zone configuration constraints work by hooking into the rebalancing decision-making process, as explained in Distributed SQL (NewSQL) SQL Made Easy: How CockroachDB Automates Operations. CockroachDB prioritizes these constraints before all other factors when ranking possible nodes for data distribution. Only after resolving these constraints will CockroachDB address other considerations, such as diversity or balance.

Replication zone configurations also allow users to constrain a table’s location for legal and compliance reasons. For example, a table that contains European financial transactions, can be restricted to European datacenters. Replication zone configurations only work on entire tables or databases (e.g., non-split tables). We plan to address this limitation through geo-partitioning, an upcoming enterprise feature set for release in 2.0.

Future Work

Geo-Partitioning for Low Latency

Geo-partitioning will allow DBAs to specify replication zone configurations on subsets of a table. This can be used to specify exactly in which localities individual rows of a table should be placed based on the value of some column(s) in the row.

We believe that this feature will be useful when table/database level replication zone configurations need more specificity for customer business purposes (e.g., onerous table-splitting). “Follow-the-workload” will automatically take advantage of the propensity of geo-partitioning to create similar data access patterns (e.g., by location) to improve latency.

CockroachDB is built to be a high-performance distributed database. We hope that by using features like replication zone configurations we will continue to “make data easy” for our users.

Keep Reading

Geo-Partitioning, Data Residency, and Super Regions, OH MY!

As the world seems to get smaller, our applications are getting bigger and we are much more likely to have a global …

Read more