Geographically distributed databases like CockroachDB offer a number of benefits including reliability, cost-effective deployments, and more. Critics often counter that distributed databases increase latency. What if a database could offer all of the benefits of distribution, but also provide low-latency?
With this challenge in mind, we set out to minimize latency in CockroachDB, all the while providing exceptional reliability for mission-critical workloads. We built “follow-the-workload” to be a key feature to improve performance and provide additional control to database administrators (DBAs). This feature will make use of geo-partitioning to allow DBAs to specify zone configurations in order to create data access patterns for improved latency.
The blog post is the first of a two-part series exploring the use of “follow-the-workload” and replication zone configurations to beat the latency-survivability tradeoff
Before diving into how “follow the workload” can be used to improve performance, let’s first give a quick overview of how CockroachDB distributes data. As an advanced distributed database, unless instructed otherwise (via Configuring Replication Zones), CockroachDB will automatically shard data and spread it evenly across all nodes in order to increase data survivability and availability as described here.
To read an introduction on how CockroachDB breaks data up into pieces and automatically distributes these pieces please consult Scalable SQL Made Easy: How CockroachDB Automates Operations.
We designed CockroachDB to automatically trade an increase in latency for strong consistency and improved survivability. By breaking data up and spreading it across multiple nodes we increase survivability. However, as a result of this very design, we increase the distance between data nodes, thereby creating additional latency as the copies of data communicate with each other from afar.
We recognize that, given this tradeoff, we need to provide the most performant system possible. As such, we set two main goals as we continued to evolve CockroachDB:
- Increase user control of the latency-survivability tradeoff
- Optimize performance based on DBA choices in the latency-survivability tradeoff
Improved User Controls
As discussed above, latency increases in parallel with the distance between data and requests. While CockroachDB can automatically make this tradeoff for customers, many users express a greater desire to influence this decision based on their business needs.
CockroachDB provides users the controls to set data location and thereby reduce latency. For example, table/database level replication zone configurations can be used to specify exactly in which Localities the data in a given table or database should or should not be placed.
Localities are key-value labels associated with each running CockroachDB process via a command-line flag. Users must turn on Localities through the locality flag to take advantage of replication zones configurations and “follow-the-workload.”
Replication zone configuration constraints work by hooking into the rebalancing decision-making process, as explained in Scalable SQL Made Easy: How CockroachDB Automates Operations. CockroachDB prioritizes these constraints before all other factors when ranking possible nodes for data distribution. Only after resolving these constraints will CockroachDB address other considerations, such as diversity or balance.
Replication zone configurations also allow users to constrain a table’s location for legal and compliance reasons. For example, a table that contains European financial transactions, can be restricted to European datacenters. Replication zone configurations only work on entire tables or databases (e.g., non-split tables). We plan to address this limitation through geo-partitioning, an upcoming enterprise feature set for release in 2.0.
Geo-partitioning will allow DBAs to specify replication zone configurations on subsets of a table. This can be used to specify exactly in which localities individual rows of a table should be placed based on the value of some column(s) in the row.
We believe that this feature will be useful when table/database level replication zone configurations need more specificity for customer business purposes (e.g., onerous table-splitting). “Follow-the-workload” will automatically take advantage of the propensity of geo-partitioning to create similar data access patterns (e.g., by location) to improve latency.
CockroachDB is built to be a high-performance distributed database. We hope that by using features like replication zone configurations we will continue to “make data easy” for our users.