Improve the Performance of your Application with the Right Implementation Topology
When you deploy a distributed database, it’s important to think through how the architecture and topology will affect your data and ultimately, your application. Distributed systems require you to think through both logical and physical details. With Distributed SQL, nailing the topology is paramount.
In this webinar, we walk through how to choose a topology pattern that matches your business needs and optimizes reliable, global access to your data. The webinar includes a live demo in which you can see the impact of an implementation topology on latency.
3 concepts you should be familiar with to deploy Distributed SQL
CockroachDB stores all user data (tables, indexes, etc.) and almost all system data in a giant sorted map of key-value pairs. This keyspace is divided into "ranges", contiguous chunks of the keyspace, so that every key can always be found in a single range.
From a SQL perspective, a table and its secondary indexes initially map to a single range, where each key-value pair in the range represents a single row in the table (also called the primary index because the table is sorted by the primary key) or a single row in a secondary index. As soon as that range reaches 64 MiB in size, it splits into two ranges. This process continues for these new ranges as the table and its indexes continue growing.
Raft is a consensus protocol––an algorithm which makes sure that your data is safely stored on multiple machines, and that those machines agree on the current state even if some of them are temporarily disconnected.
Raft organizes all nodes that contain a replica of a range into a group--unsurprisingly called a Raft group. Each replica in a Raft group is either a "leader" or a "follower". The leader, which is elected by Raft and long-lived, coordinates all writes to the Raft group. It heartbeats followers periodically and keeps their logs replicated. In the absence of heartbeats, followers become candidates after randomized election timeouts and proceed to hold new leader elections.
Once a node receives a BatchRequest for a range it contains, it converts those KV operations into Raft commands. Those commands are proposed to the Raft group leader––which is what makes it ideal for the leaseholder and the Raft leader to be one in the same––and written to the Raft log.
For a great overview of Raft, we recommend The Secret Lives of Data.
For each range, one of the replicas is the "leader" for write requests. Via the Raft consensus protocol, this replica ensures that a majority of replicas (the leader and enough followers) agree, based on their Raft logs, before committing the write. The Raft leader is almost always the same replica as the leaseholder. This has a tremendous impact on latencies, which we demo in the webinar.
Implementation Topology Pattern Options
The implementation topology patter that you choose will impact performance as well as your control over where data physically lives, which has an impact on latency and compliance with data privacy and storage regulations.
Single Region Topology Patterns
When your clients are in a single geographic region, choosing a topology is straightforward.
- Development: While developing an application against CockroachDB, it's sufficient to deploy a single-node cluster close to your test application, whether that's on a single VM or on your laptop.
- Basic Production: When you're ready to run CockroachDB in production in a single region, it's important to deploy at least 3 CockroachDB nodes to take advantage of CockroachDB's automatic replication, distribution, rebalancing, and resiliency capabilities.
Multi Region Topology Patterns
When your clients are in multiple geographic regions, it is important to deploy your cluster across regions properly and then carefully choose the right topology for each of your tables. Not doing so can result in unexpected latency and resiliency. There are a number of different multi-region topologies ot choose from depending on your table requirements. The options are below. Details for each option are available in our docs stable and are discussed in the webinar.
- Geo-Partitioned Replicas Topology
- Geo-Partitioned Leaseholders Topology
- Duplicate Indexes Topology
- Follower Reads Topology
- Follow-the-Workload Topology
If you’ve watched the webinar and you’d like to ask more questions please join our CockroachDB Community Slack channel to chat with CockroachDB users and engineers.