Geo-Partitioning: What Global Data Actually Looks Like

As we’ve written about previously, geographically distributed databases like CockroachDB offer a number of benefits including reliability, security, and cost-effective deployments. You shouldn’t have to sacrifice these upsides to realize impressive throughput and low latencies.

Distribution, by definition, creates some latency as data must travel from one node to another. We built CockroachDB to make it easy to minimize the latency created in geo-distributed clusters. We can minimize latency by minimizing the distance between where SQL queries are issued and where the data to satisfy those queries resides. While this makes intuitive sense, the simplicity of this basic concept can mask the complexities involved in setting up a global deployment. Luckily, CockroachDB makes this easy with geo-partitioning.

This blog post will present a detailed walkthrough of a new enterprise feature named geo-partitioning aimed at improving performance by reducing latency.

You can try geo-partitioning before it’s available in our 2.0 release (coming in April) by downloading the latest 2.0 beta and signing up for a free 30-day enterprise trial.

Geo-partitioning defined

Geo-partitioning grants developers row-level replication control. By default CockroachDB lets you control which tables are replicated to which nodes. But with geo-partitioning, you can control which nodes house data with row-level granularity. This allows you to keep customer data close to the user, which reduces the distance it needs to travel, thereby reducing latency and improving user experience. To geo-partition a table:

  1. Define location-based partitions while creating a table.
  2. Create location-specific zone configurations.
  3. Apply the zone configurations to the corresponding partitions.

US Asia-Pacific role-play

Imagine yourself as the proud developer of the multinational (and fictional) Roachmart online storefront with many users in both the United States and Asia-Pacific.

As a customer-oriented developer, you want to provide the best experience for your users by minimizing latencies. But, you don’t want this low-latency to come at the cost of increased organizational complexity or to require you to run multiple databases. With this goal in mind, you decide to deploy a single, global CockroachDB cluster that uses geo-partitioning to keep the data for your United States users in the United States and the data for your Asia-Pacific users in Asia.

First, create a users table that is partitioned by zones.

CREATE TABLE users (zone STRING, id SERIAL, name STRING, …)
PARTITION BY LIST (zone) (
    PARTITION asia VALUES IN ('JP', 'TW', …),
    PARTITION united_states VALUES IN ('US')
)

Tables with data about a user, like orders or posts, that should be subject to the same partitioning rules can be easily “interleaved” into the users table.

CREATE TABLE orders (…) INTERLEAVE IN PARENT users (usr_zone, usr_id)

Now you might wonder, what exactly are zones?

Regions and Availability Zones

Amazon EC2 and Google Compute Engine have parallel concepts called regions and zones. Each region can be imagined as a completely independent and separate data center site. Each region in turn includes multiple zones (called ‘availability zones’ by Amazon), which connect to each other with high-speed networking, but are otherwise kept as isolated as possible. While it is unlikely for a single event to cause unavailability in two different zones, it does happen. Note, it is even less likely that a single event causes unavailability in two different regions.

Replicas and Leaseholders

For fault tolerance, CockroachDB keeps multiple copies, called replicas, of each range of data distributed on different nodes. In this post, we assume Roachmart uses CockroachDB to keep 3 replicas, but you can configure this as needed.

CockroachDB balances replicas across nodes automatically to adapt to traffic patterns and machine failures. Writes require contacting a majority of replicas, but with the “leaseholder” optimization, reads need to consult only one replica which holds the lease. We cover this in more detail in previous posts such as Consensus, Made Thrive.

Local reads and writes

Back to your uber successful international Roachmart application. A Korean user refreshes a page in the application, making an API call which Amazon or Google’s load balancers route to the nearest API server, sitting in Taiwan. This API server then makes its SQL requests to a CockroachDB node in the same availability zone (asia-east1-b).

If this SQL query is a read, then CockroachDB can respond using the nearby replica leaseholder in asia-northeast1-a.

Image 1: Local Reads: only one local replica needed

geo-part-one-1

If the query is a write, then CockroachDB can respond without any cross-pacific network hops when a majority (in this case 23) of the replicas (e.g., asia-northeast1-a, asia-east1-a) are in Asia-Pacific. The leaseholder (asia-northeast1-a) is one of these two. The remaining minority of replicas (in this case, just the one replica in us-west1-a) will update asynchronously without any compromise to consistency.

Image 2: Local Writes: two local replicas needed

geo-part-one-2

Resilience to region unavailability

As discussed above, we must ensure that two replicas for each of the ranges in the Asia-Pacific partition live in an Asia-Pacific datacenter to complete local reads and writes. You can configure replicas using replication zones as outlined below:

$ ./cockroach start --locality \
    continent=asia,region=asia-east-1,zone=asia-east1-a

$ echo "constraints: {'+continent=asia': 2}" |
    ./cockroach zone set --insecure roachmart.users.asia -f -

CockroachDB configures each node at startup with hierarchical information about locality. For Roachmart, the gateway node’s continent is Asia, the region is asia-east-1 and the zone is asia-east1-a (public clouds don’t expose rack information, but a private datacenter would likely include this in the locality flag) while the leaseholder replica is located in the region asia-northeast1 and zone asia-northeast1-a . These localities are used as targets of replication zone configurations, but they’re also used by the system to keep diversity as high as possible.

Roachmart requires that two replicas remain in Asia to ensure low latency reads and writes. CockroachDB will automatically attempt to keep the third (and final) replica in a different region (in this case us-west) to ensure maximum diversity. This replica (located in us-west1-a) will only return to Asia in the event of a region specific failure such as unavailable or overloaded nodes.

CockroachDB will keep the two replicas within Asia in as diverse geographic regions as possible to increase survivability. For example, if there are nodes in two regions, CockroachDB will keep one of these two replicas in each. If there is only one region (either by design or due to temporary region unavailability), CockroachDB will automatically spread replicas across the zones in it to maximize survivability.

Consider the unlikely event of unavailability of the entire asia-northeast1 region, which means we’ve lost one replica of the range and are left with two. The replica in asia-east1-a becomes the leaseholder. Reads are still served without crossing the Pacific, but writes now temporarily require the second remaining replica (us-west1-a) to reach consensus.

Image 3: Data center failure: temporarily under replicated

geo-part-one-3

Eventually, CockroachDB will replace the missing replica. This uses network bandwidth and other resources, so it waits to do this for five minutes (by default). If the outage ends before the five minutes elapses, the original replica is reused. Otherwise, a replacement replica must be created and as always, it is placed to ensure maximum diversity. In this case, a different availability zone is the best we can do (asia-east1-a instead of the already existing gateway node in asia-east1-b).

Image 4: Data center failure: fully replicated in extended region outage

geo-part-one-4

One region and multiple cloud provider deployments

You may wonder, what happens if my geographic area has only one region?

At the time of writing, both Amazon and Google each offer a single Australian region (though as we’ve seen, many other continents have more than one). Australian users can run CockroachDB across different cloud providers because it is not only cloud-agnostic but also supports multi-cloud deployments. You can assign some nodes to the Amazon Australia region and some to the Google Australia region. You could also stay within one cloud provider but expand to a nearby region that’s not in Australia, for example, the previously mentioned Asia-Pacific region.

Final Thoughts

Ultimately, the distance inherent in global deployments means developers must always make a tradeoff between availability and latency. In typical applications, read queries are far more common than writes. This may allow for some applications to configure reads as Asia-Pacific local (only one replica located in Asia-Pacific) but writes with one cross-Pacific hop (as the majority of replicas are not in Asia-Pacific).

No matter the unique details of your application, CockroachDB offers you the tools to make the best tradeoff for your application.

You can try geo-partitioning before it’s available in GA by downloading the latest 2.0 beta and signing up for a free 30-day enterprise trial.

Illustration by Lea Heinrich

Ready to scale your database, but aren't sure where to start? We can help.

Get the guide