3 tips for startups who chose CockroachDB over Postgres

It’s a bit of a race, isn’t it? You have to get your MVP out the door quickly and you need to use the right technology to get you across the finish line.

We’ve talked about building with the right tech stack from the start to ensure you can scale, and preventing yourself from building up too much tech debt. Now we are going to assume you made the right choice and are using CockroachDB (yes, we are biased).

Our customer success team has onboarded over 100+ startups to CockroachDB so that they can build scalable applications, grow their customer base, and ultimately create a profitable business. During an initial meeting with our team, our customers see a slide that looks something like this:

You may already be familiar with this “path to success” or “road to production” and perhaps you are already a distributed database expert. However, we see a lot of our customers that are just becoming familiar with CockroachDB get stuck in certain areas.

When you are still in the development phase, we encourage you to address the following three topics before moving into production…

CockroachDB & PostgreSQL: the same, but different

We see a lot of customers migrate from Postgres to CockroachDB for a variety of reasons. Since CockroachDB is PostgreSQL-wire compatible, it can make the transition easy. We also see a lot of customers that already have in-house PostgreSQL expertise, but are building an application that can benefit from horizontal scale, has strict high availability requirements, or needs to be deployed in a hybrid or multi-cloud topology – so they turn to CockroachDB instead.

While CockroachDB looks and feels a lot like PostgreSQL, there are a couple of important distinctions that qualify it as a SQL distributed database. And, if you take advantage of these features early on, it will help you down the road.

Correctness in CockroachDB

One crucial distinction is that CockroachDB delivers ACID compliant guarantees and serializable isolation even when there’s a high volume of competing distributed transactions. Meaning, regardless of how much or where you scale, and how much data you are collecting, your data will always be correct.

This feature proactively helps avoid data corruption issues. For example, to ensure that a write transaction does not interfere with a read transaction that starts after it, CockroachDB uses a timestamp cache which remembers when data was last read by ongoing transactions. This ensures that clients always observe serializable consistency with regards to other concurrent transactions.

This feature also has an impact on code design. While your data will always be correct, a large amount of queries/transactions can cause performance slowdowns if you’re not following best practices to avoid transaction contention. When possible, we recommend running queries/transactions in small batches rather than massive statements. Especially if those statements can cause transaction contention because they’re hitting the same table and/or the same rows.

All this is to say, be mindful of serialization and how CockroachDB will automatically prioritize correctness over performance. Even if your design is not optimal for CockroachDB at first, you may be able to run it successfully initially without any problems. But as you begin to scale your workloads, it will be important to take serialization into consideration to ensure your performance is optimal while you scale.

Data Locality in CockroachDB vs Postgres

Another distinction: Postgres is unable to tie data to a specific instance in a specific location, whereas CockroachDB has the ability to distribute and store data within defined geographic regions. Customers use this feature to help with compliance in countries where data must be domiciled, and to reduce latency for distributed users.

Locality is a prerequisite for multi-region deployments. Locality can be used to influence the location of data replicas in various ways using replication zones. When there is high latency between nodes (e.g., cross-availability zone deployments), CockroachDB uses locality by default to move range leases closer to the current workload, reducing network round trips and improving read performance, also known as “follow-the-workload”.

Given the nature of how locality works, it helps with resiliency as well which is always a bonus, and brings us to the next question…

How to maintain high availability

Based on the type of application you are building, you should have an associated high availability goal. For example, payment applications require guaranteed uptime so customers can always access their sensitive banking information.

It’s important to define your goals early on so you can architect your application to adhere to those goals. Perhaps you’re trying to stand up prod ASAP and it’s acceptable if your app has some downtime periodically for maintenance such as an upgrade to a single node. It’s okay to start here if that’s a tradeoff you are comfortable with.

If high-availability is a priority, we recommend a minimum configuration of 3 nodes per cluster. You can always increase the amount of nodes over time, but starting with a minimum of three nodes spread across three availability zones (AZ) is recommended.

In order to maintain high availability, there are a few rules you should adhere to:

If you’re upgrading to a new version, follow our documentation on rolling upgrades to avoid interrupting the clusters' overall health and operations.
Make sure you are decommissioning nodes properly so data is completely moved out of the node before it is shut down.
Set up the right connection pools to maintain network connectivity and allow you to reconnect during a node outage (which can happen unexpectedly during a network disruption or as part of a rolling upgrade). Connection pooling also has an impact on performance. Too few connections will result in high latency as each operation will wait for a pool to open up, whereas too many connections will also result in high latency as each connection thread is being run in parallel by the system. For guidance on sizing, validating, and using connection pools with CockroachDB, see use Connection Pools.

What about backup and restore? Because CockroachDB is designed with high fault tolerance, backups are primarily needed for disaster recovery through `RESTORE` whereas isolated issues (i.e. a node outage) do not require any intervention. However, as an operational best practice, we recommend taking regular backups of your data. And, if you are prioritizing high availability, we recommend doing a full backup every day and incremental backups once an hour to minimize downtime (this is handled for you with CockroachDB Dedicated).

It may be beneficial for you to test out the `RESTORE` functionality and simulate a full cluster or database failure. This will help you assess your operational readiness to survive various types of scenarios.

Use SQL best practices for distributed systems

For users who are new to distributed systems, it’s crucial to understand your data and how it interacts with the system your data lives on. Lucky for you, our Docs Team has spent a lot of time gathering a comprehensive overview on SQL performance best practices. In general, it’s great to be proactive about schema design and follow these best practices (for designing or migrating) so you don’t waste time diagnosing performance issues down the road.

For example, let’s say you’re dealing with a high volume of transactional data that needs to be constantly updated. You want to design your data so you can read it efficiently which means intelligently designing primary keys upfront so you can get multiple access patterns out of the same primary key.

In this scenario, you will also have to design indexes for specific columns, which creates a copy of the columns and then sorts their values (without sorting the values in the table itself). After a column is indexed, you can easily filter its values using the index instead of scanning each row one-by-one. On large tables, this greatly reduces the number of rows scanned which will enable you to execute queries exponentially faster.

If you can, we recommend that you adopt (or migrate to) Universally Unique Identifiers (UUIDs) vs. sequential IDs. UUIDs offer superior performance for distributed workloads by helping avoid duplication or consistency issues. This is because each node can generate completely unique UUIDs autonomously without having to check against the other nodes. It’s not a requirement to do a massive schema overhaul, it will depend on your workloads. (You can read more about UUIDs and their benefits here).

While there are several steps you should take before moving into production, we hope this list is a good starting point and will help you succeed where others have struggled. If you still have questions, get in touch via our community Slack channel.