How to change your database schema with no downtime

Just the thought of updating the database schema is enough to give some developers and architects a headache.

Designing a good schema is hard enough. Updating it once it’s in production? That has to be a nightmare.

Right?

Well, historically it certainly has been! But it doesn’t have to be. Let’s take a look at the options for dealing with database schema, and learn how live schema changes solve challenges with both traditional relational databases and NoSQL document-store alternatives.

Schema changes with traditional RDBMS

Traditional relational databases such as MySQL require developers to lock tables during schema changes, effectively taking the database offline for users. Some relational databases can handle some live schema changes – Postgres, for example, offers more live update functionality than some others – but this functionality is still pretty limited.

In reality, when you’re working with one of these legacy databases, updating the schema without downtime usually requires creating a complete replica of your data so that you can operate in parallel. This allows you to update the schema in the replica and then gradually migrate your application over to it.

However, maintaining multiple instances of a database is expensive — even for enterprises. Doubling your database costs just to prevent downtime during schema changes is a tough pill to swallow.

Schema updates on legacy RDBMS get even more complicated if you’ve sharded your database to scale it horizontally. In that case, you’ve created an ad-hoc distributed database, so building a replica of it is far from straightforward. The complexity of a distributed relational database makes it difficult to apply schema changes while keeping data consistent.

The whole process can be difficult enough that many developers have been tempted to abandon relational databases entirely, moving to “schemaless” NoSQL alternatives.

NoSQL and “schemaless” databases

NoSQL databases such as MongoDB offer a “schemaless” database engine. Schemaless databases solve the schema problem by changing how data is stored. Instead of tables and rows, data is stored as JSON-like documents.

This approach increases flexibility at the expense of data consistency, because documents in a table or collection aren’t required to contain the same data fields. The lack of an enforced schema in the database can also negatively impact performance in some use cases.

And while not having to worry about database schema might sound great, in most cases, it just moves the work of enforcing data rules into your application.

Most applications require some or all of the data they use to be structured. If your database isn’t enforcing that structure, then you need to enforce it in your application, to ensure only valid data is sent to the database.

So, while NoSQL databases have other advantages, the term “schemaless” can be a bit misleading. Your database might be schemaless, but often that just means your team will need to build schema-like guardrails into your application. And of course, when you want to change those guardrails, it’ll be on your team to figure out how to do those updates without impacting the user experience.

For some applications, it makes sense to sacrifice the schema-enforced data consistency of a traditional relational database for the flexibility of NoSQL. Sometimes you need to store some unstructured data! Or, you may simply have an application where eventual data consistency is good enough.

But many applications – particularly those with heavy transactional workloads – rely on data consistency. Orders, logistics, money transfers – these are workloads that many applications have in some form or another where eventual consistency often isn’t good enough.

Thankfully, if you want to be able to change your schema without downtime and you need strong consistency, there’s no longer a need to choose between the compromises of legacy relational databases or NoSQL databases.

The best of both worlds: live schema changes without downtime

CockroachDB is a next-gen distributed SQL database that offers the best of both worlds: a fast relational database with easy, no-downtime live schema updates. It offers a database that has the consistency and referential integrity of a traditional SQL database, and the scalability, flexibility, and ease-of-use of a NoSQL database.

It even supports unstructured data in columns with the JSONB data type, so you can store unstructured data like a NoSQL database but get the data consistency and familiarity of a traditional SQL database.

CockroachDB solves all of the traditional database pain points associated with schema changes – you won’t have to deal with any downtime or worry about replicating your entire database to stay online. CockroachDB schema changes are executed without any interruption to your applications or impact on your users.

CockroachDB handles schema changes in progressive states, keeping the cluster free from table locks by ensuring applications only see the old schema version until the new one is fully deployed.

Schema changes run as a scheduled background job run as a state machine. The best part is that while the schema changes happen, your database remains active and can service application requests.

We have thorough documentation about how to execute online schema changes with CockroachDB, as well as best practices. But let’s go hands-on with CockroachDB to see what online schema changes really look like in action:

How to perform online schema changes

By implementing online schema changes, CockroachDB makes the process of performing schema changes less impactful to your database and the applications that consume it. Simply issue the change command and CockroachDB will perform it in the background, with no table locking or read/write impact.

As exotic as they sound, CockroachDB makes online schema changes a reassuringly uneventful affair. They’re simply operations like Primary Key, column, index, and locality changes and table and materialized view creation. Nothing out of the ordinary but potentially destructive, if the table needs to be locked in order to complete them.

The following examples demonstrate some common schema changes that CockroachDB will perform in background jobs; leaving users free to continue using your database unaffected:

-- Update primary key to comprise the country and id.
ALTER TABLE person ALTER PRIMARY KEY USING COLUMNS (country, id);

-- Update locality.
ALTER TABLE person SET LOCALITY REGIONAL BY ROW;

-- Add a column.
ALTER TABLE person ADD COLUMN country STRING NOT NULL AS
  (regexp_extract("region"::STRING, '^(\w+)-'
)) STORED;

Online schema changes vs stop-the-world schema changes

The benefits of CockroachDB’s online schema changes really become apparent when compared with a database like Postgres, which does not natively provide online schema changes.

In the examples that follow, I’ll be testing CockroachDB v22.2.6 against Postgres 15.2. I’d like to start by acknowledging that Postgres is an incredible database and this test is designed merely to highlight the importance of online schema changes.

In the following steps, I’ll create a table with 10,000,000 rows (pretty small in the grand scheme of things, but large enough to demonstrate the point) in both CockroachDB and Postgres. I’ll then run an application to monitor request durations against both, whilst updating the primary key.

First, I’ll create a CSV file with 10,000,000 rows. For this, I’ll use the Crystal programming language. It’s fast, fun, and looks like Ruby (which for me, is rarely a bad thing):

require "csv"
require "faker" # Imported from https://github.com/askn/faker
require "uuid"

result = CSV.build do |csv|
  csv.row ["id", "country", "full_name", "date_of_birth"]
  1.upto(10000000) do
    csv.row [
      UUID.random,
      Faker::Address.country_code,
      Faker::Name.name,
      Faker::Date.birthday.to_s("%Y-%m-%d"),
    ]  
  end
end

File.write("people.csv", result
)

Next, I’ll create the databases and import the data. I’ll run both databases in podman containers:

podman machine init --cpus 4 --memory 6144 podman machine start podman run -d \ --name postgres \ -p 5432:5432 \ -e POSTGRES_PASSWORD=password \ postgres:15.2-alpine \ -c 'max_connections=10000' podman run -d \ --name cockroachdb \ -p 26257:26257 \ -p 8080:8080 \ cockroachdb/cockroach:v22.2.6 \ start-single-node \ --max-sql-memory=50% \ --insecure

Next I’ll create a file called create_table.sql with the following content and apply it to both databases:

CREATE TABLE person (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  country VARCHAR(255) NOT NULL,
  full_name VARCHAR(255) NOT NULL,
  date_of_birth DATE NULL

);

PGPASSWORD=password psql -h localhost -U postgres -f create_table.sql cockroach sql --insecure < create_table.sql

The final step in readying these databases for the schema change test is to import the CSV file into both of them. I’ll import the data into Postgres first:

psql -U postgres -h localhost \COPY person(id, country, full_name, date_of_birth) FROM 'people.csv' DELIMITER ',' CSV HEADER;

And then CockroachDB:

python3 -m http.server 3000 cockroach sql --insecure IMPORT INTO person(id, country, full_name, date_of_birth) CSV DATA ( 'http://host.containers.internal:3000/people.csv' ) WITH skip='1';

Finally, I’ll create an application to collect query duration statistics. I’ll write this in Go, if - for nothing other than - the sheer joy of it. Here’s the central function of the application; a loop that executes, and measures the elapsed execution time of, a simple SELECT statement that runs once every second:

db, err := sql.Open(driver, connStr)
if err != nil {
	log.Fatalf("opening db connection: %v", err)
}
defer db.Close()

var maxLatencyMS int64
for range time.Tick(time.Second) {
	start := time.Now()
	row := db.QueryRow("SELECT id FROM person LIMIT 1")

	var id string
	if err := row.Scan(&id); err != nil {
		log.Printf("error: %v\n", err)
	}
	elapsed := time.Since(start).Milliseconds()

	if maxLatencyMS < elapsed {
		maxLatencyMS = elapsed
		log.Printf("max latency: %dms", maxLatencyMS)
	}
}

With the application running against both databases, let’s alter the primary keys:

-- Statement for CockroachDB ALTER TABLE person ALTER PRIMARY KEY USING COLUMNS (country, id); -- Statement for Postgres ALTER TABLE person DROP CONSTRAINT person_pkey; ALTER TABLE person ADD PRIMARY KEY (country, id);

Whilst locking the table to perform the schema change, the maximum elapsed query time against Postgres climbed from the low milliseconds to 6,381ms. This represents a 6.3 second period, where the table was completely unavailable.

CockroachDB on the other hand, without needing to lock the table, remained in the low milliseconds, with a maximum elapsed query time of 17ms and available to service requests throughout. This difference (~375x or 37,435%) is significant, and the importance of online schema changes will only increase as the table grows.

Why it works: MVCC and time-travel queries

Live schema changes are made possible thanks to CockroachDB’s use of Multi Version Concurrency Control (MVCC), whereby data changes, including additions, modifications, and deletions, create concurrent versions in the table; allowing you to travel back in time and view data as it was before you performed an operation.

I’ll run through some example queries to demonstrate CockroachDB’s use of MVCC and how this supports what we call time-travel queries.

First, I’ll create a table:

CREATE TABLE person (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  country STRING NOT NULL,
  full_name STRING NOT NULL,
  date_of_birth DATE NULL

);

Into which, I’ll insert some data; taking a snapshot of the current system time before doing so:

SELECT NOW();
               now
---------------------------------
  2023-03-28 14:19:39.301439+00

INSERT INTO person (country, full_name, date_of_birth)
VALUES ('uk', 'Alan Turing', '1912-06-23'
);

I’ll take another snapshot of the system system before inserting a second row, which will allow me to hop between timestamps to view data between these times:

SELECT NOW();
              now
--------------------------------
  2023-03-28 14:21:15.07211+00

INSERT INTO person (country, full_name, date_of_birth)
VALUES ('us', 'Dennis Ritchie', '1941-09-09'
);

Let’s run three queries; one to see the current state of the table, one to see the table before we inserted our second row, and another to see the table before we inserted our first row:

SELECT * FROM person;
                   id                  | country |   full_name    | date_of_birth
---------------------------------------+---------+----------------+----------------
  26f9a413-cb7f-4a92-9e8e-4b02c3a4c03f | us      | Dennis Ritchie | 1941-09-09
  7125f3db-3cf3-4d2a-b001-04447aeb5ca6 | uk      | Alan Turing    | 1912-06-23

SELECT * FROM person AS OF SYSTEM TIME '2023-03-28 14:21:15';
                   id                  | country |  full_name  | date_of_birth
---------------------------------------+---------+-------------+----------------
  7125f3db-3cf3-4d2a-b001-04447aeb5ca6 | uk      | Alan Turing | 1912-06-23

SELECT * FROM person AS OF SYSTEM TIME '2023-03-28 14:19:39';
  id | country | full_name | date_of_birth

-----+---------+-----------+----------------

This is all very cool but what does MVCC have to do with online schema changes? Well, without MVCC, online schema changes wouldn’t be possible, so it’s important to get a feel for how they work.

When you request a schema change, CockroachDB will create a new version (a copy) of your table in a background job. This background job will perform the schema migration without locking the your table, meaning users can operate against the original version of the schema until the migration is complete; at which point, they’ll switch over to the new version.

If, during the process of migration, you have a change of heart; the migration job can be paused, resumed, and cancelled with the following statements:

SHOW JOBS; -- Find the job_id

PAUSE JOB job_id;
RESUME JOB job_id;
CANCEL JOB job_id;

As you’d expect, this behavior doesn’t come for free. The additional MVCC version will double the amount of data used for the table, so be mindful of this when performing a schema migration when your cluster is close to capacity. CockroachDB’s garbage collector will delete the old version of data when it next runs, which is - by default - every 4 hours but can be changed as follows:

ALTER TABLE person CONFIGURE ZONE USING gc.ttlseconds = 43200; -- 12 hours

Conclusion

Schema changes are among the scarier tasks I perform as a software engineer. Questions like “if I take the database out of action for maintenance; can I get it back online again after?”, “what if my change takes so long that the database never unlocks?”, and “what if, once I’ve started the changes, I can’t cancel or revert them?”.

With CockroachDB’s online schema changes, none of these questions need to be a concern; the database stays up and free from locks, and, if you change your mind, you can cancel them before they finish. If you change your mind after they’ve finished, you can revert the operation, just as safely as you made it.

To learn more about online schema changes, visit our documentation page: https://www.cockroachlabs.com/docs/stable/online-schema-changes.

Learn to design database schema the easy way with our Schema Design course. It’ll walk you through everything you need to know about great schema design and making schema changes, and it’s completely free!