How online schema changes are possible in CockroachDB

I periodically need to make changes to tables. Adding columns is very simple with the ALTER TABLE command… But my tables have up to 40 million rows now, and they are growing fast. So those alter table commands take several hours…

Since I’m using Amazon RDS, I can’t have replica servers to play with offline and then promote to primary in prod. So my question is if there’s a way to do this with minimal downtime? I don’t mind an operation taking hours or even days if users can still use the db…”

— Stack Overflow, serverfault

Every enterprise aspires to get as close as possible to the ideal of zero downtime, striving to minimize the risk of disruptions to their systems and ensure operations when unexpected outages do strike. But there is one particular type of downtime that no SRE team or vendor SLA or five nines of availability can cure: scheduled downtime for database updates and schema changes.

Schema changes in particular are unfortunately frequent and necessary, especially when introducing new features to existing applications. Teams typically schedule downtime to implement and test a schema change during off-peak hours to minimize impact on end users, but this still creates a significant line item cost in your database management budget (and your team doesn’t love working nights and weekends to do it). And, scheduled or not, downtime is still downtime.

The good news: even though schema changes are inevitable, downtime doesn’t have to be.

We believe that making a simple ALTER in the database should not require downtime, no matter how minimal. None. Zip. Zero. From the beginning, CockroachDB’s schema change engine was designed to make it simple to update a table schema with no disruption to the application.

This is why online schema changes are a built-in feature of CockroachDB, requiring no additional tools, resources, or ad hoc sequencing of operations…with no effect on data consistency or application read/write latency. Simply issue the change command and CockroachDB will perform it in the background, with no table locking — and no downtime.

Read on to learn about CockroachDB’s online schema change capability, and how we manage changes to schema elements like columns and indexes without forcing a downtime.

The benefits of online schema changes

Benefits of CockroachDB’s built-in online schema change capabilities include:

Changes to your table schema happen while the database is running.
Your application’s queries can run normally, with no effect on read/write latency. The schema is cached for performance.
Your data is kept in a safe, consistent state throughout the entire schema change process.
Zero planned downtime

Schema change challenges

Schema changes in any relational database involve updating the schema and adding or deleting the table data associated with the change. When it comes to distributed SQL, however, two fundamental attributes of distributed architecture present some additional considerations:

High performance. In distributed SQL, schema must be cached across database nodes in order to optimize database performance. Maintaining consistency in distributed caches is challenging.

Large tables. Database tables can be large. Very large. Backfilling (adding data to an index) or purging data in a table with millions of rows takes time. Doing this correctly without disabling access to the database is challenging: Performing any kind of blocking operation against a database during a schema change can leave the database, your application, and its users, blocked until the operation completes.

CockroachDB’s schema change engine design built upon work originated by the F1 team at Google. Our solution solved for high performance challenges by maintaining a consistent distributed schema cache. And it solved for super-sized database tables with the concurrent use of multiple schema versions, allowing the roll out of a new schema while the older version is still in use. It backfills (or deletes) the underlying table data without holding locks — and, thus, without any downtime.

Safe schema roll out

A schema element has associated data that can be deleted, written, or read through use of the SQL data manipulation language (DML) commands (INSERT, UPDATE, DELETE, SELECT). CockroachDB makes online schema changes possible by granting delete, write, and read capabilities to these DML commands when rolling out a new index, one after the other sequentially rather than all at once.

These are the stages of making a schema change by adding a new index in CockroachDB:

Grant delete capability.
Grant write capability.
Backfill the index data.
Grant read capability.

A new capability is granted along with all prior granted capabilities in a new version of the schema. To ensure correctness, though, a new capability can only be granted once the entire cluster is using a schema that contains all prior granted capabilities. The process therefore pauses before each stage, allowing prior granted capabilities to propagate to the entire cluster before the next one can be granted.

Dropping an index runs the same process, only in reverse:

Revoke read capability.
Revoke write capability.
Purge the index data.
Revoke delete capability.

Similarly, a capability is revoked only after the previous revocation has propagated across the entire cluster.

Delete capability: Avoid spurious index entries

CockroachDB’s delete capability is granted by placing the index in the DELETE_ONLY state. Armed with this capability, the SQL DML commands limit themselves.

DELETE is fully functional, deleting the row along with the underlying index data.
UPDATE deletes the old index entry, but is prevented from writing a new one.
INSERT and SELECT ignore the index.

A node granted the write capability for an index at the next stage can trust that the entire cluster is using the delete capability for this particular index. Thus, when it receives an INSERT command and inserts an index entry for a row, another node with only the delete capability on seeing a DELETE command for the same row will correctly delete the index entry for the row. The index will avoid getting poisoned with a dangling index entry.

When a schema change drops an index, the associated index data is purged only after the successful full withdrawal of the write capability across the cluster. The entire cluster has only the delete capability, ensuring a safe purge.

Write capability: Avoid missing index entries

Write capability during a schema change is granted by placing the index in the WRITEANDDELETE_ONLY state, granting both delete and write capabilities.

INSERT, UPDATE, and DELETE commands function normally, adding or deleting index entries as needed.
SELECT, on the other hand, requires the read capability while ignoring the index completely.

The index backfill runs only once the entire cluster is capable of writing. An INSERT command received on any node while a backfill is in progress creates a new row with a legitimate index entry, and doesn’t depend on the separate backfill process to create an index entry for the row. There will be no missing index entries after the backfill is complete.

Read capability

This final capability is granted by making the index active, and turns on the full use of the index by all commands.

Fast schema roll out

Between each stage of the schema change process, the entire cluster is allowed to converge to the newest version of a table schema.

A simple schema caching mechanism might use a 5 minute TTL. That, however, forces the schema change process to wait for many minutes before trusting that the latest version is the sole active version and any new capability has been fully granted or revoked. So in CockroachDB we developed leases on schema versions to speed up cluster convergence to the latest schema version (which in turn accelerates the schema change process).

Before using a table schema for a SQL DML command, the node running the command obtains a read lease for the schema valid for a significant period of time (on the order of minutes). When an updated table schema version becomes active, it is broadcast across the cluster, alerting nodes to the new version and causing them to actively release their leases on the old version. If some nodes are unhealthy and do not actively release their leases, the roll out will wait for the leases to expire and will simply get delayed.

The schema change strategy is kept simple by having the leases follow two rules:

New leases are granted only on the latest schema version.
Valid leases are only on the two latest schema versions.

For a thorough look at table leases, check out this RFC on our Github repo.

Reliable schema roll out

The schema change is guided to completion by the node executing the schema change SQL command.

Since schema change operations are potentially long running, they need to be restartable when the node performing them dies. Every node runs a schema changer goroutine capable of running any incomplete schema change. Before executing a schema change, a schema changer goroutine first acquires an exclusive write lease on the table, giving it license to be the only one guiding the schema change.

If you change your mind while a schema change is in progress, you can cancel the schema change operations before they finish and the database remains completely unaltered. If you change your mind after they’ve finished, you can revert the schema change just as safely as you made it.

Bonus benefits of online schema changes

Operating a distributed database at scale requires constant work and diligence from everyone involved. CockroachDB customers report that being able to perform online schema changes is extremely popular with their teams, since planned downtime work that was once reserved for low-traffic times (e.g., late nights, weekends) can now be handled as part of the normal workday. As a senior staff engineer for a logistics and delivery company running 10,000 QPS data workloads explained,

“The team can do a schema change during business hours, even during our peak hours. They love that, after moving to CockroachDB, we can deploy changes to the database on a Friday — even though conventional engineering wisdom is that you never, ever deploy on Fridays.”

CockroachDB was built to make online schema changes easy to use, and to ensure that the schema change roll out process is safe, fast, and reliable. Change is the only constant in technology (and that goes double for schema changes), but now at least you have one less kind of change to worry about.