Just the thought of updating the database schema is enough to give some developers and architects a headache. Designing a good schema is hard enough. Updating it once it’s in production? That has to be a nightmare. Right?
Well, historically it certainly has been! But it doesn’t have to be. Let’s take a look at the options for dealing with database schema, and learn how live schema updates solve the challenges developers encounter with both traditional relational databases and NoSQL document-store alternatives.
Traditional relational databases such as MySQL require developers to lock tables during schema changes, effectively taking the database offline. Some relational databases can handle some live schema changes – Postgres, for example, offers more live update functionality than some others – but this functionality is still pretty limited.
In reality, when you’re working with one of these legacy databases, updating the schema without downtime usually requires creating a complete replica of your data so that you can operate in parallel. This allows you to update the schema in the replica and then gradually migrate your application over to it.
However, maintaining multiple instances of a database is expensive — even for enterprises. Doubling your database costs just to prevent downtime during schema changes is a tough pill to swallow.
Schema updates on legacy RDBMS get even more complicated if you’ve sharded your database to scale it horizontally. In that case, you’ve created an ad-hoc distributed database, so building a replica of it is no longer straightforward. The complexity of a distributed relational database makes it difficult to apply schema changes while keeping data consistent.
The whole process can be difficult enough that many developers have been tempted to abandon relational databases entirely, moving to “schemaless” NoSQL alternatives.
NoSQL databases such as MongoDB offer a “schemaless” database engine. Schemaless databases solve the schema problem by changing how data is stored. Instead of tables and rows, data is stored as JSON-like documents.
This approach increases flexibility at the expense of data consistency, because documents in a table or collection aren’t required to contain the same data fields. The lack of an enforced schema in the database can also negatively impact performance in some use cases.
And while not having to worry about database schema might sound great, in most cases, it just kicks the can down the road. Because most applications require some or all of the data they use to be structured, you still end up having a data schema. It’s just defined and enforced in your application rather than in your database. With NoSQL, the engineering challenge of finding a way to make schema updates seamless is passed from the database technology to your application development team.
For some applications, it makes sense to sacrifice the schema-enforced data consistency of a traditional relational database for the flexibility of NoSQL. Sometimes you need to store some unstructured data! But many applications – particularly those with heavy transactional workloads – rely on data consistency.
Thankfully, there’s now a third option – one that combines some of the best elements of both relational and NoSQL databases.
CockroachDB is a next-gen distributed SQL database that offers the best of both worlds: a fast relational database with easy, no-downtime live schema updates. It offers a database that has the consistency and referential integrity of a traditional SQL database, and the scalability, flexibility, and ease-of-use of a NoSQL database.
It even supports unstructured data in columns with the JSONB
data type, so you can store unstructured data like a NoSQL database but get the data consistency and familiarity of a traditional SQL database.
CockroachDB solves all of the traditional database pain points associated with schema changes – you won’t have to deal with any downtime or worry about replicating your entire database to stay online. CockroachDB schema changes are executed without any interruption to your applications or impact on your users.
CockroachDB handles schema changes in progressive states, keeping the cluster free from table locks by ensuring applications only see the old schema version until the new one is fully deployed.
Schema changes run as a scheduled background job run as a state machine. The best part is that while the schema changes happen, your database remains active and can service application requests.
We have thorough documentation about how to execute online schema changes with CockroachDB, as well as best practices. But let’s go hands-on with a sample database and Node.js app to look at just how easy online schema changes are!
Let’s review the schema changes in a production instance for CockroachDB. If you don’t have CockroachDB installed on your machine, you can set up a free CockroachDB Dedicated cluster in just a few minutes.
We’ll use the Node.js application we developed in a previous article. Here, we’ll only see the table structure that Node.js receives from the cluster.
In the next section, we assume that you have a CockroachDB instance running and can connect to it using a Node.js application.
We need to decide which database driver you’ll use with your Node.js app. For this tutorial, we’ll use PostgreSQL’s pg driver – we’ll download it via npm, then start our Express server.
We start with a single-column table. We can do this directly in SQL using the CockroachDB SQL shell, or we could run the following query using the Node.js pg driver. In this tutorial, we’ll be using CockroachDB’s SQL syntax.
CREATE TABLE IF NOT EXISTS public.person (id INT PRIMARY KEY);
Our Node.js app will query this table and we’ll read the column information after the query. Using PostgreSQL’s pg driver, we can easily read the response fields. Let’s write some JavaScript code that will select all the records from the person
table we just created, and read the fields (i.e. columns) it returns as a response:
let response = await client.query("SELECT * FROM public.person;");
console.log(`[INFO] query reply: \n` + JSON.stringify(response.fields));
res.send(response);
If we run this query right now, we’ll get:
"fields":[{"name":"id","tableID":68,"columnID":1,"dataTypeID":20,"dataTypeSize":8,"dataTypeModifier":-1,"format":"text"}]
This response shows that we have a single field, id.
In a real-world scenario, your app might use tens (if not hundreds) of fields for each record of response that comes through.
Now, let’s alter the table by running an ALTER TABLE
query in the CockroachDB SQL shell:
ALTER TABLE public.person ADD COLUMN IF NOT EXISTS name STRING;
Running this query has added a second column to our table. If we are monitoring our database as we run it (or if we check monitoring data afterwards) we can see that it executes without any database downtime. If we had an application that was actively reading and writing on the database as this query executed, it would be able to continue to read and write without any errors or data loss, assuming that the code in our application works with both the “before” and “after” versions of the schema.
Now let’s query the table again from our app. This time, we should see two fields in the response: id
and name
.
"fields":[{"name":"id","tableID":68,"columnID":1,"dataTypeID":20,"dataTypeSize":8,"dataTypeModifier":-1,"format":"text"},{"name":"name","tableID":68,"columnID":2,"dataTypeID":25,"dataTypeSize":-1,"dataTypeModifier":-1,"format":"text"}]
It works! We changed our database schema quickly and easily, without even a millisecond of downtime.
CockroachDB’s online schema changes will work even if we want to change a table’s primary key! This makes it easier to transition our applications from single-region to multi-region, and is also useful in many other contexts. You can read more about how this was accomplished if you’re interested in the technical details, but let’s try it out and see how it works!
We can do all of this right from within CockroachDB’s SQL shell.
First, we’ll need to create a new column in our person
table that’ll serve as our new primary key. Following the best practice for a distributed database, let’s create a UUID column and set it up to generate a random unique ID for each row that’s added to the table:
ALTER TABLE person ADD COLUMN uuid UUID NOT NULL DEFAULT gen_random_uuid();
Now, to change the table’s primary key, we can run the same sort of command we ran earlier when adding the name
column:
ALTER TABLE person ALTER PRIMARY KEY USING COLUMNS (uuid);
When we run that command, we’ll get a notification in the SQL shell:
NOTICE: primary key changes are finalized asynchronously; further schema changes on this table may be restricted until the job completes
This is due to the complexity of changing the primary key across multiple nodes on a distributed database. From our perspective as an application developer, however, it isn’t a problem. On a large production database we might need to wait a little while before making additional schema changes, but there’s still no downtime and thus no impact on our application.
We can confirm that our live schema change worked by running the following command in the SQL shell:
SHOW CONSTRAINTS FROM person;
This command will return a table listing each of the constraints applied to person
, and we can see that the primary key is now our uuid
column. The schema change worked!
Of course, whenever we change the database schema, we’ll probably also need to update our app to reflect the new schema. The same would be true if we were using a NoSQL database, too, since the “schemaless” database setup means we need to enforce the data schema within our application.
By using CockroachDB’s live schema updates, we’ve essentially gotten the best of both worlds – we’ve avoided the database downtime and replication headaches that often come with schema changes to traditional relational databases without having to sacrifice the data consistency they offer.
Note that while the schema update happened almost instantly with our demo database, this command might take more time in production applications because CockroachDB has to apply the schema migration to all nodes, and all nodes must agree to the schema change. But don’t worry: CockroachDB allows for online schema changes with zero downtime even on massive production databases.
In this article, we discussed how to update a CockroachDB cluster schema with zero downtime. We also took a look at some of the other database options out there, including traditional relational databases and NoSQL databases, and discussed some of their pros and cons.
By the end of this article, you saw a demonstration of how schema-change SQL queries happen in real-time and how the changes reflect in your queries. Before you actually execute any schema changes in production, though, you should review the CockroachDB documentation on schema changes for some important tips and best practices.
Learn to design database schema the easy way with our Schema Design course. It’ll walk you through everything you need to know about great schema design and making schema changes, and it’s completely free!
Indexes are a crucial part of your database schema. They improve workload performance by helping the database locate …
Read moreA recent tweet inspired me to address the need for fuzzy matching by combining some existing capabilities of …
Read moreIf you use a database, at some point you’ll need to modify your schema design. Maybe you need to store new kinds of …
Read more