Alter Column Types Without Taking Tables Offline

Alter Column Types Without Taking Tables Offline

There are many reasons you might want to alter the schema of your database but in many databases, this process typically requires downtime. In CockroachDB, we have supported online schema changes since our first stable release,  and in v20.1, we added the ability to alter primary keys while in production without downtime. 

The elegant design of alter primary keys eliminates the reliance on locks so that you can continue to use tables even while they undergo schema changes. For more details, read our docs page on online schema changes.

During my internship at Cockroach Labs, I tackled the problem of enabling alter column types without taking tables offline. The need to change the type of a column (alter column type) is a fairly common use case. Database schema design is difficult and requirements evolve which lead to columns potentially having to change types. Requiring downtime or having to do a roundabout, manual process whenever you want to change a column can be very annoying, especially since you’ll have to keep an eye on the process to know when each step finishes so you can proceed on with the next. 

This blog post covers how we implemented alter column type and challenges we overcame in the process. 

Early Versions of Alter Column Type

Previously, we only supported ALTER COLUMN TYPE when it would not require rewriting data on disk. For example, increasing the precision of a column from INT2 to INT4 does not require any data to be rewritten. Whereas when converting INT4 to INT2, we do need a rewrite since we have to "truncate" data. Altering the column types of data that required an on-disk rewrite without any downtime was a really interesting challenge to tackle.

Without the alter column type feature, if you wanted to change the type of your column, the workaround would be to create a new column with the new data type you wanted as a computed column of the original column.  Basically, a new column is created and once backfilled, you drop the old column and rename the new column to the name of the old column. However this is fairly tedious and the process of creating the new column can be slow depending on how much data it contains.

For example, if we had table t with a column id of type INT and we wanted to change it to STRING, we would have to execute the following statements:

  • ALTER TABLE t ADD COLUMN c_string INT;
  • ALTER TABLE t ADD COLUMN c_string STRING AS (c::string) STORED;
    • This creates a new column and populates the values by converting the values in column c to string.
    • This step may take a long time depending on how many values you have in the column.
  • ALTER TABLE t DROP COLUMN c;
  • ALTER TABLE t RENAME COLUMN c_string to c;

This early version of ALTER COLUMN TYPE was  manual and tedious, and required the developer to closely monitor the process to know when each step finishes before proceeding to the next. 

Implementation of Alter Column Type in CockroachDB

The implementation of the alter column type online schema change closely follows the manual process described in the previous section.

Here are the steps for converting column c to type t. This example is the simplest case, where the column is not part of any indexes or has any constraints.

  1. Create a new non-public column c' that has the computed expression CAST(c AS t) or c::t for short.
  2. Wait until column c is backfilled. Once backfilled, we perform the “column swap” which involves steps 3–8. These steps must happen atomically.
  3. Swap the names of c and c'.
  4. Make c a non-public column and c' a public column
    1. This is so only one of the two columns is visible to the user until the old column is dropped.
  5. Update column c’ to use computed expression c::t’ where t’ is the type of column c’.
    1. This seems strange: why do we add a computed expression to the original column that references the new column? This will be covered in this challenges section.
  6. Remove the computed expression from c.
  7. Replace the c’ in the tables column list with c.
  8. Update the new column's LogicalColumnID to the old column's ID.
      • The LogicalColumnID represents the ordering of the columns in the table.
  9. Enqueue drop for c’. The old column is now dropped asynchronously.

To get a better understanding of these steps, we can follow the stages for altering a column type from INT to STRING.

In this example, we start with table t with one column id of type INT. To change column id to type STRING, we execute ALTER TABLE t ALTER COLUMN id TYPE STRING.

Initial layout for table t:

column_name type computed expression public logical column id
id INT true 1

We create a new column id’ with computed expression id::STRING. Note that it is not public and cannot yet be seen by users. 

column_name type computed expression public logical column id
id INT true 1
id’ STRING id::STRING false 0

Swap the names of the two columns.

column_name type computed expression public logical column id
id’ INT true 1
id STRING id::STRING false 0

The original column, now named id’, becomes non-public and the newly created column, now named id, becomes public.

column_name type computed expression public logical column id
id’ INT false 1
id STRING id::STRING true 0

Update id’ to use computed expression id::INT and remove the computed expression from id.

column_name type computed expression public logical column id
id’ INT id::INT false 1
id STRING true 0

Swap the positions of the two columns and update the LogicalColumnIDs.

column_name type computed expression public logical column id
id STRING true 1
id’ INT id::INT false 0

Lastly, the old column is dropped asynchronously and we arrive at the desired state where column id has type STRING.

column_name type computed expression public logical column id
id STRING true 1

Challenges for Alter Column Type: Two-Version Invariant Object Descriptors

In CockroachDB, we have a concept known as the two-version invariant for object descriptors.  Descriptors are what contain the information about a particular object such as a table or database. This implies that after any schema change, there are two valid versions of a descriptor. A particular node can be using the version before the schema change or after the schema change.

This case was particularly tricky to handle when it came to the implementation of alter column type. Reading from either Table Descriptor is valid and not a problem, but the challenges appeared when it came to writes.

Writing with the original version of the Table Descriptor is the more straightforward of the two cases here. Any inserts into the original column will be reflected in the new column due the fact that the new column has a computed expression referencing the old one. There are still cases where the conversion is invalid because the computed expression cannot convert the old data type into the new one. One simple case is when converting STRING to INT and the STRING value is not castable to INT, e.g. "hello". 

When an insert happens using the new version of the Table Descriptor, we still have to ensure that any writes are visible to nodes reading from the previous version of the Table Descriptor. This is the reason for adding the computed expression that references the new column to the original column during the column swap. Any inserts into the new column will also be reflected in the original column due to the computed expression. In this case, if the inserted value into the new table cannot be cast back to the original type, the insert will be rejected until the schema change is fully completed. One caveat is that if the user performs an alter column type and provides an expression to use for the conversion, we cannot generally invert the provided expression in order to update the old column. In this case, all inserts into the new column are disallowed until the schema change is finished.

The two-version descriptor concept is fairly confusing, so let’s illustrate it with an example. Again we’ll start with a table t with one column id of type INT. Suppose we just finished the column swap and node 1 has the version of the Table Descriptor before the swap:

column_name type computed expression public logical column id
id INT true 1
id’ STRING id::STRING false 0

Whereas node 2 has the pre-swap version of the Table Descriptor.

column_name type computed expression public logical column id
id STRING true 1
id’ INT id::INT false 0

Inserts into the id column for node 1 will insert into the original column of type INT. In this case, all inserts will succeed since we can always cast INT to STRING.

Inserts into the id column for node 2 will insert into the new column of type STRING. Some inserts may fail since we cannot always cast STRING to INT. Inserting “hello” into the column will fail in this case. This insert will fail until the node moves into the next stage of the schema change, where id’ is dropped. Note that because of the two-version invariant, once any node is on the Table Descriptor version where id’ is dropped, the Table Descriptor version with the column id as type INT is no longer valid, thus inserts into the new column no longer have to worry about reflecting in the old column.

Current Limitations and Future Work

Currently, alter column type only works on columns that are not indexed and do not have constraints. This is because our schema changer does not currently support more complex schema changes which update multiple columns and indexes at the same time. When the target column is indexed, we would also have to update the index with the new column. The proposed way to do this is to simultaneously create a new index along with the new column and perform an index swap similarly to the column swap. We plan to support this in the future.

If simplifying schema changes is your cup of tea, we've got good news: Cockroach Labs is hiring!