No Time to Live: James Bond Explains Row-Level TTL

No Time to Live: James Bond Explains Row-Level TTL

Do you know what time it is? Time to die.

-Nomi

That’s it. That’s row-level TTL in a James Bond nutshell. 

You probably don’t need to read further than that one quote from Nomi (possibly the next 007?). Because you get it now. But row-level TTL has been one of our most requested features dating back to 2017. It’s rare for issues on our github to get this many votes. So I hope you’ll forgive me for indulging in this metaphor a bit longer.

Github issue for row-level TTL

Developers have been asking for this feature with such gusto for so long because they’re tired of writing custom app code for deletion logic. And now they don’t have to.

The past isn’t dead, until you say it is

Row-level TTL is a mechanism in which rows from a table are considered “expired” and will be automatically deleted once those rows have been stored longer than a specified expiration time. As of CockroachDB 22.1 row-level TTL is in preview mode in CockroachDB

The TL;DR for row-level TTL is that it will make your life easier by automating the process for getting rid of expired data. For example, you can set ‘shopping cart’ items to expire after a week, and with row-level TTL the database will clean them up for you automatically rather than “requiring” you to write a script to delete them. Or, you might need to comply with certain regulations that say something to the effect of ‘data can’t be held for people for more than x years’. Now you can use a TTL rule to make sure the data gets deleted on time.  

History isn’t kind to men who play God, but sometimes you should

Without row-level TTL, the DIY process for expiring data typically looks something like this:

  1. You create an index on the column containing the timestamp which you’ll use to expire the rows;
  2.  You periodically run (via cron, perhaps) a script which will DELETE FROM table_t1 WHERE now() - timestamp_column > interval '30 days' LIMIT 1000;.  

That LIMIT is there to prevent the script from becoming too onerous. You just keep running this until it fails to delete rows, then you stop and run it again next time you have to. Or, perhaps, you just keep running this — depends on your data volumes.  That “1000" is empirically determined; it could be “10000” or something else. 

It’s not an insurmountably complex process. But certainly a process we’re thrilled to automate because of the time it saves developers and the compliance problems it helps you avoid. One sneaky bit of complexity is the need to balance the timeliness of the deletions vs. the potentially negative performance impact of those deletions on foreground traffic from your application. These “row expiration” tasks aren’t critical at all, so it’s better if they can be done at a lower priority, to minimize impact to the usual flow of transactions.

Use cases for row-level TTL include:

  • Delete inactive data events to manage data size and performance: For example, you may want to delete order records from an online store after 90 days.
  • Delete data no longer needed for compliance: For example, a banking application may need to keep some subset of data for a period of time due to financial regulations. Row-level TTL can be used to remove data older than that period on a rolling, continuous basis.
  • Outbox pattern: When events are written to an outbox table and published to an external system like Kafka, those events must be deleted to prevent unbounded growth in the size of the outbox table.

How row-level TTL works

At a high level, row-level TTL works by:

  • Issuing a selection query at a historical timestamp, yielding a set of rows that are eligible for deletion (also known as “expired”).
  • Issuing batched DELETE statements for the expired rows.
  • As part of the above process, deciding how many rows to SELECT and DELETE at once in each of the above queries.
  • Running the SQL queries described above in parallel as background jobs.
  • To minimize the performance impact on foreground application queries, the background deletion queries are rate limited; they are also submitted at a lower priority level using the admission control system.

The process above is conceptually similar to the process described by Batch delete on an indexed column. This difference is that row-level TTL is built into CockroachDB, so it saves you from having to write code to manage the process from your application and/or external job processing framework, including tuning the rate and performance of your background queries so they don’t affect foreground application query performance.

When are rows deleted?

Once rows are expired (that is, are older than the specified TTL interval), they are eligible to be deleted. However, eligible rows may not be deleted right away. Instead, they are scheduled for deletion using a background job that is run at the interval defined by the ttl_job_cron storage parameter.

What else is in CockroachDB 22.1?

Row-level TTL is one of my favorite 22.1 features because now we have an answer for maintaining a rolling window of data. Other database solutions are able to do this via table partitioning (by date/timestamp range) but, when they do it that way, it’s operationally cumbersome because they have to script out the creation of the new partitions and the retiring of the old ones.

Essentially, this is one of those features that makes developer lives easier in a very obvious way. It isn’t the only feature of this nature in 22.1. Unfortunately, I can’t let the air out of the release just yet. If you’re keen to learn more about CockroachDB 22.1 check out our release homepage.

Keep Reading

Raft Is So Fetch: The Raft Consensus Algorithm Explained Through Mean Girls

Raise your hand if you’ve ever been personally victimized by the Raft Consensus Algorithm.

via GIPHY

Understanding Raft …

Read More
How to Build a Complete Web App with Python and CockroachDB

In this article, we’re building a full-stack web app that simulates a game leaderboard. The idea is to make it as simple …

Read More
Application Architecture: A Quick Guide for Startups

When you’ve got a great idea for a startup, application architecture is probably one of the last things on your mind. …

Read More
x
Developer Resources