Distributed UDFs: How we're building database functions from the ground up

Efficiency matters. When you’re working with large amounts of data, it matters a lot.

Every trip between your application and the database incurs real costs, both in terms of time and money. So how can you minimize those trips?

For modern, distributed applications, there’s no easy answer.

Most legacy relational databases support database functions, which can reduce the number of trips data must take by performing actions (for example, complex calculations) and returning the results directly within the database. Database users can execute built-in functions or write their own user-defined functions (UDFs) to be executed by the database, eliminating the need for a trip back to the application.

But in the modern era, large, mission-critical workloads typically require the high availability, scalability, and performance that distributed databases offer. And while some distributed relational databases do support UDFs, the functions themselves aren’t actually distributed. Rather, they simply mirror the legacy database approach: functions are all executed on a single central node, creating a bottleneck that eliminates some of the advantages of using a distributed database in the first place.

As a result, UDFs have become a bit controversial in modern development circles.

We think there’s a better way.

Introducing distributed database functions

UDFs are among the most frequently requested features for CockroachDB, and we wanted to support them. But we didn’t want to simply bolt legacy functionality onto a distributed database in the name of being able to claim compatibility. In a distributed database, your database functions should enjoy all of the same advantages your data enjoys: high availability, smooth elastic scalability, localizability, etc.

So in our latest release, 22.2, we’re introducing a new feature family called Distributed Functions. The first feature in this family is UDFs, available now in preview. It’s the first iteration of what’s to come over the next year as we realize our vision of truly distributed UDFs. Instead of grafting legacy features onto a distributed database, we’re building database functions from the ground up to take full advantage of the distributed database architecture.

CockroachDB’s distributed functions will allow modern distributed database users to take advantage of the efficiencies database functions can offer — reducing trips between the application and database layers — without losing the advantages distributed databases offer. UDFs lay the groundwork for us to build other implementations of functions like triggers and stored procedures — distributed, from the ground up.

They’ll also make migrations easier for users who are coming from legacy databases — if you’ve got database functions in your legacy relational database, you’ll be able to implement those same functions in CockroachDB, too. Specifically, we’re implementing Postgres-compatible UDFs to make it super easy to migrate from Postgres and other Postgres-compatible databases.

Try Serverless

Spin up a serverless cluster in seconds.

What can you do with it today?

Today’s announcement is just the first iteration in our ultimate vision for true distributed UDFs, triggers, and stored procedures. For now, the functionality is in preview, as a window of what’s to come. Let’s take a quick look at how to create and call a UDF in CockroachDB.

Elements of a UDF in CockroachDB

Essentially, UDFs in CockroachDB have six elements:

The name used to refer to and call the function
The arguments, which will serve as the function’s input value or values
The return type, which defines the datatype of the function’s output
The volatility, which specifies whether the function has side effects and whether it can mutate any data in the database (outside of whatever’s happening in the function itself)
The language, which specifies the programming language the function body will be written in (currently, only SQL is supported)
The function body (i.e., the actual code that takes the arguments and defines how they are processed to generate the value or values the function will return)

A simple CockroachDB UDF example

To illustrate how they work, let’s create a simple UDF in CockroachDB that’ll accept two integers as arguments, and return their sum.

To do this, we would connect to the database using the CockroachDB SQL client (or any other SQL client) and run the following SQL statement:

CREATE FUNCTION add(a INT, b INT) RETURNS INT IMMUTABLE LEAKPROOF 
LANGUAGE SQL AS 'SELECT a + b';

Let’s break this statement and how each part of it corresponds to the six elements of a CockroachDB function listed above:

CREATE FUNCTION add() specifies that we’re creating a function, and the name of the function will be add.
a INT, b INT specifies that the function accepts two arguments, a and b, both of the INT datatype.
RETURNS INT specifies that the output this function returns will also be the INT datatype.
IMMUTABLE LEAKPROOF specifies the volatility of the function. In this case, the function will not mutate any data or impact other tables in the database.
LANGUAGE SQL specifies that the function body will be written in the programming language SQL.
AS 'SELECT a + b'; specifies that the function body will execute the SQL code SELECT a + b (And of course the semicolon at the end signals the end of the full CREATE FUNCTION statement.)

Once we’ve run that statement and the function has been created, we can invoke it in the same way we’d invoke a built-in function:

SELECT add(3,5);

Running the above statement would return 8.

Once we’ve created functions, we can also edit them using ALTER FUNCTION or remove them using DROP FUNCTION.

Real-world UDFs in CockroachDB

The beauty of UDFs is that they can be designed to do a wide variety of things, depending on what you want. Here’s an example of a real-world UDF written by CockroachDB user Saqib Ali. It takes timestamp data and converts it into a more human-friendly, x hours ago format. (This function is available on Github here if you’d like to contribute to it).

CREATE OR REPLACE FUNCTION humanize_time_span(ts timestamp) RETURNS varchar AS
'
SELECT
 CASE
   WHEN (now() - ts::timestamp)::INT < 60 THEN '' Just Posted''
   WHEN ((now() - ts::timestamp)::INT) / 60 < 60 THEN ((now() - ts::TIMESTAMPTZ)::INT / 60)::INT || '' Minutes ago''
   WHEN (now() - ts::TIMESTAMPTZ)::INT / (60 * 60) < 24 THEN ((now() - ts::TIMESTAMPTZ)::INT / (60 * 60))::INT || '' Hours ago''
   ELSE ((now() - ts::TIMESTAMPTZ)::INT / (60 * 60 * 24))::INT || '' Days ago''
 END
'
LANGUAGE SQL;

To get started with UDFs in CockroachDB, please check out the documentation.

Try Serverless

Spin up a serverless cluster in seconds.