Written by
on December 19, 2022
When people think about improving application performance, they’re often thinking of cleaning up the code in the application layer. However, the database layer can also be optimized to improve application performance, and so can the ways in which these two layers – application and database – interact with each other.
That is where user-defined functions (UDFs) in SQL come in. If you’re looking to speed up your application, UDFs might be a way for you to make that happen.
In the context of a SQL database, a UDF is a function that is written by the user and executed by the database software.
SQL UDFs accept inputs, perform actions using those inputs, and return the output. The exact syntax for doing this can vary based on your specific database software, but SQL UDFs typically have at least four elements:
Here’s a basic example:
CREATE FUNCTION add(a INT, b INT) RETURNS INT AS 'SELECT a + b';
In this example:
CREATE FUNCTION add()
creates a function and names it add
.a INT, b INT
defines the arguments (i.e., the two inputs the function takes, which are integers, in this case).RETURNS INT
specifies that the function should return an integer.AS 'SELECT a + b'
defines the actual code the function runs (in this case, adding the two inputs together).We could invoke this function in SQL as follows:
SELECT add(3,5)
This would return the output 8
.
(Needless to say, this is a rather complicated way to do something as simple as add two integers, but it’s just an example. The beauty of UDFs is the user-defined part: your SQL UDFs will do whatever you want, based on the code you write in the function body.)
If you’re familiar with the concept of functions in other programming languages, this all probably looks familiar. SQL UDFs work just like you’d expect functions to in any language.
What’s interesting about UDFs in SQL isn’t what they can do, it’s where they can do it. The fact that they’re executed in the database is what makes them useful. Let’s talk about why.
The architecture for almost any application will include an application layer, where all of the business logic of the application lives, and a database layer, where data relevant to the application is permanently stored.
Data moves between these two layers regularly, but that movement is not free. Sending data back and forth between these layers constantly will increase the latency of your applications and, depending on how your application is deployed, it could increase your costs, too.
To demonstrate why, let’s imagine we want to add two values that are already in our database together. Needless to say, this is a simplified example — creating a SQL UDF isn’t the most efficient way to simply add two integers — but the table below should illustrate the efficiency that SQL UDFs can bring to an application:
Using a SQL UDF
Using an in-application function
SQL UDF add
invoked in database using two values already existing in the database for its two arguments. (In this case a = 1
and b = 2
.)
Function add
invoked in application.
SQL UDF executes a + b
and returns 3
.
Application queries database to get values a = 1
and b = 2
.
Application function executes and returns 3
.
As we can see, using an in-application function requires an extra step. It is less efficient.
This is not to say that all functions should be executed in the database! Determining the most efficient approach for any particular process will be case-dependent. In some cases, like the simple one illustrated above, using UDFs allows developers to make their applications more efficient by cutting down on the number of trips data must take between the application and database layers.
The process for creating UDFs, and even the language you’re writing the functions in, varies based on the specific database technology you’re using. A full tutorial for every database technology is beyond the scope of this article, but in general every SQL UDF will include at least the four elements we discussed earlier:
Let’s take a look at how our a + b
example UDF could be created in a few specific types of databases.
CREATE FUNCTION add(a INT, b INT) RETURNS INT AS 'SELECT a+b' LANGUAGE SQL;
DELIMITER $$
CREATE FUNCTION add
(
a INT, b INT
)
RETURNS INT
DETERMINISTIC
BEGIN
DECLARE added_together INT;
SET added_together = a + b;
RETURN added_together;
END$$
DELIMITER ;
CREATE FUNCTION add(a INT, b INT) RETURNS INT LANGUAGE SQL AS 'SELECT a + b';
You may notice that the CockroachDB UDF is identical to the Postgres one. That makes sense, since CockroachDB is Postgres wire compatible, and the syntax is very similar.
There is a difference, though: CockroachDB is a distributed database, and is building UDF functionality that will be just as distributed as the rest of your data. This makes CockroachDB a good choice for developers who like the efficiency offered by UDFs but who don’t want to sacrifice the high availability and scalability advantages inherent in a distributed database.
Learn more about UDFs and distributed functions in CockroachDB in our documentation.
Up to this point, we’ve been using a very simple a+b
function to demonstrate how SQL UDFs work. In practice, however, UDFs are capable of much, much more.
For example, here’s a real-world UDF written by CockroachDB user Saqib Ali. It takes timestamp data and converts it into a more human-friendly, x hours ago format. (This function is available on Github here if you’d like to contribute to it).
CREATE OR REPLACE FUNCTION humanize_time_span(ts timestamp) RETURNS varchar AS
'
SELECT
CASE
WHEN (now() - ts::timestamp)::INT < 60 THEN '' Just Posted''
WHEN ((now() - ts::timestamp)::INT) / 60 < 60 THEN ((now() - ts::TIMESTAMPTZ)::INT / 60)::INT || '' Minutes ago''
WHEN (now() - ts::TIMESTAMPTZ)::INT / (60 * 60) < 24 THEN ((now() - ts::TIMESTAMPTZ)::INT / (60 * 60))::INT || '' Hours ago''
ELSE ((now() - ts::TIMESTAMPTZ)::INT / (60 * 60 * 24))::INT || '' Days ago''
END
'
LANGUAGE SQL;
Working with UDFs can also get complicated when you’re working with actual data from your database. If you’re not careful, the functions you write can have unintended side effects and/or mutate the data in your tables in ways you did not intend.
For that reason, some databases have additional language you can add to a function that provides information about the function’s volatility — the extent to which it can impact your other data — and determine how the database handles it.
In CockroachDB, for example, we might rewrite our a + b
example function like so:
CREATE FUNCTION add(a INT, b INT) RETURNS INT IMMUTABLE LEAKPROOF LANGUAGE SQL AS 'SELECT a + b';
Note that we have added IMMUTABLE LEAKPROOF
to the statement. IMMUTABLE
means that the function cannot mutate data, and LEAKPROOF
means the function has no side effects (nothing depends on its arguments besides the return value). These tags are important for telling CockroachDB’s cost-based optimizer how it can expect a function to behave.
Efficiency matters. When you’re working with large amounts of data, it matters a lot.
Every trip between your …
Read MoreUpserts are useful for anyone who works with a database to know, but the term “upsert” might not even appear in your …
Read MoreIn an ideal world, all workloads are perfect and never face performance issues. In the real world, that’s hard to …
Read More