What are user-defined functions (UDFs) in SQL, and why should you care?

What are user-defined functions (UDFs) in SQL, and why should you care?

When people think about improving application performance, they’re often thinking of cleaning up the code in the application layer. However, the database layer can also be optimized to improve application performance, and so can the ways in which these two layers – application and database – interact with each other.

That is where user-defined functions (UDFs) in SQL come in. If you’re looking to speed up your application, UDFs might be a way for you to make that happen.

What are user-defined functions in SQL?

In the context of a SQL database, a UDF is a function that is written by the user and executed by the database software.

SQL UDFs accept inputs, perform actions using those inputs, and return the output. The exact syntax for doing this can vary based on your specific database software, but SQL UDFs typically have at least four elements:

  • A name, which is used to invoke the function
  • Arguments, the inputs a function takes
  • An output or return statement
  • The function body itself (i.e., the code that the database software will execute using the provided inputs)

Here’s a basic example:

CREATE FUNCTION add(a INT, b INT) RETURNS INT AS 'SELECT a + b';

In this example:

  • CREATE FUNCTION add() creates a function and names it add.
  • a INT, b INT defines the arguments (i.e., the two inputs the function takes, which are integers, in this case).
  • RETURNS INT specifies that the function should return an integer.
  • AS 'SELECT a + b' defines the actual code the function runs (in this case, adding the two inputs together).

We could invoke this function in SQL as follows:

SELECT add(3,5)

This would return the output 8.

(Needless to say, this is a rather complicated way to do something as simple as add two integers, but it’s just an example. The beauty of UDFs is the user-defined part: your SQL UDFs will do whatever you want, based on the code you write in the function body.)

If you’re familiar with the concept of functions in other programming languages, this all probably looks familiar. SQL UDFs work just like you’d expect functions to in any language.

What’s interesting about UDFs in SQL isn’t what they can do, it’s where they can do it. The fact that they’re executed in the database is what makes them useful. Let’s talk about why.

Why use SQL UDFs?

The architecture for almost any application will include an application layer, where all of the business logic of the application lives, and a database layer, where data relevant to the application is permanently stored.

Data moves between these two layers regularly, but that movement is not free. Sending data back and forth between these layers constantly will increase the latency of your applications and, depending on how your application is deployed, it could increase your costs, too.

To demonstrate why, let’s imagine we want to add two values that are already in our database together. Needless to say, this is a simplified example — creating a SQL UDF isn’t the most efficient way to simply add two integers — but the table below should illustrate the efficiency that SQL UDFs can bring to an application:

Using a SQL UDF Using an in-application function SQL UDF add invoked in database using two values already existing in the database for its two arguments. (In this case a = 1 and b = 2.) Function add invoked in application. SQL UDF executes a + b and returns 3. Application queries database to get values a = 1 and b = 2.

Application function executes and returns 3.

As we can see, using an in-application function requires an extra step. It is less efficient.

This is not to say that all functions should be executed in the database! Determining the most efficient approach for any particular process will be case-dependent. In some cases, like the simple one illustrated above, using UDFs allows developers to make their applications more efficient by cutting down on the number of trips data must take between the application and database layers.

How to create UDFs in SQL

The process for creating UDFs, and even the language you’re writing the functions in, varies based on the specific database technology you’re using. A full tutorial for every database technology is beyond the scope of this article, but in general every SQL UDF will include at least the four elements we discussed earlier:

  1. A name, which is used to invoke the function
  2. Arguments, the inputs a function takes
  3. An output or return statement
  4. The function body itself (i.e., the code that the database software will execute using the provided inputs)

Let’s take a look at how our a + b example UDF could be created in a few specific types of databases.

Create a UDF in PostgreSQL

CREATE FUNCTION add(a INT, b INT) RETURNS INT AS 'SELECT a+b' LANGUAGE SQL;

Create a UDF in MySQL

DELIMITER $$
CREATE FUNCTION add
(
a INT, b INT
)
RETURNS INT
DETERMINISTIC
BEGIN
   DECLARE added_together INT;
   SET added_together = a + b;
   RETURN added_together;
END$$
DELIMITER ;

Create a UDF in CockroachDB

CREATE FUNCTION add(a INT, b INT) RETURNS INT LANGUAGE SQL AS 'SELECT a + b';

You may notice that the CockroachDB UDF is identical to the Postgres one. That makes sense, since CockroachDB is Postgres wire compatible, and the syntax is very similar.

There is a difference, though: CockroachDB is a distributed database, and is building UDF functionality that will be just as distributed as the rest of your data. This makes CockroachDB a good choice for developers who like the efficiency offered by UDFs but who don’t want to sacrifice the high availability and scalability advantages inherent in a distributed database.

Learn more about UDFs and distributed functions in CockroachDB in our documentation.

SQL UDFs in the real world

Up to this point, we’ve been using a very simple a+b function to demonstrate how SQL UDFs work. In practice, however, UDFs are capable of much, much more.

For example, here’s a real-world UDF written by CockroachDB user Saqib Ali. It takes timestamp data and converts it into a more human-friendly, x hours ago format. (This function is available on Github here if you’d like to contribute to it).

CREATE OR REPLACE FUNCTION humanize_time_span(ts timestamp) RETURNS varchar AS
'
SELECT
 CASE
   WHEN (now() - ts::timestamp)::INT < 60 THEN '' Just Posted''
   WHEN ((now() - ts::timestamp)::INT) / 60 < 60 THEN ((now() - ts::TIMESTAMPTZ)::INT / 60)::INT || '' Minutes ago''
   WHEN (now() - ts::TIMESTAMPTZ)::INT / (60 * 60) < 24 THEN ((now() - ts::TIMESTAMPTZ)::INT / (60 * 60))::INT || '' Hours ago''
   ELSE ((now() - ts::TIMESTAMPTZ)::INT / (60 * 60 * 24))::INT || '' Days ago''
 END
'
LANGUAGE SQL;

UDF side effects

Working with UDFs can also get complicated when you’re working with actual data from your database. If you’re not careful, the functions you write can have unintended side effects and/or mutate the data in your tables in ways you did not intend.

For that reason, some databases have additional language you can add to a function that provides information about the function’s volatility — the extent to which it can impact your other data — and determine how the database handles it.

In CockroachDB, for example, we might rewrite our a + b example function like so:

CREATE FUNCTION add(a INT, b INT) RETURNS INT IMMUTABLE LEAKPROOF LANGUAGE SQL AS 'SELECT a + b';

Note that we have added IMMUTABLE LEAKPROOF to the statement. IMMUTABLE means that the function cannot mutate data, and LEAKPROOF means the function has no side effects (nothing depends on its arguments besides the return value). These tags are important for telling CockroachDB’s cost-based optimizer how it can expect a function to behave.

Learn more about how to use UDFs in CockroachDB.

About the author

Charlie Custer

Charlie is a former teacher, tech journalist, and filmmaker who’s now combined those three professions into writing and making videos about databases and application development (and occasionally messing with NLP and Python to create weird things in his spare time).

github link linkedin link

Keep Reading

Distributed UDFs: How we're building database functions from the ground up

Efficiency matters. When you’re working with large amounts of data, it matters a lot.

Every trip between your …

Read More
Upsert in SQL: What is an upsert, and when should you use one?

Upserts are useful for anyone who works with a database to know, but the term “upsert” might not even appear in your …

Read More
Query performance optimizations and troubleshooting with Intelligent Insights

In an ideal world, all workloads are perfect and never face performance issues. In the real world, that’s hard to …

Read More