Foreign keys put the “relational” in “relational database” – they help define the relationship between tables. They allow developers to maintain referential integrity across their database. Foreign keys also help end-users by preventing errors and improving the performance of any operation that’s pulling data from tables linked by indexed foreign keys.
On a technical level, a foreign key is a constraint that links a column in one table (table_1.column_a
) to a column in a different table (table_2.column_b
) and ensures that a value can be added to column_a
only if the same value already exists in column_b
.
For example, a table of customer orders might have a user
column with a foreign key attribute that links it to the user_id
column in a users table. This way, each row in the orders table can be associated with a specific user from the users table — and no orders can enter the system without a valid user being associated with them.
How do foreign keys actually work in practice? Let’s get practical, and learn more about foreign keys by looking at how they function in the context of a simple SQL database.
In this article, we’ll cover:
Let’s dive in!
Below, we’ve set up a sample database we’ll work with. It represents the sales database of a fictional online bookshop. We can see there are three tables:
users
contains data about users registered on the siteorders
contains data about specific orders placed through the sitebooks
contains data about the books that are available for sale
Needless to say, the database of a real-life bookshop would be far larger and more complex than this! But this sample database will make it easier to illustrate how foreign keys work, and the principles that apply here will apply in exactly the same way in larger and more complex databases.
(If you’d like to try working with this database hands-on, skip to the end of the article for instructions that’ll help you get it set up on a free CockroachDB dedicated cluster in less than five minutes).
To understand how foreign keys work, it will help to first understand primary keys.
A primary key is a column in a table that is used as a unique identifier for each row. It functions a bit like the row’s address, and is used for the table’s primary index. (Technically, a primary key can be made up of multiple columns, but for our purposes here let’s think of it as a single column).
You can think of the primary key as being like a row’s ID number. Just as your government ID identifies you uniquely even if other people share your name or live at your address, a table’s primary key column identifies each row uniquely even if some of the other values in the row are shared with other rows.
A table’s primary key column thus must be unique, and it cannot be empty or NULL
.
Consider the orders
table in our sample database. In this table, the primary key is order_no
, a unique number that identifies each order individually.
In fact, in this table order_no
is the only column that could be used as the primary key. user_id
can contain duplicate data, since the same customer can place more than one order, and product_sku
can also contain duplicate data, since two different customers might order the same product.
Here’s how our database looks with the primary key columns for each table highlighted:
Foreign keys link data in one table to the data in another table. A foreign key column in a table points to a column with unique values in another table (often the primary key column) to create a way of cross-referencing the two tables. If a column is assigned a foreign key, each row of that column must contain a value that exists in the ‘foreign’ column it references. The referenced (i.e. “foreign”) column must contain only unique values – often it is the primary key of its table.
For a tangible example, let’s look at the orders
table in our database again. The user_id
column here corresponds with the user_id
column in the users
table, and the product_sku
column corresponds with the product_sku
column in the books
table.
When we’re setting up this table, it would make sense to add foreign key rules to both orders.user_id
and orders.product_sku
:
orders.user_id
should reference users.user_id
orders.product_sku
should reference books.product_sku
Using these foreign keys saves us from having to store the same data repeatedly – we don’t have to store the user’s name in the orders
table, because we can use orders.user_id
to reference that user’s unique row in users.user_id
to get their name and other information about them.
But the real purpose of foreign keys is that they add a restriction: entries to the table with a foreign key must have a value that corresponds with the ‘foreign’ table column.
This restriction is called a foreign key constraint. Let’s take a look at foreign key constraints in more detail.
Foreign key constraints are the rules created when we add foreign keys to a table. Foreign key constraints in table A
link to a column with unique values in table B
and say that a value in A
’s column is only valid if it also exists in B
’s column.
(Note: foreign keys can be composite keys, so the foreign key for one column could be two or more columns in another table. In this article, for the sake of simplicity we’ll focus on linking a single column in one table to a single column in another.)
For example, imagine we’ve set up our orders
table with the foreign keys we laid out earlier: orders.user_id
references users.user_id
and orders.product_sku
references books.product_sku
. These rules mean that:
orders.user_id
must already exist in users.user_id
.In other words, orders must be placed by a registered user – the orders
table won’t accept a new row or a row update if the value in orders.user_id
doesn’t already exist in users.user_id
.
orders.product_sku
must already exist in books.product_sku
.In other words, users can only order products that exist in the database – the orders
table won’t accept a new row or a row update if the value in orders.product_sku
doesn’t already exist in books.product_sku
.
Here’s an illustration of how a foreign key constraint works visually:
Although this is a simplified example, we can see how foreign key constraints help establish clear relationships between tables across a database, and promote consistency by making it impossible to (for example) add a row in an orders
table with a user who doesn’t exist in the users
table.
Note that foreign keys are not mandatory, and a table may have no foreign keys. Conversely, every column in a table may have a foreign key constraint. Where you use foreign keys depends on the specifics of the data you’re storing in your database, how different data points relate to each other, and how you’d like your data to be validated as rows are added, updated, or removed.
Now that we understand what primary and foreign keys are and how they work, let’s take a quick look at how we can assign these values when we’re creating a table in our database.
Note: we’ll be using CockroachDB SQL syntax. Different flavors of SQL may approach these tasks slightly differently, but we’ll stick with Cockroach since it offers a free cloud database that’s excellent for any project!
To create the orders
table from our database with foreign keys:
CREATE TABLE orders (
order _no INT PRIMARY KEY,
user_id INT REFERENCES users(user_id),
product_sku INT REFERENCES books(product_sku),
);
In the code above, we’re setting the order_no
column as the primary key, and then setting rules in the form of foreign key constraints for the other two columns:
user_id
has users.user_id
as its foreign key (i.e. any values in orders.user_id
must already exist in users.user_id
).product_sku
has books.product_sku
as its foreign key (i.e. any values in orders.product_sku
must already exist in books.product_sku
).You can check to see what foreign keys and other constraints already exist in a CockroachDB database like this:
SHOW CONSTRAINTS FROM orders;
Foreign keys give us the power to define relationships between two or more tables. This is great, but it does mean that we need to think carefully about what happens when a value that’s linked across tables is changed or deleted.
For example, let’s say that Mohamed, one of our bookshop’s customers, has requested we delete his account and all data associated with it. We could run the following query to remove him from the users
table:
DELETE FROM users WHERE user_id = 11;
However, the way we set up our tables currently, that will only remove the relevant row in users
. Two orders associated with this user exist in the orders
table too, and those won’t be removed automatically; we’d have to remember to do that manually.
Thankfully, there’s a much easier way to handle this! When we’re adding foreign keys, we can also set rules for how our database should behave if a value that’s linked across tables is changed.
For example, with a CockroachDB database, we could create a table like this:
CREATE TABLE orders (
order _no INT PRIMARY KEY,
user_id INT NOT NULL REFERENCES users(user_id) ON DELETE CASCADE ON UPDATE CASCADE,
product_sku INT NOT NULL REFERENCES books(product_sku)ON DELETE CASCADE ON UPDATE CASCADE,
);
In the code above, ON DELETE CASCADE
and ON UPDATE CASCADE
specify that when a row is deleted or a value is updated (respectively) in one table, the same operation should be performed on the linked value or row in other tables.
So, if we built our table like that and then ran the query DELETE FROM users WHERE user_id = 11;
, the data associated with Mohamed (user 11
) would be deleted from the user table and orders associated with his account would also be deleted from the orders
table.
Similarly, we could run this query…
UPDATE users
SET user_id = 100
WHERE user_id = 11;
…and Mohamed’s user_id value in users
would be updated from 11
to 100
, and the user_id
value associated with his orders in the orders
table would be updated, too.
Depending on the circumstances, we might prefer that our database do something different. For example, we could also use ON DELETE SET NULL
to set the all columns of a referencing row to NULL
if the row in the table it’s referencing is deleted. We can also specify that we want the database to take NO ACTION
, although this isn’t strictly necessary, as this is the default rule the database will follow if we don’t specify another action for it to take.
The full range of actions we can use with foreign keys is detailed in the CockroachDB docs.
Refresh yourself with our imaginary database, and the foreign key constraints we added to the orders
table earlier in this article (orders.user_id
references users.user_id
and orders.product_sku
references books.product_sku
):
Now, see if you can answer the following questions:
What would be the result of running the following SQL command?
INSERT INTO orders (order_no, user_id, product_sku)
VALUES (97, 14, 456);
Answer: It would result in an error and the row would not be inserted, because the user_id
value 14
does not exist in users.user_id
.
Does the books
table have any foreign key constraints?
Answer: No, and in this case it doesn’t need any. Although the orders
table references books.product_sku
, none of the columns in the books
table correspond with values in other tables in a way that would make adding a foreign key constraint beneficial.
What would be the result of running the following SQL command?
INSERT INTO orders (order_no, user_id, product_sku)
VALUES (412, 10, 101);
Answer: It would result in a new row with those values being added to the orders
table, because it meets all of the constraints that the primary and foreign keys impose on orders
’s columns:
412
is a unique value that doesn’t already exist in order_no
, and is thus valid (the primary key constraint)10
is a user ID that corresponds with an existing user in users.user_id
, and is thus valid (a foreign key constraint)101
is a product SKU that corresponds with an existing book in books.product_sku
, and is thus valid (a foreign key constraint)Want to build a little real-world experience with foreign keys and try working with this database for yourself in the cloud? Don’t worry, it’ll only take a few minutes!
Step 1: sign up for a free Cockroach Cloud account, and follow the instructions there to create a cluster.
Step 2: Follow the instructions here to install CockroachDB locally. If you’ve already installed it, you can skip this step.
Step 3: Download this .sql file, which contains the SQL queries needed to create the database pictured above. Run the following command in your terminal and the data and tables will be automatically added to your cluster’s defaultdb
database.
(Note that you need to replace some of the details in the command below with details specific to your Cockroach Cloud account, and be sure you’ve specified the correct directories for your foreign_keys.sql
and cc-ca.crt
.)
cat foreign_keys.sql | cockroach sql --url 'postgres://username:password@free-tier.gcp-us-central1.cockroachlabs.cloud:26257/defaultdb?sslmode=verify-full&sslrootcert=cc-ca.crt&options=--cluster=cluster-name'
Step 4: Run the following command in your terminal to start up the CockroachDB SQL shell and connect to your CockroachDB dedicated cluster:
cockroach sql --url 'postgres://username:password@free-tier.gcp-us-central1.cockroachlabs.cloud:26257/defaultdb?sslmode=verify-full&sslrootcert=cc-ca.crt&options=--cluster=cluster-name'
You’re in! Feel free to poke around. The CockroachDB docs on foreign keys will be a useful reference. Some questions to get you started:
users
table?orders
if a user’s account was deleted? (Hint: SHOW CONSTRAINTS FROM orders;
)Foreign keys are an important element of any relational database. But when you’re setting up your database schema, it’s …
Read MoreWe are pleased to announce that our second paper, Enabling the Next Generation of Multi-Region Applications with …
Read More*Guest post alert! Mike Roberts has been an engineer as well as a CTO. He is the co-author of this O’Reilly Book …
Read More