In the context of a database and how it’s organized, the term database schema describes the structure of the data and how the elements within the database relate to each other. This information is often presented visually with diagrams that illustrate how different tables and elements connect and to give you a logical view of the entire database. However, the schema itself exists in the database as the coded rules that define the data’s structure and relationships.
In this context, the database schema is typically something that’s typically planned prior to the creation of a new database. It reflects the design of your database, and it’s always best to design before you start building! An architect or developer considers what is required based on the data model or application, and then makes decisions about the tables, data types, foreign key constraints, etc. that will be included. The resulting design is called the database schema.
Here is an example of a very simple database schema with three tables:
products. Primary key columns are italicized, foreign key relationships are illustrated with lines between table columns, and datatypes for each column are noted.
Note that the schema is not the diagram itself, the schema is the collection of rules and relationships for this database’s data that are depicted in the diagram.
To sum up, a database schema:
In this article, we’ll focus primarily on the above definition of database schema, as architects and developers must think carefully about their schema design prior to creating and implementing a database. But first, to avoid confusion we need to understand a few other ways the term database schema gets used.
In SQL databases, the term database schema can also have another, more specific meaning. The term “schema” is sometimes used to represent a named group of database objects, often are associated with a particular user or group of users. This type of schema is more precisely called user-defined schema, so to avoid confusion, we’ll use that term for the remainder of this article.
A single database may have multiple user-defined schema, and different user-defined schemas can be used to give different users or user groups access to different parts of the database.
For example, let’s say we have a logistics team and we want to give them access to
shipments tables while keeping them from accessing the PII (personally identifiable information) in our
customers table. We could do this by creating a user-defined schema called
logistics that contains only the database objects (
shipments) that we want the logistics team to be able to access.
To review, a user-defined schema is:
The term database schema is also sometimes used to refer to other things, such as a flowchart that provides a visual representation of a database (i.e. an entity-relationship diagram), or the physical structure of a database (i.e. which data is stored on which servers, etc.).
For the purposes of this article, though, we’re going to focus on our first definition of database schema – the big-picture database design that we need to think about before creating any database.
In a SQL database, the schema is essentially a collection of rules that the database will enforce for you. This is beneficial because it allows you to ensure that all of the data in your database is valid and that relationships are enforced without having to add application code to enforce those rules.
For example, imagine our example database with the
products tables. In that database, the
product_ordered columns in the
orders table are linked to
products.product_id, respectively, by foreign key relationships.
This means that our application attempted to enter an order with a
user_id that didn’t exist in our
users table, or a
product_id that didn’t exist in our
products table, the database would reject it. That’s a good thing — we don’t want invalid data getting int our database, or it will cause problems later.
And because those foreign key relationships are part of the schema we defined when we set up the database, we don’t have to write any application code to make that happen. A SQL database will enforce its schema automatically and without exception, so we can be certain that data will only enter the database if it’s following the rules we’ve defined.
Historically, one of the limitations of using a database management system with defined schema was that because schemas enforce structure, it’s difficult to store unstructured data. Another limitation was that changing a database’s schema after the database was in production required downtime. That meant that developers either had to put a lot of thought into the construction of their schema up front or risk paying a heavy price for making changes further down the line.
Traditional SQL databases, in part due to the restrictive nature of their schema and the way they processed transactions, were challenging to scale horizontally. Enforcing rules and routing reads, writes, joins, etc. becomes more complicated in distributed systems where the data is spread across multiple nodes.
“Schemaless” NoSQL database solutions emerged to address some of these problems. But they’re not a good solution for applications with heavy transactional workloads, as many of them struggle with consistency at scale. Additionally, developers setting up a NoSQL database for their application will have to ensure that any necessary constraints on the data are in place at the application level, since the database itself doesn’t have a schema that enforces these constraints.
Thankfully, modern distributed SQL databases make it possible for developers to “have their cake and eat it too.” For example, CockroachDB supports online schema changes, which means developers can quickly spin up a database for their projects and modify the schema over time without having to take their database offline. CockroachDB also supports the storage of unstructured data via the
JSONB data type. And, being distributed by nature, it can be easily scaled horizontally without all of the complex, manual work that was once required to scale out traditional RDBMS.
Depending on the database you’re using, it can be quite important to think carefully about your schema before you create your database. While CockroachDB supports online schema changes, making it easy to adjust and adapt your schema over time with zero production downtime, schema changes can be a bit more painful with legacy databases.
So how can you get it right the first time? Every use case is different, and your specific needs are going to define how your database will be designed.
The first step in designing any database schema is to build a thorough understanding of all of the data you’ll be storing. Creating a “data dictionary” that lists and defines every column of data you intend to store is a way to approach this task that’ll also leave you with a helpful final document you can share to help others understand your data in the future.
Once you’ve got that down, here are some important things to consider for the construction of your database schema:
Figure out exactly what data you’ll want to store, and then begin to separate it into tables and columns. The goal here is to finish with a structure that makes sense, with thematically-organized tables and minimal or no repetition across tables.
Give your tables and columns descriptive names, and avoid redundancies such as calling a table
orders_table – it should simply be called
orders. You also need to avoid using any reserved words in the names for your tables, columns, etc.
Plan the constraints you’ll add to each table, including the primary key and any foreign keys, and the format (single-column or multi-column) and data type you’d like for each constraint.
This is an area where you’ll want to take your chosen database into account, because performance can vary. For example, while using an auto-incrementing column as a primary key can work well for a traditional single-node database, if you’re using a distributed database such as CockroachDB this approach can create a “hotspot” where a single node ends up handling most of the workload.
To spread the workload out and take advantage of the distributed architecture, it’s best to use an auto-generated UUID.
This is just one example. Regardless of the database system you’re using, it’s a good idea to understand the best practices for primary and foreign key constraints before you design your schema.
Consider the appropriate data types for your columns, keeping in mind that columns with a foreign key relationship must share the same data type as the parent column. Here, too, it is wise to see if there are best practices that are specific to your database technology. For example, CockroachDB supports time-related data types including
TIMESTAMPTZ, and recommends using
TIMESTAMPTZ as a best practice. Other systems may support and recommend other formats.
It’s never too early to start thinking about performance! It’s a good idea to create secondary indexes for any columns you expect to use for sorting or filtering data. That said, don’t go crazy here, as having too many indexes can slow write performance and eat up the available node memory.
Database security is a consideration that goes far beyond just schema, but is also relevant to schema design if your database supports selective encryption. If your database supports it, encrypting tables with PII while leaving non-sensitive tables in plaintext will enable you to get the best performance from your database without compromising security.
Geolocating data close to the users who access it is a best practice for performance, and locating data in the same country where a user is located is also a legal requirement in some countries. If performance or regulatory compliance are likely to be important for your application, and if you have a database such as CockroachDB that supports geo-partioning by table and even by row, it may be worth considering whether specific tables should be associated with specific locations as part of your schema design.
Arguably as important as the schema design itself is documenting your schema design so that other people can understand it. At a minimum, you’ll probably want to create an ER diagram that illustrates your database structure visually, and a data dictionary that provides additional information and context for each table and column in the database.
Learn to design database schema the easy way with our new Schema Design course. It’ll walk you through everything you need to know about great schema design and making schema changes, and it’s completely free!
Six months ago, we reported our first implementation of SQL joins in CockroachDB. At that point in …Read more