In simple terms, a relational (SQL) database stores structured data that conforms to a predefined schema. A document-model (or NoSQL) database stores documents that may contain unstructured data without a schema. At least, that’s the way that these two database storage models have traditionally been discussed.
If you want to dive deeper, there have been countless posts on the subject that do a good job explaining some of the differences, but I quite like this debate on Stack Overflow.
This post, in contrast, intends to extend these comparisons to the modern era, where the relational database has been improved and distributed so that — like the NoSQL databases — it too can deliver scale-out access to data anywhere on the planet.
To understand how that’s possible, let’s start with the CAP Theorem.
Although somewhat academic, the CAP Theorem has traditionally been the vehicle used to describe the difference between these two types of database. Distributed SQL databases like CockroachDB, are CP (consistent and partition tolerant) and document stores, like MongoDB, Dynamo, and others, are considered AP (available and partition tolerant) databases. Basically, one guarantees consistency at scale and the other availability. But it is not as simple as this.
A CP database guarantees consistency; however, it delivers a spectrum of availability. An AP database will guarantee availability, but will deliver a spectrum of consistency. This is an important point to consider in regards to whatever workload the database will sit behind.
For CockroachDB, we guarantee consistent transactions and do not relax the database isolation level for writes (we can relax it for reads, however). For availability, there are numerous configurations that will allow you to get darn close to 100% available. So it is “almost” the best of both worlds.
With document stores, on the other hand, you get 100% availability, but only grades of consistency, and never 100% guaranteed. For more on this important distinction check out this post from our founder Ben Darnell, who speaks to why the choice to be CP is much better for transactional, operational workloads.
Many will question the performance of broadly distributed transactions with serializable isolation — and they would be right to ask… it is a difficult problem to solve! At Cockroach Labs, we have been fighting the speed of light for years and have made several advances that allow you to tune this performance.
Ultimately, we deliver near wire speed for transactions at scale. For example, Aurora Payments has a CockroachDB cluster running across three regions with data distributed across all three while seeing sub-50ms guaranteed consistent writes for their critical financial data. It’s absolutely crucial that they be able to deliver an always-on experience while guaranteeing correctness because their customers’ business depends on their payment software.
“In my mind, the only viable solutions for online transaction processing are the relational databases… CockroachDB gives us the same consistency as Postgres, but also gives us the scalability and redundancy that we wanted. The consistency strength of CockroachDB is outstanding, as it guarantees serializable transactions, which is the highest isolation level guaranteed by the SQL standard.” - Derek Maxwell, CTO, Aurora Payments
Because the distributed SQL database can deliver on availability, we should move beyond the old “document vs. relational” debate and apply a new set of comparisons to make better informed decisions on where to use which. We can look at these two types of databases as they scale across a few key requirements:
Consistent transactions, as noted above, protect you against odd issues that can happen to your data when conflicts occur. Obtaining a high level of consistency requires you to implement an isolation level in the database that will regulate this. Many databases allow you to choose an isolation level, but often, developers do not know the tradeoffs. With CockroachDB we eliminate the choice and standardize on serializable isolation and have optimized performance around this constraint. With document stores you typically get eventually consistent transactions which become increasingly difficult at scale.
A lot can go wrong with your data, and relational databases have always had features that do the work of making sure your data is correct. Referential integrity is a first-order concept in a relational database that ensures relationships are never abandoned and that data is correct. String data types are enforced so that what you expect from a query is what you get. In a document store, these concepts are relaxed, but in a relational database they are always maintained, even at broad scale.
Document stores were first created to optimize access to data at scale and they still do this incredibly well. With a relational database, the constraints around correctness of data and transactions can lead to slightly lower read performance by comparison. With CockroachDB this is the case, but we also implement features like follower reads that relax these controls at the table level for the database. This is especially important when data scales across geographies. You can also have your data follow your workload.
Again, document stores were designed as AP databases according to the CAP Theorem; they are incredibly resilient and deliver a top-notch, high-availability experience. Issues remain, though, when you change schemas or need to perform maintenance. But they excel at being resilient. This does not mean CockroachDB is not great at this. On the contrary, there are many ways in which you can bend the CAP Theorem and get incredibly good availability out of the database.
The relational model is elegant and rooted in algebra. The document model is rooted in ease and developer agility. Both approaches are perfectly acceptable, depending on what you want to accomplish. That said, once you have many documents you need to manage, the document model can get quite complex, often requiring you to manually maintain relationships and data correctness. The relational model is easy to modify and will guarantee data is correct as it grows.
Last week I found a video that captures the essence of the decision to use a relational database or a document store. I feel it captures and summarizes the choice between a system of record (relational) and what document stores are good for, which is a system of access. It really comes down to what you want to accomplish with your workload.
A lot has changed over the last few years and software engineering has made incredible advances in terms of how we deal with globally distributed systems. A lot of this is captured in the transactional model of CockroachDB, and today we are bending light and challenging the limits of both physics and the CAP Theorem.
• If you want to do more comparisons of SQL and NoSQL databases you can use this tool to see the strengths and weaknesses of each.
• If you’ve always been more comfortable with NoSQL and ORMs and learning SQL seems daunting then you should check out this free course. It’s focused exclusively on the SQL skills that are useful for application developers.
As organizations transition to the cloud, they eventually find that the legacy relational databases that are behind some …Read More