*Note: this post originally ran in 2020, at the very beginning of our managed service/multi-tenant engineering journey. If you want an update on what kind of deployment models CockroachDB offers and what our current capabilities are check out our latest release blog.
There’s this really fun game I like to play. You get a bunch of SREs in a room and you see how quickly you can rile them up. Here are some things to say next time you’re in a room of SREs:
If you say one of these things, you’ll have all the other SREs in the room screaming and sweating and breaking their laptops over their knees in no time. For extra fun, pair with a friend and each take different sides!
Another topic that works really well is Kubernetes:
See this thread on Hacker News for more fun.
In late 2018 Cockroach Labs began building a managed service offering called CockroachDB Dedicated. So for many months the SRE team evaluated different automation and orchestration technologies on top of which to build CockroachDB Dedicated. Eventually, we turned our attention Kubernetes. We didn’t break any laptops over our knees but it was quite the journey nonetheless. Each SRE started with one perspective, loaded up their cocoon with Kubernetes docs and tech talks and bug reports, spent a month in the darkness of the cocoon thinking deep thoughts, and finally emerged as a beautiful Kubernetes butterfly with that strange Kubernetes boat logo tattooed on their wings.
In the end, we chose Kubernetes as our automation technology. This blog post describes the journey to Kubernetes in detail. Or you can watch this video to see me explain our decision in person:
I started my career as an SRE at Google. At Google, we used Borg, a container orchestration system that is internal to Google and was the predecessor to Kubernetes. Users of Borg declare that they want to run such and such binary with such and such compute requirements (10 instances, 2 vCPU and 4 GB of RAM per instance), and Borg decides what physical machines the applications should run on and takes care of all low-level details such as creating containers, keeping the binary running, etc.
I was used to this model when I joined Cockroach Labs. Since Kubernetes was widely used in industry, why wouldn’t we use it at Cockroach Labs? An experienced teammate pointed out that even if Borg and Kubernetes are similar, the fact that Borg is a good fit for Google doesn’t imply that Kubernetes is a good fit for Cockroach Labs. The key point my teammate was making is that Google and Cockroach Labs are not the same!
I began to think more about the differences between Google and Cockroach Labs. Here are some things that are true of Google:
Here are some things that are true of Cockroach Labs:
Not the same at all!
In the Google case, a major benefit of container orchestration is that it decouples machine planning from the concerns of the software engineering teams. Google can plan compute in aggregate; then software engineering teams can use Borg to schedule their heterogeneous workloads onto machines. Borg will binpack the workloads so as to use compute efficiently; multiple workloads will run on one machine in general. If software engineers had to work at the level of individual machine, Google would not use compute nearly as efficiently. This is a very big problem when you are at Google’s scale.
On the other hand, does binpacking do Cockroach Labs any good at all? The public clouds allow for elastic compute at different shapes; why not just ask the public clouds for VMs, perhaps via a tool like Terraform? Note that at the time we were researching Kubernetes Cockroach Labs ran a separate instance of CockroachDB per customer, in dedicated GCP projects or AWS accounts, since CockroachDB didn’t yet support multi-tenancy (now it does!). The database required at least 2 vcpu of compute per node to run well, and cloud providers provide machines of this shape on demand. In that world, binpacking does not reduce costs in any meaningful way.
Google has written an excellent whitepaper on the history of Borg, Omega, and Kubernetes. This part is very relevant to this discussion:
“Over time it became clear that the benefits of containerization go beyond merely enabling higher levels of utilization. Containerization transforms the data center from being machine-oriented to being application-oriented.”
Having used (and loved) Borg, I felt this to be true deep in my bones. Operators describe an application’s requirements declaratively via a domain specific logic called GCL (incidentally, the language is a fever dream of surprising semantics), and Borg handles all the low-level details of requesting compute, loading binaries onto machines, and actually running the workload. If you want to update to a new version of a binary, an operator simply writes a new GCL file to Borg with the new requirements. In summary, Borg provides a wide variety of powerful automation primitives, which make software engineering and SRE teams efficient; they are able to focus on improving their services rather than doing repetitive and uninteresting service operation tasks. Wouldn’t Cockroach Labs benefit from access to solid out-of-the-box automation primitives also?
The goal of CockroachDB Dedicated is to provide CockroachDB to customers without them having to run the database themselves. Distributed databases are complex; not all companies want to develop the expertise in house needed to run them reliably.
CockroachDB Dedicated allows customers to request the following operations to be run on their cluster via a convenient web UI:
Let’s consider #4 in detail. Without Kubernetes, you could either build the automation yourself or shop around for some other automation tool.
Why is building it yourself a bad idea? Let’s consider the requirements. There is no more efficient way to break a piece of software than updating to a new version. This argues that the update must proceed gradually, updating one CockroachDB node at a time. If the database starts to fail (e.g. if health checks fail), the update must halt without operator involvement. It should also be possible to rollback quickly if there are unforeseen problems. At the same time, if the update often halts unnecessarily due to the flakiness of the automation tech, then oncall load increases. The above requirements are tricky to get right, and we have not even considering all the low level details involved in building this yourself, such as SSHing into machines to stage binaries and sending HTTP requests to do health checks.
Here’s how to do #4 with Kubernetes:
$ kubectl rolling-update crdb –image=CockroachDB:19.1.1
That’s it! Note that the speed of the rolling update is configurable, the automation is robust to transient failures, and it supports configurable health checks.
Similar arguments apply to #1, #2, and #3. In fact, the case for Kubernetes is even stronger in these cases, since safely changing the footprint of a stateful application that is serving customer traffic is not something automation technologies other than Kubernetes do particularly well.
Previously I asked the following question:
“Wouldn’t Cockroach Labs benefit from access to solid out-of-the-box automation primitives also?”
As we built prototypes, we felt the answer was definitely yes. On the other hand, the fact that access to automation primitives is a benefit of using Kubernetes doesn’t imply there aren’t also downsides. Remember the screaming masses on Hacker News please!
One of the main complaints about Kubernetes is that it is complicated. Let’s add some color to the claim that Kubernetes is complicated. In what senses is it complicated?
1 and 2 are control plane only. That is, if etcd or the controllers break, we cannot add a node to a CockroachDB cluster running on Kubernetes, but the cluster will keep serving SQL queries all the same. Also, there are managed Kubernetes offering such as GKE that reduce how much Cockroach Labs SRE needs to know about etcd for example.
3 and 4 are data plane. These are the scary ones. If the networking stack breaks, the cluster will NOT keep serving SQL queries. SRE feels we must understand the networking stack deeply, if we are to depend on Kubernetes in production. This is no easy task, especially when already holding onto lots of cognitive load about CockroachDB itself.
Let’s summarize the situation. Google has used container orchestration for a long time and is very successful with the technology. There are two main benefits:
#1 doesn’t benefit Cockroach Labs at all. #2 benefits Cockroach Labs a lot. On the other hand, Kubernetes is complicated, increasing cognitive load. Specifically, the networking stack is the main piece of complexity the SRE team worries about. What to do?
We felt, at the time, that the benefits of using Kubernetes outweighed the complexity costs. By using Kubernetes, we hoped to create a highly orchestrated managed service, which would free us up to do interesting work on the reliability and efficiency of CockroachDB itself. There are also some benefits to Kubernetes that we haven’t covered in detail here, such as the fact that it provides a cross-cloud abstraction (CockroachDB Dedicated is already available for GCP and AWS), and the fact that many of our existing enterprise customers (not Managed CockroachDB customers) use it to deploy CockroachDB to their private datacenters.
Why is Kubernetes networking so complicated? One reason is the requirement that each pod has its own IP address, which is needed for efficient binpacking (without requiring application level changes from Kubernetes users). Funnily enough, this requirement didn’t do CockroachDB Dedicated any good at the time we adopted it, because we always ran a single CockroachDB instance on each VM. This makes me wonder how many users choose Kubernetes for binpacking vs. for automation primitives vs. for both. In this brave new world of cheap public cloud elastic compute and managed Kubernetes offerings, do many users choose Kubernetes only for the automation primitives? If so, could the networking stack be made even simpler for those users?
This blog post reflects my work as an intern this summer with the SRE team at Cockroach Labs. Earlier this …Read More
I recently gave a talk at KubeCon North America -- “Experience Report: Running a Distributed System …Read More