Over the past year, Kubernetes––also known as K8s––has become a dominant topic of conversation in the infrastructure world. Given its pedigree of literally working at Google-scale, it makes sense that people want to bring that kind of power to their DevOps stories; container orchestration turns many tedious and complex tasks into something as simple as a declarative config file.
The rise of orchestration is predicated on a few things, though. First, organizations have moved toward breaking up monolithic applications into microservices. However, the resulting environments have hundreds (or thousands) of these services that need to be managed. Second, infrastructure has become cheap and disposable––if a machine fails, it’s dramatically cheaper to replace it than triage the problems.
So, to solve the first issue, orchestration relies on the boon of the second; it manages services by simply letting new machines, running the exact same containers, take the place of failed ones, which keeps a service running without any manual interference.
However, the software most amenable to being orchestrated are ones that can easily spin up new interchangeable instances without requiring coordination across zones.
The above description of an orchestration-native service should sound like the opposite of a database, though.
In short: managing state in Kubernetes is difficult because the system’s dynamism is too chaotic for most databases to handle––especially SQL databases that offer strong consistency.
So, what’s a team to do? Well, you have a lot of options.
Instead of running your entire stack inside K8s, one approach is to continue to run the database outside Kubernetes. The main challenge with this, though, is that you must continue running an entire stack of infrastructure management tools for a single service. This means that even though Kubernetes has a high-quality, automated version of each of the following, you'll wind up duplicating effort:
That’s 5 technologies you’re on the hook for maintaining, each of which is duplicative of a service already integrated into Kubernetes.
Rather than deal with the database at all, you can farm out the work to a database-as-a-service (DBaaS) provider. However, this still means that you’re running a single service outside of Kubernetes. While this is less of a burden, it is still an additional layer of complexity that could be instead rolled into your teams’ existing infrastructure.
For teams that are hosting Kubernetes themselves, it’s also strange to choose a DBaaS provider. These teams have put themselves in a situation where they could easily avoid vendor lock-in and maintain complete control of their stack.
DBaaS offerings also have their own shortcomings, though. The databases that underpin them are either built on dated technology that doesn’t scale horizontally, or require forgoing consistency entirely by relying on a NoSQL database.
Kubernetes does have two integrated solutions that make it possible to run your database in Kubernetes:
By far the most common way to run a database, StatefulSets is a feature fully supported as of the Kubernetes 1.9 release. Using it, each of your pods is guaranteed the same network identity and disk across restarts, even if it's rescheduled to a different physical machine.
DaemonSets let you specify that a group of nodes should always run a specific pod. In this way, you can set aside a set of machines and then run your database on them––and only your database, if you choose. This still leverages many of Kubernetes’ benefits like declarative infrastructure, but it forgoes the flexibility of a feature like StatefulSets that can dynamically schedule pods.
StatefulSets were designed specifically to solve the problem of running stateful, replicated services inside Kubernetes. As we discussed at the beginning of this post, databases have more requirements than stateless services, and StatefulSets go a long way to providing that.
The primary feature that enables StatefulSets to run a replicated database within Kubernetes is providing each pod a unique ID that persists, even as the pod is rescheduled to other machines. The persistence of this ID then lets you attach a particular volume to the pod, retaining its state even as Kubernetes shifts it around your datacenter.
However, because you’ll be detaching and attaching the same disk to multiple machines, you need to use a remote persistent disk, something like EBS in AWS parlance. These disks are located––as you might guess––remotely from any of the machines and are typically large block devices used for persistent storage. One of the benefits of using these disks is that the provider handles some degree of replication for you, making them more immune to typical disk failures, though this benefits databases without built-in replication.
Because Kubernetes itself runs on the machines that are running your databases, it will consume some resources and will slightly impact performance. In our testing, we found an approximately 5% dip in throughput on a simple key-value workload.
Because StatefulSets still let your database pods to be rescheduled onto other nodes, it’s possible that the stateful service will still have to contend with others for the machine’s physical resources. However, you can take steps to alleviate this issue by managing the resources that the database container requests.
DaemonSets let you specify that all nodes that match a specific criteria run a particular pod. This means you can designate a specific set of nodes to run your database, and Kubernetes ensures that the service stays available on these nodes without being subject to rescheduling––and optionally without running anything else on those nodes, which is perfect for stateful services.
DaemonSets can also use a machine’s local disk more reliably because you don’t have to be concerned with your database pods getting rescheduled and losing their disks. However, local disks are unlikely to have any kind of replication or redundancy and are therefore more susceptible to failure, although this is less of a concern for services like CockroachDB which already replicate data across machines.
While some K8s processes still run on these machines, DaemonSets can limit the amount of contention between your database and other applications by simply cordoning off entire Kubernetes nodes.
Kubernetes StatefulSets behave like all other Kubernetes pods, which means they can be rescheduled as needed. Because other types of pods can also be rescheduled onto the same machines, you’ll also need to set appropriate limits to ensure your database pods always have adequate resources allocated to them.
StatefulSets’ reliance on remote network devices also means there is a potential performance implication, though in our testing, this hasn’t been the case.
DaemonSets on the other hand, are dramatically different. They represent a more natural abstraction for cordoning your database off onto dedicated nodes and let you easily use local disks––for StatefulSets, local disk support is still in beta.
The biggest tradeoff for DaemonSets is that you're limiting Kubernetes' ability to help your cluster recover from failures. For example, if you were running CockroachDB and a node were to fail, it can't create new pods to replace pods on nodes that fail because it's already running a CockroachDB pod on all the matching nodes. This matches the behavior of running CockroachDB directly on a set of physical machines that are only manually replaced by human operators.
In our next blog post, we continue talking about stateful applications on Kubernetes, with details about how you can can (and should) orchestrate CockroachDB in Kubernetes leveraging StatefulSets.
If you're eager to get something started, though, you should check out our Kubernetes tutorial.
And if building and automating distributed systems puts a spring in your step, we're hiring! Check out our open positions here.
Illustration by Zoë van Dijk
CockroachDB makes data easier to manage by providing a strongly-consistent, highly-scalable, SQL interface that …Read more
[For CockroachDB's most up-to-date performance benchmarks, please read our Performance Overview page] …Read more