3 ways to master stateful apps in Kubernetes

Kubernetes adoption has massively accelerated, leading the way to a new, cloud-native approach to building and delivering the software that businesses need to make users happy and employees successful. Slow and heavy lifting has been replaced with interchangeable, self-contained software objects that can be configured by a simple configuration and scaled through automated replication. If an object fails, it is replaced. To deliver new software, objects are replaced while still in motion.

But not all applications are built the same. While Kubernetes container orchestration is a natural fit for stateless apps, stateful app management—and its inherent dependencies—presents a special challenge to the orchestration paradigm. This post looks at the solution paths available for organizations using Kubernetes to orchestrate stateful apps, and walks through some of the factors that might help a dev team choose between the three.

The challenge of state in Kubernetes

The magic of orchestration—being able to quickly swap and update containers on the fly—is a big obstacle to stateful applications and databases. The whole promise of containerization, being able to move and manage quickly and freely, breaks down when an application database is chained to local storage.

Why? Because Kubernetes deployments win through replication. This is how DevOps can build, deploy, scale, and fall back with less effort and more confidence. But replication doesn’t work for stateful apps:

Database replicas aren’t interchangeable like other containers, they have unique states
Databases also require tight coordination with other nodes running the same application to ensure version and other and require careful coordination

Solving for stateful: three paths to success

The road to stateful Kubernetes has three big intersections, with some other minor navigational options. We’ll review each of the three paths:

Running outside Kubernetes,
Using cloud services, or
Running in native Kubernetes

Maximum Choice, Maximum Effort: Run your database outside of Kubernetes

The most straight-forward approach is to simply spin up a new VM and run the database outside of the Kubernetes environment. The high cost of comfort here, though, unfortunately, is the additional operations workload you’re incurring. Because even though Kubernetes has a high-quality, automated version of each of the following, you’ll wind up duplicating effort across:

process monitoring
configuration management
in-datacenter load balancing
service discovery
monitoring and logging

The result is maximum choice, but also a full stack of management tools that you’ll have to run outside of Kubernetes.

Less control, less effort: running your database via cloud services.

You can also leverage cloud services to run your database outside of Kubernetes. This would eliminate the need to manage spinning up, scaling, and managing the database, and eliminates that redundant infrastructure stack through external services.

The downside is you’re stuck with the DBaaS as offered by your cloud services provider, which makes even less sense for those running things in house or on prem. And since you don’t have direct access to the infrastructure running the database, fine-tuning performance and managing compliance can be an issue.

Native control, minimum effort: running your database inside Kubernetes

Kubernetes does have two integrated, native controllers for running the database inside the container, just as deployment works with stateless apps. These maximize integration and automation, retain more workload controls, and eliminate the time, cost, and complexity of maintaining a separate stack of “around the database services” as listed earlier.

The StatefulSet controller

The first control for stateful apps is the StatefulSet controller. Like a Deployment, a StatefulSet manages pods that are based on an identical container spec but not interchangeable. By assigning each pod a persistent and unique ID (by way of an easy to build Headless Service) both application and database maintain connection regardless of which node they’re assigned to.

It also means that as an application is scaled up and down, connections are maintained, and persistence achieved. This makes them ideal for applications that need stable, persistent storage and ordered, automated scaling and updates. This includes distributed controllers like ZooKeeper as well as workloads such as MySQL clusters, Redis, Kafka, MongoDB, and others.

To learn more about how StatefulSet supports local storage by way of LocalPersistentVolume, read here.

The DaemonSet controller

The second native stateful control is the DaemonSet controller.

Where StatefulSets used unique IDs to keep application and database connected across nodes, DaemonSets ensure that all (or some) nodes run a copy of a pod. As a node is added, so is the required database pod. As the node is removed, the pod is removed via the garbage collector.

As you might guess from the name, the DaemonSet controller is especially useful when running background processes (or daemons), especially around performance monitoring or log collection and/or analysis.

By restricting nodes to database support, DaemonSets eliminate the potential performance issues of StatefulSets caused by resource contention and competition.

Choosing the Right Controller: DaemonSets vs StatefulSets

As we already mentioned, the nature of the workload, and adherence to other Kubernetes best practices, must drive the choice between stateful Kubernetes controllers. Transactional database applications like PostgreSQL are ideal for the more nimble StatefulSet controller, while scheduled background processes are typically a better fit for DaemonSets.

A guide to Managing State in Kubernetes

If you want to take all your stateful momentum into a deployment of a stateful application you can use this step-by-step guide that demonstrates a couple of different ways to manage state in Kubernetes.

CockroachDB’s architecture mirrors Kubernetes architecture which makes CockroachDB an excellent fit for the third path mentioned above, “Natrive control, minimum effort: running your database in Kubernetes”.