When we started development on CockroachCloud, we weren’t sure if Kubernetes would be the right choice for the underlying orchestration system.
We wanted to harness Kubernetes’s powerful orchestration capabilities, but building a system to run geo-distributed Cockroach clusters on Kubernetes presented unique challenges:
- The clusters must run across multiple regions, which complicates networking and service discovery.
- The clusters must store data, which requires the use of stateful sets and persistent volumes -- something that is notoriously tricky with Kubernetes.
- The system must programmatically create Kubernetes clusters on AWS and GKE, so it must navigate unique APIs for node pools and firewalls.
In this candid technical conversation, Josh Imhoff, the Technical Lead of the Site Reliability team, will share his team’s experience of overcoming these challenges to build CockroachCloud.