Distributed SQL: A cloud-native, scalable Postgres alternative

Last edited on July 23, 2020

0 minute read

    Gartner predicts that by 2022, 75% of all databases will be deployed in the cloud. The findings shouldn’t come as a surprise: on-premise data centers are slowly being phased out as companies realize the many benefits to running a database in the cloud.

    It’s a paradigm shift our customers at Cockroach Labs are experiencing first-hand, and, without Distributed SQL, it’s a hard one to meet. In a recent webinar, Our Vice President of Product Marketing Jim Walker sat down with Chris Casano, one of Cockroach Lab’s Solutions Engineers, to discuss the ins and outs of Distributed SQL, and how it delivers capabilities that traditional databases simply can’t.

    Legacy Databases Fail to Meet the Promises of the CloudCopy Icon

    The cloud has had a fundamental impact on the way that we think about systems and the way that we design applications. There is a lot of value in the approach, but it’s introduced challenges as well. Minimizing costs by breaking down applications into microservices works fine at first, until you hit a certain scale. Transactional inconsistency and speed then become issues as dozens of microservices request information from--and write to--the database. Organizations without distributed capabilities often wind up paying a high cost for this down the road, when they’re inevitably forced to manually shard Postgres instances.

    When faced with this compromise, some companies turned to NoSQL stores to try to meet these requirements. These alternatives can typically meet the scale requirements, but they fall short as a transactional database because they were not designed to guarantee transactional consistency. Recently, some NoSQL solutions have offered transaction guarantees that are full of caveats and fail to deliver the isolation levels necessary for mission-critical workloads like financial ledgers, inventory controls, and identity management.

    A conversation about Distributed SQLCopy Icon

    This webinar addressed some of these concerns with some hands-on keyboard demos and a healthy conversation between our presenters, Jim and Chris. They spoke about and demonstrated how CockroachDB can survive multiple node failures, and scaling seamlessly to meet changing workloads. The replay of the webinar is here and a transcription of the question and answer session follows.

    [Watch the full Distributed SQL webinar]

    Q: Do you provision Cockroach on a virtual machine or a physical host, do you use a stateful container? How do you deploy a node Cockroach?

    A: CockroachDB needs no specialized hardware. It's hardware agnostic. You can run it on a VM, you can run it in Kubernetes, you can run it on bare metal.

    Q: If I have four nodes, how do you typically deal with four instances of an application or maybe five different applications all hitting the same database? How do they choose which node is it to just do load balancing and that sort of thing?

    A: You could just use a single network load balancer and that will distribute the traffic round robin across all the different nodes. CockroachDB has heuristics built into the database. So as nodes have more requests that are happening for the same piece of data, it will eventually move those ranges into those nodes. That way you don't have to incur all that network traffic.

    Q: Is this going to run like Kubernetes?

    A: Yes. Some of the core design principles that were used for Kubernetes in that designed to be shared nothing, so that each atomic unit is the same across all your instances. But what's really important is that each instant, CockroachDB doesn’t have different types of nodes. Every node is the same and contains the security constructs and how you integrate with everything else.

    Q: Does Cockroach labs have an operator?

    A: CockroachDB doesn’t need an operator to deploy or do rolling upgrades. But CockroachDB does have an operator for Kubernetes.

    Q: How do you optimize or how do you choose a primary key for naming your shards?

    A: It comes back to how you want to survive things and what kind of latency you want to have on that data. Those two questions which are critical in the design of the database itself that you've got to ask upfront. But if you wanted to change it, you can, and you don't have to have downtime. So if you have a primary key that you're sharding on, you actually want to change the way things are sharded so that, you can spin up a node, do an alter table command, and the database is going to start moving that to the new location.

    Q: How does this work with Daemon Sets, and how do we do rolling upgrades?

    A: The same way you would do a rolling upgrade in Kubernetes is the same way you're going to run here. You're just going to take nodes down or put pods down or up. You’re going to use stateful sets typically, but you can use demon sets to deploy CockroachDB as well.

    distributed SQL