Roaches on open water! CockroachDB on DigitalOcean

If you’re a fan of DigitalOcean and its powerful but simple platform to deploy cloud-based infrastructure, you’ll appreciate CockroachDB: it’s similarly simple to deploy and provides your stack a lot of power and flexibility. And while CockroachDB can be deployed anywhere, it’s a natural fit within DigitalOcean’s no-fuss framework: both are for developers who like easy-to-reason-about technology that lets them get work done quickly.

To show you the synergy between DigitalOcean and CockroachDB, this post is going to demo going from nothing to a distributed 3-node cluster in under 20 minutes. So you don’t have to budge from the command line, we’re going to show you how to stand everything up through DigitalOcean’s command-line tool, doctl. (If you’d rather not bother with doctl, we’ve got the standard instructions here.)

After you’ve deployed everything, we’ll also check out some of CockroachDB’s core features (automatic data distribution and survivability) so you can get a better sense of how it makes your life easier.

Because this is a quick demo, we’ll show you how to use an insecure cluster communicating over external IP addresses, neither of which we’d necessarily recommend for production. For a production-ready deployment guide, see Deploy CockroachDB on DigitalOcean.

Install CockroachDB

Create A Startup Script

Because all of the servers we’re going to create need to install CockroachDB, we’re going to create a startup script that downloads the latest CockroachDB binary, and then makes it accessible from the command line.

Create a file to store the startup script:
$ sudo nano doctl.sh
Enter the following contents into the file:

#!/bin/bash wget https://binaries.cockroachdb.com/cockroach-latest.linux-amd64.tgz tar -xf cockroach-latest.linux-amd64.tgz –strip=1 cockroach-latest.linux-amd64/cockroach sudo mv cockroach /usr/local/bin

Write the file (^O) and exit nano (^X).

Create Your Droplets

With our startup script to install CockroachDB prepared, we can create our Droplets (which is just what DigitalOcean calls your virtual machines).

Find your SSH key’s FingerPrint, which you’ll need when creating Droplets:
$ doctl compute ssh-key list
Create your Droplets with the following options:
- --size: We recommend at least 2GB
- --image: We’re using Ubuntu 16.10 but you should be able to use any contemporary Linux distro
- --user-data-file: Use the startup script we created, doctl.sh
- --ssh-keys: Use the SSH key’s FingerPrint you found in the last step

# Create the first node $ doctl compute droplet create node1 \ --size 2gb \ --image ubuntu-16-10-x64 \ --region nyc1 \ --user-data-file doctl.sh \ --ssh-keys # Create the second and third nodes (only name differs) $ doctl compute droplet create node2 --size 2gb --image ubuntu-16-10-x64 --region nyc1 --user-data-file doctl.sh --ssh-keys $ doctl compute droplet create node3 --size 2gb --image ubuntu-16-10-x64 --region nyc1 --user-data-file doctl.sh --ssh-keys

It can take a few minutes for your Droplets to finish being created. You can check on them using doctl compute droplet list.

Get Your Droplets’ Details

Once the Droplets have been created, we’ll need to get some information about them:

$ doctl compute droplet list

Copy down the following values from each Droplet:

ID
Name
Public IPv4

Start Your Cluster

At this point, we can start our cluster, which will contain all of our databases.

SSH into node1:
$ doctl compute ssh <node1 ID>
Enter yes to proceed connecting to the Droplet.
Start your cluster:
$ cockroach start --insecure --background --advertise-host=<node1 IP address>
You’ll receive a message to stdout with your new cluster’s details, including its ID.
Terminate the SSH connection to your first node (CTRL+D).

Join Additional Nodes to the Cluster

With the cluster up and running, you can now add nodes to the cluster. This is almost as simple as starting the cluster itself (just one more flag) and demonstrates how easy it is to horizontally scale with CockroachDB.

SSH to the second node:
$ doctl compute ssh <node2 ID>
Enter yes to proceed connecting to the Droplet.
Start a new node that joins the existing cluster using node1‘s IP address on port 26257 (CockroachDB’s default port):
cockroach start --insecure --background \ --advertise-host=<node2 IP address> \ --join=<node1 IP address>:26257
Close the second node’s SSH connection and complete the same steps for your third Droplet, using its IP address instead.
Make sure all 3 nodes are connected:
$ cockroach node status
The response to stdout should list all 3 nodes.

Features in Action

At this point, your cluster’s up and running, but we’re going to take you through some demos of CockroachDB’s features that work really well on a cloud provider like DigitalOcean.

We’ve already seen how easy it is to scale a deployment by simply adding servers, but we’ll also cover:

Data Distribution
Survivability

Data Distribution

In a traditional RDBMS, you achieve scale by sharding your deployment, which splits a table into contiguous sets of rows and then stores those rows on separate servers. However, keeping a distribution of data in a sharded deployment is an enormous burden for your application, organization, and overworked DBA.

Because we want to make operating a database as simple as possible, CockroachDB handles that kind of distribution for you without any kind of complicated configuration or additional settings. When you write data to any node in CockroachDB, it’s simply available to the rest of the nodes.

Let’s demonstrate how that works by generating CockroachDB’s example data on one node and then viewing it from another node.

From your first node, generate the example data (a database called startrek with two tables, episodes and quotes):
$ doctl compute ssh <node1 ID> $ cockroach gen example-data | cockroach sql --insecure
Find out how much data was written to your node:
$ cockroach sql --insecure --execute="SELECT COUNT(*) FROM startrek.episodes; \ SELECT COUNT(*) FROM startrek.quotes;"
You’ll see that episodes contains 79 rows and quotes contains 200.
Terminate the connection with the first node (CTRL + D), and then connect to the second node:
$ doctl compute ssh <node2 ID>
Run the same SQL query to see how much data is stored in the two example tables:
$ cockroach sql --insecure --execute="SELECT COUNT(*) FROM startrek.episodes; \ SELECT COUNT(*) FROM startrek.quotes;"
Again, 79 rows in episodes and 200 in quotes.

Even though you generated the example data on another node, it’s been distributed and is accessible from all of your other servers. Best of all, you didn’t have to configure the sharding patterns or do much of anything for it to work.

Survivability

Now that you have a cluster up and running with data distributed between all of your nodes, what happens when one of the nodes dies?

The TL;DR: Not a whole lot! By default, CockroachDB includes three-way replication, which means that even if one node goes down, your cluster still has two other copies of it. This allows your database to make forward progress and your application to remain blissfully unaware.

To demonstrate this, we’ll remove a node from the cluster and show that all of the cluster’s data is still available. We’ll then rejoin the node to the cluster and see that it receives all updates that happened while it was offline.

Assuming you’re still connected to node2, quit running CockroachDB:
$ cockroach quit --insecure
Now close this session (CTRL+D) and move to node3:
$ doctl compute ssh <node3 ID>
Delete all of the quotes where the episode is greater than 50:
$ cockroach sql --insecure --execute="DELETE FROM startrek.quotes WHERE episode > 50; \ SELECT COUNT(*) FROM startrek.quotes;"
You’ll see there are now 131 rows of data.
Close the connection with node3, and then move back to node2:
$ doctl compute ssh <node3 ID>
Restart the node:
cockroach start --insecure --background \ --advertise-host=<node2 IP address> \ --join=<node1 IP address>:26257
Count the number of rows available on the cluster:
$ cockroach sql --insecure --execute="SELECT COUNT(*) FROM startrek.quotes;"

131! So, despite being offline when the update happened, the node is updated as soon as it rejoins the cluster.

If you’d like, you can now remove the example data:

$ cockroach sql --insecure --execute="DROP TABLE quotes; DROP TABLE episodes; DROP DATABASE startrek;"

What’s next?

Now that you’ve seen how easy it is to get CockroachDB up and running on DigitalOcean (and demoed the core features), you can kick it up a notch and try your hand at deploying CockroachDB on DigitalOcean with SSL encryption.