G’day! I’m Oliver, a Member of Technical Staff here at Cockroach Labs. After spending the better part of 5 years in the United States, I decided to come back home to Sydney, Australia. With my homecoming, I’m happy to announce that Cockroach Labs is hiring people to work with us from Sydney, Australia!
If you’re curious about me, my journey, and why I’m at Cockroach, read on. Of course, if the news of an opening sounds good, you can jump straight to the job openings we have in Sydney.
Born and raised in Sydney, I graduated with a degree in Computer Science from the University of New South Wales. During my university years, I did an internship at Google in Sydney, Australia, where I worked on tooling for detecting network hardware failures on the NetSoft team. I also worked at Facebook in Menlo Park, California where I built tooling for diagnosing traffic infrastructure issues behind internet.org.
After graduating, I went to work at Dropbox in San Francisco, where I was on the filesystem team. The filesystem team was responsible for the abstraction that handled the metadata associated with syncing your files. This data was stored on thousands of MySQL shards managed in-house, altogether containing trillions of rows of data.
One of my main projects involved moving the filesystem abstraction away to its own service. This enabled desirable traits, such as centralised rate limiting. It also ensured we had controlled, well behaved SQL queries to the database. This was a long running multi-month project that involved moving calls away from multiple services into one. Of course, there was reasonable effort spent to ensure the new service served traffic at reasonable latency, and with no downtime compared to speaking with the database directly from our monolithic server. The service today serves over a million requests per second with high availability.
To ensure we had a well performing service, we needed to ensure the APIs provided by our service could serve traffic with reasonable latencies. To do this, we had to ensure we had performant queries for each API endpoint that we offered. While designing our APIs, we looked at various callsites and found unoptimised queries which could devolve into large table scans - some of which had caused reasonably large downtime. Tracking things down during a downtime was terrible - trying in a mad panic to find the source and then blocking these queries from execution was quite stressful.
To ensure our queries were performant, we turned to a few techniques:
Another challenge we faced was having data that didn’t follow invariants we assumed to be true. This is especially painful if data is denormalised incorrectly, hence serving incorrect data. In the denormalisation of size case mentioned above, we had users with their Dropbox size deemed to be a higher value, meaning they couldn’t upload any files as they seemingly exceeded their quota - even when their Dropbox was completely empty! Broken invariants like these led to a broken user experience, culminating in support tickets or customers simply walking away from your product (most people just leave your product without filing a ticket if it is broken!). Unfortunately, there were lots of broken invariants hiding, which we couldn’t find without performing expensive SQL scans and joins across the database!
To detect broken invariants, we built a verification system that regularly scanned every row in our database to ensure our data was always consistent. The system reads these rows in batches and from replica databases to minimise availability impact. We found that we had millions of inconsistencies in our data, and the job to resolve these and bring these incorrect invariants down to zero involved much labour. However, the reward of building such a system and ensuring zero inconsistencies was significant, as going forward we were able to make changes and add new features with a greater deal of confidence.
Another effort I was part of involved the migration of billions of rows from our global single-shard MySQL database to a different database called Edgestore, an in-house graph-based database built on top of sharded MySQL with its own API. This was necessary to support the rising user base and to avoid the dreaded single point of failure. However, it was not straightforward - but more on that later!
As you can see, a few projects - mostly working on top of large scale database systems. Did it have any bearing on why I joined Cockroach? Well…
My experiences at Dropbox were mostly spent working with in-house database systems that continued to work at scale with high availability. Having been involved in the projects above, there were lots of technical and operational challenges involved:
In the case of Edgestore, the migration efforts were a multi-year, full organisation effort. Work was needed to create a database system from scratch and keep it up and running under scale, compounded by needing every relation to be migrated. Maintaining our in-house database systems was also an operational burden, requiring dedicated engineers to perfect these in-house systems as they matured.
Though hard and in some cases arduous, I found the database work I was doing enjoyable as it was the kind of large-scale infrastructure problems I was interested in solving. I felt fortunate to have worked with many talented engineers at Dropbox who managed to pull off such complex technical projects to keep us operational at scale.
Almost every company needs a database to house their data. Successful companies need databases that grow with their success. But not every company can afford to spend the amount of resources that companies such as Dropbox, Facebook or Google do on databases so they can survive massive growth.
When I saw that CockroachDB had the power to abstract these problems away while still using the PostgreSQL syntax developers know and love, I was immediately sold. For me, it was the database that grew with you. If I could contribute to a product that took these pain points away, I would be helping others focus more on their mission of shipping their own product instead of worrying and reasoning about complex database-related issues at scale. In that sense, I felt I would be a part of every product that would be shipped and powered by CockroachDB.
I applied straight away. Walking into the interviews, I already thought the product sold itself. I was even more impressed when I talked to everyone working at Cockroach. I was excited by the upcoming projects, the vision of the company and the people I talked to. Not long after that, I signed!
Working on CockroachDB has been an incredible experience. It’s been just over a year and I felt like I’ve been involved in so much - some highlights include dealing with the mindblow-yness of time, adding spatial features and indexing and simplifying our multi-region user experience in the upcoming release.
While there are many aspects that I enjoy at Cockroach, here are a few big ones (in no particular order) that keep me going:
When the virus-that-must-not-be-named came along, my wife and I decided that it was time to move home. The US was a great adventure and scratched our traveller itch, but coming back home to the familiar suburbia of Sydney (and the wonderful fresh sea breezes) was always our long term plan. Of course, coming back was complicated, as I still wanted to be involved with Cockroach Labs but there was no Aussie presence yet.
Fortunately, the team at Cockroach Labs was interested in the talent that could be tapped in Australia and I was given the go ahead to come back and spin up a new office. We are already growing rapidly with a Series E round under our belt, which puts us in a great position to aggressively grow.
I’m happy to be home, but I’d be even happier if you decided to join us down here in Sydney! We are currently looking for Site Reliability Engineers and Software Engineers to join our team and help build out the start of a new shiny office in the land down under.
You can see the current positions on our careers page - there will be more to come in the future. If you’re interested and are already in Australia, don’t hesitate to apply or reach out to me directly.
It’s been over six years since CockroachDB became a GitHub project. In that time, the project has racked up more than …Read More
When it comes to learning, we have all benefited from social learning in the workplace. Social learning is an …Read More