We wrote the book on distributed scale. Literally.Free O'Reilly Book
This post was originally published on November 3, 2015, the year Cockroach Labs was founded. Eight years and 85,670 GitHub commits later, it’s still absolutely true: Go was the “write stuff” for CockroachDB.
The first question many developers ask us is what our experience has been writing a distributed database in Go, a garbage-collected language. JVM garbage collection is notoriously expensive, so wouldn’t we be risking CockroachDB’s performance by building it in Go?
The fact is, when you’re building a high-performance distributed system you’ve only got a handful of languages to choose from, with C++, Java, and Go topping the list. Java’s known performance issues made it unappealing. And, though many of us spent our careers developing in C++, the effort required to build our own libraries further complicated the already daunting task of writing a distributed database. Next on the list was Go.
Despite being a brand new language for nearly every developer on the project — including the founders — Go's support for libraries, interfaces, and tooling positioned it as the right choice for CockroachDB.
Perhaps the strongest indicator that Go is a good fit: lack of previous exposure to the language has not been a barrier for contributors. We now have 67 contributors to the project*, and CockroachDB has gone from an empty Github project to 125,000 lines of non-generated Go code, with a smattering of C++ and .proto files. Managing code complexity is undeniably affected by the choice of language, and it’s especially important in an open source context.
*As of October 2023, CockroachDB now has over 600 GitHub contributors and is 89.6% written in Go, along with Java, C++, and Python, plus a smattering of TypeScript, Starlark, Yacc, and a few other languages.
It’s difficult to quantify the effect on productivity that Go brings over C++ or even Java. Go was designed to scale to large code bases with an emphasis on simplicity and orthogonality of features. The enforced code style, the simple imports and automated import management, the wide variety of linters, the straightforward (and minimal) set of programmatic idioms…all of these attributes of Go are important for clean, understandable code.
When comparing to Java, we appreciate the tight focus on implementation instead of OOP and abstraction: interfaces can be added when needed, not as an initial, often unnecessary, step. When comparing to C++, we appreciate the automatic memory management and how there’s rarely more than one way to get something done, for example with static and one-time initializers. We’ve made good use of channels for synchronization, although we’ll note there is art to using them effectively.
What remains to be seen, of course, is how all of this Go code will perform. We are still building out core functionality in CockroachDB, so much of the performance profiling is yet to come**. However, in our past experience, we ported a large system from Java to Go, which greatly decreased its memory footprint and garbage collection overhead.
** Eight years along, with a lot of additional core functionality and a great deal of performance profiling under the bridge, we are still very happy with Go.
Want to learn more about how Go garbage collection works in CockroachDB? Garbage collection in Go can cause an application to pause. Fortunately, Go also makes available a lot of manual tweaks to control what actually ends up on top of the garbage heap. In this hour-long deep (deep) dive video, Cockroach Labs CTO and co-founder Ben Darnell discusses how CockroachDB optimized memory usage to mitigate issues related to garbage collection and improved the use of channels to avoid deadlocks. Watch: How CockroachDB Wrote a Massive & Complex Go Application