The first question many developers ask us is what our experience has been writing a distributed database in Go, a garbage-collected language. JVM garbage collection is notoriously expensive, so wouldn’t we be risking CockroachDB’s performance by building it in Go?
The fact is, when you’re building a high performance, distributed system, you’ve only got a handful of languages to choose from, with C++, Java, and Go topping the list. Java’s known performance issues made it unappealing, and while many of us spent our careers developing in C++, the effort required to build our own libraries further complicated the already daunting task of writing a distributed database.
Despite Go being a brand new language for nearly every developer on the project, including the founders, its support for libraries, interfaces, and tooling positioned it as the right choice for CockroachDB.
Perhaps most telling that Go is a good fit is that a lack of previous exposure to the language has not been a barrier for contributors: Go is picked up quickly by anyone with Java or C++ experience. We now have 67 contributors to the project, and CockroachDB has gone from an empty Github project to 125,000 lines of non-generated Go code, with a smattering of C++ and .proto files. Managing code complexity is undeniably affected by the choice of language, and it’s especially important in an open source context.
It’s difficult to quantify the effect on productivity that Go brings over C++ or even Java. Go was designed to scale to large code bases with an emphasis on simplicity and orthogonality of features. The enforced code style, the simple imports and automated import management, the wide variety of linters, the straightforward (and minimal) set of programmatic idioms…all of these attributes of Go are important for clean, understandable code.
When comparing to Java, we appreciate the tight focus on implementation instead of OOP and abstraction: interfaces can be added when needed, not as an initial, often unnecessary, step. When comparing to C++, we appreciate the automatic memory management and how there’s rarely more than one way to get something done, for example with static and one-time initializers. We’ve made good use of channels for synchronization, although we’ll note there is art to using them effectively.
What remains to be seen, of course, is how all of this Go code will perform. We are still building out core functionality in CockroachDB, so much of the performance profiling is yet to come. However, in our past experience, we ported a large system from Java to Go, which greatly decreased its memory footprint and garbage collection overhead.
As we approach beta and focus more heavily on performance, we’ll share our results in a follow-up post.
<!–– Outdated blog post alert! CockroachDB no longer stores each non-primary-key column in a …Read More
Editor's Note - April 23, 2021: This article was written in 2015 when CockroachDB was pre-beta. The product has …Read More