I’m a database nerd. Or, to be more precise, a DBMS nerd. What I love most about them is that while they’re everywhere, and modern society could not function without them, they’re incredibly difficult to build well. Part of this difficulty stems from the fact that databases are complex, and their construction borrows from nearly all fields of Computer Science.
My love of databases, however, wasn’t always this strong. Back in university, I avoided the Databases course at all costs, inferring (quite incorrectly) that there wasn’t much new to learn in the domain, and that all of its hardest problems had already been solved. This avoidance came to bite me quite quickly when, upon graduating, I was recruited by the local DBMS development team at IBM, and I decided to join them.
When I first joined IBM, I worked on the Autonomic Computing team, a team whose mission was to make life easier for database administrators by finding ways to automate tasks that were difficult, or error prone, when performed manually (e.g. physical database design, statistics collection, memory management). Specifically, I was asked to lead a team whose goal was to automate the tuning of the DBMS’s various memory configuration parameters in a way that would optimize performance.
While the work was exhausting (on more than one occasion I fell asleep with my laptop beside me), it was rewarding to produce a novel solution to a prickly problem… Our work ended up being enabled by default in the product, and is still in use at thousands of customers today.
At the time we started working on automating memory tuning, all of the academic approaches available suffered from constraints which prevented them from being implemented in an industrial product. For instance, there were several approaches available for tuning buffer caches, but many of these required dividing queries into classes and specifying response time goals. This query division and goal specification could often be as difficult as the underlying memory tuning. Furthermore, while there were papers written about tuning individual memory consumers (like caches, working memory, locking memory, etc), there was no work which explored how one could unify the approaches to create a mechanism for automating tuning for a grouping of memory consumers.
As part of our investigation, we also uncovered several complications which made memory tuning difficult on a complex DBMS. For example, one of the largest consumers of query memory in a DBMS is the memory required for sorting. Every time a query contains an ORDER BY clause, and there is no corresponding index to leverage, the system must sort the data. This sorting is most efficiently performed in memory. If however, the sort is too large to be contained in memory, most systems allow the sorted rows to “spill” to disk by writing them to a temporary table. The temporary table then is cached by the system (in a write back cache) before possibly being written to disk (in cases where the cache is not large enough to hold the entire temporary table). The fact that there exists this interplay between sort memory and caching memory, makes tuning of the memory difficult. Ideally, the system would optimize the amount of sort memory to prevent spills (which are costly, even if they never spill all the way to disk), but adding additional caching memory can give the illusion of tuning progress as it can accelerate the act of sort spilling.
A second complication was that most academic approaches assumed that individual tuning decisions were free. In actuality, this is far from the case. When decreasing the size of a write-back cache for instance, any dirty pages that exist in the section of memory being freed, have to be written to disk. As a result, frequent cache size reductions can increase the burden on the I/O subsystem. Additionally, since contiguous blocks of memory must be freed to be of use to other memory consumers, the writing of pages to disk must wait on any pages that are currently in use by concurrent transactions. In practice we found that it was necessary to model the cost of these memory decreases, and employ a Control Theory approach to reduce frequent oscillations in consumer sizes.
These challenges combined to make the project very demanding, especially since I was so inexperienced. Not only did I need to wrap my head around the complex way in which memory management was performed in the DBMS, but I had to do so while at the same time learning about Control Theory, commercial software development, and leading a team for the first time. Thankfully, I had brilliant partners in IBM Research, an amazing manager and mentor, and a very forgiving team. While the work was exhausting (on more than one occasion I fell asleep with my laptop beside me), it was rewarding to produce a novel solution to a prickly problem (resulting in several academic research papers), and grow so much professionally at the same time. Our work ended up being enabled by default in the product, and is still in use at thousands of customers today - something which fills me with tremendous pride.
Once we shipped the first wave of Autonomic Computing features, our team was split up to help work on other more pressing projects. As part of this split, I ended up working on a team that was determining the feasibility of bringing some technology which had been very successful on the mainframe - Db2 Data Sharing - to commodity hardware. When Db2 Data Sharing was introduced in 1994 (along with the mainframe’s Parallel Sysplex), it allowed databases which were typically limited to a single machine, to scale to multiple machines. This made it one of the first examples of a distributed relational database, and was a huge relief for customers struggling to scale their databases to meet workload demands. When Oracle followed suit with their Real Application Clusters (RAC) in 2001, there was pressure at IBM to bring a competing technology to the non-mainframe market.
I was asked to lead the transaction management team. I was joining a team composed of transaction management experts. While this could have been intimidating, it actually freed me from having to make daily decisions around the technology, and instead allowed me to focus on leading (which the team needed). It was here where I grew my skills in software development project management…
The problem with doing so however, was that mainframes had some technological advantages over UNIX-based servers. For example, mainframes had both high-speed interconnects, and hardware-based clock synchronization, both of which made distributing a database over multiple servers much more feasible. Luckily for us, when we started this work in 2006, Infiniband connections were gaining widespread adoption, and combined with Remote Direct Memory Access (RDMA), allowed for extremely low latency interactions between nodes of the cluster. RDMA, combined with an internal implementation of a Lamport clock, allowed us to deliver similar technology that existed on the mainframe, to our existing non-mainframe customers.
This new role was my first introduction to core DBMS development, and is where I learned about things like ARIES for the first time (fortunately enough, from the paper’s original author). The first year was a tough slog, as there was so much to learn, but it was also exhilarating to be learning something completely new, especially after being deep in the bowels of memory management for so long. Once the team proved out the prototype (to which I had very lightly contributed on account of all the learning), we were green-lit to build it into our product as what would eventually be called Db2 pureScale.
At that point we needed to scale the team out dramatically, and since I’d been learning about the technical details for several months, I was asked to lead the transaction management team. This was very different from my experience in Autonomic Computing (where everyone on the team was new to the area), as I was now joining a team composed of transaction management experts. While this could have been intimidating, it actually freed me from having to make daily decisions around the technology, and instead allowed me to focus on leading (which the team needed). It was here where I grew my skills in software development project management, while at the same time picking up enough of the technical details to be able to make a significant contribution, especially at the tail end of the project.
In 2009, when we shipped Db2 pureScale, I went off in a completely different direction, but was reunited with two old friends. My old manager (and his manager), from the Autonomic Computing days, were starting up a team to investigate the possibility of incorporating recent research on Column Stores into our product.
The difficulty in the task was what drew me to it. That, along with the fact that it was uncharted space, which would allow me to both lead the team, and design the technological solution. Doing both at the same time wasn’t always easy, and the pull of project management would often distract me from what I really wanted to do, which was code.
Since the early days of relational databases, the prevailing wisdom was that rows needed to be stored contiguously on disk to maximize performance. This made sense for transactional workloads, where rows are inserted and updated one at a time (and therefore, having them in a single place on disk is most efficient), and queries are often accelerated by indexes. For analytical databases however, rows are typically modified as part of batch jobs, and it’s common for queries on very wide tables to select only a handful of columns and, in the absence of indexes, scan large portions of the table(s). As a result, it can be more efficient to store column values for consecutive rows together, as queries can read only the column values requested by the query, and save the I/O required for all unread columns. Column storage also lends itself well to vector processing, where column values are stored consecutively in memory and processed in batches as opposed to individually. As an added bonus, column storage also helps with data compression, as a given page of data (or even an entire file on disk) will contain a single column whose values are often drawn from a limited set of possibilities (e.g. State, area code, gender, or age).
In my initial introduction to column stores I was asked if I’d join a team along with some people from IBM Research to see if we could build a rapid prototype for integration into Db2. As we were partnering with the hardware division on this prototype, I was asked to take on the work to add SIMD instructions to our new vectorized processing engine. This was like nothing I’d ever done before, and getting right down to the machine instruction level was both exciting and daunting.
When we successfully completed the prototype and the work to build it into the product began, I lobbied to lead the team which would be building the column store’s insert/update/delete capabilities. The initial prototype we built was insert only (well, to be fair, bulk load only) and ensuring that we could allow fine-grained data modification was no easy task, as rows were split into multiple files on disk, and we were required to match insert performance of the existing row store, at least for large batches of inserts and updates.
The difficulty in the task was what drew me to it. That, along with the fact that it was uncharted space, which would allow me to both lead the team, and design the technological solution. Doing both at the same time wasn’t always easy, and the pull of project management would often distract me from what I really wanted to do, which was code. In the end though, getting to something that would delight our customers was my ultimate goal, and if that meant more technical troubleshooting than actual coding, I was happy to do that. I also happened to be very lucky that the team assembled around me was very strong, and well balanced. They were a pleasure to work with, and some of them continue to be close personal friends today. In 2013 we shipped our solution (named, BLU Acceleration), and published a research paper about its novel design.
After we shipped BLU Acceleration, the researchers responsible for its inception decided to turn their attention to Hybrid Transaction/Analytical Processing - HTAP. Database workloads are broadly divided into two classes: transactional and analytical. The goal of HTAP is to build a system which is optimized for both transactional and analytical processing, using a single data copy. This is no trivial feat, since as mentioned earlier, transaction processing benefits from storing rows together on disk, while analytical processing is often faster using column organized storage. I started collaborating with IBM Research on the HTAP effort, as the sole developer. The work was interesting, and we made good progress, but ultimately, the company was not interested in pursuing HTAP at the time (at least, not in the deeply integrated manner which we were pursuing).
In all situations, it’s helpful to be at the right place at the right time. While it’s not clear to me if deeply integrated HTAP is the right “place” (it certainly has its advantages, but it’s very difficult to get right), we didn’t have the timing right. At the time the biggest influence on the database space was the emerging shift to the cloud, and HTAP alone did not directly address the challenges brought about by this shift. As a result, while the work we were doing was interesting, and likely would have borne fruit, it was not the organization’s most pressing challenge at the time, which is why it failed.
When the HTAP work was shut down, I put together a proposal to build a cloud-native analytical database from the ground up. As with transactional databases, traditionally analytical databases were designed to run on a single machine. In the 80’s and 90’s however, database designers realized that long running analytical queries could be processed much faster if they were split over many machines and run in parallel. This “splitting” involved partitioning the database and having each machine own a portion of the database. The fact that machines wouldn’t be sharing data at all (each machine would own a distinct portion of the database) led to this approach being called the shared nothing architecture. For the next several decades, shared nothing would dominate the analytical database landscape. Then came the shift to the cloud, or more specifically, the shift to cloud-native storage.
The work was also invigorating, as we were unencumbered by the difficulty of retrofitting an existing system, and could instead focus on innovating.
In the cloud, the most cost-effective way to store large data sets, and analytical databases typically contain very large data sets, is through object storage (AWS S3, Azure Blob Storage, Google Cloud Storage). This not only makes storage much less expensive, but also places data on storage which is visible to all nodes of the cluster, opening up a whole new range of possibilities. For example, with data on shared storage, partition ownership can be reassigned to nodes almost instantaneously, without having to physically move data on disk (as is the case in traditional shared nothing deployments). Instantaneous partition reassignment not only allows the clusters to expand and contract on a per-query basis, but it also dramatically simplifies high availability, since partitions owned by failed nodes can be easily reassigned to surviving nodes.
Recognizing that a shared nothing architecture on top of cloud storage was the wave of the future, I proposed that we build such a system and first focus on IoT workloads. To accomplish this, the organization asked me to assemble a team outside of the database organization, so that we would be free to innovate without restrictions. This was refreshing, but also challenging, as the team needed to build up all of the infrastructure we’d taken for granted over the years. Fortunately, it also allowed us to innovate, and we leaned heavily on newer technologies like Docker, Jenkins and Kubernetes, which weren’t being used in the database organization at the time.
The work was also invigorating, as we were unencumbered by the difficulty of retrofitting an existing system, and could instead focus on innovating. Unfortunately however, the team dynamics necessitated that I spend most of my energies on maintaining harmony, and less on technological decisions. This was only possible because I had such a strong technical team, which I trusted implicitly. While the road to success wasn’t direct, we eventually shipped our IoT-optimized cloud-native database (named Db2 Event Store) in 2017, and recently published a research paper on its initial design and evolution.
As recognition for my work on Db2 Event Store I was promoted. In my new role I no longer led focused technical teams but rather, worked more closely with the executive team on our division’s strategy. This included more time working with customers, providing architectural oversight to several large and successful products, and planning for our future. The work was rewarding, and I really enjoyed working with my fellow executives (especially my boss), but ultimately it took me away from the deep technical work that I enjoyed so much. It was in part because I missed the deep technical work that I was receptive to new opportunities, when people came knocking. This culminated in me leaving IBM and joining Cockroach Labs earlier this year.
One of the more enjoyable aspects of being a leader is the impact you can have on the careers of others. I’ve been fortunate enough to mentor many people over the years and am frequently asked for my opinion on how to succeed in our industry, and the database domain specifically. Here are a few things I mention to people who ask:
The database space is thrilling with its enormous breadth and depth of technical challenges. As a result, it’s possible to be incredibly broad or incredibly deep. Some people have a natural tendency to get deeper and deeper in one area (say, query optimization, developer interfaces, or transaction management), and to be fair, going deep is required for the industry to evolve and solve increasingly complex problems. When building a career as a DBMS developer however, I’ve found that there are benefits to being broad. Specifically, it allows you to identify problems that exist across the system, and work to address them more holistically. When asked about this specifically, I generally advise people to learn as much as they can, about as many areas of the database world as possible, early on in their careers. If you take this approach, it will allow you to identify problems and propose solutions outside of the space in which you’re currently focused, and drive more value to the organization as a whole.
It’s often the case that people just starting out in our industry will chase the offer with the highest payday. I understand the temptation. When you’re early on in your career, there are pressing financial considerations (saving for a house, marriage, starting planning for retirement) and that bigger paycheque can seem like the right way to rank offers. The problem however, is that in many cases maximizing your return early can work against you later in your career.
When you’re just starting out, you should instead be trying to maximize learning opportunities, even if that means lower compensation. Becoming a vicious learner early, will ensure that when you’re in your maximum earning years (your 40s and 50s), you’re most desirable and impactful. To that end, if you’re just starting out your career, find the opportunity which will allow you to make the most impact to the company in which you work, learn from experts around you, and become a generalist by building your skills in a bunch of different areas.
Anyone who’s had a successful career (or an unsuccessful one, for that matter) would be remiss to mention the extent to which luck has played a part. Being good helps a lot, but a significant helping of luck can make the difference between being mediocre and exceptional. So much of life is about being in the right place at the right time.
To a large extent, luck is not something you can change. You have no influence on the family into which you’re born, the language(s) you grow up speaking, or the geo-political situation you find yourself in as a child. That being said, as you get older, you have the ability to influence your luck, and you should take advantage of that ability. When I finished my undergraduate degree and was looking for a place for graduate studies, I had several offers to consider. One of them was by far the most lucrative, but in the end I decided to go to the University of Waterloo because I felt that doing so would position me best down the road. Two years later when I was recruited from there to join IBM it was partly by luck (I graduated right around the time that IBM was recruiting - it was the right time) but also, the database group at IBM was only recruiting from University of Waterloo at that time (I was in the right place). If I hadn’t gone to University of Waterloo, it’s likely I never would have ended up at IBM.
From there the luck kept snowballing. At IBM I had the tremendous opportunity to study under some of the founders of the database industry - members of the original team that built System R, as well as their successors. The depth of knowledge they possessed about database systems is something that still humbles me today. I was unequivocally lucky to have been recruited into such a tremendous learning environment, but the luck I created for myself by going to Waterloo started everything.
When deciding where to work, which team to be a part of, or how to spend your energies outside of work, it’s always helpful to remember that we have the ability to influence our own luck. While you can’t always control the “right time”, you may have a better sense for the “right place”.
When switching jobs, especially after so many years, there’s rarely a single reason why. After having many conversations about the move with friends, family and colleagues, here’s the distilled list of reasons why I’m so excited to be at Cockroach Labs:
If you share my passion for solving tough technological problems, making an impact on a rapidly growing segment of the market, and working on a strong welcoming team, we’d love to hear from you.
Working at a startup presents a number of challenges for hiring. You have limited resources, you’re …Read More
Raise your hand if you’ve ever been personally victimized by the Raft Consensus Algorithm.Read More