From intern to full-time engineer at Cockroach Labs

From intern to full-time engineer at Cockroach Labs

Throughout the year, we offer internships at Cockroach Labs to give students opportunities to gain industry experience and work on challenging problems within distributed systems. Bilal Akhtar is a Member of Technical Staff at Cockroach Labs, working on the Core Storage team and toying with KV storage engines. Outside of work, you’ll see him reading non-fiction, or giving urban photography a shot.


I interned at Cockroach Labs twice while in school. I’ve now come back as a full-time engineer working on the Core Storage team. We’ve published blog posts about the work interns have done previously, as well as how we set our internship apart by offering equity. In this post, I’ll expand on what stands out about internships at Cockroach Labs.

When finding and applying for internships, there are a lot of factors to consider: scope of impact, quality of the learning experience, company size, and location, just to name a few.

I come from the University of Waterloo, a school known for its co-op program that embeds up to 6 internships as a degree requirement. Since it’s seemingly impossible to avoid discussing internships on campus, it’s easy to fall for pressure and aim almost exclusively for tech giants like Facebook or Google. There’s valuable experience to be gained working at companies like those, but I also found it important to do at least one if not more internships at smaller companies.

At smaller, close-knit, yet fast-growing companies like Cockroach Labs, I got to work with things that were more critical to the product’s success than, say, something that’s purely exploratory or a side-project. I also got more choice over what project or team I wanted to work on - just because there was never a shortage of projects in the pipeline.

First internship: SQL Execution team

When I first interned at Cockroach Labs in 2017, the company was about a third of the size that it is now. It was the pre-1.0 days when new functionality was being added left and right, and my work was no exception. I got to work on adding a couple of new statements to the product to allow for query/session monitoring and cancellation.

Here’s an example of one of those statements, `SHOW QUERIES`, in action:

SELECT node_id, user_name, query FROM [SHOW QUERIES]
----
node_id user_name query
1 root SELECT node_id, user_name, query FROM [SHOW CLUSTER QUERIES]

Finding help and mentorship inside the company back then was not as structured or straightforward as it is today. Either way, I was able to get a lot of help without even asking - the team was small but very collaborative. It was my first time working on something this performance sensitive with a lot of moving parts, and this form of learning is hard to find elsewhere. Especially as a third year student.

And since the company was so small, I got a lot of visibility into what other parts of the company were doing, and how my work impacted them. This sort of insight is not a necessity, but it really puts everything in perspective. It’s also very satisfying to hear customers talk about (and praise!) a feature that you built.

Second internship: SQL Optimizer team

I decided to return to CockroachDB a year later. The company was changing quickly, the user base had grown a lot, and the product had reached some impressive performance / feature milestones in the 2.0 release. I figured the only way to accurately gauge the company at that point would be to work there again, and I’m really glad I did.

Onboarding processes had become much more comprehensive. Interns were now assigned “Roachmates” who were expected to be a point-of-contact for technical questions throughout the internship.

I opted to join the SQL Optimizer team. A cost-based optimizer had been recently added to CockroachDB There was a lot of interesting work to be done in the optimizer; and work there was more math and statistics-driven than elsewhere, which I found to be an interesting change.

Work I did that term included added null counts as an additional statistical signal to the optimizer. Many queries are implicitly null-eliminating, and so the optimizer can make better predictions if it knows the proportion of values that are null for a given column/table.

I also implemented the optimizer pieces of a new type of join called the zigzag join. It’s somewhat similar to a merge join, except it is able to take advantage of multiple indexes to efficiently find the intersection of multiple WHERE predicates. The execution pieces for this join had already been implemented by another intern; I just had to investigate cases where this join was efficient, and plan it there.

Seeing a significant percentage improvement in the runtime of some benchmarks after a couple weeks of work is really satisfying, and is something that happens relatively often at Cockroach Labs, even with intern projects.

Here’s an example query, and a zigzag join plan associated with it:

EXPLAIN SELECT a,b,c FROM zigzag WHERE b = 5 AND c = 6.0

----

 tzigzag-join  · ·

 │       type   inner

 │       pred   (@2 = 5) AND (@3 = 6.0)

 ├── scan ·      ·

 │       table  zigzag@b_idx

 │       fixedvals  1 column

 └── scan ·      ·

  •       table  zigzag@c_idx
  •       fixedvals  1 column

In every internship, there are roadblocks, slow-moving parts, and technical concepts that are simply hard to understand. What I really appreciated in both of my internships was that the whole company understood the value of quality work, even if it took longer. I didn’t feel pressured to rush something through just to get it over the finish line within my internship; the project would just get re-scoped if that’s more appropriate.

Returning for full time

Cockroach Labs wasn’t the only company I had interned at. But not a lot of workplaces combine impactful, specialized work being given to new grads and interns, have a technically impressive product that you believe in, and live up to values around work/life balance and quality of work. All these factors played a major role when I decided to return for full time.

Upon returning, I joined the Core Storage team. Core as a whole works on the transactional key-value store below the SQL layer of the database, and anything connected to it, such as RocksDB. Core layers and concepts like distributed transactions are what make Cockroach unique from other SQL databases, and work there is more distributed systems heavy than in other teams.

Given my internships were all in SQL land, this was quite a transition and a new challenge. I always wanted to learn more about the core layers, so what better way to solidify my understanding than by working on it? I was also encouraged by peers and past managers to make this jump. And given how transparent the company is internally, I knew what I was getting myself into. I’m a month in at the time of writing this blog post, and there have been no unpleasant surprises - only good ones.

If distributed SQL databases are your cup of tea, we're hiring!

Keep Reading

SQL Prober: Black-box monitoring in Managed CockroachDB

This blog post reflects my work as an intern this summer with the SRE team at Cockroach Labs. Earlier this …

Read more
Vectorizing the merge joiner in CockroachDB

Everybody loves a fast query.

So how can we make the best use of the existing information to make joins on …

Read more
40x faster hash joiner with vectorized execution

For the past four months, I've been working with the incredible SQL Execution team at Cockroach Labs as a …

Read more