Persistence-as-a-service at Netflix

Persistence-as-a-service at Netflix

Large enterprises with hundreds of developers building thousands of applications and services have a suite of database options to choose from. They also typically have a full team dedicated to maintaining these offerings. 

The saying “use the best tool for the job” is why they offer so many options, since different workloads have different requirements. Over the past few years, the “best tools” have changed because of the increasing demand on infrastructure generated by business-critical, high-volume, workloads.

A few members of Netflix’s team recently spoke about the evolution of databases at the company. They have an entire team — called the Data Platform team — that is in charge of maintaining and building a fully automated, self-service deployment infrastructure for their developers. 

Netflix’s infrastructure is built to scale and support experimentation so their developers can bring successful ideas to market quickly. They also put all their apps and services into the cloud and have the capability to move large amounts of data in near real-time. So a project that might take other studios weeks to months to complete only takes Netflix a few days. The Data Platform team is crucial in helping deliver this competitive advantage. 

Freedom to choose your own tech stack

Netflix developers are welcome to build apps and services using whichever tech stack they like. To make it easier for them, the Data Platform team takes care of the heavy lifting by supporting “all things infrastructure.” This means they work on feature development, tooling, automation, and operations related to all the data stores in their portfolio. 

There’s no need for developers to search for the right technology, and no need to operate it themselves. This saves them a tremendous amount of time so they can focus on solving their business use cases. 

The Data Platform team’s goal is to support the majority of critical use cases for Netflix developers who need a database and for those who want to use persistence-as-a-service. For example, they offer CockroachDB as persistence, distributed databases. This means the team builds, releases, and operates CockroachDB in the correct way so their developers don’t have to think about it. 

How to choose the right database 

Developers want to know when their application should use CockroachDB or Cassandra or PostgreSQL or MySQL or . . . ? What will be the difference between these solutions? How will they impact the application? 

As mentioned before, Netflix developers have the freedom to choose the tech stack they want. However, if they are looking for recommendations or support, that is available, too. 

For example, if they want to spin up a single-region trial database, MySQL and PostgreSQL can be good choices. But when developers want to scale, distribute their data, and have a multi-region application, they will run into problems. On the other hand, although Cassandra is a good choice for distributing the data globally, it has limitations around supporting transactions. 

CockroachDB is a great solution when developers need a SQL database that delivers distributed, consistent transactions at scale and provides the option to expand to multiple regions. The Data Platform team reminds developers that CockroachDB is “definitely not a traditional SQL database” and that you should first understand the nuances around transactions, cross-region, and distributed systems. 

“At Netflix, we provide the freedom for everyone to choose their technology. However, to reduce costs and operate efficiently, we want to build a platform that will support a majority of the use cases. As long as CockroachDB fulfills their business requirement, it will be my suggestion.”
— Shengwei Wang, Senior Software Engineer

Use cases running on CockroachDB 

According to the Data Platform team, today there are over 100+ production CockroachDB clusters and 150+ test clusters. Most of the clusters are deployed in a single region with three availability zones; however, several are starting to explore multi-region topologies. 

Here are a few examples of CockroachDB use cases that are in production at Netflix today: 

  • Cloud drive service: Cloud file system for media applications (YouTube, article)
  • Data mesh: Data movement and real-time processing platform (Netflix blog
  • Device management platform: Cloud-based automation framework that handles device management at scale (Netflix blog)
  • Maestro: Workflow orchestrator that can schedule and manage workflows at a massive scale (Netflix blog

While this is only a small selection, there are several applications and services generating vast amounts of data and metadata that benefit from the scale, resilience, and distribution that CockroachDB provides.  

Learn more about CockroachDB at Netflix

The Data Platform team is relatively small compared to the massive volume of clusters they are managing, so they rely heavily on automation. They aren’t naive to the fact that challenges will happen in the future. But when they do, they have a mitigation plan in place to document the issue and prevent it from happening again. 

Netflix’s Data Platform team provides the flexibility developers need to build breakthrough applications and services that deliver joy to their customers. Their model is what many organizations aspire to achieve so that they can gain a competitive edge. 

Since Netflix adopted CockroachDB in 2020, they have made a lot of progress on leveraging the power of distributed transactions at scale. To learn more about their usage, check out this video and keep an eye out for an upcoming webinar in which we’ll welcome Senior Netflix Engineer, Vlad Sydorenko, to discuss how Netflix uses changefeeds.

About the author

Cassie McAllister

Cassie is a Senior Product Marketing Manager at Cockroach Labs. Her focus is on vertical marketing and telling customer stories. She's been in the database world for the past 5 years and previously worked in communications for cybersecurity companies. In her free time, you can find her at the beach, sipping wine, or skiing down a mountain.

linkedin link

Keep Reading

How Netflix builds the infrastructure to stream on every device

The details in this post are based on The Netflix Tech Blog post titled “Towards a Reliable Device Management Platform”. …

Read More
Global payments orchestration platform architecture

Modern hyper-growth merchants do not want to manage complex payment system architecture. They want to expand network …

Read More
The history of databases at Netflix and how they use CockroachDB

In 2008, after Netflix pivoted from DVD-by-mail to streaming, they were running the streaming service on premise and …

Read More
x
Developer Resources