In 1968, millions of people were watching the conclusion of an exciting American football game between the Oakland Raiders and the New York Jets when their televisions suddenly cut to the television film Heidi.
A lot of angry complaints and an investigation later, the true cause of the “Heidi Bowl” became clear: the network had slotted three hours for the game, but it went long. NBC executives decided to continue showing the game until its conclusion, but were unable to communicate this decision to their technicians because the network’s phone lines were jammed with viewers wondering what would happen if the game wasn’t over by the time the film was scheduled to start.
More than 50 years later, nearly everything about the way visual media like live sports are delivered to consumers has changed. But modern streaming infrastructure can be susceptible to the same kind of problems that led to the Heidi Bowl.
Consider, for example, Hulu’s coverage of the 2018 Super Bowl, which cut out for some viewers in the final moments of the game. And Hulu is far from the only victim. For example, the livestream of a long-anticipated Game of Thrones episode on HBO Now (the predecessor to HBO Max) went down almost immediately, earning HBO the ire of frustrated fantasy fans everywhere.
Serving streaming video – whether it’s live sports or Netflix-style on-demand content – can be tremendously challenging. Let’s take a look at some of those challenges and how they can be addressed in more detail.
For reference, here’s a basic architecture of the sort that might be used by a major video streaming service:
Looking at the above architecture, it’s clear that a lot relies on the metadata store at the center of the diagram. Among other things, that metadata is used to determine whether or not a viewer has access to a particular piece of content, so bugs or mistakes can mean subscribers don’t get access to the content they’re paying for.
For a media streaming company, having a piece of content such as a new show go viral presents a massive business opportunity. But having hot content also presents an engineering challenge as your services are bombarded with requests for the same content.
For example, caching systems are likely required to ensure that the metadata store itself isn’t overrun. But that introduces the potential for bugs related to outdated caches serving users the wrong content or no content at all.
One of the unseen challenges here is that what looks like a single piece of content on the front end can be quite a bit more complex in the back end. A single episode of television, for example, may have different versions and thus different metadata for different geographical regions. Users may need to be served slightly different versions of the content and its metadata depending on their subscription tier and other account settings. And users in different locations may need to be served different ads (based on their location or based on the user themselves) that must be interspersed throughout the content and timed perfectly to avoid interrupting it.
So when you’ve got a piece of hot content, you need to be aware of both:
How your system can handle the total volume, i.e. large numbers of people requesting the same content at the same time, and… How your system handles the complexities of serving the right content and metadata versions to each user, and whether any of those systems could be compromised when the system comes under heavy load.
Building a layered caching system can help solve the problem #1, but then each layer of cache needs to be kept in close sync with your source of truth database or you’re likely to encounter issues related to problem #2. This can be particularly challenging when dealing with popular live-streamed events like breaking news, live sports, or popular streaming television and movie premieres, where even a brief moment of misalignment between the cache layers can lead to viewers missing some of the content they’re paying for.
Even when things go right, dealing with problem #2 can also involve a lot of manual work. For example, it may be necessary to build message queuing systems to communicate changes from the source-of-truth metadata store to the various caching layers.
Is there an easier way?
There’s no database on earth that can make operating a global-scale media streaming company easy. However, choosing the right database for your source-of-truth metadata store can make dealing with some of these challenges a bit easier.
CockroachDB is a next-generation distributed SQL database that combines the easy scalability of databases like Cassandra with the ironclad ACID guarantees of a relational database. It can be scaled infinitely (horizontally and geographically) without requiring any modification of your application, because it can always be treated as a single logical database. That means that even if hot content catches you by surprise, the database can be scaled up to meet that demand instantly, with no need to make any adjustments to your application logic.
CockroachDB also includes some specific features that address the pains of live-streaming hot content.
For example, CockroachDB’s change data capture (CDC) feature allows you to keep cache layers informed of changes to the source-of-truth metadata database without having to manually build message queuing systems or integrate additional tools into your stack.
Another CockroachDB feature, row-level data homing, makes optimizing performance for users across the globe easier and may help you avoid some of the issues related to serving users the correct localized version of the content and ads they’re viewing.
But you don’t have to take our word for it! Spin up a free serverless cluster in seconds and start kicking the tires for yourself.
The details in this post are based on The Netflix Tech Blog post titled “Towards a Reliable Device Management Platform”. …Read More