How Frank McSherry’s company achieves sub-10ms latency at scale

Last edited on January 10, 2024

0 minute read

    How do you build an application that’s scalable, correct, and fast all at the same time? The folks at Materialize know.

    Parker Timmerman, a member of technical staff at Materialize, spoke at Cockroach Labs’s RoachFest 2023 customer conference about his team’s experience building and transitioning their high-speed data warehouse to the cloud.

    What is Materialize, and what can it do?Copy Icon

    Materialize is, in Timmerman’s words, “an operational data warehouse.” The company was co-founded by Chief Scientist Frank McSherry, and its product is built on open-source frameworks Timely Dataflow and Differential Dataflow, which he created.

    The basic idea of Materialize is that while traditional data warehouses can store tons of data, making it easy to mine for past-performance insights, they’re generally not fast enough to be useful for serving real-time results. Building bespoke solutions to serve this data is costly and complex enough to be out of reach for most businesses. Materialize can stream data from various sources (your production database, Kafka, etc.) in real time with sub-second latency, allowing businesses to leverage the data near-instantly.

    This means that the two core requirements for Materialize’s system are speed and correctness. “Doing both of these things is kind of difficult,” Timmerman says.

    “Trying to go fast and be correct isn’t the easiest thing, but spoiler alert: CockroachDB helps us balance this tension.”

    Check out the full recording of Materialize’s RoachFest talk here.

    Transitioning to the cloudCopy Icon

    Materialize didn’t start out using CockroachDB, though. The product started as a single binary, and was limited to a single node. This helped them focus on the core value of their product – the incremental computation that allows them to serve sub-10ms responses to complex queries – but the single node limitation limited maximum database size and fault tolerance.

    Materialize knew that those limitations wouldn’t work for customers in production, so in 2022 they moved to a cloud-based architecture that could offer high availability and scalability. Critically, they needed to accomplish this without sacrificing those core requirements of speed and correctness.

    To accomplish this, Materialize focused on adopting managed services to reduce their operational burden – Amazon EKS for Kubernetes, Amazon S3 for blob storage, etc. But they needed a metadata layer, a source of truth that could maintain the records of all user objects (tables, views, etc.) as well as all of the blobs in S3. This meant choosing a database.

    Given they were already invested in the Amazon ecosystem, RDS/Aurora and DynamoDB seemed like worthwhile options to consider, but they also wanted to look at FoundationDB and CockroachDB.

    All four of these database options met their needs for consistency and data correctness. But Materialize also needs speed: sub-10ms reads, to be specific. In their testing, that requirement eliminated DynamoDB (“We were seeing 10-20ms reads,” Timmerman says).

    With speed and correctness checked off for the other three databases, they then looked at another important factor: scalability. Materialize is connection-hungry – “just a single user might have tens or hundreds of connections to our metadata layer,” Timmerman says. “That eliminated RDS and Aurora; during the evaluation phase we saw that these products have a hard cap of 5,000 connections, and that just didn’t work for us. We’d very quickly run into those scaling limits.

    That left FoundationDB and CockroachDB, but FoundationDB doesn’t offer a managed solution. Materialize’s goal with the cloud transformation was to reduce their own operational burden. CockroachDB’s managed solution, CockroachDB dedicated, allowed them to pass that burden along to the Cockroach Labs team so that their team could devote its resources to building their core product.

    “CockroachDB was the only product that satisfied all of [our] requirements,” Timmerman says, “so it’s what we went with.”

    What it’s like to collaborate with CockroachDBCopy Icon

    Timmerman says that starting from the evaluation phase, working with Cockroach Labs has been a positive experience. “They’re very apt and ready to receive feedback,” he says. “They added a number of features that we requested: Amazon CloudWatch metrics, SSO, as well as a bunch of others.”

    The Cockroach Labs team also took care to ensure that they were in sync with Materialize throughout the process. There were monthly leadership sync meetings, weekly team sync meetings, and a joint slack channel for direct communication at any time.

    The point of all of this, Timmerman says, “is making sure we’re a happy customer. Which, thumbs up, we are!”

    Implementation and technical detailsCopy Icon

    How does Materialize actually use CockroachDB, and how is it deployed? For all of the details, check out the full recording of Timmerman’s talk: