Baidu: Supporting 50M Inserts a Day with CockroachDB in Production
Baidu runs CockroachDB to automate operations of two production applications that access 2TB of data with 50M inserts a day.
Baidu is a $90B+ internet company serving hundreds of millions of users with web products ranging from search to shopping to cloud storage. Their DBA team needs to support huge volumes of data while meeting the needs of internal application developers. With CockroachDB, Baidu gets a distributed database that scales horizontally while providing the SQL interface application developers are familiar with. CockroachDB now stores information gathered from across the web to drive interactive customer experiences.
Baidu started using CockroachDB when several DBAs started testing and contributing to it on GitHub. They evaluated CockroachDB by testing it with real application workloads, and it was through this testing that Baidu’s DBAs became convinced that CockroachDB had the appropriate architecture to support their needs.
Baidu’s DBA team needs to support nearly a billion users accessing applications in production, requiring infrastructure that can support reliable performance at scale. They currently rely on MySQL to do the job with multiple replicated shards and middleware to support critical applications. Baidu’s DBA team, however, wanted to try a different approach for a new application that needed to store increasing amounts of data while supporting continuous inserts with highly concurrent and real-time access. This application also needed secondary indexes to speed up queries, as well as support for basic real-time analytics to extract insights from existing data. Their existing MySQL deployment would require application developers to transform and modify data at the application layer, while NoSQL databases that sacrificed secondary indexes, aggregations, and transactions would similarly introduce complexity for application developers. For applications that needed scalable SQL, Baidu’s DBA team had to stick with a relational database. It was time to invest in a different database.
Baidu’s DBA team carried out a deep investigation into CockroachDB, contributing to our open source project and providing valuable feature feedback. They found that CockroachDB could handle their new application use case more elegantly than MySQL could with no middleware for operators to maintain and reduced complexity for application developers. Development teams could keep using SQL, while DBAs could provide a faster Recovery Time Objective (target time in which to recover from a disaster) and keep up with the growth needs of their development teams without having to modify or transform data for application-level use. They could add capacity by simply spinning up another server, installing CockroachDB, re-configuring a load balancer, and pointing the new node at an existing cluster. The database would then automatically route query traffic, rebalance, and replicate transparently to developers and operators. Better yet, all of the hardware Baidu provisioned to run CockroachDB was utilized to serve live application traffic.
Baidu’s DBA team is now running CockroachDB in production to support two new applications that would have previously used MySQL. These applications access 2TB of data with 50M inserts a day, taking advantage of SQL features like secondary indexes and distributed SQL queries. Baidu’s deployment of CockroachDB is simple with ten nodes installed on bare metal servers. A load balancer sits above the ten nodes to distribute traffic. With CockroachDB, Baidu’s DBA team can automate many of their manual processes, including setting up replication, managing rebalancing, and surviving failures. Baidu’s DBA team continues to contribute to CockroachDB, helping to build new features and improve the product’s usability. They have also partnered with Cockroach Labs to popularize CockroachDB globally with our first China-based meetup. We are excited to collaborate with Baidu to improve and spread the word about CockroachDB.