Since January 28, 2020, the EU has issued $192 million (€158.5 million) in fines for GDPR (General Data Protection Regulation) violations (DLA Piper, Engadget). Although companies have had years to become fully GDPR compliant, compliance is not easy. After the GDPR took effect in 2018, we had a conversation with Cockroach Labs co-founder and CEO Spencer Kimball on the topic, who detailed the nuances of the law and what companies should do to comply within the context of their databases. This is a recap of that conversation from 2018.***
Editor's Note: Portions of the conversation below have been edited for accuracy and to better reflect the current state of the GDPR.*** ***
Sean Loiselle: What does GDPR mean for the database itself? What are the implications or what are the technologies that GDPR affects?
Spencer Kimball: Well, it's important to keep in mind that GDPR isn't so much about the technologies that are used as it is about penalizing companies that fail to assume greater responsibility for protecting their customers' data. It's concerned with incentivizing better data privacy and governance outcomes, not mandating any particular piece of technology or technological standards. This is really the only feasible approach that also has any hope of not stifling innovation, because the technological landscape is evolving so rapidly.
The guiding principle behind GDPR is privacy by design. This means that security best practices must be implemented from the ground up, instead of tacked on as afterthoughts. In practice, this means that the infrastructure which supports a service must also have been designed with similar forethought for privacy concerns. GDPR judges outcomes, which are the product of end-to-end design, and a service composed of various layered technologies is only as secure as its weakest link.
Databases naturally assume a heavy share of the burden, as they broker access to and maintain the permanent storage of the data. For example, any database which is worth its salt must encrypt data in flight from the database to the application server(s). However, a further step can be taken to eliminate other threat vectors, such as encrypting the data when it is stored to physical media, or at rest.
Article 32 of the GDPR, which addresses security of processing”, never explicitly mandates encryption in flight or at rest, because it maintains a less specific, but more encompassing directive that the controller and processor shall implement appropriate technical … measures... to ensure a level of security appropriate to the risk. ” The type and amount of data, as well as the severity of the harm if it were to be inadvertently exposed, must be taken into consideration, as well as the likelihood of an array of risks leading to security breaches. In the end, each company must look to what the industry as a whole is doing and make sure that, at the very least, it is not inviting risks which are out of step with industry peers. Actors involved in GDPR compliance must ultimately rely on good judgement, and also expect that to evolve over time.
CockroachDB encrypts data in flight and at rest with sophisticated key management mechanisms. While GDPR doesn't explicitly mandate that either of these encryption features be employed, you'd be on thin ice without both if, for example, you're planning to store significant customer financial data.
Spencer Kimball: That's just a taste of what you must rely on from the database to comply with GDPR. There's plenty more about availability, access, integrity, testing, etc. But the most vexing aspects of GDPR go beyond what any database can hope to control because the compliance must cover the entire array of systems which a company uses to process customer data. GDPR aims not just to protect personal data, but to make it visible to its ultimate owners as well as make it disappear at their behest (the right to be forgotten”). To comply with these requirements, a company first has to figure out where all the data for a customer lives. It turns out that's a truly Herculean task for companies which have been processing data for decades. It could encompass hundreds of systems, thousands to hundreds of thousands of different data stores, including even random CSV files! That's the big data governance challenge. You've got data exhaust spread out over decades. The mind boggles.
Sean Loiselle: Data exhaust. I've not heard that term before.
Spencer Kimball: Think about exporting a SQL report into an Excel spreadsheet which then gets stored on some local hard drive somewhere. Over the years that gets copied into network storage and backed up in who knows how many places. That's just one simple data exhaust journey from a core system to the periphery. And core systems backup and migrate, upgrade, and splinter. This kind data is everywhere and that's a very big challenge to overcome. Unfortunately, CockroachDB isn't going to solve those kinds of problems.
But that segues into an area where new database technologies like CockroachDB have a really bright future: data sovereignty and domiciling. The GDPR imposes restrictions on the transfer of personal data outside of the European Union or to countries which have been deemed to have data protection adequacy”. Interestingly, the United States has not been given the European Commission's adequacy imprimatur.
Sean Loiselle: Is that true – the United States is not a trusted non-EU jurisdiction?
Spencer Kimball: Correct. The United States is seemingly unconvinced that data privacy is a fundamental right. To adopt an adequacy decision, a country's regulations must be at least as good as the EU's. If they're not, personal data may still be transferred, but only through very specific legally permitted transfer mechanisms. One such transfer mechanism used for transfers to the United States called Privacy Shield was struck down in 2020 and that decision cast a great deal of doubt over the validity of using standard contractual clauses for those transfers, which most companies rely upon today. This is a big problem for anyone paying attention, not only do businesses need an adequate transfer mechanism, the ground is shifting underneath us by the day around which mechanisms are legally sufficient! Even if you have a way to transfer the data outside of the EU, you still need to tell your customers that you will be transferring their data and what transfer mechanism you will use. One of the things that GDPR goes above and beyond to make clear is that transparency to end users is very important and shouldn’t be buried five clicks deep on your website. End users need to be affirmatively told where their data is being sent.
Sean Loiselle: Can you just provide the same kind of notifications you see on EU websites about using cookies, where someone has to accept it to continue using the site?
Spencer Kimball: Exactly, that's what many companies will need to do. And that's not necessarily such a tall order... Add an interstitial to your website which pops up and lets the user know that their personal data will be transferred for processing and storage to the United States, for example, and to ask for consent where its needed. That's really where GDPR requirements end, and good business sense kicks in. Companies have to start thinking, "Okay, but what do our customers want?" This is a point that Kindred brings up. They're keeping financial data for their customers. Those users, whether they're in the EU or whether they're in Australia, naturally would prefer that personal data be stored within their respective legal jurisdictions.
Let's consider the business decision which must be made by a fictional company which has traditionally stored its data in an Oracle RDBMS instance in a US-east datacenter. With GDPR, this company now must interrupt its European users to explain the company’s plans to transfer and store their personal data in the United States. If they have a regional competitor with a local service, there is every reason to believe they will lose customers who find this plan disagreeable, or even alarming. Make no mistake, these requirements are fundamentally warnings from the European Commission to consumers.
Things become even more problematic for SaaS businesses, where personal data is processed and stored on behalf of other companies' customers. In these circumstances, adverse reactions can have a more dramatic impact on the bottom line because the level of concern is magnified. This is because it's shared both by the end consumer, who must still consent to or be informed about the transfers, and also by the SaaS customer who must assess the cost to their business of using a non-local provider for the same service.
How does our fictional company solve this emerging problem? If they chose to keep their existing data architecture, they'd need to create two versions of their service: one for the United States, and a new siloed service for the European Union. The better solution is to use CockroachDB's geo-partitioning feature, which provides the ability to domicile data in close proximity to customers. Physically the database would have nodes in both the US and the EU, but to applications it would present logically as a monolithic database. While CockroachDB has had the ability since version 1.0 to replicate databases and tables with geographic constraints, geo-partitioning significantly extends this by allowing row-level control.
Data domiciling is where we provide a differentiating capability for GDPR compliance. It's not compliance per se, but it unlocks the ability to create architectures where data can be processed and stored local to the customer, avoiding the costs associated with cross border data transfers. It's also worth mentioning that this can result in significantly reduced customer-perceived latencies!
Sean Loiselle: Could you do something like this with other technologies on the market? For example, say you're using Cassandra, you're a small SaaS company, how would you approach data domiciling if you wanted to?
Spencer Kimball: Cassandra does have some of these same capabilities if you squint really hard. In point of fact, with talented engineers, you could string together multiple instances of MySQL or Oracle and build a middleware layer which abstracts access from the application to hide the complexity of what's happening beneath the hood. But then you'd essentially have built another database, and it's yours to maintain. Ouch.
What CockroachDB is bringing to this is a polished solution which otherwise looks just like a traditional monolithic RDBMS to application developers. The goal here is to make life easy for developers. Their energy is best spent iterating on the business use case, not struggling to write middleware, or dealing with the lack of transactionality or consistency.
As GDPR compliance continues to take effect, it is important to consider which tools and technologies will allow you to work smarter, not harder, to adhere to the new rules. CockroachDB is one such tool which allows companies to comply simply by nature of the technology.
Illustration by Christina Chung