GDPR compliance is not easy, but CockroachDB can help

GDPR compliance is not easy, but CockroachDB can help

In the wake of GDPR going into effect last month, we have seen a flurry of emails as companies scramble to comply with the new rules. Successful GDPR compliance doesn’t stop there, though. Companies will be adjusting for the next few months as they work to become fully GDPR compliant. We had a conversation with Cockroach Labs co-founder and CEO Spencer Kimball on the topic, who detailed the nuances of the law and what companies should do to comply within the context of their databases.

Sean Loiselle: What does GDPR mean for the database itself? What are the implications or what are the technologies that GDPR affects?

Spencer Kimball: Well, it's important to keep in mind that GDPR isn't so much about the technologies that are used as it is about penalizing companies that fail to assume greater responsibility for protecting their customers’ data. It’s concerned with incentivizing better data privacy and governance outcomes, not mandating any particular piece of technology or technological standards. This is really the only feasible approach that also has any hope of not stifling innovation, because the technological landscape is evolving so rapidly.

The guiding principle behind GDPR is privacy by design. This means that security best practices must be implemented from the ground up, instead of tacked on as afterthoughts. In practice, this means that the infrastructure which supports a service must also have been designed with similar forethought for privacy concerns. GDPR judges outcomes, which are the product of end-to-end design, and a service composed of various layered technologies is only as secure as its weakest link.

Databases naturally assume a heavy share of the burden, as they broker access to and maintain the permanent storage of the data. For example, any database which is worth its salt must encrypt data in flight from the database to the application server(s). However, a further step can be taken to eliminate other threat vectors, such as encrypting the data when it is stored to physical media, or “at rest”.

Article 32 of the GDPR, which addresses “security of processing”, never explicitly mandates encryption in flight or at rest, because it maintains a less specific, but more encompassing directive that “the controller and processor shall implement appropriate technical … measures .. to ensure a level of security appropriate to the risk. ” The type and amount of data, as well as the severity of the harm if it were to be inadvertently exposed, must be taken into consideration, as well as the likelihood of an array of risks leading to security breaches. In the end, each company must look to what the industry as a whole is doing and make sure that, at the very least, it is not inviting risks which are out of step with industry peers. Actors involved in GDPR compliance must ultimately rely on good judgement, and also expect that to evolve over time.

CockroachDB encrypts data in flight by default, and has since the beginning. Version 2.1 is adding encryption at rest with sophisticated key management mechanisms. While GDPR doesn’t explicitly mandate that either of these encryption features be employed, you’d be on thin ice without both if, for example, you’re planning to store significant customer financial data.

Spencer Kimball: That’s just a taste of what you must rely on from the database to comply with GDPR. There’s plenty more about availability, access, integrity, testing, etc. But the most vexing aspects of GDPR go beyond what any database can hope to control because the compliance must cover the entire array of systems which a company uses to process customer data. GDPR aims not just to protect personal data, but to make it visible to its ultimate owners as well as make it disappear at their behest (“the right to be forgotten”). To comply with these requirements, a company first has to figure out where all the data for a customer lives. It turns out that’s a truly Herculean task for companies which have been processing data for decades. It could encompass hundreds of systems, thousands to hundreds of thousands of different data stores, including even random CSV files! That's the big data governance challenge. You've got data exhaust spread out over decades. The mind boggles.

Sean Loiselle: Data exhaust. I've not heard that term before.

Spencer Kimball: Think about exporting a SQL report into an Excel spreadsheet which then gets stored on some local hard drive somewhere. Over the years that gets copied into network storage and backed up in who knows how many places. That's just one simple data exhaust journey from a core system to the periphery. And core systems backup and migrate, upgrade, and splinter. This kind data is everywhere and that's a very big challenge to overcome. Unfortunately, CockroachDB isn't going to solve those kinds of problems.

But that segues into an area where new database technologies like CockroachDB have a really bright future: data sovereignty and domiciling. The GDPR imposes restrictions on the transfer of personal data outside of the European Union or to countries which have been deemed to have data protection “adequacy”. Interestingly, the United States has not been given the European Commission’s adequacy imprimatur.

Sean Loiselle: Is that true – the United States is not a trusted non-EU jurisdiction?

Spencer Kimball: It is. The United States is seemingly unconvinced that data privacy is a fundamental right. To adopt an adequacy decision, a country’s regulations must be at least as good as the EU’s. If they’re not, personal data may still be transferred, but only with the user’s explicit consent (or else only in specific cases and with significant restrictions). One of the things that GDPR goes above and beyond to make clear is that this consent cannot be buried in end user license agreements (EULAs). Consent must be explicit, front and center.

Sean Loiselle: Can you just provide the same kind of notifications you see on EU websites about using cookies, where someone has to accept it to continue using the site?

Spencer Kimball: Exactly, that's what you’d need to do. And that’s not necessarily such a tall order... Add an interstitial to your website which pops up and asks the user to agree that their personal data will be transferred for processing and storage to the United States, for example. That's really where GDPR requirements end, and good business sense kicks in. Companies have to start thinking, "Okay, but what do our customers want?" This is a point that Kindred brings up. They're keeping financial data for their customers. Those users, whether they're in the EU or whether they're in Australia, naturally would prefer that personal data be stored within their respective legal jurisdictions.

Let’s consider the business decision which must be made by a fictional company which has traditionally stored its data in an Oracle RDBMS instance in an East Coast datacenter. With GDPR, this company now must interrupt its European users for consent to transfer and store their personal data in the United States. If they have a regional competitor with a local service, there is every reason to believe they will lose customers who find the consent disagreeable, or even alarming. Make no mistake, these consents are fundamentally warnings from the European Commission to consumers.

Things become even more problematic for SaaS businesses, where personal data is processed and stored on behalf of other companies’ customers. In these circumstances, adverse reactions can have a more dramatic impact on the bottom line because the level of concern is magnified. This is because it’s shared both by the end consumer, who must still be served a consent request, and also by the SaaS customer who must assess the cost to their business of using a non-local provider for the same service.

How does our fictional company solve this emerging problem? If they chose to keep their existing data architecture, they’d need to create two versions of their service: one for the United States, and a new siloed service for the European Union. The better solution is to use CockroachDB’s geo-partitioning feature, which provides the ability to domicile data in close proximity to customers. Physically the database would have nodes in both the US and the EU, but to applications it would present logically as a monolithic database. While CockroachDB has had the ability since version 1.0 to replicate databases and tables with geographic constraints, geo-partitioning significantly extends this by allowing row-level control.

Data domiciling is where we provide a differentiating capability for GDPR compliance. It's not compliance per se, but it unlocks the ability to create architectures where data can be processed and stored local to the customer, avoiding the costs associated with consent for cross border data transfers. It’s also worth mentioning that this can result in significantly reduced customer-perceived latencies!

Sean Loiselle: Could you do something like this with other technologies on the market? For example, say you're using Cassandra, you're a small SAS company, how would you approach data domiciling if you wanted to?

Spencer Kimball: Cassandra does have some of these same capabilities if you squint really hard. In point of fact, with talented engineers, you could string together multiple instances of MySQL or Oracle and build a middleware layer which abstracts access from the application to hide the complexity of what’s happening beneath the hood. But then you’d essentially have built another database, and it’s yours to maintain. Ouch.

What CockroachDB is bringing to this is a polished solution which otherwise looks just like a traditional monolithic RDBMS to application developers. The goal here is to make life easy for developers. Their energy is best spent iterating on the business use case, not struggling to write middleware, or dealing with the lack of transactionality or consistency.

Sean Loiselle: What doesn’t work well, or maybe a better question would be: what’s next?

Spencer Kimball: While geo-partitioning is a foundational building block for data domiciling, it currently lacks ways to define sophisticated access control policies. With those, it could implement policy enforcement and appropriate auditing mechanisms. There’s plenty of work to do in this area as geo-partitioning is adopted more widely.

Sean Loiselle: Gotcha. This was all the time that we had with you, thank you.

As GDPR compliance continues to take effect, it is important to consider which tools and technologies will allow you to work smarter, not harder, to adhere to the new rules. CockroachDB is one such tool which allows companies to comply simply by nature of the technology. To take the next step forward, keep an eye out for our upcoming PDF, Scale Your App with GDPR in Mind , for more details on how your company can scale with CockroachDB while fulfilling the rules for GDPR compliance.

Illustration by Christina Chung