A modern approach to test data management and data masking

A modern approach to test data management and data masking
[ Guides ]

O'Reilly Definitive Guide to CockroachDB

We literally wrote the book on building unkillable applications.

Read now

It was around the year 2010 when a customer of mine implemented data governance software and policies around the data within their production database. They put in the effort to ensure that their production data and systems were protected from prying eyes of a hacker and the treacherous fingers of an unscrupulous software programmer.

However, they left their development and test database environments relatively alone with little to no data controls in place. Worse yet, they extracted production data to be used in development. Then a breach occurred in their development environment (you know, the one with production data). The breach resulted in millions of records containing credit card information and customer information being stolen. The result was a multi-million dollar fine.  

Part of the issue was that the customer didn’t have a simple way to handle test data management and data masking in a cost-effective and efficient way. They didn’t have a way to keep that data up to date. They didn’t have a way to mask and refresh datasets quickly. They did the bare minimum and as a result, paid a hefty sum. In their defense, traditionally test data management has been fraught with problems. 

As part of this article we’ll discuss how to easily integrate CockroachDB into a devops environment, such as Delphix to automate Continuous Data and Continuous Compliance and to make DevOps test data management more resilient, scalable, and compliant. As we have seen the demand for CockroachDB sky rocket, so has the need for integrations and the need to seamlessly and easily blend into devops environments. As part of this article, we’ll discuss the challenges and benefits of test data management architectures as they relate to CockroachDB.

Traditional Challenges with Test Data Management (TDM)

Understanding test data challenges is key to understanding how best to implement the right tool. There are many hurdles to consider with TDM but they all fall into the following categories:    

  • Environment Provisioning is a Slow, Manual, and High-Touch Process
  • Software Development Teams Lack High-Quality Data
  • Data Masking Is Increasingly Important, But Adds Friction To Release Cycles
  • Test Data Requirements And Storage Costs Are Continually Rising

To address these challenges, IT organizations need to adopt the tools and processes to efficiently make the right test data available to project teams. A comprehensive approach should seek to improve TDM in each of the following areas:

  1. Data Distribution: Reducing the time to provision test data
  2. Data Quality: Fulfilling requirements for high-fidelity test data.
  3. Data Security: Minimizing security risks without compromising agility
  4. Infrastructure Costs: Lowering the costs of storing and archiving test data

For a deeper discussion around the fundamentals of test data management, refer to what is test data management?

The Distributed SQL Challenge

Software companies dedicated to TDM have been enabling enterprises to build, deploy and maintain secure DevOps environments with “real world” data WITHOUT actually providing “real world” data. In addition, the rise of Distributed SQL databases threw TDM companies a bit of a curve ball. For one, Distributed SQL companies fundamentally all try to address the same challenges of scale, latency, data resilience and ANSI SQL language support

That said, similarities between the various Distributed SQL offerings ends there. Without diving too deep into the architecture of CockroachDB, suffice to say, CockroachDB handles multi region, multi cloud, and hybrid cloud deployments in a rather unique manner. Yet we keep the database simple to use and scale (See: Reads and Writes In CockroachDB.)   

How to conduct Test Data Management for a Distributed Database

Now, the big question is: How does one conduct TDM for a distributed database? You need the right tools for this job. If you’re using a distributed database like CockroachDB, Delphix is the tool for TDM. 

Delphix provides the intermediary between CockroachDB and provides clean usable real world data for your dev environments, while reducing your overall technical debt and creating a repeatable process which will simplify operations. 

There are three main areas where you’d deploy Delphix with CockroachDB and it applies to very specific needs in each case. 

Test Data Management use cases for Delphix

Delphix is typically used to create non-production environments from production sources. As part of this section we will cover the main use cases Delphix employs with CockroachDB.

The main use cases are as follows:

  1. Data Masking of Sensitive Data 

  2. Application Development 

  3. Legacy application migration 

    Test Data Management Use Cases

Masking of sensitive data 

Data regulations are on the rise. Countries and states all over the world are issuing privacy regulations that apply broadly to businesses. Ransomware concerns are daily rising. There is always a risk of leaking data to bad actors. 

Masking of sensitive data

Delphix Continuous Compliance platform can integrate with CockroachDB to profile and identify sensitive data. In-Built Custom rules can be applied to Obfuscate sensitive data. Data masking secures your data by replacing values with realistic yet fictitious data. Algorithm frameworks are provided to help businesses mask everything from names and social security numbers to images and text fields.

This not only reduces the risk of exposing sensitive data to bad actors but also provides quality data to developers and testers.

The process is essentially straightforward.  A backup is taken off CockroachDB, whether and written to cloud storage, in this case AWS S3.  

From there, a virtualized environment is created. The data is then processed by the Continuous compliance engine. It will then provide masked fields so that your non-production environments are adhering to regulatory requirements. More importantly, you’re transactional non-prod environment is protected. 

Application development 

For new applications or existing applications the need for data to develop and test on is imperative. The greatest source of data is existing production systems. 

data in existing product systems

The virtualization and masking capabilities in Delphix provide a powerful combination to provide an integrated TDM solution using compliant production data as part of your DevOps process. 

Migrate legacy applications to CockroachDB

In today’s world, speed of software delivery is very critical. Every enterprise wants to boost the productivity and quality of work to stand out among the competitors in the business domain. 

migrate legacy applications

Legacy applications are incompatible with new systems to take advantage of modern platforms. Continued use of legacy applications leads to expensive support and maintenance. Migrating legacy applications is often difficult and time consuming. It is an iterative process which needs to be practiced multiple times until you get it right. You have to often redo the whole migration to get the desired results. This causes a huge loss of manpower and time. There is a constant need of testing the software iteratively on modern platforms. Another challenge during migration is exposing sensitive data in intermediate platforms till the final migration is done. 

  1. Delphix Integration with CockroachDB provides an empty database in which the data from legacy applications can be loaded with any existing tools. Once the data is available in Delphix, you can use built-in test data management features of Delphix DevOps platform to rewind, refresh, bookmark and versioning of data. 
  2. Delphix Continuous Data can be integrated with Delphix Continuous Compliance platform to Obfuscate sensitive data on lower environments with minimal impact to application development.

Summary

In today’s economy it is important to drive speed to market while protecting your enterprises most vital asset: Data. CockroachDB customers now can take advantage of Delphix Continuous Data for creating secure DevOps environments and Delphix Continuous Compliance for masking of data so customers can now modernize test data for CI/CD pipelines, and cloud adoption, while ensuring zero trust for test environments.

About the authors

Jeff Carlson linkedin link

Jeff has been in the tech sector for the last 20+ years, primarily focused on system support for various platforms (Unix, Windows, Linux), Open source (Hadoop, spark,) and database systems Netezza, Oracle, SQL Server, and CockroachdB.

Ajay Thotangare linkedin link

Ajay is a Senior Director at Delphix leading the Customer Engineering Team.

Keep Reading

How CockroachDB operates serverless clusters with Kubernetes

Welcome! If you find yourself here wondering What does “serverless database" even mean? you may want to start with …

Read more
The OpenTelemetry Collector as a platform for monitoring integrations

Over the past year, Cockroach Labs has been working hard to give our observability tooling some extra love. When it …

Read more
Enhanced data security with CockroachDB and Satori

CockroachDB helps small to large organizations manage their transactional data at global scale, with high-availability, …

Read more