Data Privacy and the Role of Data Masking

Issue 3, 2012 Download pdf

Test data privacy is critical to all enterprises and data masking is the best way to help secure your test data for better testing results and fewer budget overruns in your projects.  Learn more about Data Masking for Adabas and hear Forrester Principal Analyst Noel Yuhanna discuss best practices in data privacy in this free webinar
 

Introduction  
As the volume of personal data grows across industries and the number of data attacks on enterprises continues to increase, organizations large and small are seeking best practices on how to protect their data.  Not only are they concerned about protecting their production data, but also their nonproduction data as internal threats account for more than 60% of data attacks. Data masking is the method most recommended by industry experts for protecting nonproduction environments.
 

What Is Data Masking?
Forrester defines Data Masking as “the process of concealing private data in nonproduction environments such that application developers, testers, privileged users, and outsourcing vendors do not get exposed to such data.”

Private data includes:

  • Names
  • Addresses
  • Social security numbers
  • Credit card numbers
  • Financial data
  • Healthcare information
  • Employee or consumer information
  • Any other type of confidential information applicable to the organization

To conceal private data, you may write scripts to change numbers, names, addresses, etc. or use a vendor provided tool that automates the process. The goal is to have a masking solution that is scalable while concealing the masking logic so that it cannot be used to decode or recreate the original data.


Why Is Data Masking Important
Data masking is important for your organization because it not only protects private information but gives you the freedom to conduct robust testing with realistic data and outsource testing and development.  Nonproduction data is handled by many people, from the application developers, testers, and administrators to outsourcing vendors.  With data masking, you protect information from accidental and intentional threats to security.

Data masking enables you to:

  • Protect your customers and your company
  • Comply with data protection legislation, such as HIPAA, PCI DSS, GLBA and EU Data Protection Directive
  • Rapidly provision nonproduction environments
  • Improve test data by provisioning realistic test data

Data Masking for Adabas

Data Masking for Adabas is a fast, simple-to-use tool that enables you to de-identify (or mask) data to protect sensitive information while the referential integrity—how the data interrelates with other data and systems—remains intact. This allows testing and development teams to carry out project work against masked ‘live’ data efficiently while ensuring regulatory compliance. This section provides an overview of how it works.

Before masking your production data, you must first locate where your sensitive information is stored.  This is usually one of the most difficult aspects of data masking and requires you to work with various people to locate where all the sensitive information is stored. You may need to mask a few key tables and a number of fields of data.  Save time and expense by only masking sensitive information, not the entire database.

When you are ready to begin masking your Adabas data, create a copy of your production data with Adabas utilities.  The copy of the production data for masking can reside on either distributed systems (LUW) or mainframe.  The masking changes are then applied against this copy of the data as an update as shown in the process overview in Figure 1.

To establish the Data Masking Environment, you will need to install the Adabas SQL Gateway, the Mapping Tool (GT-Mapper), and Data Masking Engine on a LUW (Linux, UNIX, Windows) environment.

Define Masking Rules
Define your masking rules and run-time options through the Mapping Tool (GT-Mapper). To map the data, metadata is read from the Data Dictionary of the copied database using the Adabas SQL Gateway and deposited into a metadata repository.  After saving definitions made in Mapping Tool, the software creates flat files for the rules (CSV), run-time options (txt) and the start script for the actual masking step.

Data Masking for Adabas has a simple, intuitive user interface that allows you to map masking files using a graphic user interface or an Excel spreadsheet. It makes it simple to quickly provision high-quality, meaningful and compliant test data for use in nonproduction environments.  Data Masking for Adabas provides a rich set of pre-defined rules and the ability to create custom rules. Here is a quick look at some of the 50+ pre-defined rules:

  • Replacement
  • Hashing
  • Value Exchange

You may also elect not to get your test data from the production data.  Data Masking for Adabas can create it for you by generating original data for:

  • 5 or 9 digit ZIP codes
  • Credit cards
  • Social security numbers
  • Phone numbers

You can also provide your own “seed tables” of values to use with the Data Masking replacement function. For example, if you have a set of Last Names from North America and now you are working with a South American company, you can provide seed tables with values that are more applicable to the needs of your testing or development.

One of the best features of Data Masking for Adabas is the ability to maintain referential integrity of the masked data.  You can cross reference other tables and maintain associated links e.g. Employees file linked to Vehicles File via Social Security Number.


Mask the Data
The Mapping Tool, containing rules and run-time options, and the metadata repository feed the Masking Engine as shown in Figure 2. The Masking Engine is essentially an SQL-based application that uses the Mapping Tool rules to write to the copy of the production database.  The masking process itself runs as a batch job.

After you run the Masking Engine, depending on the option selected, an audit file is generated containing all actions along with the original and the new values.  Log files are written which contain information about the masking run and possible errors. If you just wish to simulate masking before actually making changes, just set “DBUPDATES=N” and view the audit files to see the resulting old and new values without changing the copied production database.

If you have a typical enterprise environment, you likely use many types of RDBMSs.  Fortunately, Software AG’s Data Masking solution is also available for most all RDBMS types and flat files for both mainframe and LUW environments.


Results
There are a few very important details that Data Masking must provide to successfully secure data while making it functional for using in test and development.  Of utmost importance, the masked data must look like production data and have referential integrity - in other words, the relationships about and between the data must be maintained.  While production data can be used as input for the masking process, your masked data cannot be reversed engineered. 

By providing multiple built-in routines and algorithms to mask data, Data Masking for Adabas can produce test data quickly with minimal time and expense. Since Data Masking also automatically masks consistently across different databases, precious time is saved ensuring consistency and integrity.

Using copies of live production databases can result in embarrassing yet preventable data leaks. These leaks not only incur penalties from the regulators but can damage credibility with customers and trading partners. Why risk using live data when masking your data is so simple?

Listen to Forrester Principal Analyst Noel Yuhanna discuss the challenges and best practices for securing your data in his webinar “Data Privacy and the Role of Data Masking

 


 1 Why Test Data Privacy Has Become Critical To All Enterprises, Noel Yuhanna, Principal Analyst, November 9, 2011