Imanis Data and Data Masking

Uncategorized

Imanis Data offers companies highly scalable data management software for NoSQL databases, Hadoop, and modern enterprise data warehouses. When such assets, or subsets of them, are deployed to smaller secondary sites for, say, testing, development, or research, their sensitive personally identifiable information (PII) must remain confidential. How? Enter masking.

 

Masking is the process of anonymizing or faking the values of PII data. Examples of such data are social security numbers (SSN), taxpayer identification numbers (TIN), credit card numbers, full names, and addresses. These data values will be replaced by randomly generated fake values.

 

This article briefly describes how the Imanis Data masking algorithm does the job.

 

Feature Overview

 

Key properties of the masking algorithm are these:

  • It is consistent: A given value of data is always masked to the same value. For example, a given SSN always causes the same fake SSN to be generated. This is important for data integrity because it ensures that the statistical properties of the source data are preserved in the masked output.
  • It is stateless: No additional state need be stored in order to achieve the consistency property described above. Thus, extra protection or encryption for stored data is not required for guaranteed privacy.
  • It is one-way: The original value of data is not easily determinable from its masked value.

 

Supported Masks

 

Our data masking currently supports the following masks:

 

  • Social Security Number
  • Taxpayer Identification Number
  • Employee Identification Number
  • Credit Card Number
  • Full Name

 

Use Case

 

A primary use case for our solution is test/development management, that is, the enabling of self-service access to production data in various sandbox environments. Since data sets often contain confidential data and since various people will process the data, confidentiality becomes mandatory.

 

Consider this instance: You need to test data from a Hive warehouse, but a customers table contains PII data, specifically credit card numbers, that must be masked before deployment to a test/dev cluster.

 

To illustrate, the following table compares the original unmasked credit card numbers, taken from the customers table, with their counterparts after the masking process has run. Notice the consistency property effect: 5156916119601001 is masked to the same random value—5599281322705890—both times it appears in the table.

 

 

A graphical user interface on the Imanis Data Management Console guides you through the five steps to creating a mirroring workflow and configuring the masking of the customers table data. We show the first and last steps. Figure 1 illustrates the first step.

 

Figure 1: Specify that the Hive customers table contains PII data and must be masked.
Figure 1: Specify that the Hive customers table contains PII data and must be masked.

 

In steps 2 through 5, you specify the data to be masked. At step 2, you scroll to Specify Masking, where you click the Apply button (step 3). For step 4, you select the mask for the table column, and, finally, at step 5, you review and approve the applied mask. Figure 2 illustrates this final step, and the screen shown in Figure 3 verifies that the customers table is configured for masking.

 

Figure 2: (step 5) Review the applied masks and click OK.
Figure 2: (step 5) Review the applied masks and click OK.
Figure 3: The customers table is now configured for masking.
Figure 3: The customers table is now configured for masking.

 

Summary

We’ve given you a brief overview of data masking. To learn more about the Imanis software solution, read our solution brief or white paper or contact us with any questions you have.

Sign Up To Receive Imanis Data Updates