Imanis Data offers companies highly scalable data management software for NoSQL databases, Hadoop, and modern enterprise data warehouses. When such assets, or subsets of them, are deployed to smaller secondary sites for, say, testing, development, or research, their sensitive personally identifiable information (PII) must remain confidential. How? Enter masking.
Masking is the process of anonymizing or faking the values of PII data. Examples of such data are social security numbers (SSN), taxpayer identification numbers (TIN), credit card numbers, full names, and addresses. These data values will be replaced by randomly generated fake values.
This article briefly describes how the Imanis Data masking algorithm does the job.
Key properties of the masking algorithm are these:
Our data masking currently supports the following masks:
A primary use case for our solution is test/development management, that is, the enabling of self-service access to production data in various sandbox environments. Since data sets often contain confidential data and since various people will process the data, confidentiality becomes mandatory.
Consider this instance: You need to test data from a Hive warehouse, but a customers table contains PII data, specifically credit card numbers, that must be masked before deployment to a test/dev cluster.
To illustrate, the following table compares the original unmasked credit card numbers, taken from the customers table, with their counterparts after the masking process has run. Notice the consistency property effect: 5156916119601001 is masked to the same random value—5599281322705890—both times it appears in the table.
A graphical user interface on the Imanis Data Management Console guides you through the five steps to creating a mirroring workflow and configuring the masking of the customers table data. We show the first and last steps. Figure 1 illustrates the first step.
In steps 2 through 5, you specify the data to be masked. At step 2, you scroll to Specify Masking, where you click the Apply button (step 3). For step 4, you select the mask for the table column, and, finally, at step 5, you review and approve the applied mask. Figure 2 illustrates this final step, and the screen shown in Figure 3 verifies that the customers table is configured for masking.