Data Masking and Encryption Are Different [video]
It is a common mistake to refer to data masking and data encryption interchangeably to mean the same things. While field-level encryption is considered one of many possible “data masking” functions, we define data masking and encryption as technically distinct processes below.
Note first however, that data masking is an IT industry term of art that usually refers in a generic way to any software function applied ito obfuscate, anonymize or otherwise de-identify sensitive or personally identifiable information (PII) in electronic data sources like these. These functions can apply to data at rest (static data masking), or to data in-transit (dynamic data masking).
For the purposes of this article and consistency with the definitions still used by some, we’ll refer to data masking here the subset functionality synonymous with data redaction or obfuscation via character replacement. In this context, masking characters are chosen to meet the requirements of a system designed to test or still work with the masked results. Masking ensures that vital parts of the PII string — like the fist 5 digits of a social security number — are obscured. And under this definition, the string-masked data is not recoverable.
Data encryption involves converting and transforming data into scrambled, often unreadable, cipher-text using non-readable mathematical calculations and algorithms. Restoring the message requires a corresponding decryption algorithm and the original encryption key. Its goal is therefore to be reversible.
When would you choose to use data masking vs data encryption?
Data masking as more narrowly defined above is often performed in the creation of test data, for medical research, and prevent unauthorized recipients from seeing or re-identifying the original content.
Application developers and those prototyping or benchmarking DB/DW operations commonly request production data for testing. Because that data can be sensitive, and pass through multiple hands, it is at great risk of theft or misuse. Therefore, it is necessary to irreversibly redact (cover or strip) the PII elements in the data set; e.g., names, addresses, SSNs, etc.
Common industry terms such as anonymization and de-identification can also refer to processes like these that help sever the identifying information in the data set. They prevent future identification of the original data even by the people conducting the research or testing. For example, one cannot discern or re-identify a social security number that presents with its first 5 digits covered by X’s.
For information on string masking (redaction) through character replacement, see: www.iri.com/solutions/data-masking/static-data-masking/redact
Data encryption is often used to protect data that is transferred between computers or networks so that it can be later restored. Data like this – whether in transit or at rest – can be vulnerable to a breach. Conversion of data into non-readable gibberish (or even format-preserved ciphertext which is still hard to crack) creates highly secure results. The only way to gain access to the data is to unlock it with a key or password which only those authorized can access.
For more information on column (field) encryption, see: www.iri.com/solutions/data-masking/static-data-masking/encryption
Thus it may be easy to think of data masking and data encryption as the same things, since they are both data-centric means of protecting sensitive data. However, it is their inherent procedures and purposes that differentiate them.
Data masking software from IRI protects PII in with a wide array of protection functions, including encryption, redaction with masking characters, hashing, pseudonymization, randomization, toenization, random noise, etc.. FieldShield for databases and structured files, CellShield for Excel, and DarkShield for unstructured text and documents, are the three static data masking products in the IRI Data Protector suite.
These three ‘shield’ products are also included in the IRI Voracity data management platform for data discovery, integration, migration, governance, and analytics. Dynamic data masking is also available via the FieldShield SDK.
This section of the IRI web site, and this section of the IRI blog site contain more information on data masking and encryption. For example, see the article Which Data Masking Function Should I Use?
5 COMMENTS
Both these terms are indeed different. Data encryption is used to encrypt data so that only people having secret key can access it, whereas data masking creates structurally similar but inauthentic versions of data. Thank you for this briefing on methods.
Nice article. Really helpful. Keep it up …
Thanks for sharing the information, you cleared my confusion regarding data masking and data encryption. Your information was vital.
Can the tool provide encryption to certain fields while obfuscate other fields?
I am looking for the solution that can support zip/city/state while the PII data obfuscated.
-frank
Yes, the idea of FieldShield is to ‘shield’ the fields (or DB columns) on a per-field basis based on your data/business rules. Please see http://www.iri.com/products/fieldshield/technical-details for links to the function choices.