Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG

What is Data Pseudonymization?

  • by Jeff Simpson

pseudonymizationThis article deals with pseudonymization as one method of de-identifying or anonymizing sensitive data.

The unauthorized use or misuse of our personally identifiable information (or PII) — such as name,  social security number, date of birth, mother’s maiden name, place of birth, etc. — can result in identify theft and other crimes related to impersonation, not to mention embarrassment, inconvenience and expense. For those organizations collecting or processing PII but not protecting it, there are serious legal and financial ramifications which is why more organizations are focused on data risk mitigation.

The “Guide to Protecting the Confidentiality of Personally Identifiable Information” by Erika McCallister goes into great detail about protecting PII, and the processes of pseudonymization and de-identification.

According to Wikipedia, and a few other online sources, pseudonymization is the process of “removing the association between data and the subject of that data, and adding an association between the data and an alternative identifier.”  It is the process of  “depersonalizing” the data so that any identifying fields within a record are replaced by one or more artificial identifiers.

In other words, personal data is removed from a database record or file and replaced with a pseudonym (pseudo-name or fake name) to protect the sensitive name. The fields are placed to look realistic.  Using a fake name can help protect sensitive PII from unauthorized misuse because it removes the individual’s association to the remaining data in the record or otherwise on hand.

Thus, the United States Department of Health and Human Services’ Health Information Knowledgebase maintains that “using pseudo-identifiers can assist in compliance with HIPAA regulations regarding suppression of patient identification information.” Article 4(3b) of the European General Data Protection Regulation (GDPR)  considers pseudonymization similarly compliant so long as “the data can no longer be attributed to a specific data subject without the use of additional information, [and] as long as such additional information is kept separately and subject to technical and organizational measures to ensure non-attribution to an identified or identifiable individual.”

Pseudonymization Methods

IRI FieldShield and IRI DarkShield data masking software products offer two primary ways to pseudonymize PII:

1) Unrecoverable.  This method uses a single-column source or ‘set’ file containing first names, cities, or other values that are listed and available for random selection in place of the original value. Because there is no association between the original and fake values, there is no way to reverse this process, even if you want to reveal the original.

2) Recoverable.  This method, which would not be considered compliant with the GDPR, involves a tabular relationship between the source data and its pseuduonym. In practice, a two-column set file using both real and fake data constitutes a look-up table that can be used for both pseudonym display and later restoration of the pseudonym values through a reverse lookup.

IRI recognized the value of this de-identification or pseudonymization method long ago when using set files in its test data generation tool, RowGen.  Test data quality is improved, without breaching privacy, when real-looking names replace actual names.

In practice, pseudonymization jobs can be complicated by the introduction of new values in the source; new substitute values need to exist to cover them, and done in such as way that reidentifiability is still prevented. One such remedy, documented here, is to use hashed name values stored in a .set file.

For Extra Security

Security can be compromised when someone can still guess the target individual’s real identity … perhaps because there are too many other identifying elements in the record.  In these cases, it makes sense to apply other protections to the remaining fields. Another consideration/requirement of the pseudonymization method regards reversibility–the extent to which the real data can be recovered or the ease with which it can be accomplished.  It may therefore make sense to pseudonymize or mask the data outright, with no means of restoring the original values.

Data Masking and Encryption Are Different [video]
Unload Very Large Databases
data execution protection data masking data protection data pseudonymization de identifying data de-identify data depersonalizing data FieldShield pseudonymization RowGen

Related articles

DarkShield PII Discovery & Masking…
Masking Flat Files in the…
Directory Data Class Search Wizard
Masking PII in a Relational…
IRI Data Class Map
Schema Data Class Search
Training NER Models in IRI…
Masking NoSQL DB PII in…
Masking RDB Data in the…
IRI DarkShield-NoSQL RPC API
Find & Mask File PII…
1 COMMENT
  • What is Data Pseudonymization » The Genius Blog
    August 27, 2012 at 9:15 pm
    Reply

    […] The CoSort Company is an expert solutions resource when it comes to data pseudonymization. You can read more information at the blog or click here for the main company website.Related Posts […]

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact