Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG
Set File Search

PII Lookup File Searches

  • by Pat Rushlaw

IRI provides multiple data discovery features for personally identifiable information (PII) and other sensitive or need-to-be-found data held in enterprise sources.

Beyond data class, pattern- and fuzzy-match searches described elsewhere in this blog, this article primarily discusses the search for values held in a lookup or ‘set’ file (e.g., a list of names). The feature is supported in the IRI FieldShield data masking product for databases and files, the IRI CellShield Enterprise Edition (EE) data masking product for Microsoft Excel spreadsheets, and the IRI Voracity platform for data lifecycle management. 

Specifically, the new search capability is built into the wizards for database profiling, flat-file profiling, and dark data discovery in the IRI Workbench GUI (built on Eclipse), which supports FieldShield and all other IRI software. And, the same string-search feature was added to CellShield EE for masking data in Excel spreadsheets.

Value Searches in DBs & Files

To use this lookup feature in the database or flat-file profiling wizards, find the Column (DB profiler) or Field (file profiler wizard) Selection page, and select the check box for Expression Search. On the next page, select Values File from the Search Type list. Then browse to find and select the set file with the values to be searched. Complete the rest of the fields on the page, and click Finish.

Value Searches in DBs & Files

 Value Searches in Text, Documents and other Unstructured Data Sources

The Dark Data Discovery wizard can also find values in lookup files because they are text files, and that wizard can use multiple search methods to find data in any text file, Microsoft Office and PDF document, MongoDB or Cassandra, or in popular image file formats (even images embedded in documents). The wizard extracts and buckets both the values it finds and the metadata for the files in which those values are found into a delimited flat-file, or in an Excel Interchange File (EIF) for use in CellShield EE. Note however that this wizard is usually with the IRI DarkShield tool for finding and masking pre-classified PII hidden in unstructured data sources.

Dark Data Discovery wizard

 Value Searches in Excel

Alternatively, CellShield EE’s new Set File [based] Remediation feature can find values in any Excel 2010 or 2013 spreadsheet that exist in a set file, allowing you to mask those values via encryption, redaction, or pseudonymization. You upload the set file, choose the preferred protection function, and click “Remediate.” A popup lets you know when the operation is done.

Pseudonymization, by way of example, is a good way to de-identify names or other proper nouns while preserving realism in the target. Pseudonyms can be reversible, or not.

    1. In your worksheet, click the Import Set File icon in the CellShield ribbon to open the Set File Search utility.
    2. Browse to the set file with the names you are looking for, and try to load it. You will get an error if there’s a problem with the file — it must be a list of ASCII values delimited by a space or carriage return. Click OK to continue.Value Searches in Excel
    3. For the Remediation Type here, we’re choosing Pseudonymization, though we could also choose redaction (full/partial cell), or encryption (AES 128, FPE AES 256, etc.), instead.
    4. Click on Find Matches, and the Menu gives a count of the matches found, highlighted in red. Click OK to continue.
    5. Tick Recoverable to save a restore set, or Non-Recoverable to prevent data restoration.
    6. For Recoverable, a Restore file is automatically created in the “CX-Pseudo_Restore” folder on the local drive.
    7. The original Set File is scrambled to randomly create pseudonym (substitute) values, which will also get saved into a recovery file for optional restoration later.
    8. Click Remediate to pseudonymize the names in the sheet. You can see the scrambled names are now in place.More Value Searches in Excel
    9. You can restore the original names by using the recovery file. Click the restore tab in the set file module. Navigate to that file in the CX-Pseudo_Restore folder, and click the Restore button.

Set File Search This feature is also offered in Bulk Remediate mode. Using your .eif file, you can simultaneously protect all the set file items in all the discovered sheets (lookup values) in the same way with the set file remediation function you choose.

Using the RowGen Test Data Job Wizard
Creating a Voracity Flow from the Palette (Part 1 of 2)
data discovery data lifecycle management data masking database enterprise data sources files IRI CellShield Enterprise Edition IRI FieldShield IRI Voracity Microsoft Excel personally identifiable information PII spreadsheet

Related articles

DarkShield PII Discovery & Masking…
Masking Flat Files in the…
Directory Data Class Search Wizard
Masking PII in a Relational…
IRI Data Class Map
Schema Data Class Search
Training NER Models in IRI…
Masking NoSQL DB PII in…
Masking RDB Data in the…
IRI DarkShield-NoSQL RPC API
Find & Mask File PII…

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact