Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG

Masking Data in Pentaho

  • by Claudia Irvine

This article is second in a 3-part series on using IRI products to expand functionality and improve performance in Pentaho systems. We first demonstrate how to improve sorting performance, and then introduce ways to mask production data, and create test data, in the Pentaho Data Integration (PDI) environment.

Pentaho Data Integration (PDI) can do typical data transformations; however, it does not appear to have a native ability to mask or encrypt data flowing through Kettle. The need to protect data at risk, even in data integration scenarios, is expanding amid a plague of data breaches.

IRI FieldShield software provides many different data masking options for Pentaho users. You can call FieldShield job scripts from the Shell step in Pentaho to protect data at rest (in database and other structured sources).

As you can see by exploring your data source in Pentaho, sample Social Security Numbers are fully displayed.

Pentaho-fs-1

Using FieldShield, you can mask/encrypt/encode (and others) this data in your needed format. There are some predefined masks that make redacting all or part of an SSN extremely easy.

pentaho-fs-2

 

pentaho-fs-3

 

The script created in the IRI Workbench – the Eclipse GUI for FieldShield and other IRI software — gets run by Pentaho through the Shell step which executes a batch file. This batch file is not automatically created by the FieldShield wizard; however, this is easily created in the Workbench with four lines of code.

pentaho-fs-4

The only step that is needed to mask your data is a Shell step. Below is a simple job; however, the Shell step can be added to any of your jobs. The Shell step simply references the batch file to run during execution.

pentaho-fs-5

pentaho-fs-6

 

After executing the Pentaho job, the FieldShield job has now masked the data via the protection method selected in the FieldShield wizard.

pentaho-fs-7

The IRI Workbench has multiple options for data masking at the field level, along with other data life-cycle management activities, including: bulk database extraction and load acceleration, data transformations, data quality improvement, data and database migration, data replication, test data generation, data federation, and reporting. Any of these operations could be run alongside Pentaho to improve its efficiency.

Click here if you missed the previous blog on improving sort performance in Pentaho, or see the third article on creating test data for Pentaho.

Using CoSort to Speed Pentaho Sort Jobs
Creating Test Data for Pentaho
data life cycle data masking data protection database DB encode encrypt field level IRI FieldShield Kettle mask PDI pentaho Pentaho Data Integration shell test data

Related articles

DarkShield PII Discovery & Masking…
Masking Flat Files in the…
Directory Data Class Search Wizard
Masking PII in a Relational…
IRI Data Class Map
Schema Data Class Search
Training NER Models in IRI…
Masking NoSQL DB PII in…
Masking RDB Data in the…
IRI DarkShield-NoSQL RPC API
Find & Mask File PII…

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact