Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG
pentaho rowgen combined logo

Creating Test Data for Pentaho

  • by Claudia Irvine

This article is third in a 3-part series on using IRI products to expand functionality and improve performance in Pentaho systems. We first demonstrate how to improve sorting performance, and then introduce ways to mask production data, and create test data, in the Pentaho Data Integration (PDI) environment.

Abstract: IRI RowGen generates safe, realistic test data for multiple database and file targets, according to business rules. By calling RowGen jobs from Pentaho, you can supply data with the structure and relationships needed for immediate ETL and BI testing, but not expose personally identifiable information.

While Pentaho Data Integration (PDI) has a number of database tools, it does not have the native capability to create safe, intelligent test data. This becomes important when you want to prototype ETL operations, share new views or reports with co-workers, and develop new applications without relying on production data.

IRI RowGen software populates tables and flat files with benign test data for use in Pentaho and other applications. You would use the Shell step in Pentaho to call pre-defined RowGen jobs (or batch job) scripts.

We’ll begin the example with empty tables to be populated. This means their definitions exist. RowGen will rely on the DDL information to generate structurally and referentially correct test data soon. The Pentaho view of this stage setting is shown below:

pentaho.1

The next step is to build the test data using RowGen job scripts automatically created in the IRI Workbench GUI, built on Eclipse™. The GUI’s New DB Test Data job wizard for RowGen will connect to the same tables, parse their DDL, and produce a data generation batch operation that will run in Pentaho’s Shell step:

pentaho.2

While you can certainly add the Shell step to a larger Pentaho project, I’m only showing the steps needed to run the test data generation job. Create the job with a Start step and use the Shell step to reference the RowGen batch file created above:

pentaho-3

pentaho.4

After the Pentaho/RowGen job is executed, you will see your tables populated with the test data. Explore the data source again in Pentaho:

pentaho.5

For questions about the use of RowGen or its callability from third-party applications, email rowgen@iri.com. make sure you also saw our previous article on masking production data in Pentaho.

Masking Data in Pentaho
Just How Fast is ODBC? A “Loaded” Comparison.
BI database DB DDL ETL file targets IRI RowGen mask production data PDI pentaho Pentaho Data Integration realistic safe sorting test data testing

Related articles

Masking RDB Data in the…
Find & Mask File PII…
Data Class & Rule Library…
Connecting MariaDB and MySQL to…
Running IRI Software in a…
The IRI Platform
IRI Test Data Generation
IRI Data Governance
Pseudonym Hash Set (File) Creation…
Consistent, Self-Updating and Secure Pseudonymization
IRI Data Migration and Modernization

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact