Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG
Import file system

Using Sonra Flexter to Process Complex XML Files in…

  • by Uli Bethke

Sonra recently demonstrated the processing of complex XML data in the IRI Voracity data management platform with the help of Sonra’s Flexter Data Liberator software. Flexter and Voracity are a match made in heaven. Flexter is great at converting complex files like XML into structured formats. IRI Voracity, through CoSort or Hadoop engines, excels at data transformations — as well as data masking — on large, structured datasets.

In this blog post, I will show you how my POC combined these technologies to break complex XML data into constituent CSV files, and then masked personally identifiable information (PII) in them, such as email addresses.  

Create Target Schema and CSVs from XML

As a first step,  we use Flexter to transform the XML data into CSV files. Flexter can work with arbitrarily complex XML schemas. For this example, I selected a particularly complex XML schema, the NDC standard from IATA . NDC stands for New Distribution Capability, and is a widely used industry standard in the aviation industry.

The schema contains hundreds of elements and is made up of X interconnected and embedded XSD files. It covers most business processes (Shopping, Order Management, Airline Profiles, etc.) in the aviation industry for the purpose of data exchange. You can download the NDC schema from the NDC schema page.

Below are some of the XSD files that are part of the NDC standard:
XSD files

Each of the above XSD files references the core schema edist_commontype.xsd, which contains the bulk of the schema. One of the elements in this schema is email contact. This is the file we will protect using some of the data masking functions available to IRI FieldShield product, or IRI Voracity platform, users.

<xsd:element name="EmailContact" type="EmailType">
        <xsd:annotation>
            <xsd:documentation source="description" xml:lang="en">Email address details, including application (I.e. home, business, etc.).</xsd:documentation>
        </xsd:annotation>
    </xsd:element>

We convert this schema and our XML files with Flexter by running the flexter_in_out.sh script. We pass in the name of the XSD and the folder where we have stored our XML files. This creates a relational target schema and the corresponding CSV files for each table.

./flexter_in_out.sh in/xml/ in/OrderCreateRQ.xsd

The output is the target schema:
Target Schema

… and the CSV files:
CSV files

An extract from OrderCreateRQ.csv:
Extract from ordercreaterq.png

Masking and Encrypting the Email Addresses

We have seen Flexter in action. So far, so good. Let’s hand the results over to Voracity. We feed the CSV output from Flexter into IRI’s csv2ddf utility, which runs in the IRI Workbench GUI for Voracity (built on Eclipse™) or on the command line. Either way, it parses the CSV files and creates a data definition file (DDF) for each of our CSV files.

DDF
Below is a screenshot of the generated DDF files:
DDF generated

Each DDF file contains a description of the fields for each input file. The DDF for OriginDestination1.csv, for example, is shown here:
OriginDestination1
An IRI job script — in the language called SortCL — references that .DDF as it specifies the sort of one field in the input file, and the masking of four of them on output:
IRI job script
Output File Results

Shown above are the output file results of the sorting and data masking job.

In addition to sorting and redaction, and a host of other data transformations, you can also use CoSort’s SortCL or FieldShield executables to encrypt and decrypt data in CSV files and other sources. Encryption is particularly useful for obfuscating data that needs to be restored at some later point.

specpassenger

Integration with the IRI Workbench GUI

Of course, we can also import our Flexter-generated CSV files into IRI Workbench, the graphical IDE for all IRI software, built on Eclipse. To do that, we select File > Import in the IRI Workbench menu. In the Import window, we select File System and select the folder with the data we generated in the original steps.
Import select

Select script, output, and metadata folders, and also select the folder that contains your project.
Import file system

Now that we have the files in the Workbench, we can apply additional transformations to the data using the comfort of a free and familiar, graphical user interface (e.g., additional protection rules). See this page for more information on the job design options available to Workbench users.
New field rule wizard

What’s next?

The above tutorial shows how a powerful data management platform such as IRI Voracity can work well in tandem with Flexter Data Liberator.

Obviously, we could make the whole process more seamless by directly integrating Flexter into the IRI Workbench GUI for Voracity; e.g., via an Eclipse plugin. Both IRI Voracity and Flexter also have advanced REST APIs to integrate the two tools even further. The above post should have given you an overview on what is possible at this juncture.

Data Replication in IRI Workbench
An Introduction to IoT & MQTT
Data Encryption XML Data Masking XML ETL Flexter IRI Voracity xml

Related articles

DarkShield PII Discovery & Masking…
Masking Flat Files in the…
Directory Data Class Search Wizard
Masking PII in a Relational…
IRI Data Class Map
Schema Data Class Search
Training NER Models in IRI…
Masking NoSQL DB PII in…
Masking RDB Data in the…
IRI DarkShield-NoSQL RPC API
Find & Mask File PII…

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact