Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG

SortCL-Compatible Excel Data Processing Examples

  • by Devon Kozenieski

This article is a continuation of the previous article, which introduced IRI support for XLS and XLSX file formats in the SortCL program used for moving and manipulating spreadsheet data in both the IRI Voracity data management (ETL, data quality, etc.) platform, and its included fit-for-purpose products: IRI CoSort, NextForm, FieldShield and RowGen.

These job samples help demonstrate the breadth of what these metadata-compatible applications can now do with data directly to/from Excel sheets in either XLS or XLSX formats.

Sample Job Scripts

Note: To view XLS and XLSX files from within IRI Workbench, simply double click on the file in Project Explorer.

Example #1: Synthesizing Test Data

This script demonstrates the use of an IRI RowGen job script for generating a small amount of random test data to an XLSX file, and an XLS file with a header.

/INFILE=simple.in
    /INCOLLECT=5
    /PROCESS=RANDOM
    /FIELD=(code,POSITION=1,SEPARATOR="\t",SIZE=5,TYPE=ALPHA_DIGIT)
    /FIELD=(value,POSITION=2,SEPARATOR="\t",SIZE=8,TYPE=WHOLE_NUMBER)
    /INCLUDE WHERE value >10

/REPORT

/OUTFILE="'Sheet1'!A1:B5;randomnums.xlsx"
    /PROCESS=XLSX
    /FIELD=(code,POSITION=1,SEPARATOR="\t",TYPE=ALPHA_DIGIT,SDEF="=A")       
    #format value field with a prepended string
    /FIELD=(valuef=format_strings("H2N3%s",value),POSITION=2,SIZE=12,SEPARATOR="\t",TYPE=ASCII,SDEF="=B")
/OUTFILE="'Sheet1'!BB1:BC6,HEADER;randomnums.xls"
    /PROCESS=XLS
    /FIELD=(code,POSITION=1, SEPARATOR=",",SDEF="Code=BB")
    /FIELD=(value,POSITION=2,SEPARATOR=",",SDEF="Value=BC")

XLSX Output:

XLS Output:

Example #2: Synthesizing Random Data with encrypted fields

This hybrid RowGen-FieldSheld job demonstrates random selection of names and email domains from set files and the subsequent masking of those names with format-preserving AES256 encryption with an encrypted passphrase. An email address is generated using the first letter of the first name plus the last name. The output is to a specific range of an XLS sheet.

/INFILES=random_file_placeholder
    /PROCESS=RANDOM
    /INCOLLECT=100
    /FIELD=(FIRSTNAME, TYPE=ALPHA_DIGIT, POSITION=1, SEPARATOR="\t", SET="C:/IRI/cosort100/sets/names/names_first.set" SELECT=ANY)
    /FIELD=(LASTNAME, TYPE=ALPHA_DIGIT, POSITION=2, SEPARATOR="\t", SET="C:/IRI/cosort100/sets/names/names_last.set" SELECT=ANY)
    /FIELD=(EMAIL, TYPE=ALPHA_DIGIT, POSITION=3, SEPARATOR="\t", SET="C:/IRI/cosort100/sets/free_email_domains.set" SELECT=ANY)
    /INREC
    /FIELD=(FIRSTNAME, TYPE=ALPHA_DIGIT, POSITION=1, SEPARATOR="\t")
    /FIELD=(LASTNAME, TYPE=ALPHA_DIGIT, POSITION=2, SEPARATOR="\t")
    /FIELD=(EMAIL, TYPE=ALPHA_DIGIT, POSITION=3, SEPARATOR="\t")
    /FIELD=(FIRSTINITIAL=sub_string(FIRSTNAME,1,1),TYPE=ALPHA_DIGIT, POSITION=4, SEPARATOR="\t")

/REPORT

/OUTFILE="'Sheet1'!B1:D100;PERINFO.xls"
    /PROCESS=XLS
    /FIELD=(FIRSTNAMEENC=enc_fp_aes256_alphanum(FIRSTNAME, "epass:mfvndoTjj8PnGCVCB9pU0Q=="), TYPE=ALPHA_DIGIT, POSITION=1, SEPARATOR="\t", SDEF="=B")
    /FIELD=(LASTNAMEENC=enc_fp_aes256_alphanum(LASTNAME, "epass:mfvndoTjj8PnGCVCB9pU0Q=="), TYPE=ALPHA_DIGIT, POSITION=2, SEPARATOR="\t", SDEF="=C")
    /FIELD=(EMAIL_FORMATTED= format_strings("%s%s@%s",FIRSTINITIAL,LASTNAME,EMAIL), TYPE=ALPHA_DIGIT, POSITION=3, SEPARATOR="\t", SDEF="=D")

Set files:

Output:

Example #3: Move flat, fixed data to a specific section of an Excel sheet

This NextForm data migration script reads a flat file with fixed position field data and maps up to 44 rows to an XLS sheet between the BA and BD column range. The header row of column names is derived from the first half of the SDEF attribute in the /OUTFILE field statement. It also demonstrates that the SDEF argument is not case sensitive.

# Three character columns are not supported by XLS- the maximum column value is IV
# Header row value specified as the first part of the SDEF. After the equals sign is the column letter.
/INFILE=chiefs
    /PROCESS=RECORD
    /FIELD=(name,POSITION=1,SIZE=27)
    /FIELD=(year,POSITION=28,SIZE=12)
    /FIELD=(party,POSITION=40,SIZE=5)
    /FIELD=(state,POSITION=45,SIZE=2)

/REPORT

/OUTFILE=”'Sheet1'!BA1:BD44,HEADER;chiefsAllXLSeasy.xls”
/PROCESS=XLS
/FIELD=(name,POSITION=1,SEPARATOR=”\t”,SDEF="NAME=BA")
    /FIELD=(year,POSITION=2, SEPARATOR=”\t”,SDef="YEAR=BB")
    /FIELD=(party,POSITION=3,SEPARATOR=”\t”,sdef="PARTY=BC")
    /FIELD=(state,POSITION=4,SEPARATOR=”\t”,sDEF="STATE=BD")

Input File:

Output Sheet:

Example #4:  Sort data from specific range of XLS sheet, select only one field to actually be output

# CoSort SortCL Example Job
# Extracts & Sorts Selected Sheet Columns, Outputs to Pipe
/INFILE=”'Sheet1'!B1:E43;chiefsALLXLS.xls”
    /PROCESS=XLS
    # select only two input columns from a dataset that spans more than those columns.
    /FIELD=(year,POSITION=1,SEPARATOR=",",SDEF="=C")
    /FIELD=(party,POSITION=2,SEPARATOR=",",SDEF="=D")

/SORT
    /KEY=(year)

# sort by year field, but only output party field, with a maximum size of 5 characters.
/OUTFILE=stdout
    /PROCESS=RECORD
    /FIELD=(party,POSITION=1,SIZE=5)

Result:

Example #5: Generates Sorted Test Data and Populates Excel in a “vertical” or inverted fashion.

/INFILES=random_file_placeholder
    /PROCESS=RANDOM
    /INCOLLECT=10
    /FIELD=(FIELD1, TYPE=ALPHA_DIGIT, POSITION=1, SIZE=6, SEPARATOR=",")
    /FIELD=(FIELD2, TYPE=ASCII, POSITION=2, SIZE=7, SEPARATOR=",")
    /FIELD=(FIELD3, TYPE=UPPERCASE, POSITION=3, SIZE=5, SEPARATOR=",")

/SORT
    /KEY=(FIELD1)

/OUTFILE=”'Sheet1'!A1:K3,HEADER,V;rgen.xls”
    /PROCESS=XLS
    /FIELD=(FIELD1, TYPE=ALPHA_DIGIT, POSITION=1, SIZE=6, SEPARATOR=",",SDEF="FIELD1=1")
    /FIELD=(FIELD2, TYPE=ASCII, POSITION=2, SIZE=7, SEPARATOR=",",SDEF="FIELD2=2")
    /FIELD=(FIELD3, TYPE=UPPERCASE, POSITION=3, SIZE=5, SEPARATOR=",",SDEF="FIELD3=3")

Result:

Example #6:  Takes PII from a text tab separated file, writes an encrypted/pseudonymized version to one .xlsx file, and an unmodified version to another .xlsx file. The epass, or encrypted password, obscures the clear text encryption passphrase in the script.

/INFILE=personal_info
    /FIELD=(credit_card,POSITION=1,SEPARATOR='\t')
    /FIELD=(driv_lic,POSITION=2,SEPARATOR='\t')
    /FIELD=(name,POSITION=3,SEPARATOR='\t')

/REPORT

/OUTFILE=”'Sheet1'!MMM1:MMO10;personal_info_encrypted.xlsx”
    /PROCESS=XLSX
    /FIELD=(credit_card1=enc_fp_aes256_alphanum(credit_card,"epass:Gg87jCTR15Jro4AGE44ENw=="),POSITION=1,SEPARATOR='\t',SDEF="=MMM")
    /FIELD=(driv_lic1=enc_fp_aes256_alphanum(driv_lic,"epass:Gg87jCTR15Jro4AGE44ENw=="),POSITION=2,SEPARATOR='\t',SDEF="=MMN")
    /FIELD=(name,POSITION=3,SEPARATOR='\t',SDEF="=MMO",SET=C:\IRI\cosort100\sets\names\names_first_last.set)

/OUTFILE=”'Sheet1'!M1:O10;personal_info_unencrypted.xlsx”
    /PROCESS=XLSX
    /FIELD=(credit_card,POSITION=1,SEPARATOR='\t',SDEF="=M")
    /FIELD=(driv_lic,POSITION=2,SEPARATOR='\t',SDEF="=N")
    /FIELD=(name,POSITION=3,SEPARATOR='\t',SDEF="=O")

Input data (tab-separated text file):

9654-4338-8732-8128    W389-324-33-473-Q    Jessica Steffani
2312-7218-4829-0111    H583-832-87-178-P    Cody Blagg
8940-8391-9147-8291    E372-273-92-893-G    Jacob Blagg
6438-8932-2284-6262    L556-731-91-842-J    Just Rushlo
8291-7381-8291-7489    G803-389-53-934-J    Maria Sheldon
7828-8391-7737-0822    K991-892-02-578-O    Keenan Ross
7834-5445-7823-7843    F894-895-10-215-N    Francesca Leonie
8383-9745-1230-4820    M352-811-49-765-N    Nadia Elyse
3129-3648-3589-0848    S891-915-48-653-E    Gordon Cade
0583-7290-7492-8375    Z538-482-61-543-M    Hanna Fay

Output:

Encrypted/Pseudonymized:

Unencrypted:

See this article to compare this FieldShield-specific approach with other IRI tools for masking data in Excel.

Example #7: This IRI RowGen example demonstrates the synthesis of 5,000 rows of test data with a header, targeting both XLS and XLSX files based on random value lookups into static inline sets and set files.

/INFILES=random_file_placeholder
    /PROCESS=RANDOM
    /INCOLLECT=5000
    /FIELD=(FIELD1, TYPE=ASCII, POSITION=1, SEPARATOR="\t",SET={Dolphin,Fish,Sloth})
    /FIELD=(FIELD2, TYPE=ASCII, POSITION=2, SEPARATOR="\t",SET=”moreAnimals.txt”)
    /FIELD=(FIELD3, TYPE=ASCII, POSITION=3, SEPARATOR="\t",SET={Red,Orange,Yellow,Blue})

/REPORT

/OUTFILE=”'Sheet1'!A1:C5001,H;rgen.xlsx”
    /PROCESS=XLSX
    /FIELD=(FIELD1, TYPE=ASCII, POSITION=1, FRAME='"',SEPARATOR="\t",SDEF="Animal=A")
    /FIELD=(FIELD2, TYPE=ASCII, POSITION=2, FRAME='"',SEPARATOR="\t",SDEF="Another Animal=B")
    /FIELD=(FIELD3, TYPE=ASCII, POSITION=3, FRAME='"',SEPARATOR="\t",SDEF="Color=C")

/OUTFILE=”'Sheet1'!A1:C5001,H;rgen.xls”
    /PROCESS=XLS
    /FIELD=(FIELD1, TYPE=ASCII, POSITION=1, FRAME='"',SEPARATOR="\t",SDEF="Animal=A")
    /FIELD=(FIELD2, TYPE=ASCII, POSITION=2, FRAME='"',SEPARATOR="\t",SDEF="Another Animal=B")
    /FIELD=(FIELD3, TYPE=ASCII, POSITION=3, FRAME='"',SEPARATOR="\t",SDEF="Color=C")

Example #8: Unicode Example with no file line arguments and no SDEFs

This IRI CoSort job script demonstrates sorting Unicode data in a flat file and outputting it into an XLSX sheet with no arguments in the file line and no SDEFs:

/INFILE=chiefs10utf8.txt
    /PROCESS=DELIMITED
    /FIELD=(president,POSITION=1,SEPARATOR="|",type=UTF8)
    /FIELD=(term,POSITION=2,SEPARATOR="|",type=UTF8)
    /FIELD=(party,POSITION=3,SEPARATOR="|",type=UTF8)
    /FIELD=(state,POSITION=4,SEPARATOR="|",type=UTF8)

/SORT
    /KEY=state

/OUTFILE=chiefs10XLSeasyunicode.xlsx
    /PROCESS=XLSX
    /FIELD=(president,POSITION=1,SEPARATOR="|",type=UTF8)
    /FIELD=(term,POSITION=2,SEPARATOR="|",type=UTF8)
    /FIELD=(party,POSITION=3,SEPARATOR="|",type=UTF8)
    /FIELD=(state,POSITION=4,SEPARATOR="|",type=UTF8)

Input:

Output:

Example #9: Sorted US Presidents in Arabic

This script sorts the Arabic translation of U.S. president names and outputs to a specific column in an XLSX spreadsheet, with an English header of “NAME”.

/INFILE=chiefs_arabic
/PROCESS=RECORD
    /FIELD=(name,POSITION=1,SIZE=100,TYPE=UTF8)

/SORT
    /KEY=name

/OUTFILE=”'Sheet1'!BA1:BD45,HEADER;chiefsAllXLSeasy.xlsx”
    /PROCESS=XLSX
    /FIELD=(name,POSITION=1,SIZE=100,SEPARATOR="\t",SDEF="NAME=BA",TYPE=UTF8)

Input data:

Output sample:

Example #10: Shift data between different sheet names and ranges in XLSX and XLS, filter one field

/INFILE="'data'!WZX1092:XAD2092;dfdfdf.xlsx"
    /PROCESS=XLSX
    /FIELD=(SSN, POSITION=1, SEPARATOR='\t',SDEF="=WZX")
    /FIELD=(FIRST_NAME, POSITION=2, SEPARATOR='\t',SDEF="=WZY")
    /FIELD=(LAST_NAME, POSITION=3, SEPARATOR='\t',SDEF="=WZZ")
    /FIELD=(EMAIL, POSITION=4, SEPARATOR='\t',SDEF="=XAA")
    /FIELD=(GENDER, POSITION=5, SEPARATOR='\t',SDEF="=XAB")
    /FIELD=(IP_ADDRESS, POSITION=6, SEPARATOR='\t',SDEF="=XAC")
    /FIELD=(URL, POSITION=7, SEPARATOR='\t',SDEF="=XAD")
/INCLUDE WHERE GENDER eq "Male"

/REPORT

/OUTFILE="'Verylongsheetname'!EX1092:FD2092;outr.xls"
    /PROCESS=XLS
    /FIELD=(SSN, POSITION=1, SEPARATOR='\t',SDEF="=EX")
    # /FIELD=(FIRST_NAME, POSITION=2, SEPARATOR='\t',SDEF="=EY")
    # don’t include commented field in output
    /FIELD=(LAST_NAME, POSITION=3, SEPARATOR='\t',SDEF="=EZ")
    /FIELD=(EMAIL, POSITION=4, SEPARATOR='\t',SDEF="=FA")
    /FIELD=(GENDER, POSITION=5, SEPARATOR='\t',SDEF="=FB")
    /FIELD=(IP_ADDRESS, POSITION=6, SEPARATOR='\t',SDEF="=FC")
    /FIELD=(URL, POSITION=7, SEPARATOR='\t',SDEF="=FD")

Resulting Output:

Example # 11: This script demonstrates writing additional data to the same XLSX file as the data was read from, with an added header in the appended report. NOTE: This functionality is only available with the XLSX format and not the XLS format.

# write a report to the same file as data was read from
/INFILE=test.xlsx
    /PROCESS=XLSX
    /FIELD=(FIELD1, POSITION=1, SEPARATOR='\t',SDEF="=A")
    /FIELD=(FIELD2, POSITION=2, SEPARATOR='\t',SDEF="=B")
    /FIELD=(FIELD3, POSITION=3, SEPARATOR='\t',SDEF="=C")
    /FIELD=(FIELD4, POSITION=4, SEPARATOR='\t',SDEF="=D")
    /FIELD=(FIELD5, POSITION=5, SEPARATOR='\t',SDEF="=E")
/INCLUDE WHERE FIELD3 eq "December"

/REPORT

/OUTFILE=”G1:K351,HEADER;test.xlsx”
    /APPEND
    /PROCESS=XLSX
    /FIELD=(FIELD1, POSITION=1, SEPARATOR='\t',SDEF="Day=G")
    /FIELD=(FIELD2, POSITION=2, SEPARATOR='\t',SDEF="SSN=H")
    /FIELD=(FIELD3, POSITION=3, SEPARATOR='\t',SDEF="Month (December Only)=I")
    /FIELD=(FIELD4, POSITION=4, SEPARATOR='\t',SDEF="Integer=J")
    /FIELD=(FIELD5, POSITION=5, SEPARATOR='\t',SDEF="Decimal=K")

Input:

Output:

These examples demonstrate just an inkling of what is now possible against XLS and XLSX sources and targets using SortCL. In the next article, we will examine the xls2ddf utility, an executable that automatically builds SortCL data definition file (DDF) metadata files (field layouts) from existing spreadsheet headers.

Processing Data in, and for, Excel Spreadsheets
XLS2DDF: IRI’s Spreadsheet Metadata Conversion Utility
ETL Excel IRI CoSort IRI FieldShield IRI NextForm IRI RowGen IRI SortCL Microsoft Excel test data xls xlsx

Related articles

DarkShield PII Discovery & Masking…
Masking Flat Files in the…
Directory Data Class Search Wizard
Masking PII in a Relational…
IRI Data Class Map
Schema Data Class Search
Training NER Models in IRI…
Masking NoSQL DB PII in…
Masking RDB Data in the…
IRI DarkShield-NoSQL RPC API
Find & Mask File PII…

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact