Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG

Big Data Transformations with CoSort (Structured Data)

  • by David Friedland

In 1992, Digital Equipment Corporation (DEC, long since acquired) asked IRI to develop a 4GL interface to CoSort in the syntax of the VAX VMS sort/merge utility. The result of that effort was the now widely adopted Sort Control Language (SortCL) program that is used to define data layouts and manipulations that go way beyond sort/merge.

SortCL now handles everything from data transformation and reporting to data migration and protection, and is the core of multiple spin-off products and a metadata infrastructure modeled and managed in the IRI Workbench GUI, built on Eclipse™.

In 1999, Database Trends Magazine studied the data transformation functions then in SortCL and labeled CoSort “The ETL Engine” in an edition dedicated to data warehousing. Indeed, since the mid 1990’s, hundreds of DW architects and thousands of EDW, ODS and database users around the world have deployed SortCL scripts directly, or within applications they use, to transform massive amounts of sequential data with built-in functions they can run alone or combination, such as:

Sort/Merge Match/Join
Select/Filter Aggregate
Find/Replace PCRE
Lookup Pivot
Rank Scrub/Cleanse
Remap/Reformat Substring
Convert Validate

In addition to the price-performance advantages made possible with the underlying CoSort engine and its

  • linearly scaling, multi-threaded, co-routine sorting algorithm
  • sophisticated memory management and good neighbor I/O
  • same-script/same-pass marriage of sorting to joins and aggregations
  • thread-safe APIs, and custom input, compare, output, and field functions,
SortCL also delivers on the promise of open systems by being:
  • cross-platform, by running on every flavor of Unix and Windows with the same scripts
  • self-documenting, via a language familiar to both mainframe and SQL users
  • easily invoked, and widely interconnected to third-party applications
  • interchangeable, through scripts you can easily convert to and from.

IRI’s sweet spot in the market remains the integration and staging of huge flat files which include bulk database extracts (e.g. from IRI FACT operations), mainframe datasets, web and IoT device logs, spreadsheet and application exports, PoS server and telco switch (CDR) feeds, COBOL and shell programs, and so on. With CoSort (SortCL) running in IRI Voracity workflows that include FACT (E) and table creation and bulk load (L) steps, end-to-end ETL jobs are built and run quickly in Eclipse or on the command line.

In Voracity, most SortCL jobs can run either in the default CoSort engine, or seamlessly in Hadoop MapReduce, Spark, Spark Stream, Storm, or Tez. Either option provides an extremely high-speed, simple, and low-cost approach without changing code.

More advanced users can write custom detail and summary reports and protect data at the field level in the same SortCL job script and I/O pass with their transforms. Data in HDFS, unstructured sources, or in otherwise non-sequential/non-relational formats, can pass through drivers, or memory through custom input procedures that structure and feed that data to CoSort (or Hadoop) for fast transformations and hand-offs to DB loads, data marts, visualization tools, etc.

Sort Demonstration [video]
ETL, ELT & IRI in Between
BI tools big data business intelligence CoSort data migration data protection data transformation ETL engine ETL tools FACT IRI Workbench SortCL structured data unstructured data

Related articles

IRI Data Class Map
Schema Data Class Search
Masking RDB Data in the…
Find & Mask File PII…
Importing Data Classes into the…
Data Class & Rule Library…
Connecting MariaDB and MySQL to…
Sharing IRI Data Management Jobs…
Running IRI Software in a…
The IRI Platform
IRI Test Data Generation

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact