SortCL Functionality

 

Next Steps
Overview Function Matrix Invocation & Logging Sample Uses Resources

What Can SortCL Do?


The Sort Control Language (SortCL) program in the IRI CoSort product or IRI Voracity platform accepts multiple inputs, including:

  • sequential (delimited or fixed-position), COBOL index, and semi-structured (flat JSON/XML) files
  • pipes
  • relational (and some NoSQL) database tables (collections) via ODBC
  • URLs for static and streaming sources, including S3/GCP/AzureBlob, HTTP/S, FTP/S, HDFS, MongoDB, Kafka, and MQTT
  • user procedures

in multiple formats, processes them in many ways, and produces one or more targets in multiple formats -- as well as customized reports -- all at once. See the table below and this diagram in the context of CoSort, or the data integration, migration, governance, and analytic portions of this diagram in the broader context of Voracity.

Specifically, SortCL can, in one job script and I/O pass, rapidly perform and combine data transformation, conversion, protection, reporting, and related processes:

Function
Actions
At the byte, field, and record level, plus duplicate removal and saving
Conditional (include/omit) selection with if-then-else, else-if logic
Multiple keys, directions, sequences
Two or more pre-sorted files
Two or more un/sorted sources on many conditions for ETL, file compares and change data capture (delta reporting) ops
Parallel roll-up and drill-down sum, min, max, average, and count values; accumulate (running); rank; lead and lag (sliding value windows)
Check
Verify source data is pre-sorted prior to sort or join operations
Resize, reposition, and realign fields
Change data types (e.g., EBCDIC<>ASCII, Packed<>Numeric)
Convert between file formats (e.g., Text <>XML<>VS<>RS<>ISAM<>Vision<>LDIF<>CSV<>JSON)
De-normalize and normalize dimensional layouts
De-duplicate, validate, homogenize, filter, find/replace, and re-structure
Integrate and segment data enhance row and column detail; create new data forms and layouts through conversions, calculations and expressions, and composite (templates)
via remapping and replication of columns and tables
Math and trig functions across detail and summary rows, plus internal and external stats functions
Bit-level manipulations and Perl-compatible regular expression logic for pattern matching, etc.
Check that character and field attributes match their specifications (i.e., "iscompares", gap analysis)
Sequence
For custom indexing, reporting, and database load operations, plus UUID/GUID value insertion
Discrete field substitutions, pseudonymization, etc., using "set" file field dimensions
For slowly changing dimension (SCD) reporting and data quality
Get discrete (lookup) values and virtualize results in reports and replicas
Mask (Protect)
Encrypt and mask data at the field level and audit data security measures; also anonymization, de-identification, filtering, and pseudonymization
Mask (Format)
Numeric, date and string layout masking to create or replace new value formats
Lookup
Discrete or random draws from set files for use in ETL lookup transforms, pseudonymization, and test data generation
Create randomly-generated or set-selected (safe) test data files (see RowGen)
Custom-formatted, segmented detail, and summary targets
Copy, manipulate, and move data from one or more sources to one or more targets
Complex field-level user functions (e.g., 3rd-party DQ libraries)

Beyond data staging, manipulation, and migration, use SortCL to report on changed data (inserts, updates, deletes), slowly changing dimensions, and trend line intersection.

Additional SortCL features support: metadata and master data management, clickstream analytics (data webhousing), real-time and near-real-time processing, customer data integration and segmentation, data wrangling (data preparation for BI and analytics), and multiple data governance objectives.

Share this page

Request More Information

Live Chat

* indicates a required field.
IRI does NOT share your information.