Challenges
You need to remove or save duplicate records, and include or exclude records based on business rules for a variety of reasons, including:
- database updates, including change data capture and slowly changing dimension reports
- business intelligence, including customer data integration and segmentation
- database subsetting, for masking production data
- data migration, replication, and federation
- bulk data reduction, data forensics, and data quality initiatives
SQL select statements are great for "normal" database table sizes but, in volume (and depending on key relationships), can take a long time. They are also database-specific, and can hamper bulk unload and query operations when done for data subsetting.
Filtering beyond one database can be difficult, particularly in the context of heterogeneous data integration. Complex ETL work or custom code may be required. Thus, both specification and execution may be inefficient.
Solutions
The SortCL program in IRI CoSort (and IRI Voracity) offers several horizontal (record/row-level) and vertical (field/column-level) selection capabilities for any specified set of structured data sources. CoSort/SortCL spin-off products -- including IRI FieldShield (for data masking), IRI NextForm (for data migration and replication), and IRI RowGen (for test data generation) -- all have the same row and column selection (filtering) capabilities.
Manage the size, number, and flow of records before, during, and after data transformation. Eliminate, reduce, or hand-off of records for fast, SQL-oriented data filtering during input and output. Bulk data reduction increases efficiency for SortCL transformations, reports, loads, and other downstream processes.
Use condition logic (e.g., if-then-else expressions) to include, omit, and reformat records during transformation, reporting, and/or protection. Filter records through de-duplication and byte, header, and record-specific commands. For example, specify which value ranges are valid and output only records outside those values.
Validate character forms and perform other data integrity checks to reject or isolate bad records before loading them into a database.
Base specifications on your business logic, and save them in text scripts managed in Eclipse for easy modification and re-use.