Challenges
Complex computations -- including multi-table joins, pseudonymization (substitution), and analytic output -- can be expensive and slow. Where a simple lookup can replace a runtime computation, the performance gain can be significant since retrieving a value in memory can be faster than computing that value.
Lookups typically occur in ETL tools and databases. Lookup transformations must be specially configured, and extra steps may be required to improve their performance in volume.
Meanwhile, analogous functionality has not been available in the file system where more speed, resources, and simultaneous functionality may be available.
Solutions
Multi-dimensional lookup table functionality is now available in external file environments. The IRI CoSort (data transformation and reporting) and IRI FieldShield (data masking) software products -- and the IRI Voracity data management platform that includes them --- all support lookup transforms in the pre-action and target layout definition phases of their jobs.
Substitutions are made from column values in a database, or from values in a delimited external (set) file. This approach bypasses database tuning and integrity issues to deliver:
- discrete solutions (a value query method)
- a pseudonymization method for data security
- an alternative to joins across many tables
By running lookups in the file system through set files and an explicit job script, you can spend less time preparing for, and getting results from, your data substitution sources.
In addition, you can:
- combine lookups with other processes in the same I/O pass
- process resulting values in the same job (SortCL script)
- immediately encrypt or otherwise protect the lookup results
- simultaneously format the lookup results in custom reports