Challenges
DBAs and data architects trying to populate test databases often rely on unsafe, and inadequate protection data. There are a number of problems with this approach, including:
Testing with production databases is not a good strategy for several reasons:
- The database contains personally identifiable information (PII) subject to data privacy laws
- Access to the database itself may be restricted because of the PII or other information within
- The database may not exist yet
- Current values in the database do not reflect the future scope of data that must be tested
- Manual subsetting and masking are time-consuming and may leave the data unrealistic or still non-compliant
DBAs and application developers require test data that has the characteristic content, format, and relationships of future production data. This is the only way to verify that each application step and query will still work.
Developers require data that are secure (i.e., do not violate privacy rules) and do not depend on other developers or phases of development to produce the data they need for their phase. Development may also need to be outsourced, and to occur in different locations concurrently.
Solutions
IRI RowGen software removes the requirement of using production data in testing by generating production class test data instead. RowGen uses production metadata, not production data, to parse, generate, and populate target tables with structurally and referentially correct test data. See how you can automate this process in a fit-for-purpose wizard and leverage external data in it here.
Before you run the wizard, think about some of the larger issues to consider as you scope out your test data requirements. See our blog series on test data management starting from this article. Then, learn why the RowGen test data product -- or the IRI Voracity data management platform that includes it -- is the best technology solution for achieving these goals:
IRI RowGen software produces accurate, safe test data that reflects production database table formats, sizes, value ranges, and constraints - without production data. RowGen uses your DDL information to automatically build and load huge, structurally and referentially correct test tables fast. For details, see:
Blog > Test Data > RowGen Automates Database Test Data Generation
Valid test tables must contain the full range of values, data types, row layouts, and primary-foreign key relationships that DB application rely on. For information on how RowGen preserves data realism, see:
By generating test data for each phase of development, you can create phases simultaneously, and independently from other phases. For example:
- Step 1 - Read a personnel table and join health insurance claim data to make a status file
- Step 2 - Read the status file and generate a list of doctors
- Step 3 - Make a web-ready billing summary sorted by patient
RowGen can generate the input table for step 1, the status file for step 2, and the web report in step 3 -- all without needing real data or data from the other steps.
RowGen gives you the ability to synthesize any type, size, layout, and amount of safe test data so you can create and provide for a more secure development environment.
Alternatively, IRI FieldShield allows you to use real data that is protected with field-level functions (like encryption) on a need-to-see basis. For details, see Solutions > Data Masking.
Both tools co-exist in the IRI Workbench GUI which also features automatic DB subsetting with masking. Either way, with safe data, there is no need to certify or bond your outsourced application developers.
RowGen shows you what the real data can look like, and what the transformation and reporting application can look like. That's because both RowGen and the SortCL program in IRI CoSort and Voracity use the same metadata to define data manipulations and table layouts.
This means that RowGen can also perform the same data transformations and reporting functions you normally would in the same I/O pass and job script doing the test data generation. It also means that the same layouts (and even transformations) created for data synthesis jobs is immediately ready for data integration, masking, migration, reporting, and other data processing jobs if and when real data becomes available.
If you only use RowGen, you can easily upgrade to CoSort or Voracity to transform and report on the real data (when it becomes available). Using the same job script RowGen used to define and transform test data, you can transform real data in the same format.