How to Create a Custom Data Class Validator
This is the second of a two-part blog series detailing data class validation in IRI Workbench. The first article, here, provided an overview of the validation scripts and how to use them in a data discovery or classification job. This article shows how to create a custom validation script for a special data class or group.
In this article, we will create and format a credit card validation script for use in a custom data class. It should be noted that IRI Workbench already provides a credit card data class and validation script for your convenience.
As a prerequisite, you should probably be familiar with ES5 Javascript and Java 8. To follow along, you will need an IDE or text editor that supports Javascript 5 (ES5) and Java 8.
I will be using Visual Studio Code, an open source IDE from Microsoft for this section of the tutorial. Although I won’t be going into detail on how to setup Visual Studio Code, you can find more information about the setup process here and here.
How does IRI Workbench interpret and use the code?
Before we get into the tutorial, it might be helpful to give a brief overview on how our platform interprets a validation script. When a Javascript file is uploaded into a custom data class, the IRI Workbench will attempt to run the code through the use of the Java ScriptingEngine API.
The ScriptingEngine will then make the Javascript file implement a validation interface that contains a method called validate. An example of the validation interface can be seen in the below image (written in Java):
This will search the Javascript file for a function of the same name and make it executable in Java. For your convenience, the below image displays sample Java code calling and executing a method contained in the validation script.
Limitations of the Java ScriptingEngine API
The Java Scripting Engine API utilizes the Nashorn engine for interpreting Javascript code. With this there are a few notable limitations to keep in mind when creating your script:
- The engine only implements the ECMAScript 5.1 Specification. ES6 syntax is not supported.
- The nashorn engine does not have a console object. Running a script with console.log(“Hello World”) will throw an error. Use the nashorn print function instead. For example, using print(“Hello”, “World”) will print its arguments to standard out.
Step 1: Create the File
To get started, open up your preferred text editor and create a new javascript file. In the image below you can see I created a javascript file named validator-creditcard.js.
Step 2: Define the Validate Method
The javascript file must have a function named validate in order to work properly. This function will take in a single argument and returns either true or false.
You can consider this the most important function within the script since it will be the one invoked by IRI Workbench. Thus all validation logic should be contained in this function.
Step 3: Write the Logic
Logic will vary depending on the data your working on. For credit cards, the only validation logic that will need to be performed is a simple checksum using the Luhn Algorithm.
I won’t be going into detail on how to implement this algorithm but a good example can be found here. In the image below, you can see I implemented the validation logic using a helper function.
A few things to note:
- The input argument will always be a String
- The return value must be either true or false.
You may be wondering why the function is void of any pattern matching. That’s because the IRI Workbench has a separate field for uploading patterns (more on this in the next section). It will run your provided pattern first and then run the validation script.
Adding a Validation Script to a Custom Data Class
This section uses some elements of Data Classification, an integrated data cataloging paradigm for defining the search methods used for finding PII independently from the source of the data. While this section provides a small introduction to Data Classification, you may find it useful to read this article that explores the topic in depth.
Now that the validation script is finished, let’s create a new data class so we can add the script to the IRI Workbench. To get started, open up the IRI preferences screen. Select the IRI Menu dropdown and select IRI Preferences. Then select the dropdown for IRI (within the preferences window) and select Data Classes and Groups.
Select Add and it will bring you to this window (below).
Fill in the relevant fields and select Add in the Matchers section.
In the Data Class Matcher window, I added a regular expression pattern to the Details field. This will check that the credit card number matches a specified pattern.
In the Validator Script field, I added the file path to our validation script created in the previous section. Select OK and then Apply And Close to save the new data class into the IRI Workbench.
Doing so creates a new Data Class that can be used for any future data classification or data discovery job. If you have any questions about how to classify data for IRI Workbench-supported software like FieldShield, DarkShield or Voracity, email info@iri.com.