Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG

Bloor Research on IRI Data Classification

  • by Daniel Howard, Bloor Research

IRI Voracity is a data management platform that we have previously talked about at some length. It offers capabilities for data integration, data governance, data masking, test data management, data quality, data cleansing, data migration, and so on (and on, and on). But it’s important to remember that many – if not all – of these activities rely on being able to classify your data, most prominently to locate the sensitive data that you will need to protect via your data governance, masking, and security processes.

Data classification also provides more general visibility into your data, which is generically useful for understanding your data but especially so if you need to identify particularly important data assets, as may be the case during (or rather, prior to) data masking or migration. With this in mind, it is both expected and appropriate that IRI Voracity provides a robust data classification system, designed to help you profile your data and find (then subsequently anonymize) sensitive data.

Specifically, Voracity allows you to define a selection of data classes for your system, with a default set (including commonly used but generic data classes like name, email, credit card number, and so on) provided automatically for each Voracity project that you can then customize and extend. A number of built-in privacy regulations (GDPR, CCPA and what have you) can also be used to automatically generate the data classes applicable to them. Data classes can be grouped together, and data class groups can be categorized further by assigning them specific sensitivity levels and/or associating them with one or more of the aforementioned regulations. Data classes can also be given a priority level, which determines which class takes precedence if a given data asset seems to fit into multiple categories at once.

Data class and rule library in the IRI Workbench GUI for Voracity

You can then search your system for data that falls into each data class. Search methods are attached to each data class, and a variety are available. This includes (but is not limited to) pattern matching, both strict and fuzzy data matching (using a preset list of associated values), column name matching, path matching, cell matching (for spreadsheets), and named-entity recognition based on Apache OpenNLP, PyTorch, or TensorFlow. The latter is particularly notable for leveraging semi-supervised machine learning to improve its efficacy. Also note that some of these methods are only applicable to structured/semi-structured data (specifically, the ones which only make sense where there is a structure, like column name matching – you can’t match based on column name if there are no columns) and some are only available in IRI DarkShield, not IRI FieldShield (such as named-entity recognition, cell matching and fuzzy matching), which otherwise share both the data classes themselves and much of the data classification functionality that is available.

Data classes can also have multiple search methods attached to them in order to make the matching process more accurate and produce fewer false positives. There are also considerations made for the sake of performance – tables that have already been scanned will be skipped during repeated discovery phases, for instance, and you can choose to exclude specific tables or data classes from the discovery process entirely.

In addition to these search methods, data classes can be paired with any of the anonymization methods available in Voracity, whether that means masking, redaction, encryption, or whatever, which will then be applied to any data that is subsequently sorted into that data class. This allows the product to essentially automate both the discovery and protection of your sensitive data through its voluminous collection of data masking (and other data security) rules. Notably, deterministic masking rules maintain the consistency, and thus the referential integrity, of your data.

Above the fold in an IRI DarkShield discovery report

Finally, this classification information is automatically aggregated and organized into an HTML report. This report is highly visual, prominently displaying various bubble and pie charts that highlight your top sources of sensitive data, your most common data classifications, and so forth. Various aspects of this report can be customized before it’s generated, such as the scale (either linear or logarithmic) it uses to create its leading bubble chart. You can also export your results to a third-party product, such as a dedicated visualization or analytics tool (Splunk, for instance), via an API.

Masking Floating PII in Database Columns
Navigating DPDPA Compliance

Related articles

Joining Flat-File & RDB Data:…
Data Scrambling: Format-Preserving Scramble (FPS)
Masking PHI in X12 EDI…
Oracle TLS Support
How to Mask PII in…
The Basic FieldShield Masking Job…
Masking PHI in HL7 EDI…
SQL Server SSL Connection
Navigating DPDPA Compliance
Masking Floating PII in Database…
IRI Voracity and Data Fabric

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact