The Best Way to Find and Mask Unstructured Data
With so much personally identifiable information (PII) and other sensitive data hidden in semi-structured and unstructured (or so-called "dark data") files, you need a way to find, report on, and mask that PII efficiently, wherever it is. IRI DarkShield does all this so you can adhere to your business rules and privacy laws pertaining to this data. Consider some of the advantages of using DarkShield:
Just as you can use IRI FieldShield or IRI Voracity to find, classify and mask PII at the column, table, or schema level -- or IRI CellShield EE in Excel sheet cells -- you can use DarkShield or Voracity on PII in one or more files, folders, NoSQL DB collections, network-mounted drives, and buckets in Amazon, Azure and GCP.
Whether you search, mask, or do both at once, you have the option to secure data surgically, or en masse to save job specification and execution time.
Rather than requiring separate learning, specification or execution steps for each file type you have, DarkShield handles proprietary formatting differences automatically. This means you can search, classify, extract, mask, and report on all DarkShield-supported file types at once -- from log files to emails and from pdf's to images -- all in the same design spec.
DarkShield also supports multiple search methods (see below), and many of the same data masking functions available to FieldShield users.
You can also combine all of these features in the same DarkShield job. That is, you can locate the PII as you've defined it in data classes wherever it is on your LAN using multiple search techniques, and at the same time or later, automatically applying the masking functions you've assigned.
Save time by choosing from many pre-defined RegEx patterns for your searches, including credit cards, phone numbers, email addresses and national ID formats. Or, define your own pattern for any custom data format you need to classify, find, and mask.
Find PII values that exactly match -- or even roughly match (using fuzzy search algorithms) -- values in a look up-file that IRI includes (like common American first and last names), or that you provide (like employees, products, formulas, or places).
For sensitive information that does not conform to a pattern or has too many members for a literal pattern definition or poorly-trained NER model, set lookups are an especially good search option.
Natural Language Processing (NLP) and Machine Learning (ML) technology in DarkShield support Named Entity Recognition (NER) searches so you can find names, addresses, and other sensitive information in the context of your documents' sentence grammar. This is especially valuable for finding, and then redacting or pseudonymizing, the names of people. Their names do not always match patterns (or especially vice versa), nor values in a lookup set.
Direct search/mask support for structured, semi-structured and unstructured columns in any JDBC-connected RDB and NoSQL collection, index or cluster -- including MongoDB, Cassandra, and Elasticsearch, plus CosmosDB, DynamoDB, Google BigTable, Solr, Redis, Couchbase and Opensearch. This addresses testing, breach nullification, and data privacy law compliance objectives.
CSV, Excel, JSON, XML, and DB column filter specifications can be used to bypass row-by-row scanning for pattern and other data class search matchers, saving time in finding and masking high volumes of data.
Finds and masks sensitive data in .BMP, .DCM (DICOM), .GIF, .JPG/2, .PNG, and TIF/F file formats, either in a targeted way or in area bounding boxes.
Leverages IRI RowGen test data synthesis (random value generation or set file selection) to insert realistic values into one or more files based on a template.
DarkShield also uniquely includes user-friendly NLP model builders and trainers that make use of your documents for machine learning. This improves the relevance, and thus accuracy, of your NER searches (e.g., for people's names).
Save time and trouble as you define and catalog PII and other sensitive information in data classes or class groups using simple graphical wizards. More specifically, you match a chosen masking function to each data class (or group), so that mask will automatically be applied to that data in the remediation phase.
The same data classes you define for DarkShield jobs can also be used in DB, flat-file and Excel PII search and mask operations in IRI FieldShield and CellShield EE. They will also be supported in IRI Voracity data management jobs, too.
All these activities -- from data class definition, saving, re-use, and application, plus data masking, cleansing and integration -- run in the same pane of glass. All IRI data 'shield' and data management tools share a free graphical IDE built on Eclipse;®, IRI Workbench. All of these products and Workbench are also included in IRI Voracity total data management platform licenses.
Saves time and passes through your data by combining these processes in the same operation.
Allows you to just find, report on, and/or use the data you're searching for, without necessarily masking it. This saves on time and the storage space needed for (potentially multiple) masked copies of the data.
Supports external and automated (scheduled) runs of either combined or separated data searching and masking jobs. Also makes manual or graphical modification of these jobs possible, as your job specifications are saved in a single, self-documenting, and easily modifiable XML script.
Gives you a range of options to use on each class of data based on your business needs and the ability of the file format to support it. You can also use other IRI shield tools to mask or reveal the data if comes from or moves into a more structured environment.
For example, you can export a MongoDB column containing floating PII values out to a JSON file via Voracity, FieldShield, or NextForm so that DarkShield can mask that PII (and you can re-import it). Conversely, you could use DarkShield to find and encrypt names in unstructured documents, then extract and structure those masked names in a delimited file. You can then import that file into an Excel spreadsheet, where an authorized CellShield user can later decrypt the names.
Exports and structures your search results and selected file-related metadata into a delimited file that you can use for auditing, analytic, and data delivery purposes. This feature also enables compliance with GDPR data portability requirement by allowing you to provide the information you have on an requester in any format required.
If you use Voracity, all of this, including the format and disposition of the report, can be built into an visual, automated work flow.
DarkShield's dynamic, exportable html report incorporates the TXT results into visualizations of the number and types of files containing PII, and which of the file types containing sensitive data were also masked.
By importing DarkShield search and remediation log data into a SIEM tool like Splunk Enterprise Security, you can customize dashboard displays and adaptive framework or playbook events to make DarkShield audit and activity information part of the larger SOC management environment.
Creates DDF metadata for the .TXT results file for use in IRI Voracity ETL or wrangling jobs, CoSort or BIRT reporting, NextForm data migration/replication operations, or FieldShield structured data masking jobs.
Creates an Excel-compatible, CellShield EE-ready import file containing the results of .XLS and .XLSX file PII searches for spreadsheet recording, and localized sheet-level or bulk remediation operations, respectively.