Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG
native mongodb masking target field layout

Native MongoDB Masking (3rd Method)

  • by Chaitali Mitra

Note: This article covers the third available IRI customer method for statically masking or encrypting PII in structured MongoDB collections through the IRI FieldShield product or IRI Voracity platform (both powered by IRI CoSort v10 and its support of the native MongoDB driver). A fourth method is now available for finding and masking PII floating in unstructured MongoDB collections through IRI DarkShield.

 

This article explains how new, native MongoDB collection support in IRI’s data manipulation products enables the rapid performance and consolidation of multiple data-centric functions, including:

  • data profiling & cleansing
  • data masking & re-ID risk scoring
  • data integration (ETL) & CDC
  • data migration & replication
  • custom report & test data generation

The API-level support for MongoDB data is offered in the core SortCL program, which is the default processing engine of the IRI Voracity data management platform, as well as its subset products: CoSort, NextForm, FieldShield, and RowGen. This connection method is the third, and fastest, way that IRI customers can acquire and manipulate data in MongoDB collections. The first is via flat files as described in 2014 here. The second is with ODBC and JDBC drivers described in 2016 here.

About MongoDB

MongoDB is a “NoSQL”, document-oriented database which works on the concept of collections and JSON-like documents with schemas. A collection, like a relational database table, contains one or more documents.

A document is a set of key-value pairs. Documents have dynamic schema, which means that documents in the same collection do not need to have the same set of fields or structure. Common fields in a collection’s documents may hold different types of data.

The following chart compares the terminology of an RDBMS and MongoDB:

OracleMongoDB
DatabaseDatabase
TableCollection
RowDocument
ColumnField
Primary KeyObject ID provided by MongoDB

Demonstration

My example below does a simple sort, mask, and movement of a MongoDB collection called “chiefs” into another collection, and also out to a standalone JSON file. JSON files are another new IRI data source supported in the release of CoSort v10.

Database Connection

The first step is to establish connectivity with your MongoDB instance in IRI Workbench. Built on Eclipse, IRI Workbench is the common job design and deployment IDE for all the IRI software listed above. It’s also where you can connect to other DBs, HDFS, and additional data sources in local or remote file systems.

From the IRI Menu (swirl icon) in the Workbench toolbar, select IRI Preferences > IRI > URL Connection Registry > Add. Specify an existing MongoDB from the schema and the server hostname. My DB’s name is cmitra, created on the ‘mongodbu’ host:

native mongodb masking url connection details

For convenience, I installed the free MonjaDB plug-in for Eclipse to interact with the database within my Workbench perspective. My collection contains US Presidents by name, party, state, and their term dates. MongoDB assigned a unique ID for each document:

native mongodb masking chiefs file

Metadata Definition

Before I can process MongoDB data in this mode, I must create a data definition format (DDF) file that lays out my collection documents in /FIELD statements for use in SortCL-compatible jobs like FieldShield. As with other data sources supported in Workbench, a wizard does this for me automatically.

Create (or select an existing) Project Folder in Workbench to store the job assets like the source metadata, SortCL script/diagram, saved masking rules, data classes, etc. With that folder active (highlighted) and knowledge of my MongoDB collection’s location, I can now run the Discover Metadata wizard from the IRI menu in the top toolbar.

native mongodb masking data source identification

In this Data Source Identification step, I can select an ODBC connection, file-browse, or URL. In this case, I am reaching MongoDB via URL, so I choose that and MongoDB for my format. After browsing for the database registry entry I previously added, I enter the collection name. The wizard prepends this to the standard MongoDB connection string.

The Data Viewer and Field Editor page connects to the collection and previews its column contents and columns as SortCL field names:

native mongodb masking data viewer field editor

When I click Finish, my DDF file is created and displays as follows:

/FIELD=(PRESIDENT, TYPE=UTF8, POSITION=1, SEPARATOR="|", MDEF="president")
/FIELD=(PARTY, TYPE=UTF8, POSITION=2, SEPARATOR="|", MDEF="party")
/FIELD=(STATE, TYPE=UTF8, POSITION=3, SEPARATOR="|", MDEF="state")
/FIELD=(START, TYPE=NUMERIC, POSITION=4, SEPARATOR="|", MDEF="term.start")
/FIELD=(END, TYPE=NUMERIC, POSITION=5, SEPARATOR="|", MDEF="term.end")

This DDF metadata can be used in any job (or jobs) affecting this collection, including the one in this example as shown below. The _id column is unneeded for my purposes was omitted.

Note the use of dot notation in the “MDEF” attribute, which is a required path definition to discrete, flat elements SortCL jobs process. Multi-value and sub document array elements get generated in sequence starting with 0 in their order of appearance.

Run the Job Design Wizard

Workbench includes many different fit-for-purpose wizards for generating the metadata for running discrete tasks (.*cl scripts) or batch-driven jobs (.flow and .bat/.sh files) on the command line, through Workbench, your programs, etc.

I chose a basic sort transform wizard because within it I can add multiple targets, mask output fields, etc. This wizard runs from the CoSort menu group (stopwatch) icon from the toolbar. I select New Sort Job and specified chiefsout.scl as the name of my job:

native mongodb masking job specification file

In the next step, I identify the name of my source MongoDB collection and where it is, using the same locator URL used during the metadata definition for “chiefs”:

native mongodb masking data sources

The next step is to specify the generation, or use of existing, metadata for the collection in the data definition (DDF) format required in all SortCL-compatible programs. In this case, I will use the DDF file I created in Step 2 above. I select Add Existing Metadata to put these layouts directly into my job script as input specifications, and they then get reflected in the dialog below:

native mongodb masking data sources infile chiefs

On the next page, I specify a sort key field to define the order of data in my target(s). In this case, I will go from the default chronological order of presidents and sort them by name:

native mongodb masking key fields president

After clicking Next, the Data Targets page displays. As with my source options, there are several different kinds of targets I can create and format through this wizard and also later in another job design mode (e.g., script editor).

This and other Workbench wizards allow me to keep adding targets, and granularly defining the contents of each one. Target-specific attributes include: format (e.g., CSV, MongoDB, MFVL), formatting (e.g., headers and footers, rows to skip, etc.), filter conditions, field-level layouts, and applied functions/rules (e.g., new data types, masks, expression logic, aggregation).

In the Data Targets page, I know that I want two targets: a standalone JSON file and a remote MongoDB collection. Thus, I click Add Data Target and begin specifying the first one. Unless I edit the target options or target field layout, my output will only contain the sorted version of the input data as is. That’s what my JSON target file will contain by simply my naming it..

When I again click Add Data Target I define the second as a MongoDB collection via URL. My new chiefsmask collection will automatically build and populate when I run the job. It is within that target that I want one of the fields masked for privacy, the President’s names in this case. I click on Target Field Layout to open a top-to-bottom mapping page for my source and targets:

native mongodb masking target field layout

This page also shows my masking rule already defined. I had defined that by right-clicking on PRESIDENT in the target and clicking Create Rule (instead of browsing to an existing one). That opened a list of field-level protection wizards, shown below. Note that more static data masking functions are available.

I selected Masking (character redaction). On that page, I took the full-field default to mask out the entire President’s name in the target collection:

native mongodb masking new field and rule wizard selection

After finishing these selections, I am returned to the mapping dialog. I click OK and return to the Data Targets page. At the end of my changes, I will wind up with the details below:

native mongodb masking data targets

I click OK to end the wizard and build the job.

Review the Job

With the chiefsout.scl job now built in my active project and open for me in the editing window automatically, I can see and modify it from there, or through available in-context dialog pages. Learn about script editing options here.

I can also right-click inside the script or its file name in the project window, and from an IRI menu of options, build a flow diagram to visualize the task within a project. From inside the flow diagram, I double-click on the brown task block to open a transform mapping diagram showing the source-to-target field movement I’ve defined, this time left-to-right:

native mongodb masking flow

The chiefsout.scl script and chiefsout.flow file are both open, team-shareable metadata assets that share a common data model.

Run the Job

There are also multiple ways to execute the job I built, both from and outside Workbench.

Shown below is the source collection and my same job represented in script and outline form. After running the job, I opened a MonjaDB view of the sorted and masked collection target, and my sorted JSON file target to its right:

native mongodb masking workbench

As mentioned, many other permutations of, and faster data management jobs using, MongoDB data are now possible in this environment. IRI Voracity users can now choose from three connection methods to MongoDB and dozens of functional capabilities.

If you have any questions or need help, contact your IRI representative here.

Production Analytic Platform #1/4: Process on Par with Information
Production Analytic Platform #3/4: Processing Real World Data
CoSort data manipulation data masking data protection data transformation databases iri mongo mongo database masking mongo database protection mongo masking MongoDB mongodb masking SortCL VLDB Voracity

Related articles

DarkShield PII Discovery & Masking…
Masking Flat Files in the…
Directory Data Class Search Wizard
Masking PII in a Relational…
IRI Data Class Map
Schema Data Class Search
Training NER Models in IRI…
Masking NoSQL DB PII in…
Masking RDB Data in the…
IRI DarkShield-NoSQL RPC API
Find & Mask File PII…
2 COMMENTS
  • Direct Data Masking for MongoDB - IRI
    August 2, 2018 at 10:23 am
    Reply

    […] and a newer method through MongoDB’s native driver supported as of CoSort v10 in 2018 is here. IRI Chakra Max version 3 will provide for dynamic data masking and database activity monitoring, […]

  • Masking Data in MongoDB - IRI
    August 2, 2018 at 10:19 am
    Reply

    […] Introduction: This example demonstrates an older, less direct method of using IRI FieldShield to protect data found within MongoDB tables. As you will read, the MongoDB Export Utility would be used to extract data and create a CSV file that FieldShield would then process and apply data protections prior to loading the newly secured data back into MongoDB. IRI now offers direct drivers that move data between MongoDB collections and IRI software engines like FieldShield or Voracity. Our how-to-article on more direct data masking of MongoDB through ODBC (2016)  is here, and through MongoDB’s native driver supported as of CoSort v10 in 2018 is here. […]

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact