Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG
transform mapping diagram mongodb

Detecting Incremental Database Changes (Oracle to MongoDB ETL)

  • by Claudia Irvine

Detecting additions and updates to database tables for data replication, ETL, PII masking, and other incremental data movement and manipulation activities can be automated in IRI Voracity workflows designed and run in IRI Workbench (WB). This article explains how to regularly check for changes in Oracle source tables to decide when to move data into a MongoDB target. 1 Note however that in 2022, IRI also added the “Ripcurrent” module for automatic (real-time), incremental data replication and classified PII masking for selected relational databases. 

In the example this article details, changed data is loaded into different databases or files using a task-scheduled batch file or shell script. This can be done using a timestamp and specific fields in the source table. Error checking is included and can also be reacted upon.

This example will be created and run on a Windows machine; however, it can be easily modified to work on a Linux or Unix like platform.

Creating the batch file is easy using a Voracity Flow diagram in WB. In this example, the source table contains columns named CREATION_DATE and UPDATE_DATE that are important in this job.

The image below shows the steps that are contained in the batch file. To summarize:

  • the job is run in a specific directory
  • an environment variable is set using the timestamp of the last job run
  • the current timestamp is recorded
  • current changes are captured
  • the error level is checked and acted upon if successful or not
  • the current timestamp overwrites the last run timestamp
  • the changed data is converted to CSV
  • a stall occurs to wait for the last file to exist
  • the CSV file is imported into MongoDB
  • the error level is checked, the current file is truncated
  • the changes file is deleted

workflow diagram mongodb
Note that it is also possible to add data cleansing and data masking functions to the target fields within the same job. Each task block in the workflow is explained below. For the how-to on building Voracity workflows from the palette, see this article.

Change Directory

This block changes the current working directory to the one specified.

Set LASTTIME

This command line block sets an environment variable called LASTTIME. The value set to the variable is the contents of the file LastTime.txt. The timestamp in this file is the timestamp that was recorded during the last run of this job. If this is the first run, this file will have to be made manually with an arbitrary timestamp dated before this job is run.

Timestamp.scl

This transformation block uses the CoSort SortCL program in Voracity to query the source database for the current time. That timestamp is saved to a file called LastTimeTemp.txt. The reason it is stored in a temp file is so that both the current and last time stamps can be preserved until error checking occurs.

It is important that the timestamp comes from the database, and not the local machine. This avoids problems where the database and the execution environment are not synchronized.

Changes.scl

This transformation block does a few things. Displayed below is the Transform Mapping Diagram for this block. Input is the source table and output is the file current.txt.

transform mapping diagram mongodb

In the input Section Options, a query is submitted to the source table for all records that have a CREATION_DATE or UPDATE_DATE  greater than the environment variable LASTTIME.

While the output appears to have two targets , the data is actually being appended to the same file using two different conditions. In the first output section, there is an Include statement that finds all records that have a CREATION_DATE greater than LASTTIME. There is also an additional output field called CDC_TYPE. The string “CREATE” is recorded in that new field.

In the second output section, an Include statement finds all records that have a UPDATE_DATE greater than LASTTIME and where CREATION_DATE is not equal to UPDATE_DATE. This ensures that newly created files are not included in this pass.  The string “UPDATE” is recorded in CDC_TYPE.

Error CoSort

This decision block checks the variable ERRORLEVEL to make sure it returned 0 (or success) after running the CoSort job above. If it did not, the job continues to the EXIT block where the job is terminated. If it returns true, the job continues to the next block.

Rename LastTimeTemp

This command block copies the contents of LastTimeTemp.txt to LastTime.txt. This records the previously captured current timestamp in the file to be used for the next job run.

Convert.scl

This transform block takes current.txt and converts it to changes.csv. The conversion is from the default delimited file type to CSV. Using the CSV process type in CoSort prepends a header row to the output file using the field names. This is the task block where I could apply other manipulations (like data masking) to the data if I so choose.

Wait Files

This wait block stalls the batch file for 3 seconds and then checks for the existence of the changes.csv file before proceeding.

MongoImport

This command block executes the mongoimport command using the parameters specified in the properties view as shown below.

mongo properties block

The parameters indicate that the MongoDB database called fnx is to be loaded with the contents of the file changes.csv which is of the type csv and contains a headerline which defines the fields.

Note that Voracity supports other methods of moving and manipulating MongoDB data. See this example of using Progress ODBC drivers for data masking using built-in “FieldShield” functions. Voracity can also process BSON data directly via API through /PROCESS=MongoDB support in CoSort v10, now too.

Error Loading

This decision block checks the variable ERRORLEVEL to make sure it returned 0 (or success) after importing into MongoDB. If it did not, the job continues to the Delete-Changes and EXIT blocks where the job is terminated. If it returns true, the job continues to the next block.

Truncate Current

This command block truncates the file current.txt. This is to clear out the records that were loaded into MongoDB. If the import failed and the block above exited the job, then these changed records are appended to on the next pass. Then, as the job repeated, they would be loaded into MongoDB with the next group of changed records.

Delete Changes

This command block deletes changes.csv so that the next pass is started with a newly created file for the pass.

Batch File

The batch file and transformation scripts are created when the Flow diagram is exported. A copy of the batch file is below. Each block adds executable lines to the batch file.

batch file mongo

Task Scheduler

Using the Windows Task Scheduler, this batch file can be executed repeatedly to capture the changes in the source database.

Conclusion

With a little planning, and the use of command blocks, changes to a database table can be detected automatically using a batch file, and then scheduled to run at selected intervals.

Contact voracity@iri.com or your IRI representative for more information or help with your use case

  1. This approach differs from third-party log-based change data capture solutions, which typically have performance bottlenecks, and are limited to specific databases, and do not enable simultaneous data transformation, PII data masking, cleansing, and reporting.
Using the Job Script Editor in IRI Workbench
Which Data Masking Function Should I Use?
data migration data replication ETL IRI Voracity MongoDB mongodb database masking Oracle oracle data replication VLDB

Related articles

Schema Data Class Search
Connecting MariaDB and MySQL to…
Running IRI Software in a…
The IRI Platform
IRI Data Migration and Modernization
Using Ripcurrent for DB Structure…
Real-Time, Incremental Data Masking
Real-time Database Data Replication
Getting Started with IRI Ripcurrent
Mapping DB Data Types
Voracity Software Support for Cloud…
1 COMMENT
  • Incremental Data Replication in IRI Workbench - IRI
    April 12, 2018 at 8:25 am
    Reply

    […] this example of replication between Oracle and MongoDB. For help replicating data on a one-off or recurring […]

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact