Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG

Linear Regression – A Predictive Tool in IRI Voracity…

  • by Dustin Ellsworth

Linear regression is a staple data analysis function for financial, economic, research, and many other disciplines, that helps discover new data correlations. Users of the IRI Voracity platform can now simultaneously process big data from any number of sources and present customized trend lines to help business users make predictions.

Linear regression expands on previously covered analytic or preparatory capabilities in constituent Voracity technology, such as:

  • prediction using standard deviation in IRI CoSort
  • unstructured data packaging for Splunk in IRI Workbench
  • protection of PII for Pentaho in IRI FieldShield

This article (and embedded video) demonstrates how to simultaneously perform regression analysis using the “quick_stats” field function in the CoSort SortCL program, and generate a .pdf report with confidence intervals on x-y value pairs using BIRT in the IRI Workbench GUI (built on Eclipse) and a Boost C++ library function. A planned update or alternative to this article will feature another graphical approach to predictive analytics in IRI Workbench using the Voracity data provider for KNIME along with connected decision tree learning and analytics nodes.

The auto-generated report provides a base set of statistics about the x and y values at the time the entire data set is ingested. Thus ‘big’ source data integration and calculation are running simultaneously with the visualization of those results … and all in the same “pane of glass” controlling both data processing and presentation. With Voracity’s task scheduler, you can run such reports at regular intervals and produce a uniform set of general statistics about the x and y data to analyze over time.

The sample report below contains a graph of the data points, a linear regression analysis line and equation, R-squared value, average x and y values, standard deviations of x and y, and corresponding confidence limits:

 Example Report

linear-regression

Use Cases 

  • You have (x,y) numeric data values that you want to analyze to determine possible correlation for predictive purposes.
  • You want a uniform analysis report of (x,y) data produced at a given interval for the purpose of determining change over time.
  • Example Linear Regression Use Cases:
    1. Regression Example, part 1: descriptive analysis (Duke.edu)
    2. Use Case1 – Linear Regression (Alpine)
    3. Linear Regression Analysis (NCBI)
    4. Statistical Sampling and Regression (Columbia.edu)

Demonstration Script (stats-test.scl) 

Demonstration Script (stats-test.scl)

Demonstration Video

Summary of quick_stats()

The function quick_stats (SOURCE X , SOURCE Y , FILENAME, X-AXIS LABEL, Y-AXIS LABEL ) takes in five (5) arguments as follows:

  • Argument 1 – SOURCE X value (integer or decimal)
  • Argument 2 – SOURCE Y value (integer or decimal)
  • Argument 3 – FILENAME you want the resulting report to have. This name will have the date and time appended upon creation. (ie. “FILENAME_2016-01-04_11.45.30”)
  • Argument 4 – X-AXIS LABEL you want on the graph (ie. “X_AXIS_LABEL”)
  • Argument 5 – Y-AXIS LABEL you want on the graph (ie. “Y_AXIS_LABEL”)

The linear regression line and all statistics contained within the report are calculated from the entire data-set.  Do to report size limitations, a maximum of 1000 points are displayed on the graph.  So, if 1,000,000 x-y value pairs are passed into the function, every one is used in calculations, though a random 1000 are displayed on the graph.

Required Preparation

Because this function is new, there are a few steps you must take to make it work:

  1. In IRI Workbench, create a new IRI Project named “Repo” from the New toolbar.
  2. Download the latest BIRT Runtime Release Build from the BIRT download page.
  3. Unzip the BIRT Runtime Release Build zip file into a temporary location.
  4. Copy the folder ReportEngine from inside the temporary location into the root of the drive that IRI software is installed on. Creating C:\ReportEngine for a default installation of CoSort.
  5. Download quick_stats.zip from the IRI Website.
  6. Unzip quick_states.zip into your CoSort install directory, C:\IRI\CoSort95 by default.
  7. Copy all the files from the examples\quick_stats_example_files folder unzipped from quick_stats.zip into the Repo project that you created in Step 1.

BIRT Report Designer

To customize the report, you must have the BIRT Report Designer plugin installed in the IRI Workbench (Eclipse) GUI as follows. Refer to this page to use the update manager to install it, version 4.7 or later to be compatible with SimRel versions of Workbench.

If you have any suggestions for a specific type of in-process data analysis you might find useful, please leave a comment below, or email voracity@iri.com.

Opportunities in Digital Business
Creating a Voracity Flow Using Existing IRI Scripts (Part 1 of 3)
BIRT Boost C++ Library data analysis Eclipse IRI CoSort IRI FieldShield IRI Voracity IRI Workbench kinear regression pentaho predictive analytics report SortCL Splunk standard deviation

Related articles

IRI Data Class Map
Schema Data Class Search
Masking RDB Data in the…
Find & Mask File PII…
Importing Data Classes into the…
Data Class & Rule Library…
Prepare and Protect Data for…
Connecting MariaDB and MySQL to…
Sharing IRI Data Management Jobs…
Running IRI Software in a…
The IRI Platform

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact