Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG

Predictive Analytics: Finding the Future in Big Data

  • by Dale Robson

As the old English proverb says, “necessity is the mother of invention.”  When it comes to modern business intelligence goals, these words of wisdom ring truer than ever.

Man-with-BinocularsThe availability of big data to business users already facile with reporting tools now necessitates analytic solutions that define trends and predict future events. Accurate models about consumer behavior, for example, can lead to better product development and marketing decisions that help companies compete and win. Predictive analytics (PA) is thus the key to unlocking more decisions from data.

“Predictive analytics is the practice of extracting information from existing data sets in order to determine patterns and predict future outcomes and trends. Predictive models and analysis are typically used to forecast future probabilities with an acceptable level of reliability. Applied to business, predictive models are used to analyze current data and historical facts in order to better understand customers, products, and partners, and to identify potential risks and opportunities for a company.” – Webopedia

Closely related to traditional business intelligence (BI) and analytics, PA is a growing trend benefiting from technological innovation and seeing an exponential increase in use cases. Insurance and financial industries for example, rely heavily on forecasts, and many are now making investments in predictive analytics a top priority. Using PA to properly assess risks based on actuarial data and proven hypotheses can mean the difference between new product ROIs and catastrophic liability. Weather models forecasting everything from hurricanes to sea-ice melt allow scientists to measure the effects of climate change and illustrate future scenarios. Crime prevention, genomics, human and knowledge performance indicators, natural resource exploration, project management, and other disciplines have stakes in PA.

Driving Growth

consumers-shopping-predictive-analyticsThere are many ways PA can help your  business grow, too. PA can provide valuable insight into consumer buying habits and patterns … empowering a company to forecast if and when prospective customers need incentives, as well as the proper timing of promotions or relevant ads that will resonate with return customers. For example, a computer tablet manufacturer needs to understand all of the reasons why its products are purchased. Durability statistics and competitive pricing data can be used to predict the right time to offer coupons for replacement models.

As online shopping has become prevalent, retailers must collect and store the massive amounts of data being generated, and comprehend it to gain meaningful insight and actionable objectives.

Going Mobile

Mobile devices only compound the amount of data companies must deal with, and force them to leverage multiple platform strategies — for both inbound data collection and output promotional campaigns simultaneously.

In e-commerce and web marketing, predictive models can be used with geolocation technology to gauge experiences and expectations. Social network purveyors are already implementing analytics into location data to predict future events and forecast popular trends. Software and hardware vendors are integrating location-based analytics with their business intelligence suites to address this need.

“Decision making and the techniques and technologies to support and automate it will be the next competitive battleground for organizations. Those who are using business rules, data mining, analytics and optimization today are the shock troops of this next wave of business innovation.” -Tom Davenport, Competing on Analytics

Big Data

To process various sources of “big data” for PA, some companies have begun to leverage the open source Hadoop paradigm for distributed computing:

“The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.” – Apache

Google invented some of its underlying technology to index collected information, examine user behavior, and improve the performance of its search and display algorithms. Yahoo has also made use of it on a wide scale.

Although Hadoop can process large amounts of data in different structures, its users must be fairly advanced and learn several different technologies. As with early databases that required custom code to process data, very few “out-of-the-box” applications run seamlessly or easily with Hadoop. For this reason, IRI’s big data preparation (packaging, protection, and provisioning) platform, CoSort, does not require Hadoop, nor carry its ramp-up curves or maintenance costs.

PA Tools & Preparation Alternativeskeyboard-predictive-analytics

A compelling open source solution for modeling and analyzing prepared data is through a new language called R from Revolution Analytics. Like SAS and SPSS, R uses big data computing and statistical analysis to model and mine data. According to the community website inside-R [edit: now defunct], “R is the leading language and environment for statistical computing and graphics.” The R Project involves “an international ecosystem of academics, statisticians, data miners, and others committed to the advancement of statistical computing.”

There are actually many PA tools available — some more capable, scalable, and affordable than others. Advanced reporting solutions from IRI are worthy of consideration because analytics can be combined in the same product, place and I/O pass with data transformation, conversion, and protection.

The SortCL program in the IRI CoSort product or IRI Voracity platform, for example, allows users to quickly sort, join, and aggregate values from multiple table and file sources simultaneously, allowing DBAs to virtualize or materialize variables they want to relate. SortCL can also filter, cleanse, enrich, mask, and reformat disparate data into moving aggregate and statistical reports that display trends from production (while de-identifying PII).

CoSort/SortCL “data franchising” or preparation activities have made advanced BI and PA display tools perform better since 2003, and without the need for Hadoop. SortCL’s fast mash-up and data blending activities hand digestible CSV, XML and table results off to those tools, removing what would otherwise be big data transformation and synchronization burdens for them or their users. For those with Hadoop, many of the same jobs can also run seamlessly in MapReduce 2, Spark, Spark Stream, Storm or Tez through Voracity’s VGrid gateway.

For more real-time preparation-to-dashboard results, SortCL integrates with BIRT in the IRI Workbench GUI, built on Eclipse. Consider this PA example with linear regression that leverages SortCL and BIRT. Additional visualization options through IRI Workbench jobs include a Splunk add-on, and a tie-up to the DW Digest cloud dashboard. R is also ready for Eclipse via Walware’s free “StatET” plug-in, and is thus easy to use in that  same GUI for correlation analysis on CoSort/Voracity-prepared data, too.

crystal-ball-predictive-analyticsRegardless of the tool used to facilitate predictive analytics, one fact remains: PA affects all of us, every day.  It may be largely unseen, but it’s a determining factor in millions of decisions. PA is harnessing the power of big data, and taking businesses and other organizations one step closer to becoming the fortune tellers they yearn to be.

Before or Instead of Hadoop, Try This …
Working Towards Data Quality
BI BI tools big data BIRT business analytics business intelligence CoSort data franchising data mining data preparation geolocation hadoop hadoop alternative IRI Workbench GUI predictive analytics revolution analytics SortCL tableau

Related articles

Prepare and Protect Data for…
The IRI Platform
Real-time Database Data Replication
Automating IRI Jobs Using File…
What’s New in CoSort 10.5?
An IRI Voracity Use Case…
Introducing the ASN.1 Format and…
Which IRI Data Masking Tool…
SortCL-Compatible Excel Data Processing Examples
Processing Data in, and for,…
IRI Product Nomenclature & Architecture
1 COMMENT
  • #10≤L Net:Links – #5 rassegna del 09.02.2014 | net:politics
    February 9, 2014 at 1:34 am
    Reply

    […] Predictive Analytics: Finding the Future in Big Data […]

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact