Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG
screenshot showing a join in IRI Workbench

CLF and ELF Web Log Processing

  • by Chaitali Mitra

This article is second in a 3-part series on CLF and ELF web log data. We previously explained CLF and ELF web log formats, and now introduce IRI solutions for manipulating and using web log data. The final article discusses web log data masking.

IRI provides a number of tools for handling CLF and ELF web log data more efficiently, especially when those files contain hundreds of thousands, or millions, of transactions:

  1. IRI NextForm can filter, reformat, re-map, replicate, federate (virtualize), and report from these logs
  2. IRI FieldShield can mask, encrypt, and otherwise de-identify personally identifiable information (PII), like IP address
  3. IRI RowGen can create safe, realistic test data in either CLF or ELF targets, plus custom log formats you define
  4. The SortCL program in IRI CoSort can do all of the above, plus perform and combine fast sort, join, and aggregate transformations

All of these tools share a common 4GL metadata and Eclipse GUI (IRI Workbench). The job scripts that power these applications rely on data layouts expressed in a simple, self-documenting Data Definition File (DDF) format. The layouts can be referenced in multiple jobs, and/or pasted into individual scripts.


For CLF File Users

IRI software supports the following formats with ready-made metadata repositories:

Format     Description     Layout File
Common (Access) Log     contains basic information from the log     CLF_Access.ddf
Referral Log     contains corresponding referral information     CLF_Referral.ddf
Agent Log     contains corresponding agent information     CLF_Agent.ddf

 

Each DDF metadata repository template contains the /FIELD specifications that IRI software job scripts require.


For ELF File Users 

ELF files have a header containing lines of comments, followed by a line naming the data fields. IRI programs will skip the header when processing source data in the log when /PROCESS=ELF is specified in the input section of the  job script. To generate a header record in an ELF target that uses the file’s field names and positions, specify /PROCESS=ELF in the output section of the job script.

Note that you can automatically generate the data definitions from ELF log files for use in IRI software jobs. The “ELF2DDF”(Extended Log format-to-data definition file) utility is a command-line translation program for converting W3C web data descriptions to DDFs.

ELF2DDF works by scanning web log headers to produce a descriptive file name and field layout specifications. ELF2DDF is also a GUI-supported option. Select it from the drop-down menu in IRI Workbench metadata conversion wizard.


Web Log Data Integration and Masking (Combined) Example

The web log file below contains information about the visitor’s IP Address, User, Date, Time, Port, User Request, Method, Status, Bytes transferred, User Agent.

A “-” in a field indicates missing data.

log

The table below contains customer information from another source, including: IP Address, User ID, Phone Number, and Name:

input1

The job script below, written for the IRI CoSort package’s SortCL program, brings the two input sources together. In the same job script and I/O pass, the web log and customer table are sorted, joined, masked, and reformatted to produce an output report:

/INFILE=LOG
   /PROCESS=RECORD
   /ALIAS=LOGFILE
   /SPECIFICATION=metadata/logfile.ddf

 /INFILE="QA.CUST;DSN=OracleTwisterQA"
    /PROCESS=ODBC
    /ALIAS=CUST
    /SPECIFICATION=metadata/cust.ddf

/JOIN INNER NOT_SORTED LOG NOT_SORTED CUST WHERE LOGFILE.remotehost == CUST.remotehost

/OUTFILE=weblognew.out
   /HEADREC="Client-IP ENC-IP USERNAME CUSTOMER NAME \n\n"
     /FIELD=(LOGFILE.REMOTEHOST, TYPE=IP_ADDRESS, POSITION=1, SIZE=13, FRAME='\"')
     /FIELD=(MASK_CUST.REMOTEHOST=replace_chars(CUST.REMOTEHOST), TYPE=IP_ADDRESS, POSITION=16, SIZE=13, FRAME='\"')
     /FIELD=(CUST.USERID, TYPE=ASCII, POSITION=32, SIZE=16, FRAME='\"')
     /FIELD=(CUST.CUSTOMER, TYPE=ASCII, POSITION=43, SIZE=15, FRAME='\"')

The sources were joined over the visitor’s IP Address (remote host), and that key field was also the one masked with the replace_char() function. The result below shows the integrated and protected result of the consolidated operation:

output

Results can also be sent to stdout (instead of a saved file or table); such ad hoc views are typical of data federation or virtualization projects.

screenshot showing a join in IRI Workbench
Data sources and target results shown with the above job script, along with the transform mapping diagram representing it in IRI Workbench, the free Eclipse job design IDE for Voracity.

See the next article on How to Mask Data in Web Logs for information on IRI solutions for protecting clickstream data at the field level.

CLF and ELF Web Log Formats
How to Mask Data in Web Logs
4GL CLF data masking de-identify Eclipse ELF encrypt federate filter IRI FieldShield IRI NextForm IRI RowGen mask metadata personally identifiable information PII re-map realistic test data reformat replicate report virtualize web log formats

Related articles

IRI Data Class Map
Schema Data Class Search
Masking RDB Data in the…
Find & Mask File PII…
Importing Data Classes into the…
Data Class & Rule Library…
Prepare and Protect Data for…
Connecting MariaDB and MySQL to…
Sharing IRI Data Management Jobs…
Running IRI Software in a…
The IRI Platform

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact