Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG
People with blurred faces

PII Blurring in IRI FieldShield

  • by Craig Schein

Adding ‘random noise’ to data through blurring or perturbation is a data common anonymization requirement for researchers and marketers of protected health information (PHI) seeking to comply with the HIPAA Expert Determination Method security rule. This capability is available in the IRI FieldShield and IRI DarkShield data masking products, as well as the larger IRI Voracity data management platform (which includes them).

Data blurring, or random noise, can be used along with other data generalization techniques to improve re-ID risk scores by anonymizing key- or quasi-identifiers while still preserving their utility. It is also possible to use ‘lesser’ masking functions on dates, too, like a math (e.g., add 10 to each age) or date calculation. Either way however, you can also preserve consistent date intervals in HIPAA and related data privacy scenarios like admit-discharge; ask us how if this is a requirement.

More specifically, IRI’s blur_age and blur_date as well as its newer blur_int and blur_int_pct functions are related data generalization libraries designed to read input values and return similar but not identical values. The functions randomly blur a number or date within a certain range of the original number via a “blur factor”, but also keep the new number within a minimum and maximum value.

 

How the Functions Work

Age blurring uses a function that can take either four or six arguments. The first set of arguments needs an original age, a minimum age, a maximum age, and a blur factor. The blur factor will determine the range that the function will change the original value. A blur factor of 10 will randomly pick a value between -10 and 10 years and then add it to the original value. The random blur will never pick zero unless the blur itself is set to zero.

The second set of arguments ask for two more inputs: a mode age and a second blur factor. The mode age is intended to be the mode of the data or another value that can be interpreted as the middle. Any age found within a certain distance of the mode is blurred by the second blur factor. The distance is the midpoint between the minimum and mode as well as the midpoint between the maximum and mode. This allows the function to adapt if the data distribution is skewed from the center.

The blur library contains the function blur_age. Its forms are:

blur_age(AGE, MIN_AGE, MAX_AGE, BLUR_FACTOR)
blur_age(AGE, MIN_AGE, MAX_AGE, BLUR_FACTOR, MODE_AGE, MODE_BLUR_FACTOR)


People with blurred faces

Date blurring follows a format similar to age blurring but uses the a date as its input and output. The function supports ISO, American, and European date formats. Respectively, their forms are yyyy-mm-dd, mm/dd/yyyy, and dd.mm.yyyy. The random blur factors for these functions is in days and will change month and year if the new date is past the last day of the month or before the first.

The blur_date function is contained within the dates library and exposed as one of four Blurring Functions in the Data Masking rule dialog in IRI Workbench (see below). Its forms are:

blur_date (DATE, MIN_DATE, MAX_DATE, BLUR_FACTOR)
blur_date (DATE, MIN_DATE, MAX_DATE, BLUR_FACTOR, MODE_DATE, MODE_BLUR_FACTOR)

The blur_int function for standard integers with a relatively limited range between low (min) and high (max) values takes the form:

blur_int(FIELD, MIN, MAX, BLUR, MODE, BLUR2)

while the blur_int_pct function form only contemplates a percentage of blurring, which is better for values that may span a very large range:

blur_int_pct(FIELD, PERCENT_AMOUNT)

Examples

Function Calls

/FIELD=(EXPRESSION_AGE=blur_age(AGE, 18, 90, 10), TYPE=NUMERIC, PRECISION=0, POSITION=1, PRECISION=0, SEPARATOR="|")
/FIELD=(EXPRESSION_DATE=blur_date(DATE, "1982-12-31", 30), TYPE=ISO_DATE ...

InputOuput
61|1982-01-1262|1982-02-01
81|1982-12-1672|1982-11-17
58|1982-11-2966|1982-11-24
23|1982-03-0519|1982-02-13
24|1982-11-1821|1982-12-09
42|1982-05-2643|1982-06-14
28|1982-09-2126|1982-09-10
63|1982-06-1360|1982-07-07
83|1982-03-0887|1982-03-02
30|1982-05-2733|1982-06-23
37|1982-04-2030|1982-03-21
90|1982-10-1581|1982-10-23
63|1982-06-2166|1982-07-03
72|1982-04-1976|1982-05-03

Graphical specification of blur function parameters into data masking jobs is available in IRI Workbench. The Blur Functions dialog described below is launched from the list of available protection rules that can apply in data classification or FieldShield job design:

New Field Rule Wizard

Additionally, the functions can be saved as rules so that they can be reused.

Choosing Blur Functions open this dialog:

Random noise anonymization
Blur Functions – Another IRI data masking rule

For numbers which may fall into a large range, the function blur_int_pct blurs an integer by a percentage of its value. This handles uses cases where blurring a number by up to 200, for example, is insignificant if the original value is 20,000, and inappropriate if the original value is 20. A 10 percent blur might blur 20 by up to plus or minus 2, while blurring 20,000 by 10% will vary the result up to plus or minus 2,000.

For each and every field value passed to the function, the percentage is applied to the value, and that becomes the blurring factor. Assuming a specified blur percentage of 25. A value of 20 would be blurred by a random amount between -5 and +5, a value of 2,000 would be blurred by a random amount between -500 and +500.

Behaviors for Bad Data

In case the value of the first argument is less than the minimum or greater than the maximum, the new value will default to the minimum or maximum value. This applies to all the blurring functions.

Blur factors that are larger than the actual distance between the minimum and maximum will be ignored. Instead, it will create a value between the minimum and maximum. A similar behavior will occur for the mode blur factor.

The blur_age function will return a bad value error if the first argument read in is not an integer. This function will also accept negative values as long as they are integers.

If blur_date encounters a value that is not an ISO_DATE, AMERICAN_DATE, EUROPEAN_DATE, it will return “not-a-date-time”. This is the default value for the Boost library variables IRI uses, and will return it if the variable is not set correctly or not set at all.

Production Analytic Platform #4/4: Unifying the Worlds of Information and Processing
Anonymizing Indirect Identifiers to Lower Re-ID Risk
age blurring data anonymization data blurring data generalization data masking data obfuscation data protection DOB blur HIPAA Expert Determination indirect identifier masking IRI FieldShield IRI Voracity random noise re-ID risk score risk score

Related articles

DarkShield PII Discovery & Masking…
Masking Flat Files in the…
Directory Data Class Search Wizard
Masking PII in a Relational…
IRI Data Class Map
Schema Data Class Search
Training NER Models in IRI…
Masking NoSQL DB PII in…
Masking RDB Data in the…
IRI DarkShield-NoSQL RPC API
Find & Mask File PII…
1 COMMENT
  • Anonymizing Indirect Identifiers to Lower Re-ID Risk - IRI
    August 13, 2018 at 11:33 am
    Reply

    […] and Use set file as range options. This example makes use of the Use set file as group option. The article on data blurring demonstrates the Use set files as a range option. The lookup sets built here will be used to […]

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact