Skip to content
IRI Logo
Solutions Products
  • Solutions
  • Products
  • Blog
  • BI
  • Big Data
  • DQ
  • ETL
  • IRI
    • IRI Business
    • IRI Workbench
  • Mask
  • MDM
    • Master Data Management
    • Metadata Management
  • Migrate
    • Data Migration
    • Sort Migration
  • Test Data
  • Transform
  • VLDB
  • VLOG

Masking PANs in Credit Card Images

  • by Adam Lewis

Abstract: Previously the DarkShield API for Files could perform black-box redaction of primary account numbers (PANs, also known as CCNs) on credit cards only within defined areas in the image 1. The accuracy of credit card number detection, and thus masking, however, has now been enhanced thanks to new OCR-A support in DarkShield. These addresses use cases where PAN locations vary among images, and when masked or synthetic PANs are needed in testing.

DarkShield and Optical Character Recognition

IRI DarkShield is a solution built to provide data protection in semi-structured and unstructured data files. By searching for and identifying sensitive information in these types of files, DarkShield is able to provide precise masking control over files using various masking operations. 

It is important to note that unstructured data does not have to mean text in text files. Text in images and text in images embedded in documents are all viable targets for DarkShield. To scan and detect sensitive data in images the DarkShield API uses a technology called optical character recognition (OCR).

How OCR Works

OCR works by processing a scanned image and analyzing the areas where there are black and white pixels in order to identify characters. Normally a process called thresholding occurs ahead of time to preprocess the image into a black and white image.

Each character is segmented into its own individual images. Then during the recognition phase, individual characters and words are identified based on a score assigned to them.

OCR and Credit Card Font

Normally OCR relies on machine learning to recognize characters from groups of contours and shapes detected in an image. Machine learning is a powerful tool that provides an accurate and flexible method for character recognition. 

That said, machine learning is also limited by the trained models it uses. A broad but easy-to-understand example would be that a model trained on detecting cars cannot be used to detect a motorcycle. 

Similarly, the Tesseract OCR models trained for detecting characters in images have challenges recognizing the characters of a credit card because they may be in a special font. This special credit card font is referred to as OCR-A font. To deal with special fonts, models need to be trained with large data sets to learn how to recognize characters in special fonts.

A common alternative for recognizing characters in an exotic font is template matching. Template matching can be useful in certain situations, like in recognizing PANs, and is available as an alternative in the DarkShield API for detecting these characters in credit cards.

What is Template Matching?

Template matching is a technique used to find matches or close matches in an image using a template image as the reference. In the context of OCR, template matching is used to help recognize optically processed characters in images. 

This technique can be a very simple but effective method used for matching on handwriting or characters of a particular font.

Template matching requires a template image containing characters in the target font. Using the template image as the base comparator, OCR will process the actual image to be parsed by using a sliding technique. 

Reference image for OCR-A font used in template matching.

This process of sliding a template image from left to right, up to down, one pixel at a time, calculates how well a match has been discovered at each location. The index of the best match is recorded as the recognized character.

Sliding across an image and matching based on the template image

To learn more about OCR template matching, follow this link.

Configuring the DarkShield-Files API Call

To begin with, calls must be made to the DarkShield-Files API to create a search context, mask context, file search context, and file mask context. These contexts tell the API how it will search for PII, what masking operations should be applied on found PII, and special configurations to be used based on file types.

Search and mask context

In the search context shown above, a credit card matcher named “CcnMatcher” is defined and uses a regular expression pattern matcher to identify PANs. A mask context defines rules to indicate the masking function that will be used, and rule matchers to match rules with search matchers. 

In the setup above, a format-preserving encryption (FPE) rule called “FpeRule” is created and a rule matcher that pairs the “CcnMatcher” with the “FpeRule” is created and called “FpeRuleMatcher”.

File search context and file mask context

File search contexts and file mask contexts allow for file-type specific configurations to be passed as part of the context. In the setup above, there is a configuration for image files that specify that template matching will be used in conjunction with OCR. 

Results of Search and Masking

Original credit card image

Credit card image with redacted numbers

From the comparison of the before and after images we can see the PANs have been identified and masked using black-box redaction. To view the full source code, see the credit card demo hosted in IRI’s GitHub repository. Contact darkshield@iri.com if you like more information.

  1. Note that the DarkShield-Files API can also mask account and routing numbers in checks by supplying an API call with the coordinates where these numbers are located (the location of these numbers are always the same place). This can be seen in the following article where the DarkShield-Files API demonstrates the ability to mask credit card numbers and checking account and routing numbers by replacing the sensitive data in images with newly generated realistic data. Alternatively, the data can be simply redacted with a black box.
Voracity Software Support for Cloud File Stores
Synthesizing Realistic Data in RowGen with Set Files
big data big data masking credit card Credit Card Image credit card numbers DarkShield Darkshield API data masking data masking tools data protection IRI DarkShield OCR parsing unstructured data primary account numbers processing unstructured data unstructured data unstructured data masking

Related articles

DarkShield PII Discovery & Masking…
Masking Flat Files in the…
Directory Data Class Search Wizard
Masking PII in a Relational…
IRI Data Class Map
Schema Data Class Search
Training NER Models in IRI…
Masking NoSQL DB PII in…
Masking RDB Data in the…
IRI DarkShield-NoSQL RPC API
Find & Mask File PII…

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • Big Data 66
  • Business Intelligence (BI) 77
  • Data Masking/Protection 163
  • Data Quality (DQ) 41
  • Data Transformation 94
  • ETL 122
  • IRI 229
    • IRI Business 86
    • IRI Workbench 162
  • MDM 37
    • Master Data Management 12
    • Metadata Management 25
  • Migration 65
    • Data Migration 60
    • Sort Migration 6
  • Test Data 102
  • VLDB 78
  • VLOG 40

Tracking

© 2025 Innovative Routines International (IRI), Inc., All Rights Reserved | Contact