Finding and Masking PHI in HL7 and X12 Files
IRI’s reputation in the healthcare industry expanded from its traditional roots in claims data processing in 2010, when it released IRI FieldShield to find and de-identify Protected Health Information (PHI) in flat files and relational databases in compliance with the HIPAA Safe Harbour and Expert Determination Method security rules.
IRI introduced the DarkShield data masking product approximately 5 years ago to address these requirements in semi- and unstructured data sources, including XML, Excel, and PDF files, as well as NoSQL DBs like Mongo and image formats like JPEG and DICOM. This article covers DarkShield support for HL7 and X12 EDI formats through its RPC API for files.
What is HL7?
The Health Level Seven (HL7) document format has been a standards-based “framework for the exchange, integration, sharing, and retrieval of electronic health information” for more than 30 years. There are several HL7 file standards in use today, including the:
- Version 2.x messaging standard for health and medical transactions;
- Version 3 as the next generation of messaging standards;
- Clinical Document Architecture (CDA), the exchange model for clinical documents;
- Continuity of Care Documents (CCD), for the exchange of medical summaries; and,
- Fast Healthcare Interoperability Resources (FHIR), for health care data exchange.
HL7 Message Structure
A HL7 message is strictly structured, and can be broken down into segments, composites (fields) within segments, and sub-composites (sub-fields). An HL7 message must always start with a message header segment (MSH).
The HL7 MSH segment contains information about the message, including what delimiters will be used in the message (e.g. ‘|^~\&’). Each line starts with a new segment and has a segment identifier (e.g. PID, MSA, NK1) at the beginning of the segment.
The composites within the segment are separated by the pipe delimiter in this example (default composite delimiter is “|”). The sub-composite delimiter in this example is a caret character (“^”).
- Composite delimiter – “|”
- Sub-composite delimiter – “^”
- Separates repeating fields – “~”
- Escape character – “\”
- Sub-sub-composite delimiter – “&”
What is X12?
X12 is a common format for electronic business documents, and has also been used in the healthcare industry since the Accredited Standards Committee (ASC) created it in 1979. Every X12 document has a three-digit identifier to notify the receiver of what information it contains.
For example:
- The X12 810 is a transaction set for an invoice.
- The X12 835 Health Care Claim and Remittance Advice transaction set specified by HIPAA 5010 requirements for electronic transmission of healthcare payment and benefit information.
- The X12 837 Health Care Medical Claims transaction set that is formatted to meet HIPAA requirements for electronic submission of healthcare information.
- The X12 850 is a transmission for a purchase order.
X12 Message Structure
Like an HL7 message, an X12 message has a strict structure consisting of segments, elements (fields), and composite elements (sub-fields). An X12 message must always start with an Interchange control header (ISA segment).
The ISA segment contains information about the message including a list of delimiters place at the end of the ISA segment (e.g. ‘*:~’). Directly following an ISA segment is the Functional Group Header segment(GS) which is also called the inner envelope.
Unlike an HL7 message, each X12 segment does not need to be followed by a carriage return or line feed. Instead X12 segments have segment delimiters.
In this example, the end-of-segment delimiter is the tilde (“~”). Like HL7, each segment has a segment identifier (e.g. N1, AMT) at the beginning of the segment. The elements within the segment are separated by a special character; in this example it is an asterisk (default element delimiter is “*”). The composite delimiter in this example is a colon character (“:”).
For this example the delimiters of the X12 document are the following:
- Elements delimiter – “*”
- Composite element delimiter – “:”
- End of segment delimiter – “~”
Sensitive Data in HL7 and X12 Documents
It is no surprise that HL7 documents focused on relaying information in the healthcare industry, and X12 messages similarly involved in multiple business sectors, may carry sensitive information like protected health information (PHI) within the body of their respective documents.
This leads to security issues especially when storing these documents. Forbes made a comment in particular about HL7, “from the start, HL7 was arguably built insecurely, making it unsuitable for the public cloud by itself”.
How IRI DarkShield Can Help
The challenge of being able to store data securely is a challenge with which many businesses and regulators are well acquainted. With the latest additions to the IRI DarkShield Files API, locating and de-identifying key identifiers in both HL7 v2 and X12 documents is now possible; HL7 v3 was already supported by virtue of its XML format.
DarkShield already finds and masks PHI and other sensitive data in structured, semi-structured, and unstructured data sources (including DICOM images). Adding support for HL7 and X12 through its Files API, provides a callable way to improve security and compliance with data privacy regulations like HIPAA.
DarkShield supports multiple search methods, and uses the same static masking functions as IRI FieldShield to preserve data integrity enterprise-wide.
Configuring HL7 and X12 DarkShield Files API Calls
In the DarkShield Files API samples for HL7 and X12 formats, you can see that the file, and content-type of the file, to be searched are specified in main.py in a manner like this:
Demonstration of X12 file content-type.
For the DarkShield Files API to know what format the messages are expected to be in the content-type must be text/hl7 and text/x12 respectively.
Selecting Specific Segments and Fields to Mask
Usually the DarkShield Files API will search and mask an entire file based on what is found by the searchMatchers. That said, there are instances when a user may wish for masking to be more specific. An example would be that not all names in a HL7 document may need masking, and a particular name might need to be referenced later.
To accomplish masking of specific segments and their fields a Column Matcher is specified in the file_search_context. Below is a sample configuration for HL7 v2 and X12:
HL7 fileSearchContext of setup.py file.
When targeting specific columns on a segment, note that the syntax for the column matcher uses a pipe delimiter to separate the segment identifier from the target column. Regardless of whether a pipe delimiter is used as a field delimiter in the document, this will be the syntax to indicate specific segments and columns to be masked by the API.
X12 fileSearchContext of setup.py file
When targeting specific columns on a segment, note that the syntax for the column matcher uses an asterisk delimiter to separate the segment identifier from the target column. Regardless of whether an asterisk delimiter is used as a field delimiter in the document, this will be the syntax to indicate specific segments and columns to be masked by the API.
Note that the ISA segment and GS segment are mandatory for the DarkShield API to correctly parse the X12 message.
Before and After Screenshots
HL7 v2 Before Masking:
HL7 v2 After Masking:
X12 835 Before Masking:
X12 835 After Masking:
X12 837 Before Masking:
X12 837 After Masking:
X12 850 Before Masking:
X12 850 After Masking:
In Closing
Because HL7 and X12 documents are highly insecure when in a public cloud, the threat of a data privacy breach is significant. To prevent data privacy law violations there is a need to protect the sensitive data within these documents.
The latest enhancements to the DarkShield–Files API address these needs for HL7v2 and X12 documents. Thus, by providing additional levels of data security for stored electronic documents, IRI DarkShield adds an additional layer of security to protected health information.
Citations
Health Level Seven international. HL7 International. (n.d.). Retrieved October 20, 2021, from https://www.hl7.org/implement/standards/.
Ferrari, M. (2019, July 12). Council Post: HL7: Is your sensitive data secure? Forbes. Retrieved October 20, 2021, from https://www.forbes.com/sites/forbestechcouncil/2019/07/12/hl7-is-your-sensitive-data-secure/?sh=40f121ca678d.
1 COMMENT
[…] The DarkShield API for files now offers a solution for searching and masking sensitive attributes in a DICOM file, which builds on some of the existing file handling capabilities already in the API, including HL7 an X12 EDI files: […]