Data Privacy Considerations for DICOM Anonymization

Data Privacy Considerations for DICOM Anonymization

An article by David Di Valentino, AI/ML Solutions Lead, Privacy Analytics

Across industries, demand for medical images is on the rise. This is due to increased use cases, such as AI, which can support applications for diagnostics, tracking disease progression, and planning or gauging the effectiveness of interventions. Demand is also driven by the more mature reuse of structured data, which has organizations looking to new sources.

When it comes to de-identifying or anonymizing images—particularly DICOM images—it can be unclear what options are available.

In this article, we will focus on the DICOM data format, exploring the challenges inherent in de-identifying DICOM data and discussing the presently available solutions.

What is DICOM?

The Digital Imaging and Communications in Medicine (DICOM) data format is widely used in the healthcare industry to store and share medical images, such as X-rays, MRIs, and CT scans.

The DICOM standard guides DICOM data formatting and comprises two main elements:

  • Header data, which is semi-structured data containing metadata about the image, as well as patient contact information, treatment details, and medical history, and
  • The image data itself (e.g., an X-ray image), which is also known as “pixel data.”

Header data can contain different types of identifiable information—such as a patient’s name, date of birth, and demographics—as well as organization-specific ID numbers.

While identified header data can make it easy for unauthorized individuals to single out patients, leading to a breach of patient privacy, identity theft, or other undesirable outcomes, this is easily remedied with standard de-identification techniques.

The conventional part of de-identifying DICOM data: Headers

In practice, de-identifying DICOM images means that identifiable information is removed or rendered non-identifiable from header data, image data, and file/folder naming.

As a starting point, it is necessary to transform direct identifiers like patient names, addresses, and ID numbers in the header data. If referential integrity is needed across DICOM images in a dataset due to multiple patient visits, or linkage is desired to other data modalities (e.g., structured data or clinical notes), replacement with synthetic values or encryption (for ID numbers) can be a viable option to transform such data.

Likewise, indirect identifiers such as patient DOB, age, and other demographics may need to be transformed via redaction, generalization, or replacement with synthetic values as appropriate to the laws or regulations governing the sharing and use of the data.

The file and folder names in the source data may also have patient identifiers; typical practice is to replace these names with newly generated names that are not based on any identifiers.

The complicated part of de-identifying DICOM data: Images

The pixel data of an image also needs to be considered in the de-identification.

For example, an image may include a clear view of the patient’s face or body or longitudinal scans of a patient’s head, allowing identification via facial recognition software. The imaging device may also print identifying text such as patient names, ID numbers, and care provider information onto the image (this is part of what is known as burnt-in text).

Redacting or otherwise obscuring identifying burnt-in text is a good general practice when de-identifying image data. However, caution is advised because some non-identifying burnt-in text, like measurements and technical settings, may add value to the data if it is practical to retain.

For small datasets, this can be achieved easily enough through manual effort. At scale, however, machine learning-driven text detection algorithms are often necessary to flag and remove burnt-in text in an automated fashion.

Similarly, in some cases, it may be necessary to employ image-defacing technologies to blur, redact, or otherwise delete data that reveals the details of a patient’s face or skull. As with burnt-in text removal, such technologies may be applied manually or automatically, depending on the desired scale. It’s worth noting, too, that in the case of, for example, brain MRIs, it may not be possible to transform the image data without severely compromising its utility.

Need help de-identifying DICOM data?

The safe sharing and reuse of DICOM data critically depends on scale and understanding the entire journey of both the image and the header data, including how it is being transformed and where and how it will ultimately be used.

To learn more about how your organization can safely and efficiently increase the utility of DICOM data in your care, download our DICOM Anonymization overview here.

Archiving / Destroying

Are you unleashing the full value of data you retain?

Your Challenges

Do you need help...

OUR SOLUTION

Value Retention

Client Success

Client: Comcast

Situation: California’s Consumer Privacy Act inspired Comcast to evolve the way in which they protect the privacy of customers who consent to share personal information with them.

Evaluating

Are you achieving intended outcomes from data?

Your Challenge

Do you need help...

OUR SOLUTION

Unbiased Results

Client Success

Client: Integrate.ai

Situation: Integrate.ai’s AI-powered tech helps clients improve their online experience by sharing signals about website visitor intent. They wanted to ensure privacy remained fully protected within the machine learning / AI context that produces these signals.

Accessing

Do the right people have the right data?

Your Challenges

Do you need help...

OUR SOLUTION

Usable and Reusable Data

Client Success

Client: Novartis

Situation: Novartis’ digital transformation in drug R&D drives their need to maximize value from vast stores of clinical study data for critical internal research enabled by their data42 platform.

 

Maintaining

Are you empowering people to safely leverage trusted data?

Your Challenges

Do you need help...

OUR SOLUTION

Security / compliance efficiency

CLIENT SUCCESS

Client: ASCO’s CancerLinQ

Situation: CancerLinQ™, a subsidiary of American Society of Clinical Oncology, is a rapid learning healthcare system that helps oncologists aggregate and analyze data on cancer patients to improve care. To achieve this goal, they must de-identify patient data provided by subscribing practices across the U.S.

 

Acquiring / Collecting

Are you acquiring the right data? Do you have appropriate consent?

Your Challenge

Do you need help...

OUR SOLUTIONS

Consent / Contracting strategy

Client Success

Client: IQVIA

Situation: Needed to ensure the primary market research process was fully compliant with internal policies and regulations such as GDPR. 

 

Planning

Are You Effectively Planning for Success?

Your Challenges

Do you need help...

OUR SOLUTION

Build privacy in by design

Client Success

Client: Nuance

Situation: Needed to enable AI-driven product innovation with a defensible governance program for the safe and responsible use
of voice-to-text data under Shrems II.

 

Join the next 5 Safes Data Privacy webinar

This course runs on the 2nd Wednesday of every month, at 11 a.m. ET (45 mins). Click the button to register and select the date that works best for you.