Privacy Analytics - Data Privacy Considerations for DICOM Anonymization

Privacy Analytics > Resources > Articles > Data Privacy Considerations for DICOM Anonymization

Data Privacy Considerations for DICOM Anonymization

An article by David Di Valentino, AI/ML Solutions Lead, Privacy Analytics

Across industries, demand for medical images is on the rise. This is due to increased use cases, such as AI, which can support applications for diagnostics, tracking disease progression, and planning or gauging the effectiveness of interventions. Demand is also driven by the more mature reuse of structured data, which has organizations looking to new sources.

When it comes to de-identifying or anonymizing images—particularly DICOM images—it can be unclear what options are available.

In this article, we will focus on the DICOM data format, exploring the challenges inherent in de-identifying DICOM data and discussing the presently available solutions.

What is DICOM?

The Digital Imaging and Communications in Medicine (DICOM) data format is widely used in the healthcare industry to store and share medical images, such as X-rays, MRIs, and CT scans.

The DICOM standard guides DICOM data formatting and comprises two main elements:

Header data, which is semi-structured data containing metadata about the image, as well as patient contact information, treatment details, and medical history, and
The image data itself (e.g., an X-ray image), which is also known as “pixel data.”

Header data can contain different types of identifiable information—such as a patient’s name, date of birth, and demographics—as well as organization-specific ID numbers.

While identified header data can make it easy for unauthorized individuals to single out patients, leading to a breach of patient privacy, identity theft, or other undesirable outcomes, this is easily remedied with standard de-identification techniques.

The conventional part of de-identifying DICOM data: Headers

In practice, de-identifying DICOM images means that identifiable information is removed or rendered non-identifiable from header data, image data, and file/folder naming.

As a starting point, it is necessary to transform direct identifiers like patient names, addresses, and ID numbers in the header data. If referential integrity is needed across DICOM images in a dataset due to multiple patient visits, or linkage is desired to other data modalities (e.g., structured data or clinical notes), replacement with synthetic values or encryption (for ID numbers) can be a viable option to transform such data.

Likewise, indirect identifiers such as patient DOB, age, and other demographics may need to be transformed via redaction, generalization, or replacement with synthetic values as appropriate to the laws or regulations governing the sharing and use of the data.

The file and folder names in the source data may also have patient identifiers; typical practice is to replace these names with newly generated names that are not based on any identifiers.

The complicated part of de-identifying DICOM data: Images

The pixel data of an image also needs to be considered in the de-identification.

For example, an image may include a clear view of the patient’s face or body or longitudinal scans of a patient’s head, allowing identification via facial recognition software. The imaging device may also print identifying text such as patient names, ID numbers, and care provider information onto the image (this is part of what is known as burnt-in text).

Redacting or otherwise obscuring identifying burnt-in text is a good general practice when de-identifying image data. However, caution is advised because some non-identifying burnt-in text, like measurements and technical settings, may add value to the data if it is practical to retain.

For small datasets, this can be achieved easily enough through manual effort. At scale, however, machine learning-driven text detection algorithms are often necessary to flag and remove burnt-in text in an automated fashion.

Client Success

Client: Nuance

Situation: Needed to enable AI-driven product innovation with a defensible governance program for the safe and responsible use
of voice-to-text data under Shrems II.

Visit Nuance.com to read the full story >

Join the next 5 Safes Data Privacy webinar

This course runs on the 2nd Wednesday of every month, at 11 a.m. ET (45 mins). Click the button to register and select the date that works best for you.

Data Privacy Considerations for DICOM Anonymization

An article by David Di Valentino, AI/ML Solutions Lead, Privacy Analytics

What is DICOM?

The conventional part of de-identifying DICOM data: Headers

The complicated part of de-identifying DICOM data: Images

Need help de-identifying DICOM data?

More Articles

7 Questions to Evolve Your Privacy Strategy

3 Core Steps to Developing a Robust Privacy Strategy

How Context Affects Anonymization in AI Model Development

Data Privacy, AI, De-identification, and Anonymization: Putting It All Together

Why Context Matters When Anonymizing Data

How to Work Safely with Unstructured Text Data

Archiving / Destroying

Are you unleashing the full value of data you retain?

Your Challenges

Do you need help...

OUR SOLUTION

Value Retention

Client Success

Client: Comcast

Evaluating

Are you achieving intended outcomes from data?

Your Challenge

Do you need help...

OUR SOLUTION

Unbiased Results

Client Success

Client: Integrate.ai

Accessing

Do the right people have the right data?

Your Challenges

Do you need help...

OUR SOLUTION

Usable and Reusable Data

Client Success

Client: Novartis

Maintaining

Are you empowering people to safely leverage trusted data?

Your Challenges

Do you need help...

OUR SOLUTION

Security / compliance efficiency

CLIENT SUCCESS

Client: ASCO’s CancerLinQ

Acquiring / Collecting

Are you acquiring the right data? Do you have appropriate consent?

Your Challenge

Do you need help...

OUR SOLUTIONS

Consent / Contracting strategy

Client Success

Client: IQVIA

Planning

Are You Effectively Planning for Success?

Your Challenges

Do you need help...

OUR SOLUTION

Build privacy in by design

Client Success

Client: Nuance

Join the next 5 Safes Data Privacy webinar