Data Privacy, AI, De-identification, and Anonymization: Putting It All Together

Data Privacy, AI, De-identification, and Anonymization: Putting It All Together

An article by Brian Rasquinha, Associate Director, Solution Architecture, Privacy Analytics

In 2024, we shared several short articles about de-identified or anonymized data. Here, we take a look at how they intersect.

AI was a dominant topic, with much focus on the promise of AI tools and some of their inherent privacy concerns. Emerging guidance on defensible AI includes approaches to ensure the responsible development of these tools.

De-identification or anonymization is a powerful privacy tool that can support defensibility in analytics, product development, and AI applications. A de-identified or anonymized dataset could be drawn from one source, but some applications require organizations to link datasets together privately without sharing key identifiers. Our primer on tokenization and linkage concepts can help you understand the challenges and solutions.

Even if each linked dataset is already de-identified or anonymized, the linkage can affect the overall identifiability of the data. As such, the linkage must be managed carefully, and there are multiple approaches to consider.

Once you have a dataset, if you need to assess its identifiability, you might perform a re-identification risk determination (RRD), such as a HIPAA Expert Determination or similar assessment. We shared our Best Practices for RRDs, informed by the hundreds of such projects we perform yearly.

One consideration for an RRD is the context of the data release, which can often highly impact overall identifiability (and thus, the risks of re-identification).

RRDs are usually sought for structured, tabular data but can also be performed on unstructured data like plain text, for which several organizations have built effective, defensible pipelines. With the advent of AI, there has also been an increased focus on de-identifying DICOM medical images.

One approach to managing risk as part of an RRD is to use differential privacy techniques. These techniques inject randomness into the data to protect outputs (such as datasets or queries).

Finally, a de-identified or anonymized dataset can be validated for privacy protections with a motivated intruder test. In this test, someone tries to re-identify the data to determine whether doing so is practical.

Contact the experts at Privacy Analytics to learn more about data de-identification, anonymization, or other privacy topics.

Archiving / Destroying

Are you unleashing the full value of data you retain?

Your Challenges

Do you need help...

OUR SOLUTION

Value Retention

Client Success

Client: Comcast

Situation: California’s Consumer Privacy Act inspired Comcast to evolve the way in which they protect the privacy of customers who consent to share personal information with them.

Evaluating

Are you achieving intended outcomes from data?

Your Challenge

Do you need help...

OUR SOLUTION

Unbiased Results

Client Success

Client: Integrate.ai

Situation: Integrate.ai’s AI-powered tech helps clients improve their online experience by sharing signals about website visitor intent. They wanted to ensure privacy remained fully protected within the machine learning / AI context that produces these signals.

Accessing

Do the right people have the right data?

Your Challenges

Do you need help...

OUR SOLUTION

Usable and Reusable Data

Client Success

Client: Novartis

Situation: Novartis’ digital transformation in drug R&D drives their need to maximize value from vast stores of clinical study data for critical internal research enabled by their data42 platform.

 

Maintaining

Are you empowering people to safely leverage trusted data?

Your Challenges

Do you need help...

OUR SOLUTION

Security / compliance efficiency

CLIENT SUCCESS

Client: ASCO’s CancerLinQ

Situation: CancerLinQ™, a subsidiary of American Society of Clinical Oncology, is a rapid learning healthcare system that helps oncologists aggregate and analyze data on cancer patients to improve care. To achieve this goal, they must de-identify patient data provided by subscribing practices across the U.S.

 

Acquiring / Collecting

Are you acquiring the right data? Do you have appropriate consent?

Your Challenge

Do you need help...

OUR SOLUTIONS

Consent / Contracting strategy

Client Success

Client: IQVIA

Situation: Needed to ensure the primary market research process was fully compliant with internal policies and regulations such as GDPR. 

 

Planning

Are You Effectively Planning for Success?

Your Challenges

Do you need help...

OUR SOLUTION

Build privacy in by design

Client Success

Client: Nuance

Situation: Needed to enable AI-driven product innovation with a defensible governance program for the safe and responsible use
of voice-to-text data under Shrems II.

 

Join the next 5 Safes Data Privacy webinar

This course runs on the 2nd Wednesday of every month, at 11 a.m. ET (45 mins). Click the button to register and select the date that works best for you.