How Context Affects Anonymization in AI Model Development

How Context Affects Anonymization in AI Model Development

An article by Santa Borel, Associate Director, Data Privacy Solutions, Privacy Analytics

Building an AI model can require large amounts of data about people, and this data needs to be appropriately handled and managed. To protect the privacy of people in the data, anonymized data can be used. However, when anonymized data is used for the development of AI models, anonymization could be compromised (which would make it easier to identify people in the data). When using anonymized data to develop AI models, it’s important to understand how privacy may be impacted to define appropriate protections.

Context changes can make anonymized data identifiable

Anonymizing personal data requires more than just making changes to the data itself. It also involves an assessment of the environment and circumstances in which the anonymized data is made available to the people and systems that will use it (e.g., see the activity map of practices needed to satisfy ISO/IEC 27559). Because context plays a role in determining whether data is considered anonymized, a data set may be anonymized in one context (e.g., when accessed within a secure environment) and not in another (e.g., when released publicly). As a result, data that was anonymized prior to AI model development may no longer be anonymized when it is used within an AI model development environment.

To determine how context changes will impact the identifiability of data, consider how and where an AI model will be used, since this will impact the protections put in place for the data used in AI model development or deployment. We want to understand which users will have access to the AI model, where the model will be made available, the model’s specific use cases, and all the inputs provided to the model. The availability of the AI model to users will have varying privacy implications. So, we really want the anonymization method and assessment to align with how the anonymized data will be used with the AI model.

The privacy and AI regulatory context may require additional protections to ensure the anonymized data remains anonymized in the context of AI development and use. Regulators play a key role in defining what it means for data to be anonymized and which anonymization approaches can be used. The expectations of regulators can vary depending on the jurisdiction. New AI regulations will also create additional expectations around how data, personal or anonymized, needs to be protected.

When using anonymized data to develop your AI models, identify and understand the applicable regulations to determine any further protections that could be needed. Some of the regulations to consider are:

  • GDPR, HIPAA, or other data privacy regulations requiring de-identification or anonymization of data
  • The EU AI Act, Colorado AI Act, and other AI regulations defining risk classification for AI use and the associated regulatory requirements

AI model outputs can compromise privacy

The outputs of the AI model can introduce privacy risks, even if the input or training data was anonymized initially. Depending on the type of model, outputs could reveal or highlight information that users could use in ways that are unintended or raise ethical concerns. For example, large language models (LLMs) can generate data that is not explicitly in the training data but that provides additional information about individuals in the data. This is why AI regulations and guidance often focus on safe and responsible use of AI, like the focus on safe and responsible use of anonymized data. The two are intimately linked.

The solution: Robust and effective privacy governance programs

Privacy governance programs allow for identification and tracking of use cases and compliance for anonymized data. They help an organization to identify and account for the factors that affect the identifiability of the data and to ensure anonymized data is used appropriately and mitigations are introduced where needed. Proper governance will also help to ensure that data protections, regulatory compliance, and output controls are developed and continuously monitored (see our whitepaper on good governance for details).

It’s important to maintain privacy protections throughout all the stages of AI development and use to ensure alignment with regulatory and societal expectations. The interplay between existing regulations, upcoming regulations, contextual considerations, and governance of data and AI are key for safe and effective use of AI and anonymized data. Organizations can proactively assess and monitor their practices for mitigating risks while benefiting from the use of anonymized data in AI, demonstrating their commitment to privacy protection and responsible innovation.

Archiving / Destroying

Are you unleashing the full value of data you retain?

Your Challenges

Do you need help...

OUR SOLUTION

Value Retention

Client Success

Client: Comcast

Situation: California’s Consumer Privacy Act inspired Comcast to evolve the way in which they protect the privacy of customers who consent to share personal information with them.

Evaluating

Are you achieving intended outcomes from data?

Your Challenge

Do you need help...

OUR SOLUTION

Unbiased Results

Client Success

Client: Integrate.ai

Situation: Integrate.ai’s AI-powered tech helps clients improve their online experience by sharing signals about website visitor intent. They wanted to ensure privacy remained fully protected within the machine learning / AI context that produces these signals.

Accessing

Do the right people have the right data?

Your Challenges

Do you need help...

OUR SOLUTION

Usable and Reusable Data

Client Success

Client: Novartis

Situation: Novartis’ digital transformation in drug R&D drives their need to maximize value from vast stores of clinical study data for critical internal research enabled by their data42 platform.

 

Maintaining

Are you empowering people to safely leverage trusted data?

Your Challenges

Do you need help...

OUR SOLUTION

Security / compliance efficiency

CLIENT SUCCESS

Client: ASCO’s CancerLinQ

Situation: CancerLinQ™, a subsidiary of American Society of Clinical Oncology, is a rapid learning healthcare system that helps oncologists aggregate and analyze data on cancer patients to improve care. To achieve this goal, they must de-identify patient data provided by subscribing practices across the U.S.

 

Acquiring / Collecting

Are you acquiring the right data? Do you have appropriate consent?

Your Challenge

Do you need help...

OUR SOLUTIONS

Consent / Contracting strategy

Client Success

Client: IQVIA

Situation: Needed to ensure the primary market research process was fully compliant with internal policies and regulations such as GDPR. 

 

Planning

Are You Effectively Planning for Success?

Your Challenges

Do you need help...

OUR SOLUTION

Build privacy in by design

Client Success

Client: Nuance

Situation: Needed to enable AI-driven product innovation with a defensible governance program for the safe and responsible use
of voice-to-text data under Shrems II.

 

Join the next 5 Safes Data Privacy webinar

This course runs on the 2nd Wednesday of every month, at 11 a.m. ET (45 mins). Click the button to register and select the date that works best for you.