Building an AI model can require large amounts of data about people, and this data needs to be appropriately handled and managed. To protect the privacy of people in the data, anonymized data can be used. However, when anonymized data is used for the development of AI models, anonymization could be compromised (which would make it easier to identify people in the data). When using anonymized data to develop AI models, it’s important to understand how privacy may be impacted to define appropriate protections.
Context changes can make anonymized data identifiable
Anonymizing personal data requires more than just making changes to the data itself. It also involves an assessment of the environment and circumstances in which the anonymized data is made available to the people and systems that will use it (e.g., see the activity map of practices needed to satisfy ISO/IEC 27559). Because context plays a role in determining whether data is considered anonymized, a data set may be anonymized in one context (e.g., when accessed within a secure environment) and not in another (e.g., when released publicly). As a result, data that was anonymized prior to AI model development may no longer be anonymized when it is used within an AI model development environment.
To determine how context changes will impact the identifiability of data, consider how and where an AI model will be used, since this will impact the protections put in place for the data used in AI model development or deployment. We want to understand which users will have access to the AI model, where the model will be made available, the model’s specific use cases, and all the inputs provided to the model. The availability of the AI model to users will have varying privacy implications. So, we really want the anonymization method and assessment to align with how the anonymized data will be used with the AI model.
The privacy and AI regulatory context may require additional protections to ensure the anonymized data remains anonymized in the context of AI development and use. Regulators play a key role in defining what it means for data to be anonymized and which anonymization approaches can be used. The expectations of regulators can vary depending on the jurisdiction. New AI regulations will also create additional expectations around how data, personal or anonymized, needs to be protected.
When using anonymized data to develop your AI models, identify and understand the applicable regulations to determine any further protections that could be needed. Some of the regulations to consider are:
- GDPR, HIPAA, or other data privacy regulations requiring de-identification or anonymization of data
- The EU AI Act, Colorado AI Act, and other AI regulations defining risk classification for AI use and the associated regulatory requirements
AI model outputs can compromise privacy
The outputs of the AI model can introduce privacy risks, even if the input or training data was anonymized initially. Depending on the type of model, outputs could reveal or highlight information that users could use in ways that are unintended or raise ethical concerns. For example, large language models (LLMs) can generate data that is not explicitly in the training data but that provides additional information about individuals in the data. This is why AI regulations and guidance often focus on safe and responsible use of AI, like the focus on safe and responsible use of anonymized data. The two are intimately linked.
The solution: Robust and effective privacy governance programs
Privacy governance programs allow for identification and tracking of use cases and compliance for anonymized data. They help an organization to identify and account for the factors that affect the identifiability of the data and to ensure anonymized data is used appropriately and mitigations are introduced where needed. Proper governance will also help to ensure that data protections, regulatory compliance, and output controls are developed and continuously monitored (see our whitepaper on good governance for details).
It’s important to maintain privacy protections throughout all the stages of AI development and use to ensure alignment with regulatory and societal expectations. The interplay between existing regulations, upcoming regulations, contextual considerations, and governance of data and AI are key for safe and effective use of AI and anonymized data. Organizations can proactively assess and monitor their practices for mitigating risks while benefiting from the use of anonymized data in AI, demonstrating their commitment to privacy protection and responsible innovation.