When to Integrate Anonymization of Documents and Data

When to Integrate Anonymization of Documents and Data

An article by Sarah Lyons – Senior Director, Operations

Should sponsors integrate risk-based anonymization across both documents and structured individual patient data (S-IPD)?

The question often stems from anticipation of EMA Policy 0070 Phase 2, which is expected to cover S-IPD in addition to the clinical study documents covered in Phase 1.

There are three important considerations in answering this question and making a conclusive determination.

1. Anonymizing structured IPD for public release is not well suited to clinical trial data.

An integrated approach suggests participant re-identification risk is measured across data and documents. This approach assumes both the document and the data will be shared publicly.

Based on the nature of structured individual patient data, in contrast with the study documents, it is unlikely that the same level of public disclosure will be implemented. Public release of S-IPD would require removing most of the information to achieve a threshold of 0.09 (which equates to a minimal equivalence class of size 11).

Through our experience anonymizing different types of structured data for public release, the transformations needed to withstand privacy attacks would not be well suited to clinical trial data. The prevalent populations are relatively small and the number of variables in these datasets is very high.

Accordingly, we would expect the S-IPD disclosure to be subject to certain terms of use, like how S-IPD is shared today through research portals such as ClinicalStudyDataRequest.com and Vivli.

2. A common threshold is not the same as an integrated approach.

A common threshold can be used across documents and data without necessarily integrating anonymization of the two.

A key aspect of risk-based anonymization is proper consideration of the context in which data is disclosed. If the terms of use for S-IPD differ from the terms of use for CSRs, this context should be considered in the risk determination. Even with the same threshold applied to both, the context will influence the degree of transformation needed.

3. Analytic utility of both documents and data needs to be considered.

Another common question relates to the analytic utility of the documents and data in combination.

We often hear concerns related to being able to use the documents and data in combination. Often, these use cases can be satisfied by applying patient ID pseudonyms consistently across documents and data.

Risk-based anonymization balances transparency and privacy.

A risk-based anonymization approach balances two objectives: transparency and privacy.

Transparency is achieved by retaining utility in the data so that it can be used for secondary analysis and research. Privacy is achieved by ensuring sufficiently low risk of re-identification.

“One-size-fits-all” may not fit.

Integration can sound simple but often introduces unwanted trade-offs. As with many examples in practice, a one-size-fits-all approach to privacy can fail.

Many of the principles behind best practices in privacy and Privacy by Design are anchored in contextual alignment. Business goals should inform privacy approaches.

We have seen many cases where the context of disclosure suggests that integration is not the best approach, whether the goal is cost-reduction or research/analysis. And for compliance readiness, speculating too early may result in rework downstream. Considering the context of the release remains core to adopting the right de-identification solution.

Archiving / Destroying

Are you unleashing the full value of data you retain?

Your Challenges

Do you need help...


Value Retention

Client Success

Client: Comcast

Situation: California’s Consumer Privacy Act inspired Comcast to evolve the way in which they protect the privacy of customers who consent to share personal information with them.


Are you achieving intended outcomes from data?

Your Challenge

Do you need help...


Unbiased Results

Client Success

Client: Integrate.ai

Situation: Integrate.ai’s AI-powered tech helps clients improve their online experience by sharing signals about website visitor intent. They wanted to ensure privacy remained fully protected within the machine learning / AI context that produces these signals.


Do the right people have the right data?

Your Challenges

Do you need help...


Usable and Reusable Data

Client Success

Client: Novartis

Situation: Novartis’ digital transformation in drug R&D drives their need to maximize value from vast stores of clinical study data for critical internal research enabled by their data42 platform.



Are you empowering people to safely leverage trusted data?

Your Challenges

Do you need help...


Security / compliance efficiency


Client: ASCO’s CancerLinQ

Situation: CancerLinQ™, a subsidiary of American Society of Clinical Oncology, is a rapid learning healthcare system that helps oncologists aggregate and analyze data on cancer patients to improve care. To achieve this goal, they must de-identify patient data provided by subscribing practices across the U.S.


Acquiring / Collecting

Are you acquiring the right data? Do you have appropriate consent?

Your Challenge

Do you need help...


Consent / Contracting strategy

Client Success

Client: IQVIA

Situation: Needed to ensure the primary market research process was fully compliant with internal policies and regulations such as GDPR. 



Are You Effectively Planning for Success?

Your Challenges

Do you need help...


Build privacy in by design

Client Success

Client: Nuance

Situation: Needed to enable AI-driven product innovation with a defensible governance program for the safe and responsible use
of voice-to-text data under Shrems II.


Join the next 5 Safes Data Privacy webinar

This course runs on the 2nd Wednesday of every month, at 11 a.m. ET (45 mins). Click the button to register and select the date that works best for you.