When to Integrate Anonymization of Documents and Data
by Sarah Lyons – Senior Director, Operations
Should sponsors integrate risk-based anonymization across both documents and structured individual patient data (S-IPD)?
Good question—and a common one.
The question often stems from anticipation of EMA Policy 0070 Phase 2, which is expected to cover S-IPD in addition to the clinical study documents covered in Phase 1.
There are three important considerations in answering this question and making a conclusive determination.
1. Anonymizing structured IPD for public release is not well suited to clinical trial data.
An integrated approach suggests participant re-identification risk is measured across data and documents. This approach assumes both the document and the data will be shared publicly.
Based on the nature of structured individual patient data, in contrast with the study documents, it is unlikely that the same level of public disclosure will be implemented. Public release of S-IPD would require removing most of the information to achieve a threshold of 0.09 (which equates to a minimal equivalence class of size 11).
Through our experience anonymizing different types of structured data for public release, the transformations needed to withstand privacy attacks would not be well suited to clinical trial data. The prevalent populations are relatively small and the number of variables in these datasets is very high.
2. A common threshold is not the same as an integrated approach.
A common threshold can be used across documents and data without necessarily integrating anonymization of the two.
3. Analytic utility of both documents and data needs to be considered.
Another common question relates to the analytic utility of the documents and data in combination.
We often hear concerns related to being able to use the documents and data in combination. Often, these use cases can be satisfied by applying patient ID pseudonyms consistently across documents and data.
Risk-based anonymization balances transparency and privacy.
A risk-based anonymization approach balances two objectives: transparency and privacy.
Transparency is achieved by retaining utility in the data so that it can be used for secondary analysis and research. Privacy is achieved by ensuring sufficiently low risk of re-identification.
“One-size-fits-all” may not fit.
Integration can sound simple but often introduces unwanted trade-offs. As with many examples in practice, a one-size-fits-all approach to privacy can fail.
Many of the principles behind best practices in privacy and Privacy by Design are anchored in contextual alignment. Business goals should inform privacy approaches.
We have seen many cases where the context of disclosure suggests that integration is not the best approach, whether the goal is cost-reduction or research/analysis. And for compliance readiness, speculating too early may result in rework downstream. Considering the context of the release remains core to adopting the right de-identification solution.
You might also like:
Deep-Diving into Re-identification: Perspectives On An Article In Nature Communications
by Dr. Khaled El-Emam – General Manager of Privacy Analytics Recently, an article in Nature Communications caught our attention, primarily…
Learning at Scale: Anonymizing Unstructured Data using AI/ML
by Rachel Li, Ph.D. – Senior Machine Learning Engineer Unstructured data, such as medical notes, pose unique challenges with regards…
Maximizing Data Value with Full GDPR and HIPAA Compliance through Anonymization
How to solve the privacy dilemma and get the win-win If your organization is focused on maximizing data value, while…
Early Impact of Health Canada’s New Guidelines
Sarah Lyons – Senior Director, Operations Health Canada has introduced new regulatory guidance for anonymizing clinical study reports. As background,…
- Turn Data Assets into Business Opportunity Under CCPADecember 19, 2019
- Can you comply your way to greatness?November 21, 2019
- When to Integrate Anonymization of Documents and DataSeptember 26, 2019
- Deep-Diving into Re-identification: Perspectives On An Article In Nature CommunicationsSeptember 26, 2019
- Learning at Scale: Anonymizing Unstructured Data using AI/MLSeptember 26, 2019
- GDPR and The Future of Clinical Trials Data SharingMarch 18, 2019
- Advancing Principled Data Practices in Support of Emerging TechnologiesMarch 15, 2019
- “Zero Risk Does Not Exist”February 7, 2019
- Is Anonymization Possible with Current Technologies?January 9, 2019
- Comparing the benefits of pseudonymisation and anonymisation under the GDPRDecember 20, 2018