When to Integrate Anonymization of Documents and Data

by Sarah Lyons – Senior Director, Operations
Should sponsors integrate risk-based anonymization across both documents and structured individual patient data (S-IPD)?

Good question—and a common one.

The question often stems from anticipation of EMA Policy 0070 Phase 2, which is expected to cover S-IPD in addition to the clinical study documents covered in Phase 1.

There are three important considerations in answering this question and making a conclusive determination.

  1. Anonymizing structured IPD for public release is not well suited to clinical trial data.

An integrated approach suggests participant re-identification risk is measured across data and documents. This approach assumes both the document and the data will be shared publicly.

Based on the nature of structured individual patient data, in contrast with the study documents, it is unlikely that the same level of public disclosure will be implemented. Public release of S-IPD would require removing most of the information to achieve a threshold of 0.09 (which equates to a minimal equivalence class of size 11).

Through our experience anonymizing different types of structured data for public release, the transformations needed to withstand privacy attacks would not be well suited to clinical trial data. The prevalent populations are relatively small and the number of variables in these datasets is very high.

Accordingly, we would expect the S-IPD disclosure to be subject to certain terms of use, like how S-IPD is shared today through research portals such as ClinicalStudyDataRequest.com and Vivli.

   2. A common threshold is not the same as an integrated approach.

A common threshold can be used across documents and data without necessarily integrating anonymization of the two.

A key aspect of risk-based anonymization is proper consideration of the context in which data is disclosed. If the terms of use for S-IPD differ from the terms of use for CSRs, this context should be considered in the risk determination. Even with the same threshold applied to both, the context will influence the degree of transformation needed.

   3. Analytic utility of both documents and data needs to be considered.

Another common question relates to the analytic utility of the documents and data in combination.

We often hear concerns related to being able to use the documents and data in combination. Often, these use cases can be satisfied by applying patient ID pseudonyms consistently across documents and data.

Risk-based anonymization balances transparency and privacy.

A risk-based anonymization approach balances two objectives: transparency and privacy.

Transparency is achieved by retaining utility in the data so that it can be used for secondary analysis and research. Privacy is achieved by ensuring sufficiently low risk of re-identification.

“One-size-fits-all” may not fit.

Integration can sound simple but often introduces unwanted trade-offs. As with many examples in practice, a one-size-fits-all approach to privacy can fail.

Many of the principles behind best practices in privacy and Privacy by Design are anchored in contextual alignment. Business goals should inform privacy approaches.

We have seen many cases where the context of disclosure suggests that integration is not the best approach, whether the goal is cost-reduction or research/analysis. And for compliance readiness, speculating too early may result in rework downstream. Considering the context of the release remains core to adopting the right de-identification solution.

Free Webinar: De-Identification 101

Join Privacy Analytics for a high level introduction of de-identification and data masking.
Watch now

Free Download: De-Id 101

You have Successfully Subscribed!