Risk-based Anonymization vs. a motivated intruder

Risk-based Anonymization vs. a motivated intruder

Risk-based anonymization vs. a motivated intruder

– and the results are in…

Written by Niamh McGuinness, Senior CTT Analyst

The public sharing of clinical documents has trial sponsors caught in a tug-of-war between privacy and utility – how do you anonymize these documents to protect participant privacy without destroying their usefulness for research and secondary analysis?

We believe risk-based anonymization is the answer. Unlike blunt-force methods like redaction, anonymization can be applied to transform the data just enough to ensure the identifiability of a clinical trial participant is sufficiently low.  The beauty of this “just enough” approach is that it can most effectively achieve the delicate balance between participant privacy and data utility – thus achieving true clinical trial transparency.  After all, there is nothing transparent about an opaque redaction box.

Thinking like a ‘motivated intruder’

“OK,” you say, “that all sounds well and good, but does it work?”

With a risk-based approach, some of an individual’s original demographic information may be simply generalized or even retained. This can justifiably make some people hesitant to stray from the brute force approach of redacting everything of interest.

A multinational pharmaceutical company recently funded a study to demonstrate how risk-based anonymization, and Privacy Analytics’ methodology in particular, is robust enough to withstand a commissioned privacy attack.

We had our team anonymize a trial document following EMA guidance on the implementation of EMA Policy 0070. A research organization was then contracted to play the role of a “motivated intruder” and make a concerted effort to identify any of the participants included in the document.

Which data were used?

The document under scrutiny came from a clinical trial that ran in the spring of 2015 for the anti-inflammatory prescription drug Nepafenac. This randomized, double-masked, controlled study assessed the safety and efficacy of a formulation of the drug specifically for diabetic subjects following cataract surgery. Trial subjects must have been 18 years of age or older, must have had a cataract, and must have been planning to undergo cataract extraction by phacoemulsification (a modern approach to cataract surgery). Subjects must also have had a history of diabetes and diabetic retinopathy.

Consider this combination of criteria, plus the inclusion of some demographics, visit dates, and medical events captured during the trial – short of a name and an address, that’s a lot of information that could be used to identify an individual patient.

Following guidelines from the UK’s Information Commissioner’s Office (ICO) and the Office of National Statistics (ONS), we define our motivated intruder as an individual without any specialized knowledge. This person is not a dedicated hacker and unlikely to resort to criminal means. They have a personal computer and an Internet connection, access to public datasets and the intelligence to apply some basic investigative techniques.

The investigators got to work, digging up any information in the public domain that could be tied to any of the trial’s participants: clinical reports, death records, hospital discharge records, voter registration records, records of attempts to re-contact patients, even Ancestry.com. Social media channels were also explored, since many people voluntarily disclose a lot of information about themselves on platforms like Facebook. The investigators even pursued FDA and EMA Freedom of Information Act (FOIA) requests, since the trial participants under the privacy attack were American.

What did these ‘motivated intruders’ accomplish?

Out of 500 trial participants, the investigators came up with six potential matches. Even then, these six remained at low-confidence of being identified, well below acceptable limits suggested in guidance. Four of these potential matches were a complete guess, and wo were a partial guess, for a 39 per cent confidence rating for identification.

Not only that, but it took about 24 hours of work per patient to arrive at these potential matches – far more effort than is recommended in guidance on conducting such a test, because a real motivated intruder is unlikely to commit this level of effort before giving up.

In other words, the realistic risk of identification was low enough to meet the test of reasonableness. This is key – privacy regulations predominantly rely on tests of reasonableness, rather than absolutes.

Empirical evidence makes the case

To our knowledge, this is the first documented empirical evidence that risk-based anonymization of clinical study reports could withstand a commissioned privacy attack. Of particular interest is that these documents were transformed to ensure the richest data for the lowest risk– thus most effectively contributing to secondary research and improved clinical outcomes.

Our risk-based methodology and the expertise of our team was put to the test and we passed.

Looking for more great content?

Browse the resource library for articles, case studies, videos and more.

Archiving / Destroying

Are you unleashing the full value of data you retain?

Your Challenges

Do you need help...

OUR SOLUTION

Value Retention

Client Success

Client: Comcast

Situation: California’s Consumer Privacy Act inspired Comcast to evolve the way in which they protect the privacy of customers who consent to share personal information with them.

Evaluating

Are you achieving intended outcomes from data?

Your Challenge

Do you need help...

OUR SOLUTION

Unbiased Results

Client Success

Client: Integrate.ai

Situation: Integrate.ai’s AI-powered tech helps clients improve their online experience by sharing signals about website visitor intent. They wanted to ensure privacy remained fully protected within the machine learning / AI context that produces these signals.

Accessing

Do the right people have the right data?

Your Challenges

Do you need help...

OUR SOLUTION

Usable and Reusable Data

Client Success

Client: Novartis

Situation: Novartis’ digital transformation in drug R&D drives their need to maximize value from vast stores of clinical study data for critical internal research enabled by their data42 platform.

 

Maintaining

Are you empowering people to safely leverage trusted data?

Your Challenges

Do you need help...

OUR SOLUTION

Security / compliance efficiency

CLIENT SUCCESS

Client: ASCO’s CancerLinQ

Situation: CancerLinQ™, a subsidiary of American Society of Clinical Oncology, is a rapid learning healthcare system that helps oncologists aggregate and analyze data on cancer patients to improve care. To achieve this goal, they must de-identify patient data provided by subscribing practices across the U.S.

 

Acquiring / Collecting

Are you acquiring the right data? Do you have appropriate consent?

Your Challenge

Do you need help...

OUR SOLUTIONS

Consent / Contracting strategy

Client Success

Client: IQVIA

Situation: Needed to ensure the primary market research process was fully compliant with internal policies and regulations such as GDPR. 

 

Planning

Are You Effectively Planning for Success?

Your Challenges

Do you need help...

OUR SOLUTION

Build privacy in by design

Client Success

Client: Nuance

Situation: Needed to enable AI-driven product innovation with a defensible governance program for the safe and responsible use
of voice-to-text data under Shrems II.

 

Join the next 5 Safes Data Privacy webinar

This course runs on the 2nd Wednesday of every month, at 11 a.m. ET (45 mins). Click the button to register and select the date that works best for you.