Privacy Analytics - Risk-based Anonymization vs. a motivated intruder

Privacy Analytics > Risk-based Anonymization vs. a motivated intruder

Risk-based Anonymization vs. a motivated intruder

Risk-based anonymization vs. a motivated intruder

– and the results are in…

Written by Niamh McGuinness, Senior CTT Analyst

The public sharing of clinical documents has trial sponsors caught in a tug-of-war between privacy and utility – how do you anonymize these documents to protect participant privacy without destroying their usefulness for research and secondary analysis?

We believe risk-based anonymization is the answer. Unlike blunt-force methods like redaction, anonymization can be applied to transform the data just enough to ensure the identifiability of a clinical trial participant is sufficiently low. The beauty of this “just enough” approach is that it can most effectively achieve the delicate balance between participant privacy and data utility – thus achieving true clinical trial transparency. After all, there is nothing transparent about an opaque redaction box.

Thinking like a ‘motivated intruder’

“OK,” you say, “that all sounds well and good, but does it work?”

With a risk-based approach, some of an individual’s original demographic information may be simply generalized or even retained. This can justifiably make some people hesitant to stray from the brute force approach of redacting everything of interest.

A multinational pharmaceutical company recently funded a study to demonstrate how risk-based anonymization, and Privacy Analytics’ methodology in particular, is robust enough to withstand a commissioned privacy attack.

We had our team anonymize a trial document following EMA guidance on the implementation of EMA Policy 0070. A research organization was then contracted to play the role of a “motivated intruder” and make a concerted effort to identify any of the participants included in the document.

Which data were used?

The document under scrutiny came from a clinical trial that ran in the spring of 2015 for the anti-inflammatory prescription drug Nepafenac. This randomized, double-masked, controlled study assessed the safety and efficacy of a formulation of the drug specifically for diabetic subjects following cataract surgery. Trial subjects must have been 18 years of age or older, must have had a cataract, and must have been planning to undergo cataract extraction by phacoemulsification (a modern approach to cataract surgery). Subjects must also have had a history of diabetes and diabetic retinopathy.

Consider this combination of criteria, plus the inclusion of some demographics, visit dates, and medical events captured during the trial – short of a name and an address, that’s a lot of information that could be used to identify an individual patient.

Following guidelines from the UK’s Information Commissioner’s Office (ICO) and the Office of National Statistics (ONS), we define our motivated intruder as an individual without any specialized knowledge. This person is not a dedicated hacker and unlikely to resort to criminal means. They have a personal computer and an Internet connection, access to public datasets and the intelligence to apply some basic investigative techniques.

The investigators got to work, digging up any information in the public domain that could be tied to any of the trial’s participants: clinical reports, death records, hospital discharge records, voter registration records, records of attempts to re-contact patients, even Ancestry.com. Social media channels were also explored, since many people voluntarily disclose a lot of information about themselves on platforms like Facebook. The investigators even pursued FDA and EMA Freedom of Information Act (FOIA) requests, since the trial participants under the privacy attack were American.

What did these ‘motivated intruders’ accomplish?

Out of 500 trial participants, the investigators came up with six potential matches. Even then, these six remained at low-confidence of being identified, well below acceptable limits suggested in guidance. Four of these potential matches were a complete guess, and wo were a partial guess, for a 39 per cent confidence rating for identification.

Not only that, but it took about 24 hours of work per patient to arrive at these potential matches – far more effort than is recommended in guidance on conducting such a test, because a real motivated intruder is unlikely to commit this level of effort before giving up.

In other words, the realistic risk of identification was low enough to meet the test of reasonableness. This is key – privacy regulations predominantly rely on tests of reasonableness, rather than absolutes.

Empirical evidence makes the case

To our knowledge, this is the first documented empirical evidence that risk-based anonymization of clinical study reports could withstand a commissioned privacy attack. Of particular interest is that these documents were transformed to ensure the richest data for the lowest risk– thus most effectively contributing to secondary research and improved clinical outcomes.

Client Success

Client: Nuance

Situation: Needed to enable AI-driven product innovation with a defensible governance program for the safe and responsible use
of voice-to-text data under Shrems II.

Visit Nuance.com to read the full story >

Join the next 5 Safes Data Privacy webinar

This course runs on the 2nd Wednesday of every month, at 11 a.m. ET (45 mins). Click the button to register and select the date that works best for you.