Motivated Intruder Testing – What, How, and Why?

Motivated Intruder Testing – What, How, and Why?

An article by Brian Rasquinha, Associate Director, Solution Architecture, Privacy Analytics

Many organizations are familiar with statistical assessments of identifiability, like Expert Determinations under HIPAA or anonymization assessments under other regulations like the GPDR. In these assessments, a technical expert evaluates the identifiability of the data by modeling the likelihood of an opportunity to identify someone in the data against the probability of success. Conversely, a Motivated Intruder Test, also known as adversarial testing, assesses the effectiveness of privacy protections practically by having a data analyst attempt to re-identify people in the data.

Both approaches are considered components of effective anonymization strategies in international standards and guidance, such as ISO/IEC 27559 and guidance released by the United Kingdom’s Information Commissioner’s Office (ICO). However, each approach is different in terms of how it is executed and how its findings are interpreted.

What is a Motivated Intruder Test?

A Motivated Intruder Test involves a data analyst attempting to re-identify individuals in a dataset deemed to be de-identified or anonymized. This is analogous to a penetration test in physical security, where a tester attempts to break into a physical space without the usual authorization, for example, by social engineering, defeating locks, or other security measures.

In a Motivated Intruder Test, an analyst attempts re-identification by flagging fields in the data they can reference externally and looking for individuals who are unique (or close to unique) in their combinations of those fields. For example, suppose a dataset pertains to a medical condition primarily affecting a specific demographic, like older adults. In that case, an analyst might look for very young patients in the data, cross-referencing media articles covering young patients with this condition.

For data that is not well de-identified or anonymized, this can yield high-confidence matches between the dataset and externally referenceable individuals.

Motivated Intruder Tests are conducted under some impactful practical constraints:

  • Effort constraints, where the project scope fits a pre-established budget of hours to attempt re-identification.
  • Financial constraints, which affect what data assets might be purchased and what data storage and computing power might be used in the re-identification attempt.
  • Technical constraints, in terms of analysis tools to detect potential identifiers, particularly in unstructured data like text, images, audio, or video.
  • Ethical constraints, like limiting access to datasets that may be available through illicit means, such as breaches or leaks, prohibiting contact with an individual who may be in the dataset to attempt to validate the identity, and prohibiting any additional monitoring or surveillance of an individual.

How do I interpret the findings of a Motivated Intruder Test?

The illustrative power of a Motivated Intruder Test can be stark. If the test has several high-confidence re-identifications, it is a clear “smoking gun” demonstration that the existing privacy protections applied to a dataset are not sufficient to mitigate re-identification. However, if the test does not yield any re-identification or near re-identification, re-identification could still be possible through other means.

The lack of success may be due to one or more of the constraints listed above or some bad (or is it good?) luck in the re-identification attempt. To bring back the physical security analogy—a burglar failing to break into a building does not mean the building cannot be broken into.

When and why would I choose a Motivated Intruder Test?

Re-identification risk determinations—statistical assessments of identifiability—provide an unambiguous evaluation incorporating assumptions or modeling about how the data might be breached. They are often used, being explicitly required under HIPAA Expert Determination and more implicitly by some guidance under the GDPR, like EMA Policy 0070.

Motivated Intruder Tests, on the other hand, provide an unambiguous illustration of data being re-identifiable in cases where a candidate re-identification is found. It also allows the assumptions and modeling used in statistical assessments to be effectively evaluated under real-life scenario testing.

The ISO/IEC 27559 standard recommends using both analytical techniques and motivated intruder testing to challenge assumptions and improve data-sharing strategies. ICO guidance also encourages the use of both assessments.

In short, Motivated Intruder Tests are a great option if you are looking to evaluate your assumptions, validate your methods, or provide additional evidence to support the strength of your de-identification or anonymization scheme. They offer practical proof points when results are positive and are illustrative when results are negative, which is particularly helpful in high-urgency or high-impact cases.

Contact the experts at Privacy Analytics to learn more about these or any of our other offerings.

Archiving / Destroying

Are you unleashing the full value of data you retain?

Your Challenges

Do you need help...

OUR SOLUTION

Value Retention

Client Success

Client: Comcast

Situation: California’s Consumer Privacy Act inspired Comcast to evolve the way in which they protect the privacy of customers who consent to share personal information with them.

Evaluating

Are you achieving intended outcomes from data?

Your Challenge

Do you need help...

OUR SOLUTION

Unbiased Results

Client Success

Client: Integrate.ai

Situation: Integrate.ai’s AI-powered tech helps clients improve their online experience by sharing signals about website visitor intent. They wanted to ensure privacy remained fully protected within the machine learning / AI context that produces these signals.

Accessing

Do the right people have the right data?

Your Challenges

Do you need help...

OUR SOLUTION

Usable and Reusable Data

Client Success

Client: Novartis

Situation: Novartis’ digital transformation in drug R&D drives their need to maximize value from vast stores of clinical study data for critical internal research enabled by their data42 platform.

 

Maintaining

Are you empowering people to safely leverage trusted data?

Your Challenges

Do you need help...

OUR SOLUTION

Security / compliance efficiency

CLIENT SUCCESS

Client: ASCO’s CancerLinQ

Situation: CancerLinQ™, a subsidiary of American Society of Clinical Oncology, is a rapid learning healthcare system that helps oncologists aggregate and analyze data on cancer patients to improve care. To achieve this goal, they must de-identify patient data provided by subscribing practices across the U.S.

 

Acquiring / Collecting

Are you acquiring the right data? Do you have appropriate consent?

Your Challenge

Do you need help...

OUR SOLUTIONS

Consent / Contracting strategy

Client Success

Client: IQVIA

Situation: Needed to ensure the primary market research process was fully compliant with internal policies and regulations such as GDPR. 

 

Planning

Are You Effectively Planning for Success?

Your Challenges

Do you need help...

OUR SOLUTION

Build privacy in by design

Client Success

Client: Nuance

Situation: Needed to enable AI-driven product innovation with a defensible governance program for the safe and responsible use
of voice-to-text data under Shrems II.

 

Join the next 5 Safes Data Privacy webinar

This course runs on the 2nd Wednesday of every month, at 11 a.m. ET (45 mins). Click the button to register and select the date that works best for you.