Re-identification Attacks

Re-identification attacks happen – but here’s what you can do about it

The biggest risk when sharing sensitive data for research and analytics lies in the re-identification of individuals in the data. Re-identification leads to a breach, which will damage your organization’s reputation and finances. Latest numbers show that in the event of a data breach, the average cost to the organization is $208 per re-identified individual. These costs include notification, legal fines, regulatory fines, and more. De-identification enables organizations to mitigate this risk, but it needs to be done well for the organization to be truly defensible.

Re-identification results when a record is correctly tied to the person behind that data, even if the data was thought to have been made anonymous. Re-identification attacks happen for various reasons – sometimes it’s because of monetary gain, personal gain, or curiosity. These attacks occur because an attacker has the skills, resources, and motivation to find people hidden in the de-identified data.

There are three types of attacks:

  • The first type of attack aims to re-identify a specific person and relies upon preexisting knowledge about a person known to exist in the de-identified database. We term the risk for this as Prosecutor Risk.
  • The second attack also aims to re-identify an individual but instead uses access to another source of public information about an individual or individuals that are also present in the de-identified dataset. We term this Journalist Risk.
  • The last attack involves re-identifying as many people as possible from the de-identified data even if this means some of them will be incorrectly identified. This last one is known as Marketer Risk.

All three impact the overall risk of re-identification for a dataset.

