Guess again: De-identification does work

As I scan the media, I tend to notice a lot of arguments around the notion that de-identification does not work or that you can’t de-identify health data. This blog post serves as my standard rebuttal to those very arguments.

The central argument by the naysayers tends to center around re-identification attacks of data that were not properly de-identified in the first place. When presented with this type of argument, the first  question should be, what type of data was it? Then, has it been de-identified properly? If the response doesn’t involve mention of a specific standard or discussion around risk then I know that the logic is flawed since those two elements are fundamental to a properly de-identified dataset.

There have been a small number of examples, through real and commissioned attacks, where the data has been de-identified properly (using a particular standard or methodology) and the success rate was very small, varying from 0 to a very small number, 0.013% in one case. So the narrative around the “de-identification doesn’t work” argument is faulty in that the story is being retold in the absence of actual evidence.

Why is the argument so prevalent?

I think part of the problem can be attributed to the concept of “confirmation bias” found in behavioral economics.  Confirmation bias is the tendency to search for, interpret, or recall information in a way that confirms or reinforces one’s beliefs. The problem here is that the naysayer argument becomes a red herring to a conversation that should be centered on increasing the adoption of good practices. Rather, it serves to distract and inhibit the many benefits of sharing de-identified health data for secondary purposes. This is best exemplified by the Washington State re-identification attack in 2013. The impact was immediate as the State of Washington reduced their willingness to share data. Their reaction was unfortunate because it stifled access to valuable datasets that was  important for public health and other types of analytics. Thankfully, it was temporary (even though temporary was still a long period of time). This example shows the negative impact red herrings have on the ability for researchers to gain access to very valuable data.  The reality is limiting access of health data for secondary purposes stifles research and innovation that could lead to the betterment of all of us.

Additional reading on the subject can be found here:

Big Data and Innovation, Setting the Record Straight: De-identification Does Work

Do we have to worry about re-identification attacks upon our health data?

A Systematic Review of Re-Identification Attacks on Health Data

 

@swehbe

Free Webinar: De-Identification 101

Join Privacy Analytics for a high level introduction of de-identification and data masking.
Watch now

Free Download: De-Id 101

You have Successfully Subscribed!