Real Anonymization vs Data Masking

After reading Kalev Leetaru’s article, The Big Data Era of Mosaicked Deidentification: Can We Anonymize Data Anymore?, there are a few things that we can agree on.

Leetaru’s article discusses how “anonymized” data sets are increasingly common and re-identified with ease. He cites Sweeney’s study from 2000 and notes several famously “anonymized” datasets that lead to re-identification, Netflix, AOL, and the NYC taxi debacle. These are all very persuasive examples that can prompt people to assume all anonymization is terrible and easily reversed.

He writes, “As more and more organizations begin to release sensitive datasets to the public, the data science community must spend more time thinking about how to safely and responsibility manage this flow of anonymized data that is the lifeblood of the big data era.” Privacy and data use are key ingredients when considering how anonymization can be incorporated into a data sharing work flow.

Real Anonymization vs Data Masking: Not the Same

But there is one point of disagreement. In his article, he talks about “anonymization”. Anonymization is the process of turning data into a form which does not identify individuals and where identification is not likely to take place. None of the examples in his article are examples of anonymization. They are examples of data masking though, and poorly done data masking at that. This distinction is key because there are people and organization that anonymize data effectively every day – but they don’t make the news like these sensationalized stories.

In Sweeney’s case, the de-identification performed wasn’t even compliant with HIPAA’s Safe Harbor method (the minimum standard for de-identifying PHI for secondary use). In the AOL example, the scheme used to anonymize patients failed to address the most identifying information of all – their search data! That data was immediately identifying – 56% of internet users have looked for themselves online.

When you incorporate a risk-based de-identification process, you can be confident that PHI in the data has truly been anonymized. That’s why so many standards and industry guidelines are advocating for this approach, including HITRUST, the Institute of Medicine and the European Medicines Agency.

Not all regulators and industry groups are ready to dismiss anonymization. To learn more about new and emerging standards around health data de-identification, don’t miss our webinar: De-identification 201.

Free Webinar: De-Identification 101

Join Privacy Analytics for a high level introduction of de-identification and data masking.
Watch now

Free Download: De-Id 101

You have Successfully Subscribed!