Does Data Masking Work?
Safe Harbor data masking looks like de-identification in disguise: don’t be fooled!
We have often heard that data masking does not work. There are a lot of data masking vendors in the marketplace that offer software and services which apply Safe Harbor techniques to diverse datasets. In theory, Safe Harbor’s approach effectively ensures HIPAA compliance and properly de-identifies data for secondary use. In actuality, there are a number of drawbacks to Safe Harbor which contribute to the impression that data masking is limited. So in the end, organizations are left to wonder: does data masking work?
Data masking refers to applying a set of transformative techniques to a dataset that remove the direct identifiers (variables that immediately identify individuals) and generalize elements like zip codes and dates. While effective at hiding individuals in the data, it offers very little information for secondary purposes. In short, data masking is pretty futile exercise.
Why is it so futile? There are a number of drawbacks to using data masking alone. As previously mentioned, data masking addresses direct identifiers. This is quite logical – after all, gleaning direct information about the individuals in the dataset is NOT what organizations want. They perform research and analytics on the indirect identifiers – information that while not immediately identifying, can potentially identify individuals when combined. Indirect identifiers can be tricky – after all, outliers pose risk. A hypothetical 98-year old Rwandan man in Juneau, Alaska would represent an outlier in the dataset. Without knowing his name, there is great risk in him being re-identified due to his age, ethnicity and location.
There are many other drawbacks to using data masking only – read them here, in our white paper: The Top 5 Drawbacks to Using Only Data Masking.
- Turn Data Assets into Business Opportunity Under CCPADecember 19, 2019
- Can you comply your way to greatness?November 21, 2019
- When to Integrate Anonymization of Documents and DataSeptember 26, 2019
- Deep-Diving into Re-identification: Perspectives On An Article In Nature CommunicationsSeptember 26, 2019
- Learning at Scale: Anonymizing Unstructured Data using AI/MLSeptember 26, 2019
- GDPR and The Future of Clinical Trials Data SharingMarch 18, 2019
- Advancing Principled Data Practices in Support of Emerging TechnologiesMarch 15, 2019
- “Zero Risk Does Not Exist”February 7, 2019
- Is Anonymization Possible with Current Technologies?January 9, 2019
- Comparing the benefits of pseudonymisation and anonymisation under the GDPRDecember 20, 2018