The Problem with Data Masking Techniques
It is no longer a secret – many healthcare organizations are sharing health data. More standards are emerging as a result, including new standards by NIST on de-identifying datasets. Let’s be clear: data masking and de-identification are not the same. Believing data masking techniques are on par with proper de-identification could prove costly. The risks to your organization’s brand, reputation, and overall bottom line are not worth confusing these methods.
Problem 1: Data masking techniques do not use metrics to measure the actual risk of re-identification. It is not always possible to know whether the transformations performed on the data were considered sufficient to de-identify it and can be deemed defensible.
Solution: Combining masking with de-identification techniques provides the risk-measurement-based approach that is needed to safeguard privacy. Using a risk-based approach ensures that the correct techniques are used and provides for the best protection. As stated by the HHS, “Patient demographics could be classified as high-risk features. In contrast, lower risk features are those that do not appear in public records or are less readily available.”
Problem 2: Data masking only deals with direct identifiers. Data masking techniques typically attempt to eliminate direct identifiers. Direct identifiers are data fields that can be used alone to uniquely identify individuals, like name, email address or Social Security Number. Typically, direct identifiers are not used in statistical analyses.
Solution: Distinguish what types of identifiers are in your data. Quasi-identifiers are fields that can identify individuals and are also useful for data analysis. Examples of these include dates, demographic information, such as race and ethnicity, and socioeconomic variables, like occupation and income. This distinction is important because the drawback of dealing with only direct identifiers is that the risk exposure from the indirect identifiers remains.
Problem 3: Masking effectively eliminates the analytic utility. Many masking techniques destroy the data utility of the masked fields. Masking should only be used on fields that will not require any analytics.
Solution: Find better, proven methods of de-identifying that will help keep data quality high. At the end of the day, the data is being masked so it can be used for secondary purposes, like research, post-marketing surveillance, monetization, and analytics. These efforts deserve granular, high-value data. De-identification is a risk-management exercise; by learning the risk and managing them, your organization can reap the rewards.
Learn more about data masking pitfalls in our whitepaper: Avoid the Blur of Data Masking. Download it here.
You might also like:

Does Data Masking Work?
Safe Harbor data masking looks like de-identification in disguise: don’t be fooled! We have often heard that data masking does…

False Promise of Data Masking
Let me start with some good news. Increasingly, IT departments that are recognizing that they need to protect the privacy…

Against “Mickey Mouse” Data Security
Khaled El Emam Talks to Cantech Letter Terry Dawes from Cantech Letter recently interviewed Privacy Analytics CEO Khaled El Emam…

Safe Harbor Versus The Statistical Method
Understand more about the key differences between Safe Harbor and the Statistical Method under HIPAA’s privacy rule in this white paper.
From the blog
- Can you comply your way to greatness?November 21, 2019
- When to Integrate Anonymization of Documents and DataSeptember 26, 2019
- Deep-Diving into Re-identification: Perspectives On An Article In Nature CommunicationsSeptember 26, 2019
- Learning at Scale: Anonymizing Unstructured Data using AI/MLSeptember 26, 2019
- Early Impact of Health Canada’s New GuidelinesJune 21, 2019
Recent News
- GDPR and The Future of Clinical Trials Data SharingMarch 18, 2019
- Advancing Principled Data Practices in Support of Emerging TechnologiesMarch 15, 2019
- “Zero Risk Does Not Exist”February 7, 2019
- Is Anonymization Possible with Current Technologies?January 9, 2019
- Comparing the benefits of pseudonymisation and anonymisation under the GDPRDecember 20, 2018