Using Safe Harbor De-Identification
When is it appropriate to use Safe Harbor?
With the emergence of new standards and regulations in the field of de-identification, especially risk-based methodology modeled on HIPAA’s Expert Determination method, where does that leave Safe Harbor? The majority of these new standards advocate for going beyond Safe Harbor’s approach for de-identification. And of those that do recommend using Safe Harbor, they still recommend a second pass using a risk analysis.
Safe Harbor works by looking at 18 identifiers in the data. These focus on 16 direct identifiers, like name, email, patient id number, and 2 indirect identifiers, zip/post code and date. Once transformations are done to these 18 patient identifiers, the data can be released – provided there is no actual knowledge that any of the information left in the dataset can identify an individual. By following this standard of de-identification, an organization’s data is considered safe to release. Safe Harbor is relatively simple, straightforward and less costly to implement. These are very big reasons why its adoption rate is so high.
It’s important to realize that research and analytics aren’t done on direct identifiers. What about the indirect identifiers in a data set – those bits of information that do not immediately identify an individual but when combined, can identify an individual? This is the challenge with Safe Harbor: re-identification risk is high and analysis is rendered useless. Firstly, we know that linking dates of birth to zip codes can re-identify people. Secondly, Safe Harbor generalizes dates to the year and removes zip codes with populations under 20,000 people. Now we are getting into data that researchers and analysts want – and removing all utility of it. When tracking progression of a disease, for example, hospital admission dates are key bits of information that Safe Harbor removes. We’ve seen cases where organizations use selective Safe Harbor tactics to “get around” the analytical challenges of Safe Harbor, but that only increase the risk of re-identification.
When is using Safe Harbor advisable? Rarely ever if your organization wants real analytics.
- Turn Data Assets into Business Opportunity Under CCPADecember 19, 2019
- Can you comply your way to greatness?November 21, 2019
- When to Integrate Anonymization of Documents and DataSeptember 26, 2019
- Deep-Diving into Re-identification: Perspectives On An Article In Nature CommunicationsSeptember 26, 2019
- Learning at Scale: Anonymizing Unstructured Data using AI/MLSeptember 26, 2019
- GDPR and The Future of Clinical Trials Data SharingMarch 18, 2019
- Advancing Principled Data Practices in Support of Emerging TechnologiesMarch 15, 2019
- “Zero Risk Does Not Exist”February 7, 2019
- Is Anonymization Possible with Current Technologies?January 9, 2019
- Comparing the benefits of pseudonymisation and anonymisation under the GDPRDecember 20, 2018