Using Safe Harbor De-Identification

When is it appropriate to use Safe Harbor?

With the emergence of new standards and regulations in the field of de-identification, especially risk-based methodology modeled on HIPAA’s Expert Determination method, where does that leave Safe Harbor? The majority of these new standards advocate for going beyond Safe Harbor’s approach for de-identification. And of those that do recommend using Safe Harbor, they still recommend a second pass using a risk analysis.

Safe Harbor works by looking at 18 identifiers in the data. These focus on 16 direct identifiers, like name, email, patient id number, and 2 indirect identifiers, zip/post code and date. Once transformations are done to these 18 patient identifiers, the data can be released – provided there is no actual knowledge that any of the information left in the dataset can identify an individual. By following this standard of de-identification, an organization’s data is considered safe to release. Safe Harbor is relatively simple, straightforward and less costly to implement. These are very big reasons why its adoption rate is so high.


It’s important to realize that research and analytics aren’t done on direct identifiers. What about the indirect identifiers in a data set – those bits of information that do not immediately identify an individual but when combined, can identify an individual? This is the challenge with Safe Harbor: re-identification risk is high and analysis is rendered useless. Firstly, we know that linking dates of birth to zip codes can re-identify people. Secondly, Safe Harbor generalizes dates to the year and removes zip codes with populations under 20,000 people. Now we are getting into data that researchers and analysts want – and removing all utility of it. When tracking progression of a disease, for example, hospital admission dates are key bits of information that Safe Harbor removes. We’ve seen cases where organizations use selective Safe Harbor tactics to “get around” the analytical challenges of Safe Harbor, but that only increase the risk of re-identification.

When is using Safe Harbor advisable? Rarely ever if your organization wants real analytics.

Free Webinar: De-Identification 101

Join Privacy Analytics for a high level introduction of de-identification and data masking.
Watch now

Free Download: De-Id 101

You have Successfully Subscribed!