If your organization is focused on maximizing data value, while also protecting data as a core asset, you may be on the horns of a privacy dilemma. Why?
As an analytics leader or Chief Data Officer, you may be focused on leveraging your organization’s sensitive data assets to drive business impact — you’re likely looking at the potential of personal data.
As a Chief Privacy Officer or Data Protection Officer, your primary focus is likely the protection of personal data.
Whether you’re a CDO or CPO, in the context of U.S. healthcare, you could be up to your neck in HIPAA. And if you operate in the EU as a CDO, CPO, or DPO, or hold data of European residents, you’re in a similar situation with GDPR.
Three big data value questions
You’re likely consumed by three big questions surrounding data value:
- How can we mitigate our legal exposure?
- What’s the best way to unlock value from personal data—without compromising its utility through de-identification or anonymization?
- How can we increase our access to valuable data held by external suppliers?
To answer those three important questions, you’ll be looking for three things in this article:
- Learn how to bridge the gap between data protection and data value.
- Find out how de-identification and anonymization can unlock the most business value from personal data.
- Understand the pros and cons of approaches to de-identification and anonymization as they relate to driving business value of information (BVI) and cost value of information (CVI).
To help you achieve these three learnings, you’ll want to review the comparison chart below, which offers side-by-side comparisons to assist you in aligning the best de-identification anonymization approach to your specific business needs.

The need to focus on both offence and defense
If you’re a CDO, CPO, or DPO, you’re in a tricky position – sitting on the opposite side of the table to each other. The CDO advocates the effective use of data, while the CPO or DPO is mandated to adhere to privacy regulations and requirements. You need to work together. You need a strategy to unlock more value from personal data and, at the same time, remain judicious about protecting data privacy.
To use a sports analogy, as coaches of your team, you need to be equally focused on defense as on offence.
With all the compliance requirements, do you sometimes feel like you’re grinding it out on defense, and making little progress on your value-generating, business-transforming goals? Or perhaps, alternatively, you’re finally generating value—making tangible gains—but are kept awake at night by the fear you’re not fully compliant?
Learn what’s in the winners’ playbook
How about a glimpse into the playbook showing real business problems that some of the world’s top companies are solving with Privacy Analytics? Organizations with a leading record for maximizing data value, such as these:
- A top tier revenue cycle management company whose use of Privacy Analytics’ risk-based de-identification software resulted in healthcare providers improving the quality and efficiency of patient care.
- A multinational conglomerate with $30 billion in annual sales that worked with Privacy Analytics to anonymize large datasets, enabling revenue growth through information-related initiatives such as third-party research and analytics, as well as healthcare network improvements.
- A U.S.-based private research university that was able to demonstrate alignment with HIPAA de-identification while receiving reimbursements by insurance companies and the federal government.
These organizations are using effective anonymization techniques to not only commercialize sensitive data, but also to successfully employ data internally—reducing operational processes and taking full responsibility for sensitive data.
The common denominator for these teams is that they all leverage the added value beyond anonymization, with a repeatable process that minimizes subjectivity and can be operationalized successfully.
The bottom line: a win-win for data privacy and business value.
How to adopt an effective anonymization strategy
Many of the new, innovative and value-driving uses of data are enabled by anonymization because consent or authorization is impractical—sometimes even impossible—under privacy laws. Therefore, you need to be committed to anonymization to drive value.
However, if anonymization isn’t done well, several things can happen:
- You risk non-compliance with GDPR or non-compliance with HIPAA,
- Legal exposure, a negative impact on trust, and brand, reputation, and trust damage,
- You destroy the utility of data during the anonymization process.
That’s why you need an effective anonymization strategy: one that has operationalized a contextual evaluation of risk.
Let’s look at the two prevailing approaches to data privacy: risk-based and rules-based de-identification.
Risk-based versus rules-based approaches
When it comes to personal data, personally identifiable information (PII) and protected health information (PHI), producing a high-value dataset that meets specific needs must address privacy concerns. Doing so requires organizations to de-identify personal information using a risk-based approach that goes beyond simple masking techniques.
For a better understanding of why successful organizations choose a risk-based approach, consider the graphics later in this article. These graphics illustrate the major differences between the main approaches in a HIPAA context, which applies to U.S. companies working with PHI.
It’s worth noting that with the advent of GDPR, the bar has been raised. For companies who operate beyond the U.S., HIPAA-compliance is no longer ‘good enough’. And as new legislation is proposed and enacted, what was done yesterday to protect personal data may not be good enough tomorrow.
If you are in the healthcare sector in the U.S., you already know HIPAA Safe Harbor, a rules-based de-identification approach that outlines 18 different identifiers that must be removed from data that will be used for secondary purposes.
These examples are:
- Name
- Address (all geographic subdivisions smaller than state, including street address, city county, and zip code)
- All elements (except years) of dates related to an individual (including birthdate, admission date, discharge date, date of death, and exact age if over 89)
- Telephone numbers
- Vehicle identifiers like serial numbers and license plate numbers
- Fax number
- Email address
- Social Security Number
- Medical record number
- Health plan beneficiary number
- Account number
- Certificate or license number
- Any vehicle or other device serial number
- Web URL
- Internet Protocol (IP) Address
- Finger or voice print
- Photographic image – Photographic images are not limited to images of the face.
- Any other characteristic that could uniquely identify the individual
HIPAA Expert Determination, on the other hand, is a risk-based de-identification approach that applies statistical or scientific principles to provide a very small, quantifiable risk that an anticipated recipient of the data could identify an individual. This approach also requires that the methods and results of the analysis are documented in a defensible report.
Here, we’re comparing HIPAA Safe Harbor (a rules-based example that is easy to visualize) with HIPAA Expert Determination, which a risk-based example. To further simplify the illustration, we are only presenting a subset of data elements, and a small sample of records.
First, consider this example of raw data.
Please note that this example has been created for demonstration purposes only and is not intended to reference actual persons.

Raw personal data can’t be used for most secondary purposes. Although data utility would be high, the risk level would also be very high. And, as such, it would not be compliant under the HIPAA Privacy Rule, nor anonymized under the GDPR.
Next, an example of a HIPAA Safe Harbor (rules-based) approach:

Here, we often see a significant negative impact on data utility. Although the resulting data aligns with HIPAA, risk management is moderate (and can be low in some contexts).
Furthermore, many other fields are untouched by this method, anything beyond the 18 different data elements that must be removed from data under Safe Harbor. Anonymization under the GDPR requires that the identifiability of other data elements are assessed and risk is mitigated appropriately.
And, finally, a risk-based de-identification example, like Expert Determination:

With this approach, data utility remains high while the risk level is low. And, of critical importance, a risk-based approach can be aligned to both HIPAA de-identification and anonymization under GDPR (and most other privacy regulations globally) when the appropriate assessments have taken place and properly documented in a defensible report.
This is because a risk-based approach measures two critical aspects:
- Data Risk: A function of the dataset characteristics
- Context Risk: A function of where and how the de-identified data will be disclosed/used.
Higher-risk contexts require greater data transformations (data disruption/modification to reduce identifiability), whereas lower-risk contexts require less. The risk-based calibration accounts for the protective effects of closely secured analysis environments, preserving utility while ensuring privacy. As such, the opportunity also exists to improve data utility through improved technical and organizational controls, which provide a lower-risk context.
Actual transformations depend on the level of completeness of the dataset as well as context.
These illustrations are intended to help show why a risk-based approach makes the most sense for organizations seeking to unlock maximum value from data while ensuring alignment with privacy laws.
Complex algorithms; practical solutions
In some circles, there’s a perception that risk-based methods are difficult to implement. Although the underlying statistical methods used to assess the risk can be complex, risk-based methods can be oriented around practical implementation to focus on simple, understandable outcomes. There are open methodologies, training programs (e.g., HITRUST), and commercial software for scale and automation.
How do you make your best choice among de-identification and anonymization approaches?
As the data privacy landscape changes, being proactive is the winning strategy. Leading organizations around the globe are adopting best practice frameworks such as Privacy by Design. In fact, adoption of the Privacy by Design framework is now required by GDPR.
Leading organizations know that risk-based approaches to de-identification and anonymization pay off, in terms of innovation, efficiency and revenue.
So where do more and more of these leaders go for guidance regarding data de-identification or anonymization? The answer is Privacy Analytics.
Our executives, and our global team of data scientists and business analysts, are the trusted experts for healthcare, life science, medical technology and devices, finance and banking, transportation, communication, advertisement, IT and many other diversified companies worldwide.
Privacy Analytics’ client success stories include
- One of the world’s largest pharmaceutical multinationals was enabled to fulfill EMA (European Medicines Agency) Policy 0070 document releases.
- A software company using artificial intelligence to derive diagnoses from medical images sourced from a variety of hospital partners and hardware configurations, was enabled to apply the process of de-identification to the onboarding of new machines.
- A global provider of pre-operative planning software and intra-operative surgical robots, was enabled to implement Privacy and Security controls to ensure data recipients can manage data access and use appropriately.
Complex algorithms; practical solutions
Finally, as a primer on data de-identification and anonymization, we recommend The Five Safes of Risk-Based Anonymization whitepaper.