In 2024, we shared several short articles about de-identified or anonymized data. Here, we take a look at how they intersect.
AI was a dominant topic, with much focus on the promise of AI tools and some of their inherent privacy concerns. Emerging guidance on defensible AI includes approaches to ensure the responsible development of these tools.
De-identification or anonymization is a powerful privacy tool that can support defensibility in analytics, product development, and AI applications. A de-identified or anonymized dataset could be drawn from one source, but some applications require organizations to link datasets together privately without sharing key identifiers. Our primer on tokenization and linkage concepts can help you understand the challenges and solutions.
Even if each linked dataset is already de-identified or anonymized, the linkage can affect the overall identifiability of the data. As such, the linkage must be managed carefully, and there are multiple approaches to consider.
Once you have a dataset, if you need to assess its identifiability, you might perform a re-identification risk determination (RRD), such as a HIPAA Expert Determination or similar assessment. We shared our Best Practices for RRDs, informed by the hundreds of such projects we perform yearly.
One consideration for an RRD is the context of the data release, which can often highly impact overall identifiability (and thus, the risks of re-identification).
RRDs are usually sought for structured, tabular data but can also be performed on unstructured data like plain text, for which several organizations have built effective, defensible pipelines. With the advent of AI, there has also been an increased focus on de-identifying DICOM medical images.
One approach to managing risk as part of an RRD is to use differential privacy techniques. These techniques inject randomness into the data to protect outputs (such as datasets or queries).
Finally, a de-identified or anonymized dataset can be validated for privacy protections with a motivated intruder test. In this test, someone tries to re-identify the data to determine whether doing so is practical.
Contact the experts at Privacy Analytics to learn more about data de-identification, anonymization, or other privacy topics.