De-identifying Unstructured Data
There is a surplus of analytics waiting to be tapped by de-identifying unstructured data resources
Often overlooked, unstructured data is a rich source of information. This data represents text data found in medical and insurance forms, physician and nurses’ notes, consultation letters, and radiology and pathology reports – to name but a few. Several years ago, IBM forecasted that 80% of medical data will be unstructured. They also indicated that this body of data will be doubling every five years. This means we are now looking at petabytes of data that will be encrypted and siloed away. The idea of de-identifying unstructured data and unlocking its potential is growing.
While unstructured data may contain a treasure trove of information, it is not without its own challenges. Let’s consider electronic health records (EHRs) as an example. As just one source of unstructured data, they represent a rich source of free text. The free text can exist as fields in a database, standardized XML files, or as simple text file feeds from medical records or medical devices. The value of text data comes from the summaries, observations, notes and transcriptions.
Using unstructured text data for secondary purposes is slowly becoming reality. De-identified unstructured data can be used to lower healthcare costs, improve patient outcomes, improve quality of care and delivery, and add new revenue streams for the organization. For these purposes and more, the de-identification of unstructured data is crucial – but it needs to be de-identified well. Given the various and abstract formats of unstructured data, a solid plan is required to properly de-identify the data and mitigate risk.
There are major considerations that need to be in place when opting to leverage unstructured data for secondary use. Get them in our white paper, De-Identification of Unstructured Data, and learn more about how your organization can benefit from the surplus of unstructured data you already have.
- Can you comply your way to greatness?November 21, 2019
- When to Integrate Anonymization of Documents and DataSeptember 26, 2019
- Deep-Diving into Re-identification: Perspectives On An Article In Nature CommunicationsSeptember 26, 2019
- Learning at Scale: Anonymizing Unstructured Data using AI/MLSeptember 26, 2019
- Early Impact of Health Canada’s New GuidelinesJune 21, 2019
- GDPR and The Future of Clinical Trials Data SharingMarch 18, 2019
- Advancing Principled Data Practices in Support of Emerging TechnologiesMarch 15, 2019
- “Zero Risk Does Not Exist”February 7, 2019
- Is Anonymization Possible with Current Technologies?January 9, 2019
- Comparing the benefits of pseudonymisation and anonymisation under the GDPRDecember 20, 2018