Unlocking Insights in Unstructured Data
Unstructured data is the new frontier in healthcare. More and more organizations are recognizing that analyzing unstructured data is the future. Some estimates point to 90% of all data as being unstructured – including social media, text files, pdfs, word docs, video, and more. Unlocking the insights in unstructured data for healthcare, however, is so small feat.
Secondary data use, the sharing of personal information outside of direct healthcare delivery, is a significant challenge. Protecting personal healthcare information requires compliance with HIPAA and other jurisdictions’ legal requirements. Many jurisdictions have data breach notification laws. The costs of breach notification are high, representing on average $200 per individual and as much as $7 million per organization. The argument for secondary data use in healthcare is convincing, despite its risks. Redaction is a common practice when sharing unstructured data – but having no data is not a solution when sharing for research and analytics. These risks can be much more effectively mitigated with proper de-identification.
Secondary data enriches healthcare research and quality of care and delivery while accelerating the marketing of medical innovations. It can also improve the performance of providers, helping to lower overall healthcare costs. Leveraging data for secondary purposes also creates revenue streams for the collection and sale of healthcare data to third-parties, such as drug and device manufacturers, governments, payers, and researchers.
Sharing Any Data is a Risk Management Exercise
The challenge for many organizations leveraging unstructured data for secondary use is methodically evaluating the risk of sharing data. Responsible data sharing is a risk management exercise. Once the risk of re-identification is determined, from there, the appropriate level of de-identification can be established. De-identification of unstructured health data should balance privacy and utility…so how should organizations establish an approach that enables a repeatable, scalable and compliant analytic pipeline that can leverage unstructured data for secondary use?
A systematic approach is required, one which automates de-identification and risk analysis and is governed by rigorous compliance practices. The underlying management of unstructured health data must incorporate de-identification as a best practice when that data is used for secondary purposes. As a result, organizations can establish a common approach to de-identification. This approach will create a pipeline of flowing, granular, high quality data for research efforts and truly bring healthcare analytics into the 21st century.
There are software tools that can be used to leverage unstructured data for secondary use. Privacy Analytics Lexicon software tags PHI in PDFs, XML, Word files and more – and offers a flexible, secure solution for de-identification. Learn more here.
- Can you comply your way to greatness?November 21, 2019
- When to Integrate Anonymization of Documents and DataSeptember 26, 2019
- Deep-Diving into Re-identification: Perspectives On An Article In Nature CommunicationsSeptember 26, 2019
- Learning at Scale: Anonymizing Unstructured Data using AI/MLSeptember 26, 2019
- Early Impact of Health Canada’s New GuidelinesJune 21, 2019
- GDPR and The Future of Clinical Trials Data SharingMarch 18, 2019
- Advancing Principled Data Practices in Support of Emerging TechnologiesMarch 15, 2019
- “Zero Risk Does Not Exist”February 7, 2019
- Is Anonymization Possible with Current Technologies?January 9, 2019
- Comparing the benefits of pseudonymisation and anonymisation under the GDPRDecember 20, 2018