Unlocking Insights in Unstructured Data

Unstructured data is the new frontier in healthcare. More and more organizations are recognizing that analyzing unstructured data is the future. Some estimates point to 90% of all data as being unstructured – including social media, text files, pdfs, word docs, video, and more. Unlocking the insights in unstructured data for healthcare, however, is so small feat.

Secondary data use, the sharing of personal information outside of direct healthcare delivery, is a significant challenge. Protecting personal healthcare information requires compliance with HIPAA and other jurisdictions’ legal requirements. Many jurisdictions have data breach notification laws. The costs of breach notification are high, representing on average $200 per individual and as much as $7 million per organization. The argument for secondary data use in healthcare is convincing, despite its risks. Redaction is a common practice when sharing unstructured data – but having no data is not a solution when sharing for research and analytics. These risks can be much more effectively mitigated with proper de-identification.

Secondary data enriches healthcare research and quality of care and delivery while accelerating the marketing of medical innovations. It can also improve the performance of providers, helping to lower overall healthcare costs. Leveraging data for secondary purposes also creates revenue streams for the collection and sale of healthcare data to third-parties, such as drug and device manufacturers, governments, payers, and researchers.

Sharing Any Data is a Risk Management Exercise

The challenge for many organizations leveraging unstructured data for secondary use is methodically evaluating the risk of sharing data. Responsible data sharing is a risk management exercise. Once the risk of re-identification is determined, from there, the appropriate level of de-identification can be established. De-identification of unstructured health data should balance privacy and utility…so how should organizations establish an approach that enables a repeatable, scalable and compliant analytic pipeline that can leverage unstructured data for secondary use?

A systematic approach is required, one which automates de-identification and risk analysis and is governed by rigorous compliance practices. The underlying management of unstructured health data must incorporate de-identification as a best practice when that data is used for secondary purposes. As a result, organizations can establish a common approach to de-identification. This approach will create a pipeline of flowing, granular, high quality data for research efforts and truly bring healthcare analytics into the 21st century.

There are software tools that can be used to leverage unstructured data for secondary use. Privacy Analytics Lexicon software tags PHI in PDFs, XML, Word files and more – and offers a flexible, secure solution for de-identification. Learn more here.

Free Webinar: De-Identification 101

Join Privacy Analytics for a high level introduction of de-identification and data masking.
Watch now

Free Download: De-Id 101

You have Successfully Subscribed!