The Rise of Big Data in Healthcare

Over the past decade, a metamorphosis has been taking place in how health information is collected and stored. Gone are the days of written scripts and paper charts, replaced by real-time medical monitoring and the electronic health records (EHRs) of today.

Along with these changes to the clinical record, health data is no longer captured solely at the point of care. Increasingly, it is being augmented by information coming from patient activities like fitness tracking devices, mobile applications, social media posts and internet searches. Also contributing to this expanding digital data universe are other sources of data like gene sequencing, medical research findings, practice guidelines, and administrative data from billing and insurance claims. In fact, healthcare is one of the fastest growing segments of digital information on the planet, with an estimated annual growth rate of 48%. At this rate, it is expected that the amount of worldwide healthcare data will reach about 2,000 exabytes — or 2 trillion gigabytes — by 2020.

The enormous quantity of information being produced, the speed at which it is being generated and the mix of formats in which it is captured are the three defining characteristics of big data – volume, velocity and variety.

A fourth ‘V’ is also noted sometimes when discussing big data in healthcare – veracity. Veracity, or data quality, refers to the fact that the analysis done on the data is credible and error-free. The quality of healthcare data, particularly unstructured data, can be highly variable. In addition to issues typically seen in text data, like typos, medical text uses many abbreviations, has numerous variants to express some health conditions, and can see wide differences in the level of detail provided in practitioner notes. This makes the de-identification of medical text less clear-cut than other forms of data. Effective text anonymization requires expertise not only in masking and de-identification but also in the unique characteristics of this lexicon.

The Rise of Big Data in Healthcare is the second in the Big Data Analytics Series by Privacy Analytics. Next: Challenges in Big Data Analytics.

Free Webinar: De-Identification 101

Join Privacy Analytics for a high level introduction of de-identification and data masking.
Watch now

Free Download: De-Id 101

You have Successfully Subscribed!