The Industry’s First Anonymization Solution

National Institutes of Health Selects Privacy Analytics to Anonymize Unstructured Data for Analysis

WASHINGTON, D.C.  (November 12, 2013) – Building on its award-winning anonymization software, PARAT, Privacy Analytics Inc. ( today announced the availability of PARAT 5.3, which extends its de-identification and masking capabilities to unstructured data.  According to one organization, 80 percent of medical record data will be unstructured within two years, a critical source of analysis to ascertain new insights, innovation and knowledge for research hospitals and organizations, medical device companies, and insurance and medical claims providers, among others.[1]   The challenge for many organizations is accessing and analyzing unstructured and structured data, while ensuring that the personal information it contains is robustly protected under HIPAA and other legal requirements.

The Biomedical Translational Research Information System (BTRIS) at the National Institutes of Health (NIH) Clinical Center, a biomedical research facility and an agency of the United States Department of Health and Human Services (DHHS), recently purchased (through a competitive bidding process) PARAT Text software, the standalone module of PARAT 5.3.  The BTRIS trans-NIH clinical research data repository plans to anonymize unstructured text data from more than 400,000 patients for research purposes using PARAT Text.   The government agency selected PARAT Text to augment the data currently available in “de-identified format” within the BTRIS repository. The addition of unstructured text data without personal identifiers to the repository will allow researchers access to NIH Clinical Center clinical documentation from 1976 to the present.  Access to clinical documentation in addition to structured data in de-identified form allows researchers to test hypotheses for new research, confirm potential sample sizes for proposed research and find collaborators for cross-disciplinary research studies.

“PARAT 5.3 allows organizations to safeguard their data, while at the same time enabling them to gain richer analysis from an integrated solution that marries structured and unstructured anonymization,” said Khaled El Emam, CEO, Privacy Analytics Inc. “The software matches structured and unstructured data values, to ensure the consistency and integrity of data, while also tying masked personal information to corresponding anonymized unstructured text for richer analysis.  This allows privacy officers to safeguard personal information across their enterprise and statisticians and data analytic professionals to leverage it for secondary use.”

Using a risk-based methodology to anonymize personal information in accordance with HIPAA and other legal requirements, PARAT 5.3 automates the masking and de-identification of data in standard database tables and text or XML-based documents.  Very often this unstructured data resides in text-based formats, including, for example, in:

  1. Electronic health records where personal  information resides in free form text, is often exported in XML format, and must be anonymized for analysis;
  2. Medical devices where unstructured data or free form text from machine “dumps” or downloads (i.e. x-ray machines or CAT scans) are sent to a database(s) for analysis; and,
  3. On-line Forums where patients or providers discuss their conditions or cases, and this narrative needs to be anonymized to facilitate sentiment analysis and other forms of information extraction.

PARAT masks or renders personal information, such as names, phone numbers and medical record numbers (MRNs) unrecognizable, which due to the resulting obfuscation prevents its analysis.   In the same database, however, PARAT 5.3 can de-identify or alter indirect personal identifiers, such as date of birth, medical facility name, and ZIP or postal code, to enable high quality, aggregate and individual-level analysis while protecting personal information at the same time.

About Privacy Analytics

Privacy Analytics ( provides organizations with enterprise software to safeguard, operationalize and enable data for secondary use.  It is the only company to offer its customers software, peer-reviewed methodology and valued-added services that protect the privacy of individuals when conducting critical research and complex analytics.  PARAT and PARAT Text are the industry’s most comprehensive software that enables the analysis of data for secondary use by integrating the anonymization of structured and unstructured information from multiple sources in compliance with HIPAA and other legal requirements.

For more information, please contact us:

1.613.369.4313 x2

[1] Source: ZDNet, April 9, 2013, “Within Two Years, 80% of All Medical Data Will Be Unstructured.”

Free Webinar: De-Identification 101

Join Privacy Analytics for a high level introduction of de-identification and data masking.
Watch now

Free Download: De-Id 101

You have Successfully Subscribed!