Privacy Analytics Lexicon gives medical researchers and data analysts access to valuable insights contained in unstructured data, while allowing data managers to safeguard personal information and ensure regulatory compliance


The growing volumes of unstructured data from multiple sources heighten organizations’ susceptibility to potential data breaches. With Privacy Analytics Lexicon, data managers safeguard data assets by extending de-identification to unstructured data.


Automate the de-identification of unstructured data from multiple sources to gain richer analytic value and insight

bar chart

Enhance the value of data assets by maintaining the relationships within de-identified data for more granular, higher quality analyses

arrows pointing up

Mitigate the risk of re-identification by detecting personal information, while also applying different de-identification techniques to preserve its relative analytic quality


patient info

Lexicon enables both redaction and de-identification of protected health information, found in physician notes, CAT scans and other unstructured formats. Whether stored in databases, xml, text files or documents, data analysts can de-identify these files so that the valuable insights are retained. Compliance with risk-based de-identification standards can be operationalized.


Lexicon allows analytic professionals to configure the redaction or de-identification of unstructured data, preparing it for secondary use and analysis, by:

magnifying glass

Discovering and annotating personal information residing in multiple text formats and fields in RDBMS sources, including ID’s, credit cards, driver licenses and medical codes


Improving the quality of de-identified data by replacing values with realistically similar values that support research and analysis

map pin

Allowing for better insight into temporal and geo-spatial data by preserving the granularity of dates and zip codes

comparing two documents

Evaluating the measurement and tuning of precision and recall by comparing pre-determined samples of a dataset

Our Integrated Discovery Programs rely on the ability to collaborate among scientists, clinicians, industry representatives and patient advocacy groups. An important part of this collaboration is data sharing. Lexicon software allows us to extract valuable information from unstructured data sources that would otherwise not be accessible.
- Francis Jeanson, Manager, Informatics & Analytics at Ontario Brain Institute

See how Privacy Analytics Lexicon can work for you.

Contact us to learn more.

Free Webinar: De-Identification 101

Join Privacy Analytics for a high level introduction of de-identification and data masking.
Watch now

Free Download: De-Id 101

You have Successfully Subscribed!