Zephyr AI is a small, high-growth company with big ambitions. One of their aims is to predict adverse health outcomes and facilitate proactive healthcare interventions that improve patient outcomes and lower costs. By applying machine learning and artificial intelligence to real-world health data, the company aims to support personalized medicine in serious and chronic diseases. When Zephyr AI partnered with a leading healthcare provider to develop predictive analytics for clinicians treating type 2 diabetes, it knew that it must develop its own in-house data de-identification capabilities. An Eclipse software module from the Privacy Analytics Platform proved to be the flexible solution Zephyr AI needed.
An ongoing need to de-identify data for machine learning
Zephyr AI procures, ingests, and transforms data sets from multiple different sources as it develops its machine learning algorithms to train and validate its predictive models. Because those data sets can contain protected health information (PHI) and the models cannot be trained with PHI, Zephyr AI must ensure that the data is de-identified in compliance with applicable privacy laws, such as the Health Insurance Portability and Accountability Act (HIPAA).
"The main components of value for us have been scalability and applicability. We can use a single tool to perform multiple data transformations. So, we’re not limited to a single data model, and we don’t have to pay a consulting fee every time we need to de-identify a data set."
VP of Data Platform and Partnerships at Zephyr AI
When Zephyr AI launched its first machine learning project that involved ingesting PHI data from electronic health records (EHRs), it sought a data de-identification solution that would allow the company to be self-sufficient. “Paying an individual service fee every time we needed to de-identify data just wasn’t going to be scalable or feasible for our organization,” explained Amy Sheide, Zephyr AI’s VP Data Platform and Partnerships. “While this happened to be our first instance of receiving PHI, it will not be our last. It is not possible to limit our ML use cases to a single de-identified data schema. We must generate multiple data sets to use in training our models. We needed a solution that empowered us to transform the data in multiple iterations to develop our algorithms.”
The Privacy Analytics Solution
A self-service data de-identification capability
Recognizing that Zephyr AI was a start-up company, Privacy Analytics offered flexible terms for deploying the HIPAA Expert Determination module of its Eclipse software in the cloud. When applied to a data set, the module transforms data to ensure that the resulting output is in full compliance with HIPAA privacy standards.
Importantly – and critically for Zephyr AI – the tool allows the user to experiment with different redactions to arrive at a data set that is beneath the desired risk threshold. Nathaniel Tann, a data management specialist at Zephyr AI, noted, “We have an intuitive sense of what fields might be useful from a clinical perspective, but with the Privacy Analytics software we can run multiple experiments to confirm what would really be best. In this way, we’re able to balance data utility with reducing the likelihood of re-identification.” For example, the tool can reveal how the value of the data and possibility of re-identification would be affected by retaining or removing a data field such as the patient’s age.
The Eclipse software module includes a reporting feature that gives an overview of the de-identification output to help in making transformation decisions. Samantha Pindak, also a Zephyr AI data management specialist, added, “Very quickly, we can compare different transformations applied to the same data set and choose the ones that best fit our use case and what we want to accomplish with our model training.”
To ensure that Zephyr AI’s staff is able to use the software to its fullest, Privacy Analytics delivered an intensive, multi-day training program for representatives from different functions across Zephyr AI. The training opened with an in-depth explanation of Privacy Analytics’ de-identification methodology and how it supports various privacy regulations. It then introduced the Eclipse software and included a demonstration of its functionality with test data. The final phase was a workshop tailored to Zephyr AI’s use case in which staff were able to use the software on the company’s own data in a real-life application. Nathaniel and Samantha, who attended the training, described it as “very thorough” and “good preparation for running de-identification jobs post training.”
Privacy Analytics also supports Zephyr AI’s users with ongoing coaching to answer any questions that arise as they apply the tool and to offer clarification as needed concerning the results. “Privacy Analytics has always responded promptly when we had questions, and there’s comfort in knowing that we can reach out to a strong team if we need it,” said Samantha. “It’s been a very positive experience.”
The Client Results
On-demand ability to experiment
Zephyr AI’s data management specialists have been able to create and examine different de-identified data sets without having to rely on an expensive and time intensive external service. In some instances, they have run as many as 10 iterations on the same data. After using the Eclipse software, they have provided half a dozen de-identified data sets to their machine learning team to use in model training and evaluation. Data sets like these can help Zephyr AI to develop tools that extend and improve the quality of patients’ lives and enable clinicians to confidently predict and prevent adverse events.
Amy summed up the value of working with Privacy Analytics, “The main components of value for us have been scalability and applicability. We can use a single tool to perform multiple data transformations. So, we’re not limited to a single data model, and we don’t have to pay a consulting fee every time we need to de-identify a data set. As we enter into more partnerships with other data suppliers, we’ll be using the software again – and will be confident in our ability to do so.”