Zephyr AI benefits from Eclipse software to support machine learning

Zephyr AI benefits from Eclipse software to support machine learning

Self-service de-identification solution from Privacy Analytics provides needed flexibility

Zephyr AI is a small, high-growth company with big ambitions. One of their aims is to predict adverse health outcomes and facilitate proactive healthcare interventions that improve patient outcomes and lower costs. By applying machine learning and artificial intelligence to real-world health data, the company aims to support personalized medicine in serious and chronic diseases. When Zephyr AI partnered with a leading healthcare provider to develop predictive analytics for clinicians treating type 2 diabetes, it knew that it must develop its own in-house data de-identification capabilities. An Eclipse software module from the Privacy Analytics Platform proved to be the flexible solution Zephyr AI needed.

The Challenge

An ongoing need to de-identify data for machine learning

Zephyr AI procures, ingests, and transforms data sets from multiple different sources as it develops its machine learning algorithms to train and validate its predictive models. Because those data sets can contain protected health information (PHI) and the models cannot be trained with PHI, Zephyr AI must ensure that the data is de-identified in compliance with applicable privacy laws, such as the Health Insurance Portability and Accountability Act (HIPAA).

"The main components of value for us have been scalability and applicability. We can use a single tool to perform multiple data transformations. So, we’re not limited to a single data model, and we don’t have to pay a consulting fee every time we need to de-identify a data set."

Amy Sheide,

VP of Data Platform and Partnerships at Zephyr AI

When Zephyr AI launched its first machine learning project that involved ingesting PHI data from electronic health records (EHRs), it sought a data de-identification solution that would allow the company to be self-sufficient. “Paying an individual service fee every time we needed to de-identify data just wasn’t going to be scalable or feasible for our organization,” explained Amy Sheide, Zephyr AI’s VP Data Platform and Partnerships. “While this happened to be our first instance of receiving PHI, it will not be our last. It is not possible to limit our ML use cases to a single de-identified data schema. We must generate multiple data sets to use in training our models. We needed a solution that empowered us to transform the data in multiple iterations to develop our algorithms.”

The Privacy Analytics Solution

A self-service data de-identification capability

Recognizing that Zephyr AI was a start-up company, Privacy Analytics offered flexible terms for deploying the HIPAA Expert Determination module of its Eclipse software in the cloud. When applied to a data set, the module transforms data to ensure that the resulting output is in full compliance with HIPAA privacy standards.

Protect privacy efficiently and in a scalable way

Privacy Analytics’ flexible, extensible platform – with technology and commercial software options – enables you to put your sensitive data to work. Watch our 1-minute video to learn more!

Importantly – and critically for Zephyr AI – the tool allows the user to experiment with different redactions to arrive at a data set that is beneath the desired risk threshold. Nathaniel Tann, a data management specialist at Zephyr AI, noted, “We have an intuitive sense of what fields might be useful from a clinical perspective, but with the Privacy Analytics software we can run multiple experiments to confirm what would really be best. In this way, we’re able to balance data utility with reducing the likelihood of re-identification.” For example, the tool can reveal how the value of the data and possibility of re-identification would be affected by retaining or removing a data field such as the patient’s age.

The Eclipse software module includes a reporting feature that gives an overview of the de-identification output to help in making transformation decisions. Samantha Pindak, also a Zephyr AI data management specialist, added, “Very quickly, we can compare different transformations applied to the same data set and choose the ones that best fit our use case and what we want to accomplish with our model training.”

To ensure that Zephyr AI’s staff is able to use the software to its fullest, Privacy Analytics delivered an intensive, multi-day training program for representatives from different functions across Zephyr AI. The training opened with an in-depth explanation of Privacy Analytics’ de-identification methodology and how it supports various privacy regulations. It then introduced the Eclipse software and included a demonstration of its functionality with test data. The final phase was a workshop tailored to Zephyr AI’s use case in which staff were able to use the software on the company’s own data in a real-life application. Nathaniel and Samantha, who attended the training, described it as “very thorough” and “good preparation for running de-identification jobs post training.”

Privacy Analytics also supports Zephyr AI’s users with ongoing coaching to answer any questions that arise as they apply the tool and to offer clarification as needed concerning the results. “Privacy Analytics has always responded promptly when we had questions, and there’s comfort in knowing that we can reach out to a strong team if we need it,” said Samantha. “It’s been a very positive experience.”

The Client Results

On-demand ability to experiment

Zephyr AI’s data management specialists have been able to create and examine different de-identified data sets without having to rely on an expensive and time intensive external service. In some instances, they have run as many as 10 iterations on the same data. After using the Eclipse software, they have provided half a dozen de-identified data sets to their machine learning team to use in model training and evaluation. Data sets like these can help Zephyr AI to develop tools that extend and improve the quality of patients’ lives and enable clinicians to confidently predict and prevent adverse events.

Amy summed up the value of working with Privacy Analytics, “The main components of value for us have been scalability and applicability. We can use a single tool to perform multiple data transformations. So, we’re not limited to a single data model, and we don’t have to pay a consulting fee every time we need to de-identify a data set. As we enter into more partnerships with other data suppliers, we’ll be using the software again – and will be confident in our ability to do so.”

Archiving / Destroying

Are you unleashing the full value of data you retain?

Your Challenges

Do you need help...


Value Retention

Client Success

Client: Comcast

Situation: California’s Consumer Privacy Act inspired Comcast to evolve the way in which they protect the privacy of customers who consent to share personal information with them.


Are you achieving intended outcomes from data?

Your Challenge

Do you need help...


Unbiased Results

Client Success

Client: Integrate.ai

Situation: Integrate.ai’s AI-powered tech helps clients improve their online experience by sharing signals about website visitor intent. They wanted to ensure privacy remained fully protected within the machine learning / AI context that produces these signals.


Do the right people have the right data?

Your Challenges

Do you need help...


Usable and Reusable Data

Client Success

Client: Novartis

Situation: Novartis’ digital transformation in drug R&D drives their need to maximize value from vast stores of clinical study data for critical internal research enabled by their data42 platform.



Are you empowering people to safely leverage trusted data?

Your Challenges

Do you need help...


Security / compliance efficiency


Client: ASCO’s CancerLinQ

Situation: CancerLinQ™, a subsidiary of American Society of Clinical Oncology, is a rapid learning healthcare system that helps oncologists aggregate and analyze data on cancer patients to improve care. To achieve this goal, they must de-identify patient data provided by subscribing practices across the U.S.


Acquiring / Collecting

Are you acquiring the right data? Do you have appropriate consent?

Your Challenge

Do you need help...


Consent / Contracting strategy

Client Success

Client: IQVIA

Situation: Needed to ensure the primary market research process was fully compliant with internal policies and regulations such as GDPR. 



Are You Effectively Planning for Success?

Your Challenges

Do you need help...


Build privacy in by design

Client Success

Client: Nuance

Situation: Needed to enable AI-driven product innovation with a defensible governance program for the safe and responsible use
of voice-to-text data under Shrems II.


Join the next 5 Safes Data Privacy webinar

This course runs on the 2nd Wednesday of every month, at 11 a.m. ET (45 mins). Click the button to register and select the date that works best for you.