Building a De-identification Pipeline to Support RWE
To use patient-level data for RWE initiatives, biopharmaceutical companies must first obtain the patient’s consent that their data can be shared for secondary purposes; otherwise, they must de-identify the data. While most patients are willing to share their data for use in research, they also have an expectation that their privacy will be maintained. As a result, de-identification in some form is recommended, even if consent is obtained.
Furthermore, because biopharmaceutical companies operate in the global marketplace, it is prudent for them to follow standards and guidelines that pertain to the use and sharing of healthcare data. A number of respected and internationally recognized groups have published such guidance in recent years. Industry associations like the Health Information Trust Alliance (HITRUST) and government bodies like the Institute of Medicine (IOM) in the U.S. and the Canadian Council of Academies have all endorsed the use of a risk-based methodology to de-identify healthcare data.
Creating the Pipeline
Establishing a data de-identification pipeline helps apply risk-based de-identification automatically and consistently to data sets that are being continuously updated. Companies that have established a data warehouse for RWE purposes need to continuously refresh the data within it so that the information remains current. This permits analysts and researchers to have timely access to the most recently available data in a de-identified format, a situation that would be nearly impossible using manual processes. A de-identification pipeline pulls in data from the source, an EMR database for example, on a regular basis (e.g., monthly or quarterly). At this point, the automated de-identification engine would perform a series of steps to manipulate the database variables, reducing the risk of re- identification and protecting the patient’s privacy.
As with any risk-based de-identification approach, the first step is to assess the risk to privacy by looking at who will have access to the data and what security and privacy controls are in place to protect it from unauthorized access. Next, we need to classify the variables in the data that contain keys to an individual’s identity. While a data warehouse may consist of hundreds of data tables with thousands of variables, only some of these are relevant from a privacy perspective. The final step is to map the data. This ensures that the de-identified data maintains the integrity of the original database. With the work of the de-identification engine complete, the de-identified RWD can be exported to the data warehouse. There analysis can be run. The use of the pipeline limits the risk of a successful re-identification attack on the data warehouse since the warehouse only ever accepts data that is de-identified.
Automation, Yes – But Also Compliance
Establishing a de-identification pipeline not only lets biopharmaceutical companies automate the de-identification process when refreshing the content of their data warehouse, it also helps them to operate in a manner that is compliant with privacy legislation. By engaging with experts in the field of data de-identification, a de-identification pipeline can be implemented that follows legislation, like the HIPAA Privacy Rule. In the event of a data breach, the ability to show practices that comply with the legislation provides organizations with a defensible position.
Almost there – our last piece in our series on RWE: Final Thoughts on RWE.
- Can you comply your way to greatness?November 21, 2019
- When to Integrate Anonymization of Documents and DataSeptember 26, 2019
- Deep-Diving into Re-identification: Perspectives On An Article In Nature CommunicationsSeptember 26, 2019
- Learning at Scale: Anonymizing Unstructured Data using AI/MLSeptember 26, 2019
- Early Impact of Health Canada’s New GuidelinesJune 21, 2019
- GDPR and The Future of Clinical Trials Data SharingMarch 18, 2019
- Advancing Principled Data Practices in Support of Emerging TechnologiesMarch 15, 2019
- “Zero Risk Does Not Exist”February 7, 2019
- Is Anonymization Possible with Current Technologies?January 9, 2019
- Comparing the benefits of pseudonymisation and anonymisation under the GDPRDecember 20, 2018