Rocket Science Happens Every Day
On April 20, 2016, IT News published comments from Australian Information Commissioner Timothy Pilgrim in their article: Pilgrim warns data de-identification is ‘rocket science.’ Pilgrim warns that data de-identification is an elusive goal and that unless it meets strict protection demands, it should not be considered. He compares it to ‘rocket science’, taking from the adage that when something is simple, “it’s not rocket science.”
Well Mr. Pilgrim, you are correct. Data de-identification is a science and should be done well. To claim data has been de-identified when identifying details remain in the data is irresponsible and criminal. But here’s the thing: rocket science happens every day. It just happens to be handled by actual rocket scientists. The same can be said for data de-identification. Leave it to scientists who understand exactly what it is and it isn’t.
To make his case, Pilgrim pointed out to some high profile cases in the US where researchers have been able to re-identify patients. These are often sited cases: one is in the study done by Montejoye et al. The problem with this study is that the de-identification process used was not actually done well. Furthermore, when de-identification was applied to a whole dataset verses the sample [used by Montejoye et al.], anonymity improved exponentially. It’s easy to make a case for de-identification being ineffective when using a poorly done example.
Already we see that this ‘rocket science’ is changing the future. The American Society of Clinical Oncology launched CancerLinQ, a learning health system intended to connect and analyze real-world cancer care data. Using properly de-identified data, the team at CancerLinQ is able to safely share important, sensitive data with the oncology community. With CancerLinQ, a physician will be able to search through de-identified data on similar patients to determine how others have responded to the planned treatment. In other cases, this data and analytics can help inform the patient’s treatment decision when medical literature is not yet available.
Real de-identification scientists, with years of experience, know how to make people in the data disappear. They incorporate risk-measurements, permitting them to see how much de-identification is needed, and then verify that they have implemented the right amount of de-identification. This process, our process, has led to zero re-identifications.
Mr. Pilgrim was almost right; de-identification is very close to rocket science. So leave it in the hands of scientists and experts. As CancerLinQ demonstrates, they are doing great things.
- Can you comply your way to greatness?November 21, 2019
- When to Integrate Anonymization of Documents and DataSeptember 26, 2019
- Deep-Diving into Re-identification: Perspectives On An Article In Nature CommunicationsSeptember 26, 2019
- Learning at Scale: Anonymizing Unstructured Data using AI/MLSeptember 26, 2019
- Early Impact of Health Canada’s New GuidelinesJune 21, 2019
- GDPR and The Future of Clinical Trials Data SharingMarch 18, 2019
- Advancing Principled Data Practices in Support of Emerging TechnologiesMarch 15, 2019
- “Zero Risk Does Not Exist”February 7, 2019
- Is Anonymization Possible with Current Technologies?January 9, 2019
- Comparing the benefits of pseudonymisation and anonymisation under the GDPRDecember 20, 2018