Sensitive data can be reused in many ways to improve healthcare services, uncover new insights and opportunities that can influence healthcare strategies, and develop data products that address societal health needs. Health data can be particularly sensitive as it can reveal a lot about an individual’s medical history and lifestyle.
There are many dimensions to the safe and responsible reuse of data, which can also be thought of in terms of defense in depth, ie, protecting data from unauthorized access and misuse through layers of administrative and technical controls. Technical privacy models are one such control as they are used to assess the risk of disclosure and determine appropriate data transformations that will eliminate those risks.
Differential privacy is a technical privacy model that protects individuals by requiring that the information contributed by any individual does not significantly affect the output. More specifically, differential privacy is a mathematical property that defines an adjustable information limit.
Advance the safe sharing of data
Find out how differential privacy and risk metrics are combined to effectively protect data. Read this 7-page white paper.
Differential Privacy and Risk Metrics
By augmenting differential privacy with a framework of risk metrics and other associated benchmarks, we can enable safe and responsible data sharing. Risk metrics are essential tools as they allow organizations to measure and manage the potential risks associated with various data sharing strategies.
Mechanisms that are differentially private protect outputs (queries or datasets) by incorporating a level of uncertainty through randomness (eg, noise injection, permutation, shuffling). The randomness produces indistinguishable outputs up to a defined information limit.
The privacy budget is a form of information limit on how much can be inferred or learned from a dataset and is governed by the sensitivity of a dataset, or the individuals in the dataset. Therefore, the same privacy budget can result in different levels of protection for different datasets due to the differences in their level of sensitivity.
Benchmarks, representing statistical thresholds, ensure there is an objective way to assess safe data sharing. For example, a minimum group size of 10 requires that all information for a group of data subjects with the same set of identifying values be represented by at least 10 individual contributors, such as 10 women aged 45.
By leveraging minimum group sizes to inform the level of randomness required, the privacy budget of differential privacy can be determined in such a way to meet existing benchmarks and eliminate the probability of singling out an individual’s contribution of data. Incorporating the notion of group sizes as a risk threshold will determine the privacy budget for that dataset.
By construction, meeting the definition of differential privacy will limit the information that can be inferred or learned from the dataset. Risk metrics alleviate concerns that exists with differential privacy of variable protection across datasets, for the same privacy budget, while ensuring that individuals are hidden in the data and analytics while providing a form of plausible deniability.
Conclusions
The safe and responsible sharing of patient health data is enabled through the use of established benchmarks and emerging technologies. A variety of data sharing scenarios have produced strong precedents for reusing sensitive data. The use of differential privacy is emerging as a common theme for data protection, one that can be made consistent with existing risk metrics in producing safe data.
Contact us to learn more about the practical advancements in using differential privacy, and how our design and engineering services can help you enable the safe and responsible uses of protected data. From the design of perturbation algorithms to system engineering of platforms or environments that protect the lifecycle of data, our experts can help.