Turn Data Assets into Business Opportunity Under CCPA

by Luk Arbuckle – Chief Methodologist

Business Problem

How the CCPA is changing the legislative landscape

Since the California Consumer Privacy Act was signed in 2018, multiple other states have proposed or introduced similar, state-level privacy bills, creating an air of general uncertainty around data legislation across the U.S. and around the world. Companies who collect and use data from individuals in multiple states are currently without guidance on what level of data transformation will be required to ensure compliance nationwide.

Who must comply with the CCPA?

At present, very few affected companies are reporting full CCPA compliance. It’s safe to assume there will be a push from more organizations to get compliant before the impending start date of January 1, 2020. (Or, at the very latest, the actual effective date of no later than July 1, 2020.)

With fines ranging from $2,500 to $7,500 USD per violation, CCPA regulations will apply to any company or person that does business in the state of California that:
• Has more than $25 million in annual revenue;
• Collects information on 50,000+ people (or devices); OR
• Makes 50% or more of its annual revenue from selling personal information.

How will the CCPA affect my company’s operations?

Broadly speaking, the CCPA applies only to commercial activities involving identifiable consumer or household information. Properly and fully de-identified data will not be subject to the regulations adopted due to CCPA (as the data will no longer connect to particular consumers or households).

If you are already employing a solution that appropriately de-identifies your data, from a use and sharing perspective, the CCPA shouldn’t affect you, no matter which state(s) you do business in or from. In addition, the CCPA excludes certain categories of medical information, as well as data related to health that are collected for clinical trials.

Business Solution

The Identifiability SpectrumAt Privacy Analytics, we think in terms of what we call the Identifiability Spectrum. To understand this, you need to know these two key terms:

Direct Identifiers – Information that alone directly links the data with a specific person, and is not analytically useful.

  • Name, Address, Social Security number, Unique Identifiers, etc.

Indirect Identifiers – Information that, in combination with other facts, can indirectly link the data with a specific person, and is analytically useful.

  • Date of Birth, Gender, Geography, Date of Service, etc.

The Identifiability Spectrum, then, defines how much the data in question have been treated or managed, as follows:

1. Identified – Raw personal data with unaltered direct identifiers included, so that the data are of particular consumers or households.

2. Reduced Identifiability – Direct identifiers have been masked or altered, but indirect identifiers make it easy to connect to a particular consumer or household.

3. Minimal Identifiability – Direct and some indirect identifiers have been masked or altered so that there are no demographically unique individuals, making it harder to connect to a consumer or household.

4. Non-identifiable – Direct and indirect identifiers have been masked or altered to ensure there is no reasonable possibility to connect to a consumer or household (that is, the data are de-identified).

Much like the GDPR, the obligations under the CCPA are different depending on a variety of factors surrounding the type, use, and potential to connect to a particular consumer or household of a given set of data or data asset.

Consider a contextual, risk-based approach to transforming data

At Privacy Analytics, we recommend an approach to transforming sensitive data called risk-based de-identification, and have spearheaded the development of a widely-recognized and peer-reviewed methodology. It’s a context-driven, statistical method that ensures you are neither over nor under de-identifying your data, giving it the highest possible level of utility and the lowest level of risk.

When taking steps toward fully leveraging data for business potential under the CCPA, you reduce the level of identifiability in personal data as you move from basic needs such as security to more advanced business use cases like leveraging sensitive data for innovation.

De-identification required for different business contexts

In the diagram below, we look at the connection between the level of identifiability and specific business needs. Typically, the level of data transformation required increases proportionally with the complexity of business need. All of which helps you protect the identity of the people represented in your data sets and, ultimately, your organization’s reputation.

Security – Data that have reduced identifiability meet basic regulatory compliance and provide simple privacy protection to mitigate the impact of a potential data privacy breach. These data require additional technical and organizational controls since they are easy to connect to a particular consumer or household. They are generally not considered appropriate to use for anything other than the purpose for which they were originally collected.

Efficiency – Data that have minimal identifiability may be appropriately reused in-house, where lawful, to make processes, software, or equipment more efficient and thereby lower operational costs. A proper evaluation of the specific circumstances of each use case is required to ensure it is appropriate.

Innovation – Data that are non-identifiable can be used in-house or with a partner to create net new processes, software, or equipment, sometimes employing machine learning and/or artificial intelligence. The CCPA excludes this data from its application, so that obligations such as consumer rights no longer apply.Revenue – Data that have minimal identifiability can help drive revenue by making an organization more efficient through their in-house reuse of data. Data that are non-identifiable can help drive revenue through broad innovation strategies, including sharing or pooling arrangements that may bring in their own separate revenue streams. Both approaches can enable the safe and responsible reuse of data by effectively managing the level of identifiability for appropriate use cases.
What is your organization doing to prepare for the CCPA?
As you plan and execute your approach to the CCPA—assessing the risk of connecting a particular consumer or household in your company’s existing process, flow, data, and use case—we want to hear from you.

Tell us about your data challenges, the way your uses for data evolve in the light of new regulations, and what you think about applying the concepts presented in this article at your organization.

If you have any questions about the CCPA or anything to do with data privacy, the spectrum of identifiability, or de-identification, please feel free to reach out. If you choose to do so, you may contact us here.

About the author
Advising enterprises topping the Fortune 500, Luk Arbuckle has co-authored books, scholarly journal articles and patents on assessing the risk of identifying individuals in data and de-identification technology. He brings a decade of experience in the field of de-identification. Luk is co-author of Building an Anonymization Pipeline (O’Reilly Media, 2020) and Anonymizing Health Data: Case Studies and Methods to Get You Started (O’Reilly Media, 2013/2014).

Luk is the Chief Methodologist at Privacy Analytics. He was formerly Director of Technology Analysis at the Office of the Privacy Commissioner of Canada and Research Manager and Data Scientist at the Children’s Hospital of Eastern Ontario (CHEO) Research Institute in Ottawa. He holds multiple graduate degrees and was awarded numerous scholarships and bursaries.

Free Webinar: De-Identification 101

Join Privacy Analytics for a high level introduction of de-identification and data masking.
Watch now

Free Download: De-Id 101

You have Successfully Subscribed!