Comments to the U.S. Office of Management and Budget, Re: New Techniques and Methodologies Based on Combining Data from Multiple Sources
On March 13, 2018, members of the Privacy Tools Project submitted comments to the U.S. Office of Management and Budget (OMB), responding to the request for information on new techniques and methodologies based on combining data from multiple sources.
Advances in the scientific study of privacy in the fields of theoretical computer science, statistics, and information science over the last two decades have demonstrated the inadequacy of widely-used privacy protection measures and other challenges related to managing information privacy in the modern world. A fundamental challenge revealed by modern privacy research is that every release of data, if it has any utility, inevitably and cumulatively, regardless of how it is protected, leaks some private information. In other words, there is no “free lunch” when using information about people; useful statistics must always be purchased with privacy loss. These advances also point to the benefits of using more recent scientifically-grounded privacy measures, as they can enable analysis of data that would have otherwise been withheld or redacted. Furthermore, such approaches can be used as tools for ensuring the validity of statistical and machine learning analyses, as they can be used to protect against overfitting.
In particular, failures of traditional privacy-preserving approaches to control disclosure risks in statistical publications have motivated computer scientists to develop a strong, formal approach to privacy. The main concept currently under study is differential privacy, introduced by Dwork, McSherry, Nissim, and Smith in 2006. Differentially private tools can help enable researchers, policymakers, and businesses to analyze and share sensitive data while providing strong guarantees of privacy to the individuals in the data. Differential privacy is supported by a rich and rapidly advancing theory that enables one to reason with mathematical rigor about privacy risk. Adopting this formal approach to privacy yields a number of practical benefits for users. Systems that adhere to strong formal definitions like differential privacy provide protection that is robust to a wide range of potential privacy attacks, provides provable privacy guarantees with respect to the cumulative risk from successive data releases, has the benefit of transparency (as it is not necessary to maintain secrecy around a differentially private computation or its parameters), and can be used to provide broad, public access to data or data summaries in a privacy-preserving way.
Regulations and policies for privacy protection should evolve in light of scientific advances in privacy. In particular, expectations that traditional disclosure control techniques such as de-identification provide sufficient privacy protection are no longer supported by the legal or scientific literature. A key insight from the scientific study of privacy is that data cannot be analyzed or released without some leakage of information about individuals. Differential privacy quantifies this leakage and, furthermore, is equipped with tools for bounding the accumulation of multiple releases. It is a matter of policy to set a limit for privacy leakage (referred to as the “privacy budget”) and decide how to act once the budget is exhausted. Policymakers should accordingly consider the importance of setting and monitoring a privacy budget and develop policies specifying how the privacy budget should be used, such as how to choose between analyses to be performed if the privacy budget cannot allow all desired analyses.