Workshop Program

Part I (Morning)
Part II (Afternoon)

All talks will be located in the Annenberg Center for Information Science and Technology, Room 105.
Breaks and Lunch will be located in Annenberg's lobby and vicinity.

Introduction to Differential Privacy and to the Sessions
The Reusable Holdout: Preserving Validity in Adaptive Data Analysis [slides]
Moritz Hardt
DIscussant: Antonio Rangel [slides]
Lightning Talks
Lightning Talks
The Sample Complexity of Private Learning [slides]
Kobbi Nissim
Discussant: Pietro Perona [slides]
Privacy as a Tool for Robust Mechanism Design in Large Markets [slides]
Aaron Roth

Discussant: Leeat Yariv [slides]


Talk Abstracts:

Moritz Hardt

The Reusable Holdout: Preserving Validity in Adaptive Data Analysis

A great deal of effort has been devoted to reducing the risk of spurious scientific discoveries, from the use of holdout sets and sophisticated cross-validation techniques, to procedures for controlling the false discovery rate in multiple hypothesis testing. However, there is a fundamental disconnect between the theoretical results and the practice of science: the theory assumes a fixed collection of hypotheses to be tested, or learning algorithms to be applied, selected non-adaptively before the data are gathered, whereas science is by definition an adaptive process, in which data are shared and re-used, and hypotheses and new studies are generated on the basis of data exploration and previous outcomes.

Surprisingly, the challenges of adaptivity can be addressed using insights from differential privacy, a field of study supporting a definition of privacy tailored to private data analysis. As a corollary we show how to safely reuse a holdout set a great many times without undermining its validation power, even when hypotheses and computations are chosen adaptively. Armed with this technique, the analyst is free to explore the data ad libitum, generating and evaluating hypotheses, verifying results on the holdout, and backtracking as needed.

Joint work with Cynthia Dwork, Vitaly Feldman, Toni Pitassi, Omer Reingold and Aaron Roth.

Kobbi Nissim

The sample complexity of private learning

Learning is a task that abstracts many of the computations performed over collections of sensitive individual information, hence natural examine in conjunction with differential privacy.

Predating differential privacy, in 2005 Blum, Dwork, McSherry and Nissim showed that any concept class that is learnable in Kearns' model of statistical queries is also learnable with privacy. The concept of Private learning was later formalized by Kasiviswanathan et al in 2008 as the conjunction of PAC learning and differential privacy. They showed that any concept class |C| can be learned privately with O(log|C|) samples, via a construction that is somewhat analogous to the Occam Razor (non-private) learner.

Investigating the gap between the O(log|C|) sample complexity of private learners and Theta(VC(C)) sample complexity of non-private learners resulted in a rich picture of private learners. Under pure (epsilon, 0)-differential privacy it is now known that sample complexity of proper learners (restricted to output a hypothesis in C) can be significantly higher than that of improper learners, that the latter is fully characterized by the representation dimension of C, and that the representation dimension is generally higher than the VC dimension. Under approximate (epsilon, delta)-differential privacy the picture is currently less clear, but it is known that the sample complexity of proper learners is generally higher than the VC dimension. Other models that were investigated include active learning, semi-supervised learning, and simultaneously performing k learning tasks.

In the talk, I will define private learning, review the main results and present some of the more recent technical results. Time permitting, I will mention how some of these results relate to task of data sanitization.

Based on joint works with: Amos Beimel, Avrim Blum, Hai Brenner, Mark Bun, Cynthia Dwork, Shiva Kasiviswanathan, Homin Lee, Frank McSherry, Sofya Raskhodnikova, Adam Smith, Uri Stremmer, and Salil Vadhan.

Aaron Roth

Privacy as a Tool for Robust Mechanism Design in Large Markets

Differential privacy is an algorithmic stability guarantee that can be applied to derive asymptotically dominant strategy truthful mechanisms for interesting problems in large markets. Example applications include: computing a school-optimal stable matching, while incentivizing truthful reporting for the students, maximizing welfare in any problem in which agents have convex utility functions using anonymous item pricings (rather than non-anonymous, bundle pricings that arise from the VCG mechanism), and solving the equilibrium selection problem using weak "mediators" in settings of incomplete information.

For some of these problems, there are analogous "large market" results in the economic literature, but the applications of differential privacy tend to give results with the following fundamental difference. In the economics literature, it is typically assumed that agent types are drawn from a prior distribution (which often results in replication economies -- i.e. economies in which there are many individuals who have each type), and the incentive properties are studied in settings in which an exact outcome is realized -- i.e. an exact walrasian equilibrium, or an exactly school optimal stable matching. In contrast, differentially private mechanisms compute approximate versions of these outcomes, but can guarantee individual incentives -in the worst case-, without making any assumptions about how type profiles arise. This allows for truthfulness even in settings in which there are individuals with highly unusual type profiles. The differential privacy approach also can give finite population guarantees even in relatively "small" economies, in which the number of agents can be e.g. much smaller than the number of types, or the number of available actions.