Title: Correctness Protection via Differential Privacy
Speaker: Aaron Roth
Affiliation: UPenn
Abstract:
False discovery is a growing problem in scientific research. Despite
sophisticated statistical techniques for controlling the false discovery rate
and related statistics designed to protect against spurious discoveries, there
is significant evidence that many
published scientific papers contain incorrect conclusions.
In this talk we consider the role that adaptivity has in this problem. A
fundamental disconnect between the theorems that control false discovery rate
and the practice of science is that the theorems assume a fixed collection of
hypotheses to be tested, selected non-adaptively before the data is gathered,
whereas science is by definition an
adaptive process, in which data is shared and re-used, while hypotheses are
generated after seeing the results of previous tests.
We note that false discovery cannot be prevented when a substantial number of
adaptive queries are made to the data, and data is used naively — i.e. when
queries are answered exactly with their empirical estimates on a given finite
data set. However we show that remarkably, there is a different way to evaluate
statistical queries on a data set that allows even an adaptive analyst to make
exponentially many queries to the data set, while guaranteeing that with high
probability, all of the conclusions he draws generalize to the underlying
distribution. This technique counter-intuitively involves actively perturbing
the answers given to the data analyst, using techniques developed for privacy
preservation — but in our application, the perturbations are added entirely to
increase the utility of the data.
Joint work with Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann
Pitassi, and Omer Reingold.