We typically have seminars on Wednesdays at noon in Malone 228. All seminar announcements will be sent to the theory mailing list.
SPEAKER: Ilya Razenshteyn (MIT)
TITLE: Sketching and Embedding are Equivalent for Norms
ABSTRACT: Imagine the following communication task. Alice and Bob each have a point from a metric space. They want to transmit a few bits and decide whether their points are close to each other or are far apart. Of particular interest are sketching protocols: Alice and Bob both compute short summaries of their inputs and then a referee, given these summaries, makes the decision; sketches are very useful for the nearest neighbor search, streaming, randomized linear algebra etc. Indyk (FOCS 2000) showed that for the l_p spaces with 0 < p <= 2 the above problem allows a very efficient sketching protocol. Consequently, any metric that can be mapped into the l_p space with all the distances being approximately preserved has a good protocol as well.
I will show that for normed spaces (a very important class of metric spaces) embedding into l_p is the only possible technique for solving the communication problem. Slightly more formally, we show that any normed space that admits a good communication (in particular, sketching) protocol for distinguishing close and far pairs of points embeds well into l_p with p being close to 1. The proof uses tools from communication complexity and functional analysis.
As a corollary, we will show communication lower bounds for the planar Earth Mover’s Distance (minimum-cost matching metric) and for the trace norm (the sum of the singular values of a matrix) by deriving them from the (known) non-embeddability theorems and (the contrapositive of) our result.
The talk is based on a joint paper with Alexandr Andoni and Robert Krauthgamer (arXiv:1411.2577).
Title: Nonconvex Statistical Optimization: Algorithm and Model-based Optimization Theory
Abstract: Nonconvex regularized M-estimators have been widely applied to high dimensional data analysis. Existing statisticaltheory has established their statistical properties
in high dimensions only when the global optimum or certain local optimum can be obtained. Though practitioners have proposed numerous heuristic computational algorithms for
solving these nonconvex optimization problems, existing optimization theory does not necessarily guarantee these algorithms to obtain the global or local optima with
desired statistical properties in polynomial time. Therefore, there exists a significant gap between theory and practice: What is actually computed is not the same as what has
been proved. To bridge this gap, we propose a new generation of nonconvex statistical optimization algorithms and model-based theory, which incorporate the statistical thinking
into modern optimization. When developing computational algorithms, we take underlying sparse statistical models into consideration. Particularly, for nonconvex regularized
M-estimation problems, our proposed algorithms devise three different optimization schemes, under which the solutions achieved by the optimization algorithm always falls within
a restricted sparse set. Thus the nonconvex objective function mimics the behavior of a strongly convex function, which eventually allows our proposed algorithms to obtain an
estimator with the desired optimal statistical properties in polynomial time with high probability
Speaker: Aravind Srinivasan
Affiliation: University of Maryland
Title: Properties and Generalizations of the Moser-Tardos Process
Abstract: Moser and Tardos have developed an elegant and powerful algorithmic version of the Lovasz Local Lemma. Since the publication of this work, it has become apparent that this algorithm has some very interesting properties and extensions, and can be viewed as a stochastic process of independent interest. I will survey some of these, especially the ideas of “partial resampling” and the “LLL-distribution” (the properties of the output distribution of Moser-Tardos). I will draw from joint works with Haeupler and Saha, with Harris, and with Chen and Harris.
Title: Fault Resilient Graph Structures
Speaker: Merav Parter (MIT)
Abstract:
A fault-tolerant (FT) structure for a network is required to continue functioning following the failure of some of the network’s edges or vertices. Fault-resilience can be introduced into the network in several different ways. This talk will focus on a notion of fault-tolerance whereby the structure at hand is augmented (by adding to it various components) so that subsequent to the failure of some of the network’s vertices or edges, the surviving part of the structure is still operational. As this augmentation carries certain costs, it is desirable to minimize the number of added components.We will revise several constructions of sparse fault tolerant structures such as FT spanner and FT shortest-path trees. I will also present a new model for fault resilient network structures that mix two orthogonal protection mechanisms: (a) backup, namely, augmenting the structure with many (redundant) low-cost and fault-prone components, and (b) reinforcement, namely, acquiring high-cost but fault-resistant components. A trade-off between these two mechanisms will be presented in a concrete setting of shortest-path trees.
Talk Title:
Dependent Random Graphs and Multiparty Pointer Jumpin
Abstract:
We initiate a study of a relaxed version of the standard Erdos-Renyi random graph model, where each edge may depend on a few other edges. We call such graphs *dependent random graphs* and give tight bounds on the clique and chromatic numbers of such graphs. Surprisingly, some of the bounds in the standard random graph model continue to hold in this relaxed setting. For example, the size of the largest clique in a dependent random graph remains roughly log(n)/log(1/p).
As an application, we give a new upper bound on communication complexity of the Multiparty Pointer Jumping (MPJ) problem in the number-on-the-forehead (NOF) model. NOF communication lies at the current frontier of our understanding of communication complexity, and MPJ is one of the canonical problems in this setting. Furthermore, sufficiently strong bounds for MPJ would have important consequences for circuit complexity.
No prior knowledge is assumed aside from basic discrete probability. I hope to motivate both random graphs and the application and demonstrate why NOF communication is an important active research area.
This talk is based on research that is joint work with Mario Sanchez.
Bio:
Joshua Brody received a bachelor’s degree in Mathematics/Computer Science from Carnegie Mellon University, a Master’s Degree in Computer Science from New York University, and a PhD in Computer Science from Dartmouth College. Prior to graduate school, he served in the Peace Corps teaching high school mathematics in Burkina Faso, West Africa. Dr. Brody is currently an Assistant Professor in the Computer Science department at Swarthmore College. His primary research area is communication complexity. Additional research interests include several areas of theoretical computer science, including streaming algorithms, property testing, and data structures.
In a t-out-of-n robust secret sharing scheme, a secret message is shared among n parties who can reconstruct the message by combining their shares. An adversary can adaptively corrupt up to t of the parties, get their shares, and modify them arbitrarily. The scheme should satisfy privacy, meaning that the adversary cannot learn anything about the shared message, and robustness, meaning that the adversary cannot cause the reconstruction procedure to output an incorrect message. Such schemes are only possible in the case of an honest majority, and here we focus on unconditional security in the maximal corruption setting where n=2t+1.In this scenario, to share an m-bit message with a reconstruction failure probability of at most 2−k, a known lower-bound shows that the share size must be at least m+k bits. On the other hand, all prior constructions have share size that scales linearly with the number of parties n, and the prior state-of-the-art scheme due to Cevallos et al. (EUROCRYPT ’12) achieves m+O˜(k+n).
In this work, we construct the first robust secret sharing scheme in the maximal corruption setting with n=2t+1, that avoids the linear dependence between share size and the number of parties n. In particular, we get a share size of only m+O˜(k) bits. Our scheme is computationally efficient and relies on approximation algorithms for the minimum graph bisection problem.
This talk is based on a Eurocrypt’2016 paper with authors: Allison Bishop and Valerio Pastro and Rajmohan Rajaraman and Daniel Wichs.
Details TBA
Speaker: Samir Khuller
Affiliation: University of Maryland College Park
Title: To do or not to do: scheduling to minimize energy
Abstract:
Traditional scheduling algorithms, especially those involving job scheduling
on parallel machines, make the assumption that the machines are always
available and try to schedule jobs to minimize specific job related metrics.
Since modern data centers consume massive amounts of energy, we consider job
scheduling problems that take energy consumption into account, turning
machines off, especially during periods of low demand. The ensuing problems
relate very closely to classical covering problems such as capacitated set
cover, and we discuss several recent results in this regard.
(This is talk covers two papers, and is joint work with Jessica Chang, Hal Gabow
and Koyel Mukherjee.)
Speaker: Justin Hsu
Affiliation: University of Pennsylvania
Speaker: David Harris
Affiliation: University of Maryland College Park
Title: Improved parallel algorithms for hypergraph maximal independent set
Abstract:
Finding a maximal independent set in hypergraphs has been a long-standing algorithmic challenge. The best parallel algorithm for hypergraphs of rank $r$ was developed by Beame
and Luby (1990) and Kelsen (1992), running in time roughly $(\log n)^{r!}$. This is in RNC for fixed $r$, but is still quite expensive. We improve on the analysis of Kelsen to
show that (a slight variant) of this algorithm runs in time $(\log n)^{2^r}$. We derandomize this algorithm to achieve a deterministic algorithm running in time $(\log
n)^{2^{r+3}}$ using $m^{O(1)}$ processors.
Our analysis can also apply when $r$ is slowly growing; using this in conjunction with a strategy of Bercea et al. (2015) gives a deterministic algorithm running in time
$\exp(O(\log m/\log \log m))$. This is faster than the algorithm of Bercea et al, and in addition it is deterministic. In particular, this is sub-polynomial time for graphs with
$m \leq n^{o(\log \log n)}$ edges.
Speaker: Adam Smith
Affiliation: Penn State University.
Title: Privacy, Information and Generalization
Abstract:
Consider an agency holding a large database of sensitive personal
information — medical records, census survey answers, web search
records, or genetic data, for example. The agency would like to
discover and publicly release global characteristics of the data (say,
to inform policy or business decisions) while protecting the privacy
of individuals’ records. I will begin by discussing what makes this
problem difficult, and exhibit some of the nontrivial issues that
plague simple attempts at anonymization and aggregation. Motivated by
this, I will present differential privacy, a rigorous definition of
privacy in statistical databases that has received significant
attention.
In the second part of the talk, I will explain how differential
privacy is connected to a seemingly different problem: “adaptive data
analysis”, the practice by which insights gathered from data are used
to inform further analysis of the same data sets. This is increasingly
common both in scientific research, in which data sets are shared and
re-used across multiple studies. Classical statistical theory assumes
that the analysis to be run is selected independently of the data.
This assumption breaks down when data re re-used; the resulting
dependencies can significantly bias the analyses’ outcome. I’ll show
how the limiting the information revealed about a data set during
analysis allows one to control such bias, and why differentially
private analyses provide a particularly attractive tool for limiting
information.
Based on several papers, including recent joint works with R. Bassily,
K. Nissim, U. Stemmer, T. Steinke and J. Ullman (STOC 2016) and R.
Rogers, A. Roth and O. Thakkar (FOCS 2016).
Bio:
Adam Smith is a professor of Computer Science and Engineering at Penn
State. His research interests lie in data privacy and cryptography,
and their connections to machine learning, statistics, information
theory, and quantum computing. He received his Ph.D. from MIT in 2004
and has held visiting positions at the Weizmann Institute of Science,
UCLA, Boston University and Harvard. In 2009, he received a
Presidential Early Career Award for Scientists and Engineers (PECASE).
In 2016, he received the Theory of Cryptography Test of Time award,
jointly with C. Dwork, F. McSherry and K. Nissim.
Speaker: Justin Thaler
Affiliation: Georgetown University
Title: Approximate Degree, Sign-Rank, and the Method of Dual Polynomials
Abstract:
The eps-approximate degree of a Boolean function is the minimum degree of a real polynomial that point-wise approximates f to error eps. Approximate degree has wide-ranging applications in theoretical computer science, yet our understanding of approximate degree remains limited, with few general results known.
The focus of this talk will be on a relatively new method for proving lower bounds on approximate degree: specifying dual polynomials, which are dual solutions to a certain linear program capturing the approximate degree of any function. I will describe how the method of dual polynomials has recently enabled progress on a variety of open problems, especially in communication complexity and oracle separations.
Joint work with Mark Bun, Adam Bouland, Lijie Chen, Dhiraj Holden, and Prashant Nalini Vasudevan
Speaker: Jalaj Upadhyay
Affiliation: Penn State University
Title: Fast and Space-Optimal Differentially-Private Low-Rank Factorization in the General Turnstile Update Model
Abstract:
The problem of {\em low-rank factorization} of an mxn matrix A requires outputting a singular value decomposition: an m x k matrix U, an n x k matrix V, and a k x k diagonal
matrix D) such that U D V^T approximates the matrix A in the Frobenius norm. In this paper, we study releasing differentially-private low-rank factorization of a matrix in
the general turnstile update model. We give two differentially-private algorithms instantiated with respect to two levels of privacy. Both of our privacy levels are stronger
than privacy levels for this and related problems studied in previous works, namely that of Blocki {\it et al.} (FOCS 2012), Dwork {\it et al.} (STOC 2014), Hardt and Roth
(STOC 2012, STOC 2013), and Hardt and Price (NIPS 2014). Our main contributions are as follows.
1. In our first level of privacy, we consider two matrices A and A’ as neighboring if A – A’ can be represented as an outer product of two unit vectors. Our private algorithm
with respect to this privacy level incurs optimal additive error. We also prove a lower bound that shows that the space required by this algorithm is optimal up to a
logarithmic factor.
2. In our second level of privacy, we consider two matrices as neighboring if their difference has the Frobenius norm at most 1. Our private algorithm with respect to this
privacy level is computationally more efficient than our first algorithm and incurs optimal additive error.
Speaker: Mohammad Mahmoody, Assistant Professor, University of Virginia
Abstract: Indistinguishability Obfuscation (IO) has recently emerged as a central primitive in cryptography, enabling many heretofore out-of-reach applications. However, currently all known constructions of IO are based on multilinear maps which are poorly understood. With the hope of basing IO on more standard assumptions, in this work we ask whether IO could be based on any of powerful (and recently realized) encryption primitives such as attribute-based/predicate encryption, fully homomorphic encryption, and witness encryption. What connects these primitives is that they are zero-one: either the message is revealed fully by the “right key” or it remains completely hidden.
Our main result is a negative one: we prove there is no black-box construction of IO from any of the above list of “zero-one” encryptions. We note many IO constructions are in fact non-black-box and e.g., results of Anath-Jain’15 and Bitansky-Vaikuntanathan’15 of basing IO on functional encryption is non-black-box. In fact, we prove our separations in an extension of the black-box framework of Impagliazzo-Rudich’89 and Reingold-Trevisan-Vadhan’04 which allows such non-black-box techniques as part of the model by default. Thus, we believe our extended model is of independent interest as a candidate for the new “standard” for cryptographic separations.
Speaker: Avishay Tal
Affiliation: IAS
Title:Time-Space Hardness of Learning Sparse Parities
Abstract:
How can one learn a parity function, i.e., a function of the form $f(x) = a_1 x_1 + a_2 x_2 + … + a_n x_n (mod 2)$ where a_1, …, a_n are in {0,1}, from random labeled examples? One approach is to gather O(n) random labeled examples and perform Gaussian-elimination. This requires a memory of size O(n^2) and poly(n) time. Another approach is to go over all possible 2^n parity functions and to verify them by checking O(n) random examples per each possibility. This requires a memory of size O(n), but O(2^n * n) time. In a recent work, Raz [FOCS, 2016] showed that if an algorithm has memory of size much smaller than n^2, then it has to spend exponential time in order to learn a parity function. In other words, fast learning requires a good memory. In this work, we show that even if the parity function is known to be extremely sparse, where only log(n) of the a_i’s are nonzero, then the learning task is still time-space hard. That is, we show that any algorithm with linear size memory and polynomial time fails to learn log(n)-sparse parities. Consequently, the classical tasks of learning linear-size DNF formulae, linear-size decision trees, and logarithmic-size juntas are all time-space hard. Based on joint work with Gillat Kol and Ran Raz.
SPEAKER: Yevgeniy Dodis, New York University
TITLE: Fixing Cracks in the Concrete: Random Oracles with Auxiliary Input, Revisited
ABSTRACT: We revisit security proofs for various cryptographic primitives in the random oracle model with auxiliary input (ROM-AI): a (computationally unbounded) attacker A can compute arbitrary S bits of leakage z=z(O) about the random oracle O before attacking the system, and then use additional T oracle queries to O during the attack. This model was explicitly studied by Unruh in 2007 (CRYPTO 2007), but dates back to the seminal paper of Hellman in 1980 about time-space tradeoffs for inverting random functions, and has natural applications in settings where traditional random oracle proofs are not useful: (a) security against non-uniform attackers; (b) space-time tradeoffs; (c) security against preprocessing; (d) resilience to backdoors in hash functions. We obtain a number of new results about ROM-AI, but our main message is that ROM-AI is the “new cool kid in town”: it nicely connects theory and practice, has a lot of exciting open questions, leads to beautiful math, short definitions, elegant proofs, surprising algorithms, and is still in its infancy. In short, you should work on it! Joint Work with Siyao Guo and Jonathan Katz.
Speaker: Ori Rottenstriech, Princeton Title: Novel Approaches to Challenges in Emerging Network Paradigms Abstract: SDN (Software defined networking) and NFV (Network Function Virtualization) are two emerging network paradigms that enable simplification, flexibility and cost-reduction in network management. We believe that the new paradigms will lead to many interesting research questions. We study how to rely on them for dealing with two common network challenges. We consider switches that imply network policies in SDN through rule matching tables of limited size. We study the applicability of rule caching and lossy compression to create packet classifiers requiring much less memory than the theoretical size limits of semantically-equivalent representations. We would like to find limited-size classifiers that can correctly classify a high portion of the traffic. We address different goals with unique settings and explain how to deal with the traffic that cannot be classified correctly. Network functions such as load balancing and deep packet inspection are often implemented in dedicated hardware called middleboxes. Those can suffer from temporary unavailability due to misconfiguration or software and hardware malfunction. We suggest to rely on virtualization for planning and deploying backup schemes for network functions. The schemes guarantee high levels of survivability with significant reduction in resource consumption. We discuss different goals that network designers should take into account. We describe a graph theoretical model for finding properties of efficient solutions and developing algorithms that can build them. Bio: Ori Rottenstriech is a postdoctoral research associate at the Department of Computer Science, Princeton University. He received his Ph.D. from the Electrical Engineering department of the Technion. His research interests include the intersection of computer networking and algorithms.