Tentative schedule:
All talks will take place in Hodson Hall, auditorium 110 (on the map ). Poster session will take place in the lobby on the second floor. Break out rooms: 301, 305, and 313 on Thursday; 301, 313, and 315 on Friday and Saturday. Internet access is provided through the open WiFi hotspot "JHGuestnet".
Abstracts:
Local Algorithms for Sparse Spanning Graphs
Ronitt Rubinfeld, Massachusetts Institute of Technology, Tel Aviv University
Constructing a spanning tree of a graph is one of the most basic tasks in graph theory.
Motivated by several recent studies of local graph algorithms, we consider the following variant
of this problem. Let G be a connected bounded-degree graph. Given an edge e in G we would
like to decide whether e belongs to a connected subgraph G' consisting of (1 + eps)n edges (for
a prespecifi ed constant eps in (0, 1)), where the decision for di fferent edges should be consistent
with the same subgraph G'. How many queries to G are needed for this task?
We present some answers and some questions.
Joint works with Reut Levi, Guy Moshkovitz, Dana Ron and Asaf Shapira
Testing Dynamic Environments
Oded Goldreich, Weizmann Institute of Science
We initiate a study of learning and testing dynamic environments,
focusing on environment that evolve according to a fixed local rule.
The (proper) learning task consists of obtaining the initial configuration
of the environment, whereas for non-proper learning it suffices to predict
its future values. The testing task consists of checking whether
the environment has indeed evolved from some initial configuration
according to the known evolution rule.
We focus on the temporal aspect of these computational problems,
which is reflected in two requirements:
(1) It is not possible to ""go back to the past""
and make a query concerning the environment at time t after making
a query at time t' > t;
(2) Only a small portion of the environment
is inspected in each time slot.
We present some general observations, an extensive study of two special cases,
two separation results, and a host of open problems.
The two special cases that we study refer to linear rules of evolution
and to rules of evolution that represent simple movement of objects.
Specifically, we show that evolution according to any linear rule
can be tested within a total number of queries that is sublinear
in the size of the environment, and that evolution according to
a simple one-dimensional movement can be tested within a total number
of queries that is independent of the size of the environment.
In my talk, I plan to focus on the general observation
and state only the results for the aforementioned two special cases.
Approximately counting triangles in sublinear time
Dana Ron, Tel Aviv University
We consider the problem of estimating the number of triangles in a graph. This problem has been extensively studied in both theory and practice, but all existing algorithms read the entire graph. In this work we design a sublinear-time algorithm for approximating the number of triangles in a graph, and prove a matching lower bound.
The Latest on Linear Graph Sketching: Matchings, Densest Subgraphs, and Vertex Connectivity
Andrew McGregor, University of Massachusetts at Amherst
In this talk, we survey recent work on using random linear
projections, a.k.a. sketches, to solve graph problems. Sketches are
useful in a variety of computational models including the dynamic
graph stream model were the input is defined by a stream of edge
insertions and deletions that need to be processed in small space. A
large number of problems have now been considered in this model
including edge and vertex connectivity, sparsification, densest
subgraph, correlation clustering, vertex cover and matching.
Parameterized and Promised Streaming: Making Big Data More Accessible
MohammadTaghi HajiAghayi, University of Maryland at College Park
Big networks are constantly growing in both size and relevance: from social networks such as Google+, Facebook, and Twitter, to brain networks, gene regulatory networks, and health/disease networks. The traditional approach to analyzing such big datasets is to use powerful supercomputers (clusters), ideally large enough to store the data in main memory. The downsides to this approach are that many potential users of big data lack such powerful computational resources (e.g. point-of-sale Bitcoin blockchain analysis), and it can be difficult to solve unexpected problems within such a large infrastructure (e.g. image analysis after the Boston Marathon Bombing). Our main goal here is to enable the processing of huge datasets on computational devices with a limited amount of fast memory, connected to a relatively slow external data source. In particular in this talk, we introduce two new approaches of parameterized and promised streaming for optimization (often for NP-hard) problems over dynamic graph streams in which edges can be both inserted and deleted.
Algorithms for Querying Noisy Distributed/Streaming Datasets
Qin Zhang, Indiana University Bloomington
This talk concerns the following question: can we run distributed and streaming algorithms directly on the noisy datasets, resolve the noise "on the fly", and retain communication/space efficiency compared with the noise-free setting? Here 'noisy' means that a real-world entity may appear in different forms in different datasets, but those variants should be considered as the same universe element when performing statistical estimations. We give a first set of communication/space-efficient solutions for statistical estimations on distributed/streaming noisy datasets.
Distributed Submodular Maximization in MapReduce
Huy L Nguyen, Toyota Technological Institute at Chicago
A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. A lot of recent effort has been devoted to developing distributed algorithms for these problems. However, these results suffer from high number of rounds, suboptimal approximation ratios, or both. We develop a framework for bringing existing algorithms in the sequential setting to the distributed setting, achieving near optimal approximation ratios for many settings in only a constant number of MapReduce rounds.
High-rate locally-correctable and locally-testable codes with sub-polynomial query complexity
Noga Ron-Zewi, Princeton, Rutgers
Locally-correctable codes (LCCs) and locally-testable codes (LTCs) are special families of error-correcting codes that admit highly efficient sub-linear time algorithms for error correction and detection, respectively, while making a small number of queries to the received word.
LCCs and LTCs were originally studied in complexity theory because of their close relationship to program checking and probabilistically-checkable proofs (PCPs), but in recent years there has been a new incentive for their study due to their potential use for massive data transmission and storage purposes and the successful implementation of related families of local codes in cloud storage systems.
We show constructions of LCCs and LTCs with constant rate, constant relative distance, and sub-polynomial number of queries and running time of the form $exp(sqrt{log n})$ for length n codewords. Prior to our work, such codes were known to exist only with polynomial number of queries and running time of the form $n^{beta}$ (for a constant $beta>0$). and there were several, quite different, constructions known.
Along the way, we also construct LCCs and LTCs over large (but constant) size alphabets, with the same number of queries and running time of $exp(sqrt{log n})$, which additionally have the property of approaching the Singleton bound: they have almost the best-possible relationship between their rate and distance. Such a result was previously not known for any sub-linear number of queries and running time.
If time permits I will also discuss a more recent work that further improves the running time and number of queries of LTCs to a quasi-polylogarithmic function of the form $(log n)^{O(log log n)}$.
Joint work with Swastik Kopparty, Or Meir and Shubhangi Saraf
Streaming Set Cover
Amit Chakrabarti, Dartmouth
Given a stream of m subsets of the universe {1,...,n}, with m >> n, the Set
Cover problem asks for a small subcollection of these subsets that covers the
universe. Although this is a classic optimisation problem, its space complexity
in such a streaming setting has only recently begun to be understood.
We focus on semi-streaming algorithms: those running in space O(n poly(log n)).
We give tight upper and lower bounds that fully resolve the pass/approximation
tradeoffs achievable: in short, p semi-streaming passes lead to an approximation
ratio of n^{1/(p+1)}. The lower bound uses a novel algebraic construction of an
abstract incidence geometry, which may be interesting in its own right as a
combinatorial result.
Joint work with Tony Wirth.
New Algorithms for Testing Monotonicity
Eric Blais, University of Waterloo
Monotonicity testing asks: how many queries to an unknown Boolean function f must we make to distinguish monotone functions from functions that are far from monotone? Algorithms for monotonicity testing have typically been very simple: select a pair of comparable inputs x <= y according to some appropriate distribution, verify that f(x) <= f(y), and repeat. (The analysis of these algorithms, however, is far from simple, and indeed has led to fundamental new structural insights about monotonicity and Boolean functions.)
In this talk, we will explore two fundamentally different ways to design algorithms for testing monotonicity. We will survey how the analysis of these algorithms leads to new results, to connections with the analysis of pair testers, and to new questions. This is joint work with Alexander Belov.
A New Approach for Distribution Testing
Ilias Diakonikolas, University of Edinburgh
We study problems in distribution property testing:
Given sample access to one or more unknown discrete distributions,
we want to determine whether they have some global property or are $\eps$-far
from having the property in $\ell_1$ distance.
We provide a simple and general approach to obtain upper bounds in this setting,
by reducing $\ell_1$-testing to $\ell_2$-testing.
Our reduction yields optimal $\ell_1$-testers, by using a standard $\ell_2$-tester as a black-box.
Using our framework, we obtain sample-optimal and computationally efficient estimators for
a wide variety of $\ell_1$ distribution testing problems, including the following: identity testing to a fixed distribution,
closeness testing between two unknown distributions (with equal/unequal sample sizes),
independence testing (in any number of dimensions), closeness testing for collections of distributions, and testing
$k$-flatness. For several of these problems, we give the first optimal tester in the literature.
Moreover, our estimators are significantly simpler to state and analyze compared to previous approaches.
As our second main contribution, we provide a direct general approach for proving distribution testing lower bounds,
by bounding the mutual information. Our lower bound approach is not restricted to symmetric properties,
and we use it to prove tight lower bounds for all the aforementioned problems.
(Joint work with Daniel Kane.)
Towards Resistance Sparsifiers
Michael Dinitz, Johns Hopkins University
We study resistance sparsification of graphs, in which the goal is to find a sparse subgraph (with reweighted edges) that approximately preserves the effective resistances between every pair of nodes. We show that every dense regular expander admits a (1+eps)-resistance sparsifier of size O~(n/eps), and conjecture this bound holds for all graphs on n nodes. In comparison, spectral sparsification is a strictly stronger notion and requires Ω(n/eps^2) edges even on the complete graph.
Our approach leads to the following structural question on graphs: Does every dense regular expander contain a sparse regular expander as a subgraph? Our main technical contribution, which may of independent interest, is a positive answer to this question in a certain setting of parameters. Combining this with a recent result of von Luxburg, Radl, and Hein~(JMLR, 2014) leads to the aforementioned resistance sparsifiers.
Joint work with Robert Krauthgamer and Tal Wagner
Recovering structured probability matrices
Gregory Valiant, Stanford
We consider the problem of accurately recovering a matrix B of size M x M , which represents a probability distribution over M^2 outcomes,
given access to an observed matrix of "counts" generated by taking
independent samples from the distribution B. How can structural
properties of the underlying matrix B be leveraged to yield
computationally efficient and information theoretically optimal
reconstruction algorithms? When can accurate reconstruction be
accomplished in the sparse data regime? This basic problem lies at the
core of a number of questions that are currently being considered by
different communities, including community detection in graphs,
learning structured models such as topic models or hidden Markov
models, and the efforts from the natural language processing community
to compute "word embeddings". Many aspects of this problem - both in
terms of learning and property testing/estimation and on both the
algorithmic and information theoretic sides - remain open. We will
present some recent progress on this question, and discuss several
intriguing open directions related to achieving computationally
efficient and information theoretically near-optimal algorithms for
extracting structure underlying such count matrices."
Contributed Talks Abstracts:
Streaming Symmetric Norms via Measure Concentration
Lin Yang, Johns Hopkins University
We characterize the streaming space complexity of every symmetric norm l (a norm on Rn invariant under sign-flips and coordinate-permutations), by relating this space complexity to the measure-concentration characteristics of l. Specifically, we provide upper and lower bounds on the space complexity of approximating the norm of the stream, where both bounds depend on the median of l(x) when x is drawn uniformly from the l2 unit sphere. The same quantity governs many phenomena in high-dimensional spaces, such as large-deviation bounds and the critical dimension in Dvoretzky's Theorem. The family of symmetric norms contains several well-studied norms, such as all lp norms, and indeed we provide a new explanation for the disparity in space complexity between p ≤2 and p>2. In addition, we apply our general results to easily derive bounds for several norms were not studied before in the streaming model, including for example the top-k norm and the k-support norm, which was recently shown to be effective for machine learning tasks. Overall, these results make progress on two outstanding problems in the area of sublinear algorithms (Problems 5 and 30 in http://sublinear.info).
L-p testing
Grigory Yaroslavtsev, University of Pennsylvania
I will introduce a framework for systematic studies of sublinear algorithms for approximately testing properties of real-valued data with respect to Lp distances for p >= 1. Such algorithms distinguish datasets which either have (or are close to having) a certain property from datasets which are far from having it with respect to Lp distance. For applications involving noisy real-valued data, using Lp distances allows algorithms to withstand noise of bounded Lp norm. While the classical property testing framework developed with respect to Hamming distance has been studied extensively, testing with respect to Lp distances has received little attention.
We use our framework to design simple and fast algorithms for classic problems, such as testing monotonicity, convexity and the Lipschitz property, and also distance approximation to monotonicity. In particular, for functions over the hypergrid domains [n]^d , the complexity of our algorithms for all these properties does not depend on the linear dimension n. I will also describe applications of our Lp-testing algorithms to testing assumptions used in the analysis of gradient descent methods for convex optimization and suggest multiple directions for future work.
Joint work with Piotr Berman and Sofya Raskhodnikova (STOC'14)
k-means for Streaming and Distributed Big Sparse Data
Artem Barger, University of Haifa
We provide the first streaming algorithm for computing a provable approximation to the k-means of sparse Big data. Here, sparse Big Data is a set of n vectors in R^d, where each vector has O(1) non-zeroes entries, and d > n. E.g., adjacency matrix of a graph, web-links, social network, document-terms, or image-features matrices.
Our streaming algorithm stores at most log(n)*k^{O(1)} input points in memory. If the stream is distributed among M machines, the running time reduces by a factor of M, while communicating a total of Mk^{O(1)} (sparse) input points between the machines.
Differentially-private linear algebra in the Streaming Model
Jalaj Kumar Upadhyay, Pennsylvania State University
In this talk, we show how to perform linear algebraic tasks while preserving privacy when the data is streamed online. We give a new construction of a fast-Johnson-Lindenstrauss transform that preserves privacy, and use it to generate private sketch of the streamed data. We give optimal, up to logarithmic factor, space data-structures that can compute low rank approximation, linear regression, and matrix multiplication, while preserving differential privacy.
Testing Pattern-Freeness
Daniel Reichman
We consider the problem of testing pattern-freeness: given a string $I$ and a fixed pattern $J$ over a finite alphabet $\Sigma$, decide whether $I$ has no occurrence of $J$ or alternatively, one has to modify $I$ in at least an $\epsilon$-fraction of its locations to obtain a string that is $J$-free. The two-dimensional analog with 2D-images and patterns is studied as well.
We show that other than a small number of specific patterns, there is a one-sided tester for this problem whose query complexity is $2 k/\epsilon$ for patterns of size $k$. Our algorithm is very simple and its query complexity improves on %is better than
known bounds of current sublinear algorithms for this problem. We also show a lower bound, proving that any tester erring with probability at most $1/3$ must make $\Omega(k)$ queries to $I$.
The space complexity of streaming sums
Stephen R. Chestnut, ETH Zurich
A central problem in the theory of algorithms for data streams is to determine which functions on a stream can be approximated in sublinear, and especially poly-logarithmic, space. Given a function g, we study the space complexity of approximating sum_{i=1}^n g(|f_i|), where f in Z^n is the frequency vector of a turnstile stream. This is a generalization of the well-known frequency moments problem, and previous results apply only when g is monotonic or has a special functional form. Our contribution is to give a condition such that, except for a narrow class of functions g, there is a space-efficient approximation algorithm for the sum if and only if g satisfies the condition. The functions g that we are able to characterize include all convex, concave, monotonic, polynomial, and trigonometric functions, among many others, and is the first such characterization for non-monotonic functions. This is joint work with Vladimir Braverman, Lin F. Yang, and David P. Woodruff.
Approximating Functions over Frequencies and Implicit Matrices in Small Space
Alan Roytman, Tel Aviv University
As the amount of data being generated continues to grow at a staggering rate, streaming algorithms are increasingly becoming more important as a practical tool to analyze and make sense of all the information. In practice, such applications generate vast amounts of data in a very short period of time, and hence it is infeasible to store everything. This presents a pressing question: when is it possible to summarize data while still providing approximate solutions with good theoretical guarantees?
We study this question in the context of approximating functions over frequencies and implicit matrices. For frequency-based functions, we give a complete characterization of when the sum of such functions over frequencies of elements can be approximated using small space in the sliding window model. In addition, for a broad class of functions, we formalize the notion of universality for streaming over sliding windows, and construct a universal algorithm that can simultaneously approximate any such function from the class without knowing the function to be approximated in advance.
We also give general techniques for approximating functions over implicit matrices, which we describe as matrices generated via the outer product of two vectors presented as a stream. Our results automatically yield improved space bounds for the problem of identifying correlations in data streams, in which we are given a stream of pairs of integers whose frequencies induce a joint distribution and marginal distributions, and we would like to measure how close the joint distribution is to the product of the marginal distributions.
Local Computation Algorithms - a short survey
Shai Vardi, Weizmann Institute of Science
Local computation algorithms (LCAs) produce small parts of a single solution to a given search problem in sublinear time. Consider, for example, a massive graph on which we would like to compute a maximal independent set (MIS), but never need the entire solution at any given time. Instead, we are occasionally queried about whether certain vertices are in the MIS. An LCA would allow us to reply to these queries consistently (i.e., if the LCA is queried on all of the vertices the replies would conform to a coherent solution), using sublinear time and space. I will give a short overview of some recent results on LCAs.
Streaming Property Testing of Visibly Pushdown Languages
Nathanael Francois, TU Dortmund, Germany
In the context of language recognition, we demonstrate the superiority of streaming property testers against streaming algorithms and property testers, when they are not combined. Initiated by Feigenbaum et al., a streaming property tester is a streaming algorithm recognizing a language under the property testing approximation: it must distinguish inputs of the language from those that are ε-far from it, while using the smallest possible memory (rather than limiting its number of input queries).
Our main result is a streaming ε-property tester for visibly pushdown languages (VPL) with one-sided error using memory space poly((logn)/ε).
This constructions relies on a (non-streaming) property tester for weighted regular languages based on a previous tester by Alon et al. We provide a simple application of this tester for streaming testing special cases of instances of VPL that are already hard for both streaming algorithms and property testers.
Our main algorithm is a combination of an original simulation of visibly pushdown automata using a stack with small height but possible items of linear size. In a second step, those items are replaced by small sketches. Those sketches relies on a notion of suffix-sampling we introduce. This sampling is the key idea connecting our streaming tester algorithm to property testers.