We typically have seminars on Wednesday at noon in Malone 228.  All seminar announcements will be sent to the theory mailing list.

Lisa Zhang
May 13 @ 12:00 pm – 1:00 pm

Speaker: Lisa Zhang
Affiliation: Alcatel-Lucent Bell Labs

Title: Analysis of k-Anonymity Algorithms for Streaming Location Data

We propose and analyze algorithms to achieve k-anonymity for streaming location data. We consider a framework motivated by European Union privacy requirements, in which location information arrives online into a buffer of fixed size m. Whenever the buffer fills, some data must move to permanent storage in a k-anonymized fashion. This notion of anonymity refers to recording a coarse common region containing at least k points instead of separate exact locations. One primary goal is to minimize the recorded region size so that the anonymized location data is as accurate as possible.

We observe that under competitive analysis, any online algorithm can be arbitrarily bad in terms of the recorded region size. We therefore assume a more benign model in which the location distribution is known. For a uniform distribution, we analyze a simple, natural algorithm that partitions the space into m/k identical regions to ensure k-anonymity, and picks the region with the largest occupancy whenever the buffer fills. Our detailed analysis shows
that the largest occupancy converges to 2k. This implies, perhaps somewhat unintuitively, that it is sufficient to achieve k-anonymity by partitioning space into $2m/k$ regions, which reduces and thereby improves the recorded region size by a factor of 2. We also present an almost matching lower bound of 2m/k. Finally, we discuss generalizations to nonuniform distributions by partitioning the space to match the given distribution.

Sanjeev Khanna
Jun 10 @ 12:00 pm – 1:00 pm

Speaker: Sanjeev Khanna
Affiliation: University of Pennsylvania

Title: Tight Bounds for Linear Sketches of Approximate Matchings

We consider the problem of approximating a maximum matching in dynamic graph streams where the stream may include both edge insertions and deletions. Our main result is a resolution of the space complexity of linear sketches for approximating the maximum matching in this model.

Specifically, we show that for any $\eps > 0$, there exists a single-pass streaming algorithm, which only maintains a linear sketch of size roughly $n^{2-3\eps}$ bits and recovers an $n^\epsilon$-approximate maximum matching in dynamic graph streams, where $n$ is the number of vertices in the graph. We complement this result with the following lower bound result: any linear sketch for approximating the maximum matching to within a factor of $n^\eps$ has to be of size at least $n^{2-3\eps -o(1)}$ bits.

This is based on joint work with Sepehr Assadi, Yang Li, and Grigory Yaroslavtsev.

[Theory Seminar] Harry Lang
Sep 2 @ 12:00 pm – 1:00 pm


Title: A New Algorithm for Accurate and Low-Space k-Median Clustering on Data Streams
Abstract: The k-median problem for insertion-only data streams is an active area of research.  In 2003, Charikar et al provided the first poly(k, log n)-space constant-approximation, albeit with a massive constant (~5500).  In the current work, we introduce a new technique that finds a low-constant approximation (~40) without requiring any additional storage.
Short Bio: Harry Lang is a doctoral student in the mathematics department at JHU.  He studies algebraic topology, and in particular computational methods for detecting topological structure of massive data sets.  In computer science, he works on streaming algorithms for clustering with Professor Vladimir Braverman.
This talk will be given remotely.


Hossein Esfandiari
Sep 30 @ 12:00 pm – 1:00 pm

Details and abstract will be added when available.

[Theory Seminar] Hossein Esfandiari
Sep 30 @ 12:00 pm – 1:00 pm



Streaming Algorithms for Estimating the Matching Size in Planar Graphs and Beyond.


We consider the problem of estimating the size of a maximum matching when the edges are revealed in a streaming fashion. Consider a graph G=(V,E) with n vertices and m edges. The input stream is a permutation of edges S= (e_1,…,e_m) chosen by an adversary.
The goal is to output an estimation of the size of a maximum matching. The algorithm is only allowed to use a small amount of memory (much smaller than n).

When the underlying graph is planar, we present a simple and elegant streaming algorithm that with high probability estimates the size of a maximum matching within a constant factor using O-tilde(n^(2/3)) space. The approach generalizes to the family of graphs that have bounded arboricity. Graphs with bounded arboricity include, among other families of graphs, graphs with an excluded constant-size minor. To the best of our knowledge, this is the first result for estimating the size of a maximum matching in the adversarial-order streaming model (as opposed to the random-order streaming model). We circumvent the barriers inherent in the adversarial-order model by exploiting several structural properties of planar graphs, and more generally, graphs with bounded arboricity. We hope that this approach finds applications in estimating other properties of graphs in the adversarial-order streaming model. We further reduce the required memory size to O-tilde(sqrt(n)) for three restricted settings: (i) when the underlying graph is a forest; (ii) when we have 2-passes over the stream of edges of a graph with bounded arboricity; and (iii) when the edges arrive in random order and the underlying graph has bounded arboricity.

Finally, by introducing a communication complexity problem, we show that the approximation factor of a deterministic algorithm cannot be better than a constant using o(n) space, even if the underlying graph is a collection of paths. We can show that under a plausible conjecture for the hardness of the communication complexity problem, randomized algorithms with o(sqrt(n)) space cannot have an approximation factor better than a fixed constant.

[Theory Seminar] Sofya Raskhodnikova
Oct 7 @ 12:00 pm – 1:00 pm
Title: Fast Algorithms for Testing Geometric Properties
Speaker: Sofya Raskhodnikova


How quickly can we determine if an object satisfies some basic geometric property? For example, is the object a half-plane? Is it convex? Is it connected? If we need to answer such a question exactly, it requires at least as much time as it takes to read the object. In this talk, we will focus on approximate versions of these questions and will discuss how to solve them in time that depends only on the approximation parameter, but not the size of the input.

Specifically, an algorithm is given access to a discretized image represented by an n x n matrix of 0/1 pixel values. Another input to the algorithm is an approximation parameter, epsilon. The algorithm is required to accept images with the desired property and reject (with high probability) images that are far from having the desired property. An image is far if at least an epsilon fraction of its pixels has to be changed to get an image with the required property. For example, in this model, if the algorithm is allowed to read pixels of its choice, the half-plane property and convexity can be tested in time O(1/epsilon). If the algorithm receives access to pixels chosen uniformly and independently at random, then the half-plane property still takes O(1/epsilon) time, but for convexity the (optimal) bound on the running time is O(1/epsilon^(4/3)).

Based on joint work with Piotr Berman and Meiram Murzabulatov.


Sofya Raskhodnikova is an associate professor of Computer Science and Engineering at Penn State. Her research interests include sublinear-time algorithms (in particular, property testing), private data analysis, approximation algorithms, and randomized algorithms. She got her PhD from MIT in 2003. From the fall of 2003 to 2006, she worked at the Hebrew University of Jerusalem, the Weizmann Institute of Science and the Institute for Pure and Applied Mathematics. In 2013–2014, she was on sabbatical leave at Boston University for a special Privacy Year and also participated in thePrivacy Tools project at Harvard University in Spring 2014.
[Theory Seminar] Sofya Raskhodnikova
Oct 7 @ 12:00 pm – 1:00 pm

Details and abstract will be added when available.

[Theory Seminar] Gal Shahaf
Nov 4 @ 12:00 pm – 1:00 pm

Title: Inapproximability of Truthful Mechanisms via Generalizations of the VC Dimension
Speaker: Gal Shahaf
Affiliation: The Hebrew University of Jerusalem

Algorithmic mechanism design (AMD) studies the delicate interplay between computational efficiency, truthfulness, and economic optimality. We focus on AMD’s paradigmatic problem: combinatorial auctions, and present new inapproximability results for truthful mechanisms in this scenario. Our main technique is a generalization of the classical VC dimension and the corresponding Sauer-Shelah Lemma.

Joint work with Amit Daniely and Michael Schapira

The talk is designed to be accessible to M.Sc. students, and includes an elementary introduction to VC dimension, combinatorial auctions and VCG mechanisms.

[Theory Seminar] Ilya Razenshteyn
Nov 9 @ 12:00 pm – 1:00 pm

SPEAKER: Ilya Razenshteyn (MIT)

TITLE: Sketching and Embedding are Equivalent for Norms

ABSTRACT: Imagine the following communication task. Alice and Bob each have a point from a metric space. They want to transmit a few bits and decide whether their points are close to each other or are far apart. Of particular interest are sketching protocols: Alice and Bob both compute short summaries of their inputs and then a referee, given these summaries, makes the decision; sketches are very useful for the nearest neighbor search, streaming, randomized linear algebra etc. Indyk (FOCS 2000) showed that for the l_p spaces with 0 < p <= 2 the above problem allows a very efficient sketching protocol. Consequently, any metric that can be mapped into the l_p space with all the distances being approximately preserved has a good protocol as well. 

I will show that for normed spaces (a very important class of metric spaces) embedding into l_p is the only possible technique for solving the communication problem. Slightly more formally, we show that any normed space that admits a good communication (in particular, sketching) protocol for distinguishing close and far pairs of points embeds well into l_p with p being close to 1. The proof uses tools from communication complexity and functional analysis.

 As a corollary, we will show communication lower bounds for the planar Earth Mover’s Distance (minimum-cost matching metric) and for the trace norm (the sum of the singular values of a matrix) by deriving them from the (known) non-embeddability theorems and (the contrapositive of) our result.


The talk is based on a joint paper with Alexandr Andoni and Robert Krauthgamer (arXiv:1411.2577).

[Theory Seminar] Zeyu Zhang
Nov 18 @ 12:00 pm – 1:00 pm
Approximating Low-Stretch Spanners
Despite significant recent progress on approximating graph spanners (subgraphs which approximately preserve distances), there are still several large gaps in our understanding. We give new results for two of them: approximating basic k-spanner (particularly for small k), and the dependence on f when approximating f-fault tolerant spanners.
We first design an Õ(n^(1/3))-approximation for 4-spanner (both basic and directed). This was the last value of k for which only an O(√n)-approximation was known for basic k-spanner, and thus implies that for any k the approximation ratio is at most Õ(n^(1/3)). For basic k-spanner, we also show an integrality gap for the natural flow-based LP (the main tool in almost all nontrivial spanner approximations) which nearly matches the trivial approximation of n^{\frac{1}{\lfloor (k+1)/2\rfloor}}.
For f-fault tolerant spanners, we show that in the small-stretch setting (k ∈ {3,4}) it is possible to entirely remove the dependence on f from the approximation ratio, at the cost of moving to bicriteria guarantees. The previous best dependence on f was either almost-linear (in the undirected setting) or exponential (in the directed setting for stretch 4).