Seminar

We typically have seminars on Wednesdays at noon in Malone 228.  All seminar announcements will be sent to the theory mailing list.

Sep
3
Wed
Welcome and Introductions
Sep 3 @ 12:00 pm – 1:00 pm
Sep
10
Wed
Rico Zenklusen
Sep 10 @ 12:00 pm – 1:00 pm

Speaker: Rico Zenklusen
Affiliation: ETH Zurich and Johns Hopkins University

Title: Multi-Budgeted Matchings via the Ham Sandwich Theorem

Abstract:
In many applications, one has to deal with multiple, partially conflicting constraints. In this talk, we consider a multi-objective variant of the maximum weight matching problem, which is a classical combinatorial optimization problem with numerous applications. A natural way to deal with several objectives is to turn all of the objectives but one into budget constraints. This leads to the multi-budgeted matching problem, which asks to find a maximum weight matching subject to k linear constraints with nonnegative coefficients. Whereas this problem can easily be shown to be NP hard even for k=1, I will present in this talk a polynomial-time approximation scheme that works for any constant k. Our algorithm is based on rounding an optimal solution x* of an LP relaxation. Starting with a convex decomposition of x* into few matchings, we reduce the problem of rounding x* to an arguably simpler problem of successively merging two matchings in the convex decomposition of x*.

To prove that our algorithm is correct, we leverage two beautiful non-constructive mathematical theorems. More precisely, the Jordan Curve Theorem gives a concise and intuitive proof why our algorithm works for k=1, and a result of Stromquist and Woodall that follows from the Ham Sandwich Theorem allows for showing correctness for any constant k.

Part of this work is joint with Fabrizio Grandoni, R. Ravi and Mohit Singh.

Sep
17
Wed
Rong Ge
Sep 17 @ 12:00 pm – 1:00 pm

Speaker: Rong Ge
Affiliation: Microsoft Research New England

Title: New Algorithms for Learning Incoherent and Overcomplete Dictionary

Abstract:

In sparse recovery we are given a matrix A (“the dictionary”) and a vector of the form AX where X is sparse. and the goal is to recover X. This is a central notion in signal processing, statistics and machine learning. But in applications such as
sparse coding, the dictionary A is unknown and has to be learned from random examples of the form Y = AX where X is drawn from an appropriate distribution — this is the dictionary learning problem. In most settings, A is overcomplete: it has more columns
than rows. This talk presents a polynomial-time algorithm for learning overcomplete dictionaries; Our algorithm applies to incoherent dictionaries which have been a central object of study since they were introduced in seminal work of Donoho and Huo.

Based on joint work with Sanjeev Arora, Tengyu Ma and Ankur Moitra.

Bio:

Rong Ge is currently a post-doc at Microsoft Research, New England. He received his Ph.D. in Princeton University, advised by Prof. Sanjeev Arora. His main research interest is in applying algorithm design techniques from theoretical computer science to
machine learning problems, with the hope of provable algorithms and better understanding of the machine learning models.

Sep
24
Wed
Aaron Roth
Sep 24 @ 12:00 pm – 1:00 pm

Title: Correctness Protection via Differential Privacy

Speaker: Aaron Roth
Affiliation: UPenn

Abstract:
False discovery is a growing problem in scientific research. Despite
sophisticated statistical techniques for controlling the false discovery rate
and related statistics designed to protect against spurious discoveries, there
is significant evidence that many
published scientific papers contain incorrect conclusions.

In this talk we consider the role that adaptivity has in this problem. A
fundamental disconnect between the theorems that control false discovery rate
and the practice of science is that the theorems assume a fixed collection of
hypotheses to be tested, selected non-adaptively before the data is gathered,
whereas science is by definition an
adaptive process, in which data is shared and re-used, while hypotheses are
generated after seeing the results of previous tests.

We note that false discovery cannot be prevented when a substantial number of
adaptive queries are made to the data, and data is used naively — i.e. when
queries are answered exactly with their empirical estimates on a given finite
data set. However we show that remarkably, there is a different way to evaluate
statistical queries on a data set that allows even an adaptive analyst to make
exponentially many queries to the data set, while guaranteeing that with high
probability, all of the conclusions he draws generalize to the underlying
distribution. This technique counter-intuitively involves actively perturbing
the answers given to the data analyst, using techniques developed for privacy
preservation — but in our application, the perturbations are added entirely to
increase the utility of the data.

Joint work with Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann
Pitassi, and Omer Reingold.

Oct
8
Wed
Guy Kortsarz
Oct 8 @ 12:00 pm – 1:00 pm

Speaker: Guy Kortsarz
Affiliation: Rutgers University-Camden

Title: What have we learned about cut expansion and density problems

Abstract: I will survey several problems related to the above subjects. Directed and undirected multicut.  For Directed multicut I will show the approximation algorithm algorithm of Gupta.  Conductance (and sparsest cut), overlapping and non overlapping clustering, the small set expansion conjecture and it equivalent to breaking the ratio of 2 for MINIMUM partial vertex cover problem, the densest subgraph problem and the dense k-subgraph problem.

Oct
22
Wed
Robert Krauthgamer
Oct 22 @ 12:00 pm – 1:00 pm

Speaker: Robert Krauthgamer
Affiliation: Weizmann Institute of Science

Title: The Sketching Complexity of Graph Cuts

Abstract:
We study the problem of sketching an input graph $G$ on $n$ vertices, so that given the sketch, one can estimate the weight (capacity) of any cut in the graph within a small approximation factor $1+\epsilon$. The classic cut-sparsifier construction of Benczur and Karger (STOC 1996) implies a sketch of size $\tilde O(n/\epsilon^2)$ [this notation hides logarithmic factors].

We show that the dependence on $\epsilon$ can be brought down to only linear, at the price of achieving a slightly relaxed guarantee. Specifically, we design a randomized scheme that produces from $G$ a sketch of size $\tilde O(n/\epsilon)$ bits, from which the weight of any cut $(S,\bar S)$ can be reported, with high probability, within factor $1+\epsilon$. We also demonstrate some applications where this “for each” guarantee is indeed useful.

We further prove that that our relaxation is necessary. Specifically, a sketch that can $(1+\epsilon)$-approximate the weight of all cuts in the graph simultaneously (i.e., a “for all” guarantee), must be of size $\Omega(n/\epsilon^2)$ bits.

Joint work with Alexandr Andoni and David Woodruff.

Oct
29
Wed
Calvin Newport
Oct 29 @ 12:00 pm – 1:00 pm

Calvin Newport
Georgetown University

Title: Radio Network Lower Bounds Made Easy

Nov
5
Wed
Amitabh Basu
Nov 5 @ 12:00 pm – 1:00 pm

Speaker: Amitabh Basu
Affiliation: JHU

Title: Cutting Planes and Geometry of Numbers

Abstract: We survey some recent results in cutting plane theory for integer programming. Cutting Planes give a way to reduce the search space for the optimal solution in an integer optimization problem. The results we will present are very recent connections between cutting planes and covering/tiling properties of subsets of euclidean sets. Important structural information about cutting planes can be translated to geometric questions like: Does a particular compact subset B of R^n cover all of R^n when we consider all of its translates by integer vectors. This connects to very classical problems in the geometry of numbers and deep theorems like the Venkov-Alexandrov-McMullen theorem on tilings, and the geometry of zonotopes can be leveraged. Research in this area of integer optimization is very much work-in-progress; we will close the presentation with an invitation to join our quest with some open problems.

Nov
19
Wed
Grigory Yaroslavtsev
Nov 19 @ 12:00 pm – 1:00 pm

Speaker: Grigory Yaroslavtsev
Affiliation: University of Pennsylvania

Title: Parallel Algorithms for Geometric Graph Problems

Abstract:
I will describe algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. The talk will be self-contained, including a formal introduction of the modern theoretical computational models used to capture computations in massively parallel “MapReduce”-like systems. It will also include a sample of major open problems in the area.

For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes an approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms).

I will also show how the main ideas from the MST algorithm can be captured within a general “Solve-and-Sketch” algorithmic framework that we develop. Besides MST it also applies to the approximate Earth-Mover Distance (EMD) and the transportation cost problem. Algorithms designed in the “Solve-and-Sketch” framework have implications which go beyond parallel models. In particular, our work implies new near-linear time algorithms for EMD cost and transportation cost in the plane. Other implications include algorithms in the streaming with sorting model.

Joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov.

Jan
28
Wed
Michael Dinitz
Jan 28 @ 12:00 pm – 1:00 pm

Speaker: Michael Dinitz
Affiliation: Johns Hopkins University

Title: Approximating Graph Spanners

Abstract:
Graph spanners (subgraphs which approximately preserve distances) have been studied extensively since the 1980’s. Many of the known results are about the optimal tradeoffs between various parameters, particularly the stretch and size of the spanner. But there has been some recent progress on a different and less developed line of research: fixing the allowable stretch, and optimizing the size. This turns spanners into more of a computational problem, and allows us to use many of the standard techniques from approximation algorithms (convex relaxations in particular). In this talk we will give an overview of some of the progress in this area, its limitations, and some possible future directions.

Feb
18
Wed
David Harris
Feb 18 @ 12:00 pm – 1:00 pm

Speaker: David Harris
Affiliation: University of Maryland – College Park

Title: Lopsidependency in the Moser-Tardos framework: Beyond the Lopsided Lov\'{a}sz Local Lemma

Abstract: The Lopsided Lovasz Local Lemma (LLLL) is a powerful probabilistic principle which has been used in a variety of combinatorial constructions. While this principle began as a general statement about probability spaces, it has recently been transformed into a variety of polynomial-time algorithms. The resampling algorithm of Moser & Tardos is the most well-known example of this. A variety of criteria have been shown for the LLLL; the strongest possible criterion was shown by Shearer, and other criteria which are easier to use computationally have been shown by Bissacot et al, Pegden, and Kolipaka & Szegedy.

We show a new criterion for the Moser-Tardos algorithm to converge. This criterion is stronger than the LLLL criterion, and in fact can yield better results even than the full Shearer criterion. This is possible because it does not apply in the same generality as the original LLLL; yet, it is strong enough to cover many applications of the LLLL in combinatorics. We show a variety of new bounds and algorithms. A noteworthy application is for $k$-SAT, with bounded occurences of variables. As shown in Gebauer, Szabo, and Tardos, a $k$-SAT instance in which every variable appears $L \leq \frac{2^{k+1}}{e (k+1)}$ times, is satisfiable. Although this bound is asymptotically tight (in $k$), we improve it to $L \leq \frac{2^{k+1} (1 – 1/k)^k}{k-1} – \frac{2}{k}$ which can be significantly stronger when $k$ is small.

Mar
4
Wed
Matthew Andrews
Mar 4 @ 12:00 pm – 1:00 pm

Speaker: Matthew Andrews
Affiliation: Alcatel-Lucent Bell Labs

Title: Understanding Sponsored Content in Mobile Data Networks

Abstract:
Sponsored content is a mechanism in which content providers can pay the operator of a wireless network to make their content free to end users. Such offerings have recently been introduced in both the US and Asia and they raise many challenging questions regarding which sites should be candidates for sponsoring and how much the service provider should charge the content provider.

In this talk we introduce a number of models that aim to capture the interactions between the service provider, the content provider and the end users in a sponsored content offering. We show that it is possible to design the system so that it is win-win-win for all players. In many settings the problem is a generalization of the “Adwords” problem that arises in the design of sponsored search. We also show how to analyze network traffic and content provider financial data in order to calculate the input parameters for these models.

Mar
11
Wed
Jeremy Fineman
Mar 11 @ 12:00 pm – 1:00 pm

Speaker: Jeremy Fineman
Affiliation: Georgetown University

Title:
How to Fix Exponential Backoff: Achieving Constant Throughput and Robustness with Polylog Attempts

Abstract:
Randomized exponential backoff is employed in many domains to coordinate access to a shared resource or communication channel. Despite the ubiquity of the protocol, exponential backoff has poor general performance guarantees. Most notably, exponential backoff neither achieves constant throughput in a general online setting, nor is it robust to corrupted or jammed messages. This talk considers a new backoff protocol that achieves constant throughput, even in the presence of an adaptive adversary that jams (or blocks access to) the shared resource at certain times. The protocol also makes relatively few attempts to access the resources, which means that each agent does not expend too much energy. Specifically, we show that the expected energy per agent is O(log^2(n+J)), where n is the number of contenders and J is the amount of time the adversary jams.

Mar
18
Wed
Spring break (no seminar)
Mar 18 @ 12:00 pm – 1:00 pm
Apr
1
Wed
Alex Slivkins
Apr 1 @ 12:00 pm – 1:00 pm

Speaker: Alex Slivkins
Affiliation: Microsoft Research – New York

Title: Bandits with Resource Constraints
Abstract:
Multi-armed bandits is the predominant theoretical model for exploration-exploitation tradeoff in machine learning, with countless applications ranging from medical trials, to communication networks, to Web search and advertising, to dynamic pricing. In many of these application domains the learner may be constrained by one or more supply/budget limits, in addition to the customary limitation on the time horizon. We introduce a general model that encompasses such problems, combining aspects of stochastic integer programming with online learning. A distinctive feature (and challenge) in our model, compared to the conventional bandit problems, is that the optimal policy for a given problem instance may significantly outperform the policy that always chooses the best fixed action. Our main result is an algorithm with near-optimal regret relative to the optimal policy. Also, we extend this result to contextual bandits, and detail an application to dynamic pricing.

Apr
15
Wed
Mohammad Hajiaghayi
Apr 15 @ 12:00 pm – 1:00 pm

Mohammad Hajiaghayi
University of Maryland – College Park

Title: Parameterized and Promised Streaming: Matching and Vertex Cover

Abstract:
As graphs continue to grow in size, we seek ways to effectively
process such data at scale. The model of streaming graph processing, in
which a compact summary is maintained as each edge insertion/deletion
is observed, is an attractive one. However, few results are known for
optimization (often NP-hard) problems over such dynamic graph streams.

In this talk, we introduce a new approach to handling graph streams,
by instead seeking solutions for the parameterized (and promised) versions of
these problems. Here, we are given a parameter k and the objective is to
decide whether there is a solution bounded by k. By combining
kernelization techniques with randomized sketch structures, we obtain the
first streaming algorithms for the parameterized versions of Maximal
Matching and Vertex Cover. We consider various models for a graph stream on n
nodes: the insertion-only model where the edges can only be added, and
the dynamic model where edges can be both inserted and deleted.

Apr
22
Wed
Gordon Wilfong
Apr 22 @ 12:00 pm – 1:00 pm

Gordon Wilfong
Alcatel-Lucent Bell Labs

Title: Optimal Path Encoding
Abstract: Packet networks need to maintain state in the form
of forwarding tables at each switch. The cost of this state
increases as networks support ever more sophisticated per-flow
routing, traffic engineering, and service chaining. Per-flow or per-path
state at the switches can be eliminated by encoding each
packet’s desired path in its header. A key component of such a
method is an efficient encoding of paths.
We introduce a mathematical formulation of this optimal path encoding
problem. We prove that the problem is APX-hard, by
showing that approximating it to within a factor less than 8/7
is NP-hard. We then present an algorithm
approximating the optimal path-encoding problem to within a
factor 2. Finally, we provide empirical results illustrating the
effectiveness of the proposed algorithm.
Joint work with A. Hari (Bell Labs) and U. Niesen (Qualcomm)

May
13
Wed
Lisa Zhang
May 13 @ 12:00 pm – 1:00 pm

Speaker: Lisa Zhang
Affiliation: Alcatel-Lucent Bell Labs

Title: Analysis of k-Anonymity Algorithms for Streaming Location Data

Abstract:
We propose and analyze algorithms to achieve k-anonymity for streaming location data. We consider a framework motivated by European Union privacy requirements, in which location information arrives online into a buffer of fixed size m. Whenever the buffer fills, some data must move to permanent storage in a k-anonymized fashion. This notion of anonymity refers to recording a coarse common region containing at least k points instead of separate exact locations. One primary goal is to minimize the recorded region size so that the anonymized location data is as accurate as possible.

We observe that under competitive analysis, any online algorithm can be arbitrarily bad in terms of the recorded region size. We therefore assume a more benign model in which the location distribution is known. For a uniform distribution, we analyze a simple, natural algorithm that partitions the space into m/k identical regions to ensure k-anonymity, and picks the region with the largest occupancy whenever the buffer fills. Our detailed analysis shows
that the largest occupancy converges to 2k. This implies, perhaps somewhat unintuitively, that it is sufficient to achieve k-anonymity by partitioning space into $2m/k$ regions, which reduces and thereby improves the recorded region size by a factor of 2. We also present an almost matching lower bound of 2m/k. Finally, we discuss generalizations to nonuniform distributions by partitioning the space to match the given distribution.

Jun
10
Wed
Sanjeev Khanna
Jun 10 @ 12:00 pm – 1:00 pm

Speaker: Sanjeev Khanna
Affiliation: University of Pennsylvania

Title: Tight Bounds for Linear Sketches of Approximate Matchings

Abstract:
We consider the problem of approximating a maximum matching in dynamic graph streams where the stream may include both edge insertions and deletions. Our main result is a resolution of the space complexity of linear sketches for approximating the maximum matching in this model.

Specifically, we show that for any $\eps > 0$, there exists a single-pass streaming algorithm, which only maintains a linear sketch of size roughly $n^{2-3\eps}$ bits and recovers an $n^\epsilon$-approximate maximum matching in dynamic graph streams, where $n$ is the number of vertices in the graph. We complement this result with the following lower bound result: any linear sketch for approximating the maximum matching to within a factor of $n^\eps$ has to be of size at least $n^{2-3\eps -o(1)}$ bits.

This is based on joint work with Sepehr Assadi, Yang Li, and Grigory Yaroslavtsev.

Sep
2
Wed
[Theory Seminar] Harry Lang
Sep 2 @ 12:00 pm – 1:00 pm

 

Title: A New Algorithm for Accurate and Low-Space k-Median Clustering on Data Streams
Abstract: The k-median problem for insertion-only data streams is an active area of research.  In 2003, Charikar et al provided the first poly(k, log n)-space constant-approximation, albeit with a massive constant (~5500).  In the current work, we introduce a new technique that finds a low-constant approximation (~40) without requiring any additional storage.
Short Bio: Harry Lang is a doctoral student in the mathematics department at JHU.  He studies algebraic topology, and in particular computational methods for detecting topological structure of massive data sets.  In computer science, he works on streaming algorithms for clustering with Professor Vladimir Braverman.
This talk will be given remotely.