We typically have seminars on Wednesday at noon in Malone 228. All seminar announcements will be sent to the theory mailing list.

Speaker: Sami Davies

Affiliation: University of Washington

Title: A Tale of Santa Claus, Hypergraphs, and Matroids

Abstract:

A well-known problem in scheduling and approximation algorithms is the Santa Claus problem. Suppose that Santa Claus has a set of gifts, and he wants to distribute them among a set of children so that the least happy child is made as happy as possible. Here, the value that a child i has for a present j is of the form p_{ij} \in \{0,p_j\}. A polynomial time algorithm by Annamalai et al. gives a 12.33-approximation algorithm and is based on a modification of Haxell’s hypergraph matching argument.

In this paper, we introduce a matroid version of the Santa Claus problem. Our algorithm is also based on Haxell’s augmentation tree, but with the introduction of the matroid structure we solve a more general problem with cleaner methods. Our result can then be used as a blackbox to obtain a (4 +\varepsilon)-approximation for Santa Claus. This factor also compares against a natural, compact LP for Santa Claus.

Speaker: Jalaj Upadhyay

Affiliation: Johns Hopkins Universit

Title: Towards Robust and Scalable Private Data Analysis

Abstract:

In the current age of big data, we are constantly creating new data which is analyzed by various platforms to improve service and user’s experience. Given the sensitive and confidential nature of these data, there are obvious security and privacy concerns while storing and analyzing such data. In this talk, I will discuss the fundamental challenges in providing robust security and privacy guarantee while storing and analyzing large data. I will also give a brief overview of my contributions and future plans towards addressing these challenges.

To give a glimpse of these challenges in providing a robust privacy guarantee known as differential privacy, I will use spectral sparsification of graphs as an example. Given the ubiquitous nature of graphs, differentially private analysis on graphs has gained a lot of interest. However, existing algorithms for these analyses are tailored made for the task at hand making them infeasible in practice. In this talk, I will present a novel differentially private algorithm that outputs a spectral sparsification of the input graph. At the core of this algorithm is a method to privately estimate the importance of an edge in the graph. Prior to this work, there was no known privacy preserving method that provides such an estimate or spectral sparsification of graphs.

Since many graph properties are defined by the spectrum of the graph, this work has many analytical as well as learning theoretic applications. To demonstrate some applications, I will show more efficient and accurate analysis of various combinatorial problems on graphs and the first technique to perform privacy preserving manifold learning on graphs.

Speaker: Martin Farach-Colton

Affiliation: Rutgers University

Title: TBA

Abstract: TBA

Speaker: Xue Chen

Affiliation: Northwestern University

Title: Active Regression via Linear-Sample Sparsification

Abstract:

E[||X \wt{\beta} – X\beta^*||_2^2] \leq \eps ||X \beta^* – y||_2^2.

This improves on the best previous result of O(d \log d + d/\eps) from leverage score sampling. We also present results for the *inductive* setting, showing when \wt{\beta} will generalize to fresh samples; these apply to continuous settings such as polynomial regression. Finally, we show how the techniques yield improved results for the non-linear sparse Fourier transform setting.

Bio: Xue Chen is broadly interested in randomized algorithms and the use of randomness in computation. Specific areas include Fourier transform, learning theory and optimization, and pseudorandomness. He obtained his Ph.D. at the University of Texas at Austin, under the supervision of David Zuckerman. Currently, he is a postdoctoral fellow in Northwestern University.

Speaker: Rediet Abebe

Affiliation: Cornell University

Title: Using Search Queries to Understand Health Information Needs in Africa

Abstract:

Access to healthcare and health information is of major global

concern. The stark inequality in the availability of health data by

country, demographic groups, and socioeconomic status impedes the

identification of major public health concerns and implementation of

effective interventions. This data gap ranges from basic disease

statistics, such as disease prevalence rates, to more nuanced

information, such as public attitudes. A key challenge is

understanding health information needs of under-served and

marginalized communities. Without understanding people’s everyday

needs, concerns, and misconceptions, health organizations lack the

ability to effectively target education and programming efforts.

In this presentation, we focus on the lack of comprehensive,

high-quality data about information needs of individuals in developing

nations. We propose an approach that uses search data to uncover

health information needs of individuals in all 54 nations in Africa.

We analyze Bing searches related to HIV/AIDS, malaria, and

tuberculosis; these searches reveal diverse health information needs

that vary by demographic groups and geographic regions. We also shed

light on discrepancies in the quality of content returned by search

engines.

We conclude with a discussion on computationally-informed

interventions both on- and off-line in health and related domains and

the Mechanism Design for Social Good research initiative.

Bio:

Rediet Abebe is a computer scientist with a strong interest in the

promotion of equality and justice. Her research focuses on algorithms,

AI, and their applications to social good. As part of this research

agenda, she co-founded and co-organizes Mechanism Design for Social

Good (MD4SG), an interdisciplinary, multi-institutional research

initiative with over 300 individuals. She is also a co-founder and

co-organizer of Black in AI, an international network of over 1000

individuals focused on increasing the presence and inclusion of Black

and African researchers in AI. Her research is deeply influenced by

her upbringing in her hometown of Addis Ababa, Ethiopia, where she

lived until moving to the U.S. in 2009. Her work has been generously

supported by fellowships and scholarships through Facebook, Google,

the Cornell Graduate School, and the Harvard-Cambridge Fellowship.

Speaker: Grigory Yaroslavtsev

Affiliation: Indiana University, Bloomington

Title: Advances in Hierarchical Clustering for Vector Data

Abstract:

Compared to the highly successful flat clustering (e.g. k-means), despite its important role and applications in data analysis, hierarchical clustering has been lacking in rigorous algorithmic studies until late due to absence of rigorous objectives. Since 2016, a sequence of works has emerged and gave novel algorithms for this problem in the general metric setting. This was enabled by a breakthrough by Dasgupta, who introduced a formal objective into the study of hierarchical clustering.

In this talk I will give an overview of our recent progress on models and scalable algorithms for hierarchical clustering applicable specifically to high-dimensional vector data. I will first discuss various linkage-based algorithms (single-linkage, average-linkage) and their formal properties with respect to various objectives. I will then introduce a new projection-based approximation algorithm for vector data. The talk will be self-contained and doesn’t assume prior knowledge of clustering methods.

Based on joint works with Vadapalli (ICML’18) and Charikar, Chatziafratis and Niazadeh (AISTATS’19)

Speaker: Arka Rai Choudhury

Affiliation: Johns Hopkins University

Title: Finding a Nash Equilibrium is No Easier than Breaking Fiat-Shamir

Abstract:

The Fiat-Shamir heuristic transforms a public-coin interactive proof into a non-interactive argument, by replacing the verifier with a cryptographic hash function that is applied to the protocol’s transcript. Constructing hash functions for which this transformation is sound is a central and long-standing open question in cryptography.

We show that solving the END-OF-METERED-LINE problem is no easier than breaking the soundness of the Fiat-Shamir transformation when applied to the sumcheck protocol. In particular, if the transformed protocol is sound, then any hard problem in #P gives rise to a hard distribution in the class CLS, which is contained in PPAD. Our result opens up the possibility of sampling moderately-sized games for which it is hard to find a Nash equilibrium, by reducing the inversion of appropriately chosen one-way functions to #SAT.

Our main technical contribution is a stateful incrementally verifiable procedure that, given a SAT instance over n variables, counts the number of satisfying assignments. This is accomplished via an exponential sequence of small steps, each computable in time poly(n). Incremental verifiability means that each intermediate state includes a sumcheck-based proof of its correctness, and the proof can be updated and verified in time poly(n).

Joint work with Pavel Hubacek, Chethan Kamath, Krzysztof Pietrzak, Alon Rosen, and Guy Rothblum.

Speaker: Amitabh Basu

Affiliation: JHU

Title: Admissibility of solution estimators for stochastic optimization

Abstract:

We look at stochastic optimization problems through the lens of statistical decision theory. In particular, we address admissibility, in the statistical decision theory sense, of the natural sample average estimator for a stochastic optimization problem (which is also known as the empirical risk minimization (ERM) rule in learning literature). It is well known that for general stochastic optimization problems, the sample average estimator may not be admissible. This is known as Stein’s paradox in the statistics literature. We show in this paper that for optimizing stochastic linear functions over compact sets, the sample average estimator is admissible.

Speaker: Daniel Reichman

Affiliation: Princeton University

Title: Contagious sets in bootstrap percolation

Abstract:

Consider the following activation process in undirected graphs: a vertex is active either if it belongs to a set of initially activated vertices or if at some point it has at least r active neighbors. This process (commonly referred to as bootstrap percolation) has been studied in several fields including combinatorics, computer science, statistical physics and viral marketing. A contagious set is a set whose activation results with the entire graph being active. Given a graph G, let m(G,r) be the minimal size of a contagious set.

I will survey upper and lower bounds for m(G,r) in general graphs and for graphs with special properties (random and pseudo-random graphs, graphs without short cycles) and discuss many unresolved questions.

Based on joint work with Amin Coja-Oghlan, Uriel Feige and Michael Krivelevich.

Speaker: Xuan Wu

Affiliation: Johns Hopkins

Title: Coreset for Ordered Weighted Clustering

Abstract:

Ordered k-Median is a generalization of classical clustering problems such as k-Median and k-Center, that offers a more flexible data analysis, like easily combining multiple objectives (e.g., to increase fairness or for Pareto optimization). Its objective function is defined via the Ordered Weighted Averaging (OWA) paradigm of Yager (1988), where data points are weighted according to a predefined weight vector, but in order of their contribution to the objective (distance from the centers).

Coreset is a powerful data-reduction technique which summarizes the data set into a small (weighted) point set while approximately preserving the objective value of the data set to all centers. When there are multiple objectives (weights), the above standard coreset might have limited usefulness, whereas in a \emph{simultaneous} coreset, which was introduced recently by Bachem and Lucic and Lattanzi (2018), the above approximation holds for all weights (in addition to all centers). Our main result is the first construction of simultaneous coreset for the Ordered k-Median problem of small size.

In this talk, I will introduce the basics of coreset construction for the clustering problem and the main ideas of our new results. Finally, we discuss some remaining open problems.

This talk is based on joint work with Vladimir Braverman, Shaofeng Jiang, and Robert Krauthgamer.