We typically have seminars on Wednesdays at noon in Malone 228. All seminar announcements will be sent to the theory mailing list.

Speaker: Maryam Negahbani

Affiliation: Dartmouth University

Title: “Revisiting Priority k-Center: Fairness and Outliers

Abstract:

Clustering is a fundamental unsupervised learning and facility location problem extensively studied in the literature. I will talk about a clustering problem called “priority k-center” introduced by Plesnik (in Disc. Appl. Math. 1987). Given a metric space on n points X, with distance function d, an integer k, and radius r_v for each point v in X, the goal is to choose k points S as “centers” to minimize the maximum distance of a point v to S divided by r_v. For uniform r_v’s this is precisely the “k-center” problem where the objective is to minimize the maximum distance of any point to S. In the priority version, points with smaller r_v are prioritized to be closer to S. Recently, a special case of this problem was studied in the context of “individually fair clustering” by Jung et al., FORC 2020. This notion of fairness forces S to open a center in every “densely populated area” by setting r_v to be v’s distance to its closest (n/k)-th neighbor.

In this talk, I show how to approximate priority k-center with outliers: When for a given integer z, you are allowed to throw away z points as outliers and minimize the objective over the rest of the points. We show there is 9-approximation, which is morally a 5, if you have constant many types of radii or if your radii are powers of 2. This is via an LP-aware reduction to min-cost max-flow and is general enough that could handle Matroid constraints on facilities (where instead of asking to pick at most k facilities, you are asked to pick facilities that are independent in a given matroid). Things become quite interesting for priority knapsack-center with outliers: where opening each center costs something and you have a limited budget to spend on your solution S. In this case, we do not know how to solve the corresponding flow problem, so we alter our reduction to reduce to a simpler problem we do know how to solve taking a hit of +5 in the approximation ratio. There are still many open problems in this work, in addition to solving the flow problem in the knapsack case, the best LP integrality gap we know for priority k-center with outliers is 3.

Speaker: Leonidas Tsepenekas

Affiliation: University of Maryland

Title: Approximating Two-Stage Stochastic Supplier Problems

Abstract:

The main focus of this talk will be radius-based (supplier) clustering in the two-stage stochastic setting with recourse, where the inherent stochasticity of the model comes in the form of a budget constraint. Our eventual goal is to provide results in the most general distributional setting, where there is only black-box access to the underlying distribution. To that end, we follow a two-step approach. First, we develop algorithms for a restricted version of the problem, in which all possible scenarios are explicitly provided; second, we employ a novel scenario-discarding variant of the standard Sample Average Approximation (SAA) method, in which we also crucially exploit structural properties of the algorithms developed for the first step of the framework. In this way, we manage to generalize the results of the latter to the black-box model. Finally, we note that the scenario-discarding modification to the SAA method is necessary in order to optimize over the radius.

Paper: https://arxiv.org/abs/2008.03325

Speaker: Samson Zhou

Affiliation: Carnegie Mellon University

Title: Tight Bounds for Adversarially Robust Streams and Sliding Windows via Difference Estimators

Abstract:

We introduce *difference estimators* for data stream computation, which provide approximations to F(v)-F(u) for frequency vectors v,u and a given function F. We show how to use such estimators to carefully trade error for memory in an iterative manner. The function F is generally non-linear, and we give the first difference estimators for the frequency moments F_p for p between 0 and 2, as well as for integers p>2. Using these, we resolve a number of central open questions in adversarial robust streaming and sliding window models.

For both models, we obtain algorithms for norm estimation whose dependence on epsilon is 1/epsilon^2, which shows, up to logarithmic factors, that there is no overhead over the standard insertion-only data stream model for these problems.

Speaker: Zeyu Guo

Affiliation: Ohio State University

Title: TBD

Abstract: TBD

Speaker: Yuzhou Gu

Affiliation:NYU Center for Data Science & Courant Institute

Title: Community detection in the hypergraph stochastic block model

Abstract:

Community detection is a fundamental problem in network

science, and its theoretical study has received significant attention

over the last decade. In this talk I will present some recent advances

on the community detection problem in sparse hypergraphs. In

particular, we determine the weak recovery threshold for the

hypergraph stochastic block model for a wide range of parameters. This

resolves conjectures made by physicists in the corresponding regimes

and has implications to phase transitions of random constraint

satisfaction problems. A key component in this study is to analyze the

behavior of information channels under repeated applications of the

belief propagation operator. We introduce a framework for performing

this analysis based on information-theoretical methods for channel

comparison. Along the way, we formulate a rigorous version of the

population dynamics algorithm, an approach commonly used in practice

but lacks theoretical guarantees.