Seminar

We typically have seminars on Wednesdays at noon in Malone 228.  All seminar announcements will be sent to the theory mailing list.

Nov
9
Sat
FOCS Workshops
Nov 9 all-day
Nov
10
Sun
FOCS
Nov 10 – Nov 12 all-day
Dec
4
Wed
[Theory Seminar] Yasamin Nazari
Dec 4 @ 12:00 pm – 1:00 pm

Speaker: Yasamin Nazari
Affiliation: Johns Hopkins University

Title: Sparse Hopsets in Congested Clique

Abstract:

We give the first Congested Clique algorithm that computes a sparse hopset with polylogarithmic hopbound in polylogarithmic time. Given a graph G=(V,E), a (β,ϵ)-hopset H with “hopbound” β, is a set of edges added to G such that for any pair of nodes u and v in G there is a path with at most β hops in GH with length within (1+ϵ) of the shortest path between u and v in G.
Our hopsets are significantly sparser than the recent construction of Censor-Hillel et al. [6], that constructs a hopset of size Õ (n^{3/2}), but with a smaller polylogarithmic hopbound. On the other hand, the previously known constructions of sparse hopsets with polylogarithmic hopbound in the Congested Clique model, proposed by Elkin and Neiman [10],[11],[12], all require polynomial rounds.
One tool that we use is an efficient algorithm that constructs an -limited neighborhood cover, that may be of independent interest.
Finally, as a side result, we also give a hopset construction in a variant of the low-memory Massively Parallel Computation model, with improved running time over existing algorithms.
Dec
11
Wed
[Theory Seminar] Richard Shea
Dec 11 @ 12:00 pm – 1:00 pm

Speaker: Richard Shea

Affiliation: Applied and Computational Math program, Johns Hopkins University

Title: Progress towards building a Dynamic Hawkes Graph

Abstract:

This talk will introduce the Dynamic Hawkes Graph.  It builds from developments in multivariate Hawkes Processes, locally stable Hawkes distributions, and representations of the Hawkes process as an Integer Valued autoregressive (INAR) fit.  The background to enable understanding of the Dynamic Hawkes Graph will be the bulk of the talk. Richard is presenting this as part of his Master’s thesis.
Mar
11
Wed
[Theory Seminar] Aditya Krishnan
Mar 11 @ 12:00 pm – 1:00 pm

Speaker: Aditya Krishnan
Affiliation: Johns Hopkins University

Title: Schatten Norms in Matrix Streams: The Role of Sparsity

Abstract:
Spectral functions of large matrices contain important structural information about the underlying data, and are thus becoming increasingly important to efficiently compute. Many times, large matrices representing real-world data are sparse or doubly sparse (i.e., sparse in both rows and columns), and are accessed as a stream of updates, typically organized in row-order. In this setting, where space (memory) is the limiting resource, all known algorithms require space that is polynomial in the dimension of the matrix, even for sparse matrices. We address this challenge by providing the first algorithms whose space requirement is independent of the matrix dimension, assuming the matrix is doubly-sparse and presented in row-order.

In addition, we prove that multiple passes are unavoidable in this setting and show extensions of our primary technique, including a trade-off between space requirements and number of passes. Our algorithms approximate the Schatten p-norms, which we use in turn to approximate other spectral functions, such as logarithm of the determinant, trace of matrix inverse, and Estrada index.

Joint work with Vladimir Braverman, Robert Krauthgamer and Roi Sinoff.

Mar
25
Wed
[Theory Seminar] Arnold Filtser
Mar 25 @ 12:00 pm – 1:00 pm

Speaker: Arnold Filtser
Affiliation: Columbia University

Title: TBD
Abstract: TBD

Sep
2
Wed
[Theory Seminar] Welcome and Introductions
Sep 2 @ 12:00 pm – 1:00 pm
Sep
16
Wed
[Theory Seminar] Jasper Lee
Sep 16 @ 12:00 pm – 1:00 pm

Speaker: Jasper Lee
Affiliation: Brown University

Title: Real-Valued Sub-Gaussian Mean Estimation, Optimal to a (1+o(1)) Multiplicative Factor

Abstract:
We revisit one of the most fundamental problems in statistics: given access to independent samples from a 1D random variable (with finite but unknown mean and variance), what is the best way to estimate the mean, in terms of error convergence with respect to sample size? The conventional wisdom is to use the sample mean as our estimate. However it is known that the sample mean has optimal convergence if and only if the underlying distribution is (sub-)Gaussian. Moreover, it can even be exponentially slower than optimal for certain heavy-tailed distributions. On the other hand, the median-of-means estimator (invented and reinvented in various literature) does have sub-Gaussian convergence for all finite-variance distributions, albeit in the big-O sense with a sub-optimal multiplicative constant. The natural remaining question then, is whether it is possible to bridge the gap, to have an estimator that has optimal sub-Gaussian concentration with an optimal constant, for all finite-variance distributions.

In this talk, we answer the question affirmatively by giving an estimator that converges with the optimal constant inside the big-O, up to a (1+o(1)) multiplicative factor. Our estimator is furthermore computable in time linear in the sample size. The convergence analysis involves deriving tail bounds using tools from linear and convex programming, which may be of independent interest.

Joint work with Paul Valiant.

Sep
23
Wed
[Theory Seminar] Edinah Gnang
Sep 23 @ 12:00 pm – 1:00 pm

Speaker: Edinah Gnang
Affiliation: Johns Hopkins University
Title: On the Kotzig-Ringel-Rosa conjecture

Abstract:
In this talk we describe and motivate the K.R.R. conjecture graph partitioning and describe a functional approach enabling us to tackle the K.R.R. conjecture via a new composition lemma. If time permits I will also discuss algorithmic aspects.

Oct
7
Wed
[Theory Seminar] Aditya Krishnan
Oct 7 @ 12:00 pm – 1:00 pm

Speaker: Aditya Krishnan
Affiliation: Johns Hopkins University

Title: Schatten Norms in Matrix Streams: The Role of Sparsity.

Abstract: Spectral functions of large matrices contain important structural information about the underlying data, and are thus becoming increasingly important to efficiently compute. Many times, large matrices representing real-world data are \emph{sparse} or \emph{doubly sparse} (i.e., sparse in both rows and columns), and are accessed as a \emph{stream} of updates, typically organized in \emph{row-order}. In this setting, where space (memory) is the limiting resource, all known algorithms require space that is polynomial in the dimension of the matrix, even for sparse matrices. We address this challenge by providing the first algorithms whose space requirement is \emph{independent of the matrix dimension}, assuming the matrix is doubly-sparse and presented in row-order.

In addition, we prove that multiple passes are unavoidable in this setting and show extensions of our primary technique, including a trade-off between space requirements and number of passes. Our algorithms approximate the Schatten p-norms, which we use in turn to approximate other spectral functions, such as logarithm of the determinant, trace of matrix inverse, and Estrada index.

Joint work with Vladimir Braverman, Robert Krauthgamer and Roi Sinoff.

Oct
21
Wed
[Theory Seminar] Brian Brubach
Oct 21 @ 12:00 pm – 1:00 pm

Speaker: Brian Brubach
Affiliation: Wellesley College

Title: Online matching under three layers of uncertainty

Abstract:
Online matching problems have become ubiquitous with the rise of the internet and e-commerce. From the humble beginnings of a single problem 30 years ago, the theoretical study of online matching now encompasses dozens of variations inspired by diverse applications. We’ll take a tour through the landscape of online matching problems. As we go, we’ll venture deeper and deeper into the jungle of stochasticity and uncertainty. Finally, we’ll cover some very recent work introducing new algorithms and models. Along the way, I’ll highlight techniques and tricks I’ve found useful in studying these problems.

Oct
28
Wed
[Theory Seminar] Joshua Grochow
Oct 28 @ 12:00 pm – 1:00 pm

Speaker: Joshua Grochow
Affiliation: University of Colorado

Title: Isomorphism of tensors, algebras, and polynomials, oh my!

Abstract: We consider the problems of testing isomorphism of tensors, p-groups, cubic polynomials, quantum states, algebras, and more, which arise from a variety of areas, including machine learning, group theory, and cryptography. Despite Graph Isomorphism now being in quasi-polynomial time, and having long had efficient practical software, for the problems we consider no algorithm is known that is asymptotically better than brute force, and state-of-the-art software cannot get beyond small instances. We approach this situation in two ways:
– Complexity-theoretic: we show that all these problems are polynomial-time equivalent, giving rise to a class of problems we call Tensor Isomorphism-complete (TI-complete). Perhaps surprising here is that we show that isomorphism of d-tensors for any fixed d (at least 3) is equivalent to testing isomorphism of 3-tensors. These equivalences let us focus on just one problem rather than dozens, or a whole infinite hierarchy, separately.
– Algorithmic: Adapting the Weisfeiler-Leman method from Graph Isomorphism to tensors, trying to understand tensor isomorphism by taking advantage of isomorphisms of small sub-tensors. This leads to tensor analogues of the Graph Reconstruction conjecture and related questions.

Based on joint works with Vyacheslav V. Futorny and Vladimir V. Sergeichuk (Lin. Alg. Appl., 2019; arXiv:1810.09219), with Peter A. Brooksbank, Yinan Li, Youming Qiao, and James B. Wilson (arXiv:1905.02518), and with Youming Qiao (arXiv:1907.00309).

Nov
4
Wed
[Theory Seminar] Jingfeng Wu
Nov 4 @ 12:00 pm – 1:00 pm

Speaker: Jingfeng Wu
Affiliation: Johns Hopkins University

Title: Direction Matters: On the Implicit Regularization Effect of Stochastic Gradient Descent with Moderate Learning Rate

Abstract:
Understanding the algorithmic regularization effect of stochastic gradient descent (SGD) is one of the key challenges in modern machine learning and deep learning theory. Most of the existing works, however, focus on very small or even infinitesimal learning rate regime, and fail to cover practical scenarios where the learning rate is moderate and annealing. In this paper, we make an initial attempt to characterize the particular regularization effect of SGD in the moderate learning rate regime by studying its behavior for optimizing an overparameterized linear regression problem. In this case, SGD and GD are known to converge to the unique minimum-norm solution; however, with the moderate and annealing learning rate, we show that they exhibit different directional bias: SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions. Furthermore, we show that such directional bias does matter when early stopping is adopted, where the SGD output is nearly optimal but the GD output is suboptimal. Finally, our theory explains several folk arts in practice used for SGD hyperparameter tuning, such as (1) linearly scaling the initial learning rate with batch size; and (2) overrunning SGD with high learning rate even when the loss stops decreasing.

Nov
11
Wed
[Theory Seminar] Arnold Filtser
Nov 11 @ 12:00 pm – 1:00 pm

Speaker: Arnold Filtser
Affiliation: Columbia University

Title: Scattering and Sparse Partitions, and their Applications
Abstract:
A partition $\mathcal{P}$ of a weighted graph $G$ is $(\sigma,\tau,\Delta)$-sparse if every cluster has diameter at most $\Delta$, and every ball of radius $\Delta/\sigma$ intersects at most $\tau$ clusters. Similarly, $\mathcal{P}$ is $(\sigma,\tau,\Delta)$-scattering if instead for balls we require that every shortest path of length at most $\Delta/\sigma$ intersects at most $\tau$ clusters. Given a graph $G$ that admits a $(\sigma,\tau,\Delta)$-sparse partition for all $\Delta>0$, Jia et al. [STOC05] constructed a solution for the Universal Steiner Tree problem (and also Universal TSP) with stretch $O(\tau\sigma^2\log_\tau n)$. Given a graph $G$ that admits a $(\sigma,\tau,\Delta)$-scattering partition for all $\Delta>0$, we construct a solution for the Steiner Point Removal problem with stretch $O(\tau^3\sigma^3)$. We then construct sparse and scattering partitions for various different graph families, receiving new results for the Universal Steiner Tree and Steiner Point Removal problems.

Dec
2
Wed
[Theory Seminar] Yu Zheng
Dec 2 @ 12:00 pm – 1:00 pm

Speaker: Yu Zheng
Affiliation: Johns Hopkins University

Title: Space Efficient Deterministic Approximation of String Measures

Abstract:
We study approximation algorithms for the following three string measures that are widely used in practice: edit distance (ED), longest common subsequence (LCS), and longest increasing sequence (LIS). All three problems can be solved exactly by standard algorithms that run in polynomial time with roughly $\Theta(n)$ space, where $n$ is the input length, and our goal is to design deterministic approximation algorithms that run in polynomial time with significantly smaller space.

Towards this, we design several algorithms that achieve $1+\eps$ or $1-\eps$ approximation for all three problems, where $\eps>0$ can be any constant and even slightly sub constant. Our algorithms are flexible and can be adjusted to achieve the following two regimes of parameters: 1) space $n^{\delta}$ for any constant $\delta>0$ with running time essentially the same as or slightly more than the standard algorithms; and 2) space $\mathsf{polylog}(n)$ with (a larger) polynomial running time, which puts the approximation versions of the three problems in Steve’s class (SC). Our algorithms significantly improve previous results in terms of space complexity, where all known results need to use space at least $\Omega(\sqrt{n})$. Some of our algorithms can also be adapted to work in the asymmetric streaming model [SS13], and output the corresponding sequence. Furthermore, our results can be used to improve a recent result by Farhadi et. al. [FHRS20] about approximating ED in the asymmetric streaming model, reducing the running time from being exponential in [FHRS20] to a polynomial.

Our algorithms are based on the idea of using recursion as in Savitch’s theorem [Sav70], and a careful adaption of previous techniques to make the recursion work. Along the way we also give a new logspace reduction from longest common subsequence to longest increasing sequence, which may be of independent interest.

Dec
9
Wed
[Theory Seminar] Xuan Wu
Dec 9 @ 12:00 pm – 1:00 pm

Speaker: Xuan Wu
Affiliation: Johns Hopkins University

Title: Coreset for Clustering in Graph Metrics.

Abstract:
Clustering is a fundamental task in machine learning. As the increasing demand for running machine learning algorithms in the huge data sets, classic clustering algorithms were found not to scale well. To this end, coreset is introduced as a powerful data reduction technique that turns a huge dataset into a tiny proxy. Coresets have been also successfully applied to streaming and distributed computing. Coresets for clustering in Euclidean spaces have been very well studied. However, very few results were known about the non-Euclidean space. Beyond Euclidean, graph metrics is a very important family of metric space. In this talk, I will cover my recent work on coreset for k-clustering in graph metrics, including bounded treewidth graph and excluded-minor graph.

Feb
17
Wed
[Theory Seminar] Enayat Ullah
Feb 17 @ 12:00 pm – 1:00 pm

Speaker: Enayat Ullah
Affiliation: Johns Hopkins University

Title: Machine unlearning via algorithmic stability

Abstract: We study the problem of machine unlearning, and identify a notion of algorithmic stability, Total Variation (TV) stability, which we argue, is suitable for the goal of exact efficient unlearning. For convex risk minimization problems, we design TV-stable algorithms based on noisy Stochastic Gradient Descent (SGD). Our key contribution is the design of corresponding efficient unlearning algorithms, which are based on constructing a (maximal) coupling of Markov chains for the noisy SGD procedure. To understand the trade-offs between accuracy and unlearning efficiency, we give upper and lower bounds on excess empirical and population risk of TV stable algorithms for convex risk minimization. Our techniques generalize to arbitrary non-convex functions, and our algorithms are differentially private as well.

Feb
24
Wed
[Theory Seminar] Thomas Lavastida
Feb 24 @ 12:00 pm – 1:00 pm

Speaker: Thomas Lavastida
Affiliation: Carnegie Mellon University

Title: Combinatorial Optimization Augmented with Machine Learning

Abstract:

Combinatorial optimization often focuses on optimizing for the worst-case. However, the best algorithm to use depends on the “relevant inputs”, which is application specific and often does not have a formal definition.

The talk gives a new theoretical model for designing algorithms that are tailored to inputs for the application at hand. In the model, learning is performed on past problem instances to make predictions on future instances. These predictions are incorporated into the design and analysis of the algorithm. The predictions can be used to achieve “instance-optimal” algorithm design when the predictions are accurate and the algorithm’s performance gracefully degrades when there is error in the prediction.

The talk will apply this framework to online algorithm design and give algorithms with theoretical performance that goes beyond worst-case analysis. The majority of the talk will focus on load balancing with restricted assignments.

Mar
3
Wed
[Theory Seminar] Hung Le
Mar 3 @ 12:00 pm – 1:00 pm

Speaker: Hung Le
Affiliation: University of Massachusetts, Amherst

Title: Reliable Spanners: Locality-Sensitive Orderings Strike Back

Abstract:
A highly desirable property of networks is robustness to failures.
Consider a metric space $(X,d_X)$, a graph $H$ over $X$ is a $\vartheta$-reliable $t$-spanner if, for every set of failed vertices $B\subset X$, there is a superset $B^+\supseteq B$ such that the induced subgraph $H[X\setminus B]$ preserves all the distances between points in $X\setminus B^+$ up to a stretch factor $t$, while the expected size of $B^+$ is as most $(1+\vartheta)|B|$. Such a spanner could withstand a catastrophe: failure of even $90\%$ of the network.

Buchin, Har-Peled, and Olah showed how to construct a sparse reliable spanner for Euclidean space from Euclidean locality-sensitive orderings, an object introduced by Chan, Har-Peled, and Jones. In this talk, we extend their approach to non-Euclidean metric spaces by generalizing the ordering of Chan, Har-Peled, and Jones to doubling metrics and introducing new types of locality-sensitive orderings for other metric spaces. We also show how to construct reliable spanners from the newly introduced locality-sensitive orderings via reliable 2-hop spanners for paths. The highlight of our results is that the number of edges in our spanner has no dependency on the spread.

Mar
10
Wed
[Theory Seminar] Teodor Marinov
Mar 10 @ 12:00 pm – 1:00 pm

Speaker: Teodor Marinov
Affiliation: Johns Hopkins University

Title: Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

Abstract:
Reinforcement Learning (RL) is a general scenario where agents interact with the environment to achieve some goal. The environment and an agent’s interactions are typically modeled as a Markov decision process (MDP), which can represent a rich variety of tasks. But, for which MDPs can an agent or an RL algorithm succeed? This requires a theoretical analysis of the complexity of an MDP. In this talk I will present information-theoretic lower bounds for a large class of MDPs. The lower bounds are based on studying a certain semi-infinite LP. Further, I will show that existing algorithms enjoy tighter gap-dependent regret bounds (similar to the stochastic multi-armed bandit problem), however, they are unable to achieve the information-theoretic lower bounds even in deterministic transition MDPs, unless there is a unique optimal policy.