Speaker: Nith
in Varma

\nAffiliation: Boston University

Title: Separating e rasures and errors in property testing using local list decoding

\nA
bstract:

\nCorruption in data can be in the form of erasures (missing
data) or errors (wrong data). Erasure-resilient property testing (Dixit\,
Raskhodnikova\, Thakurta\, Varma ’16) and tolerant property testing (Parn
as\, Ron\, Rubinfeld ’06) are two formal models of sublinear algorithms th
at account for the presence of erasures and errors in input data\, respect
ively.

We first show that there exists a property P that has an e rasure-resilient tester whose query complexity is independent of the input size n\, whereas every tolerant tester for P has query complexity that de pends on n. We obtain this result by designing a local list decoder for th e Hadamard code that works in the presence of erasures\, thereby proving a n analog of the famous Goldreich-Levin Theorem. We also show a strengthene d separation by proving that there exists another property R such that R h as a constant-query erasure-resilient tester\, whereas every tolerant test er for R requires n^{Omega(1)} queries. The main tool used in proving the strengthened separation is an approximate variant of a locally list decoda ble code that works against erasures.

\nJoint work with Sofya Raskho dnikova and Noga Ron-Zewi.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-256@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Jalaj Upadhyay\nAffiliation: JHU\nTitle: Differentiall y Private Spectral Sparsification of Graphs\nAbstract:\nIn this talk\, we will discuss differentially private spectral sparsification of graphs. We argue that traditional spectral sparsification where the output graph is a subgraph of the input graph is not possible with differential privacy. Th is motivates us to define a relaxed version of spectral sparsification of graphs.\nWe consider edge-level privacy\, i.e.\, neighboring graphs differ s in one edge with weight one. We give efficient $(\\alpha\,\\beta)$-diffe rentially private algorithms that\, on input a dense graph G\, construct a spectral sparsification of G. Our output graphs has $ O(n/\\eps^2)$ weigh ted edges\, which matches the best known non-private algorithms.\nWe can u se our private sparse graph to solve various combinatorial and learning pr oblems on graphs efficiently while preserving differential privacy. Some e xamples include all possible cut queries\, Max-Cut\, Sparse-Cut\, Edge-Exp ansion\, Laplacian eigenmaps\, etc. \nThis talk is based on a joint work w ith Raman Arora and Vladimir Braverman. DTSTART;TZID=America/New_York:20181107T120000 DTEND;TZID=America/New_York:20181107T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Jalaj Upadhyay URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-jalaj-upadh yay-2/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Jala
j Upadhyay

\nAffiliation: JHU

Title: Differentially Private S
pectral Sparsification of Graphs

\nAbstract:

\nIn this talk\, we
will discuss differentially private spectral sparsification of graphs. We
argue that traditional spectral sparsification where the output graph is
a subgraph of the input graph is not possible with differential privacy. T
his motivates us to define a relaxed version of spectral sparsification of
graphs.

We consider edge-level privacy\, i.e.\, neighboring graph s differs in one edge with weight one. We give efficient $(\\alpha\,\\beta )$-differentially private algorithms that\, on input a dense graph G\, con struct a spectral sparsification of G. Our output graphs has $ O(n/\\eps^2 )$ weighted edges\, which matches the best known non-private algorithms.\n

We can use our private sparse graph to solve various combinatorial and learning problems on graphs efficiently while preserving differential privacy. Some examples include all possible cut queries\, Max-Cut\, Sparse -Cut\, Edge-Expansion\, Laplacian eigenmaps\, etc.

\nThis talk is b ased on a joint work with Raman Arora and Vladimir Braverman.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-264@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Ke Wu\nAffiliation: Johns Hopkins University\nTitle: S ynchronization Strings: Efficient and Fast Deterministic Constructions ove r Small Alphabets\nAbstract:\nSynchronization strings are recently introdu ced by Haeupler and Shahrasbi (STOC 2017) in the study of codes for correc ting insertion and deletion errors (insdel codes). They showed that for an y parameter ε>0\, synchronization strings of arbitrary length exist over a n alphabet whose size depends only on ε. Specifically\, they obtained an a lphabet size of O(ε^{−4})\, which left an open question on where the minim al size of such alphabets lies between Ω(ε^{1}) and O(ε^{−4}). In this wor k\, we partially bridge this gap by providing an improved lower bound of Ω (ε^{−3/2})\, and an improved upper bound of O(ε^{−2}). We also provide fas t explicit constructions of synchronization strings over small alphabets. \nFurther\, along the lines of previous work on similar combinatorial obje cts\, we study the extremal question of the smallest possible alphabet siz e over which synchronization strings can exist for some constant ε DTSTART;TZID=America/New_York:20181205T120000 DTEND;TZID=America/New_York:20181205T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Ke Wu URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-ke-wu/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Ke W
u

\nAffiliation: Johns Hopkins University

Title: Synchronizat
ion Strings: Efficient and Fast Deterministic Constructions over Small Alp
habets

\nAbstract:

\nSynchronization strings are recently introd
uced by Haeupler and Shahrasbi (STOC 2017) in the study of codes for corre
cting insertion and deletion errors (insdel codes). They showed that for a
ny parameter ε>0\, synchronization strings of arbitrary length exist over
an alphabet whose size depends only on ε. Specifically\, they obtained an
alphabet size of O(ε^{−4})\, which left an open question on where the mini
mal size of such alphabets lies between Ω(ε^{1}) and O(ε^{−4}). In this wo
rk\, we partially bridge this gap by providing an improved lower bound of
Ω(ε^{−3/2})\, and an improved upper bound of O(ε^{−2}). We also provide fa
st explicit constructions of synchronization strings over small alphabets.

Further\, along the lines of previous work on similar combinator ial objects\, we study the extremal question of the smallest possible alph abet size over which synchronization strings can exist for some constant ε <1. We show that one can construct ε-synchronization strings over alphabet s of size four while no such string exists over binary alphabets. This red uces the extremal question to whether synchronization strings exist over t ernary alphabets.\n

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-270@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Sami Davies\nAffiliation: University of Washington\nTi tle: A Tale of Santa Claus\, Hypergraphs\, and Matroids\nAbstract:\nA well -known problem in scheduling and approximation algorithms is the Santa Cla us problem. Suppose that Santa Claus has a set of gifts\, and he wants to distribute them among a set of children so that the least happy child is m ade as happy as possible. Here\, the value that a child i has for a presen t j is of the form p_{ij} \\in \\{0\,p_j\\}. A polynomial time algorithm b y Annamalai et al. gives a 12.33-approximation algorithm and is based on a modification of Haxell’s hypergraph matching argument.\nIn this paper\, w e introduce a matroid version of the Santa Claus problem. Our algorithm is also based on Haxell’s augmentation tree\, but with the introduction of t he matroid structure we solve a more general problem with cleaner methods. Our result can then be used as a blackbox to obtain a (4 +\\varepsilon)-a pproximation for Santa Claus. This factor also compares against a natural\ , compact LP for Santa Claus. DTSTART;TZID=America/New_York:20190116T120000 DTEND;TZID=America/New_York:20190116T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Sami Davies URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-sami-davies / X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Sami
Davies

\nAffiliation: University of Washington

Title: A Tale
of Santa Claus\, Hypergraphs\, and Matroids

\nAbstract:

\nA wel
l-known problem in scheduling and approximation algorithms is the Santa Cl
aus problem. Suppose that Santa Claus has a set of gifts\, and he wants to
distribute them among a set of children so that the least happy child is
made as happy as possible. Here\, the value that a child i has for a prese
nt j is of the form p_{ij} \\in \\{0\,p_j\\}. A polynomial time algorithm
by Annamalai et al. gives a 12.33-approximation algorithm and is based on
a modification of Haxell’s hypergraph matching argument.

In this p aper\, we introduce a matroid version of the Santa Claus problem. Our algo rithm is also based on Haxell’s augmentation tree\, but with the introduct ion of the matroid structure we solve a more general problem with cleaner methods. Our result can then be used as a blackbox to obtain a (4 +\\varep silon)-approximation for Santa Claus. This factor also compares against a natural\, compact LP for Santa Claus.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-275@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Jalaj Upadhyay\nAffiliation: Johns Hopkins Universit\n Title: Towards Robust and Scalable Private Data Analysis\nAbstract:\nIn th e current age of big data\, we are constantly creating new data which is a nalyzed by various platforms to improve service and user’s experience. Gi ven the sensitive and confidential nature of these data\, there are obviou s security and privacy concerns while storing and analyzing such data. In this talk\, I will discuss the fundamental challenges in providing robust security and privacy guarantee while storing and analyzing large data. I w ill also give a brief overview of my contributions and future plans toward s addressing these challenges. \nTo give a glimpse of these challenges in providing a robust privacy guarantee known as differential privacy\, I wi ll use spectral sparsification of graphs as an example. Given the ubiquito us nature of graphs\, differentially private analysis on graphs has gained a lot of interest. However\, existing algorithms for these analyses are t ailored made for the task at hand making them infeasible in practice. In t his talk\, I will present a novel differentially private algorithm that ou tputs a spectral sparsification of the input graph. At the core of this al gorithm is a method to privately estimate the importance of an edge in the graph. Prior to this work\, there was no known privacy preserving method that provides such an estimate or spectral sparsification of graphs. \nSin ce many graph properties are defined by the spectrum of the graph\, this w ork has many analytical as well as learning theoretic applications. To dem onstrate some applications\, I will show more efficient and accurate analy sis of various combinatorial problems on graphs and the first technique to perform privacy preserving manifold learning on graphs. DTSTART;TZID=America/New_York:20190206T120000 DTEND;TZID=America/New_York:20190206T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Jalaj Upadhyay URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-jalaj-upadh yay-3/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Jala
j Upadhyay

\nAffiliation: Johns Hopkins Universit

Title: Towa rds Robust and Scalable Private Data Analysis

\nAbstract:

\nIn
the current age of big data\, we are constantly creating new data which is
analyzed by various platforms to improve service and user’s experience.
Given the sensitive and confidential nature of these data\, there are obvi
ous security and privacy concerns while storing and analyzing such data. I
n this talk\, I will discuss the fundamental challenges in providing robus
t security and privacy guarantee while storing and analyzing large data. I
will also give a brief overview of my contributions and future plans towa
rds addressing these challenges.

To give a glimpse of these chal lenges in providing a robust privacy guarantee known as differential priva cy\, I will use spectral sparsification of graphs as an example. Given the ubiquitous nature of graphs\, differentially private analysis on graphs h as gained a lot of interest. However\, existing algorithms for these analy ses are tailored made for the task at hand making them infeasible in pract ice. In this talk\, I will present a novel differentially private algorith m that outputs a spectral sparsification of the input graph. At the core o f this algorithm is a method to privately estimate the importance of an ed ge in the graph. Prior to this work\, there was no known privacy preservin g method that provides such an estimate or spectral sparsification of grap hs.

\nSince many graph properties are defined by the spectrum of th e graph\, this work has many analytical as well as learning theoretic appl ications. To demonstrate some applications\, I will show more efficient an d accurate analysis of various combinatorial problems on graphs and the fi rst technique to perform privacy preserving manifold learning on graphs.\n END:VEVENT BEGIN:VEVENT UID:ai1ec-243@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Martin Farach-Colton\nAffiliation: Rutgers University \nTitle: TBA\nAbstract: TBA DTSTART;TZID=America/New_York:20190213T120000 DTEND;TZID=America/New_York:20190213T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Martin Farach-Colton URL:https://www.cs.jhu.edu/~mdinitz/theory/event/martin-farach-colton/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n

Speaker: Mart
in Farach-Colton

\nAffiliation: Rutgers University

Title: TBA

\nAbstract: TBA

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-272@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Xue Chen\nAffiliation: Northwestern University\nTitle: Active Regression via Linear-Sample Sparsification\nAbstract:\nWe present an approach that improves the sample complexity for a variety of curve fi tting problems\, including active learning for linear regression\, polynom ial regression\, and continuous sparse Fourier transforms. In the active l inear regression problem\, one would like to estimate the least squares so lution \\beta^* minimizing ||X \\beta – y||_2 given the entire unlabeled d ataset X \\in \\R^{n \\times d} but only observing a small number of label s y_i. We show that O(d/\\eps) labels suffice to find an \\eps-approximati on \\wt{\\beta} to \\beta^*:\n\nE[||X \\wt{\\beta} – X\\beta^*||_2^2] \\le q \\eps ||X \\beta^* – y||_2^2.\nThis improves on the best previous result of O(d \\log d + d/\\eps) from leverage score sampling. We also present r esults for the inductive setting\, showing when \\wt{\\beta} will generali ze to fresh samples\; these apply to continuous settings such as polynomia l regression. Finally\, we show how the techniques yield improved results for the non-linear sparse Fourier transform setting.\n \nBio: Xue Chen is broadly interested in randomized algorithms and the use of randomness in c omputation. Specific areas include Fourier transform\, learning theory and optimization\, and pseudorandomness. He obtained his Ph.D. at the Univers ity of Texas at Austin\, under the supervision of David Zuckerman. Current ly\, he is a postdoctoral fellow in Northwestern University. DTSTART;TZID=America/New_York:20190227T120000 DTEND;TZID=America/New_York:20190227T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Xue Chen URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-xue-chen/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Xue
Chen

\nAffiliation: Northwestern University

Title: Active Reg ression via Linear-Sample Sparsification

\nAbstract:

\nWe pr
esent an approach that improves the sample complexity for a variety of cur
ve fitting problems\, including active learning for linear regression\, po
lynomial regression\, and continuous sparse Fourier transforms. In the act
ive linear regression problem\, one would like to estimate the least squar
es solution \\beta^* minimizing ||X \\beta – y||_2 given the entire unlabe
led dataset X \\in \\R^{n \\times d} but only observing a small number of
labels y_i. We show that O(d/\\eps) labels suffice to find an \\eps-approx
imation \\wt{\\beta} to \\beta^*:

\n\nE[||X \\wt{\\beta } – X\\beta^*||_2^2] \\leq \\eps ||X \\beta^* – y||_2^2.

\nThis impr
oves on the best previous result of O(d \\log d + d/\\eps) from leverage s
core sampling. We also present results for the *inductive* setting\,
showing when \\wt{\\beta} will generalize to fresh samples\; these apply t
o continuous settings such as polynomial regression. Finally\, we show how
the techniques yield improved results for the non-linear sparse Fourier t
ransform setting.

\n

Bio: Xue Chen is broadly interested in randomized al gorithms and the use of randomness in computation. Specific areas include Fourier transform\, learning theory and optimization\, and pseudorandomnes s. He obtained his Ph.D. at the University of Texas at Austin\, under the supervision of David Zuckerman. Currently\, he is a postdoctoral fellow in Northwestern University.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-282@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Rediet Abebe\nAffiliation: Cornell University\nTitle: Using Search Queries to Understand Health Information Needs in Africa\nAbs tract:\nAccess to healthcare and health information is of major global\nco ncern. The stark inequality in the availability of health data by\ncountry \, demographic groups\, and socioeconomic status impedes the\nidentificati on of major public health concerns and implementation of\neffective interv entions. This data gap ranges from basic disease\nstatistics\, such as dis ease prevalence rates\, to more nuanced\ninformation\, such as public atti tudes. A key challenge is\nunderstanding health information needs of under -served and\nmarginalized communities. Without understanding people’s ever yday\nneeds\, concerns\, and misconceptions\, health organizations lack th e\nability to effectively target education and programming efforts.\nIn th is presentation\, we focus on the lack of comprehensive\,\nhigh-quality da ta about information needs of individuals in developing\nnations. We propo se an approach that uses search data to uncover\nhealth information needs of individuals in all 54 nations in Africa.\nWe analyze Bing searches rela ted to HIV/AIDS\, malaria\, and\ntuberculosis\; these searches reveal dive rse health information needs\nthat vary by demographic groups and geograph ic regions. We also shed\nlight on discrepancies in the quality of content returned by search\nengines.\nWe conclude with a discussion on computatio nally-informed\ninterventions both on- and off-line in health and related domains and\nthe Mechanism Design for Social Good research initiative.\nBi o:\nRediet Abebe is a computer scientist with a strong interest in the\npr omotion of equality and justice. Her research focuses on algorithms\,\nAI\ , and their applications to social good. As part of this research\nagenda\ , she co-founded and co-organizes Mechanism Design for Social\nGood (MD4SG )\, an interdisciplinary\, multi-institutional research\ninitiative with o ver 300 individuals. She is also a co-founder and\nco-organizer of Black i n AI\, an international network of over 1000\nindividuals focused on incre asing the presence and inclusion of Black\nand African researchers in AI. Her research is deeply influenced by\nher upbringing in her hometown of Ad dis Ababa\, Ethiopia\, where she\nlived until moving to the U.S. in 2009. Her work has been generously\nsupported by fellowships and scholarships th rough Facebook\, Google\,\nthe Cornell Graduate School\, and the Harvard-C ambridge Fellowship. DTSTART;TZID=America/New_York:20190304T120000 DTEND;TZID=America/New_York:20190304T130000 SEQUENCE:0 SUMMARY:Rediet Abebe URL:https://www.cs.jhu.edu/~mdinitz/theory/event/rediet-abebe/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Redi
et Abebe

\nAffiliation: Cornell University

Title: Using Searc h Queries to Understand Health Information Needs in Africa

\nAbstrac
t:

\nAccess to healthcare and health information is of major global**\nconcern. The stark inequality in the availability of health data by<
br />\ncountry\, demographic groups\, and socioeconomic status impedes the
\nidentification of major public health concerns and implementation
of\neffective interventions. This data gap ranges from basic disease
\nstatistics\, such as disease prevalence rates\, to more nuanced\ninformation\, such as public attitudes. A key challenge is\nund
erstanding health information needs of under-served and\nmarginalize
d communities. Without understanding people’s everyday\nneeds\, conc
erns\, and misconceptions\, health organizations lack the\nability t
o effectively target education and programming efforts.**

In this pr
esentation\, we focus on the lack of comprehensive\,

\nhigh-quality d
ata about information needs of individuals in developing

\nnations. W
e propose an approach that uses search data to uncover

\nhealth infor
mation needs of individuals in all 54 nations in Africa.

\nWe analyze
Bing searches related to HIV/AIDS\, malaria\, and

\ntuberculosis\; t
hese searches reveal diverse health information needs

\nthat vary by
demographic groups and geographic regions. We also shed

\nlight on di
screpancies in the quality of content returned by search

\nengines.\n

We conclude with a discussion on computationally-informed

\nin
terventions both on- and off-line in health and related domains and

\nthe Mechanism Design for Social Good research initiative.

Bio:**\nRediet Abebe is a computer scientist with a strong interest in the \npromotion of equality and justice. Her research focuses on algorithm
s\,**

\nAI\, and their applications to social good. As part of this res earch

\nagenda\, she co-founded and co-organizes Mechanism Design for Social

\nGood (MD4SG)\, an interdisciplinary\, multi-institutional r esearch

\ninitiative with over 300 individuals. She is also a co-foun der and

\nco-organizer of Black in AI\, an international network of o ver 1000

\nindividuals focused on increasing the presence and inclusi on of Black

\nand African researchers in AI. Her research is deeply i nfluenced by

\nher upbringing in her hometown of Addis Ababa\, Ethiop ia\, where she

\nlived until moving to the U.S. in 2009. Her work has been generously

\nsupported by fellowships and scholarships through Facebook\, Google\,

\nthe Cornell Graduate School\, and the Harvard-C ambridge Fellowship.

Speaker: Grig
ory Yaroslavtsev

\nAffiliation: Indiana University\, Bloomington

Title: Advances in Hierarchical Clustering for Vector Data

\nAbs
tract:

\nCompared to the highly successful flat clustering (e.g. k-me
ans)\, despite its important role and applications in data analysis\, hier
archical clustering has been lacking in rigorous algorithmic studies until
late due to absence of rigorous objectives. Since 2016\, a sequence of wo
rks has emerged and gave novel algorithms for this problem in the general
metric setting. This was enabled by a breakthrough by Dasgupta\, who intro
duced a formal objective into the study of hierarchical clustering.

Based on joint works with Vadapalli (ICML’18) and Charikar\, Chatziafratis and Niazadeh (AISTATS’19)

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-286@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Arka Rai Choudhury\nAffiliation: Johns Hopkins Univers ity\nTitle: Finding a Nash Equilibrium is No Easier than Breaking Fiat-Sha mir\nAbstract:\nThe Fiat-Shamir heuristic transforms a public-coin interac tive proof into a non-interactive argument\, by replacing the verifier wit h a cryptographic hash function that is applied to the protocol’s transcri pt. Constructing hash functions for which this transformation is sound is a central and long-standing open question in cryptography.\nWe show that s olving the END-OF-METERED-LINE problem is no easier than breaking the soun dness of the Fiat-Shamir transformation when applied to the sumcheck proto col. In particular\, if the transformed protocol is sound\, then any hard problem in #P gives rise to a hard distribution in the class CLS\, which i s contained in PPAD. Our result opens up the possibility of sampling moder ately-sized games for which it is hard to find a Nash equilibrium\, by red ucing the inversion of appropriately chosen one-way functions to #SAT.\nOu r main technical contribution is a stateful incrementally verifiable proce dure that\, given a SAT instance over n variables\, counts the number of s atisfying assignments. This is accomplished via an exponential sequence of small steps\, each computable in time poly(n). Incremental verifiability means that each intermediate state includes a sumcheck-based proof of its correctness\, and the proof can be updated and verified in time poly(n).\n Joint work with Pavel Hubacek\, Chethan Kamath\, Krzysztof Pietrzak\, Alon Rosen\, and Guy Rothblum. DTSTART;TZID=America/New_York:20190327T120000 DTEND;TZID=America/New_York:20190327T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Arka Rai Choudhury URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-arka-rai-ch oudhury/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Arka
Rai Choudhury

\nAffiliation: Johns Hopkins University

Title: Finding a Nash Equilibrium is No Easier than Breaking Fiat-Shamir

\nAbstract:

\nThe Fiat-Shamir heuristic transforms a public-coin inter
active proof into a non-interactive argument\, by replacing the verifier w
ith a cryptographic hash function that is applied to the protocol’s transc
ript. Constructing hash functions for which this transformation is sound i
s a central and long-standing open question in cryptography.

We sh ow that solving the END-OF-METERED-LINE problem is no easier than breaking the soundness of the Fiat-Shamir transformation when applied to the sumch eck protocol. In particular\, if the transformed protocol is sound\, then any hard problem in #P gives rise to a hard distribution in the class CLS\ , which is contained in PPAD. Our result opens up the possibility of sampl ing moderately-sized games for which it is hard to find a Nash equilibrium \, by reducing the inversion of appropriately chosen one-way functions to #SAT.

\nOur main technical contribution is a stateful incrementally verifiable procedure that\, given a SAT instance over n variables\, counts the number of satisfying assignments. This is accomplished via an exponen tial sequence of small steps\, each computable in time poly(n). Incrementa l verifiability means that each intermediate state includes a sumcheck-bas ed proof of its correctness\, and the proof can be updated and verified in time poly(n).

\nJoint work with Pavel Hubacek\, Chethan Kamath\, Kr zysztof Pietrzak\, Alon Rosen\, and Guy Rothblum.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-289@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Amitabh Basu\nAffiliation: JHU\nTitle: Admissibility o f solution estimators for stochastic optimization\nAbstract:\nWe look at s tochastic optimization problems through the lens of statistical decision t heory. In particular\, we address admissibility\, in the statistical decis ion theory sense\, of the natural sample average estimator for a stochasti c optimization problem (which is also known as the empirical risk minimiza tion (ERM) rule in learning literature). It is well known that for general stochastic optimization problems\, the sample average estimator may not b e admissible. This is known as Stein’s paradox in the statistics literatur e. We show in this paper that for optimizing stochastic linear functions o ver compact sets\, the sample average estimator is admissible. DTSTART;TZID=America/New_York:20190410T120000 DTEND;TZID=America/New_York:20190410T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Amitabh Basu URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-amitabh-bas u/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Amit
abh Basu

\nAffiliation: JHU

Title: Admissibility of solution estimators for stochastic optimization

\nAbstract:

\nWe look at
stochastic optimization problems through the lens of statistical decision
theory. In particular\, we address admissibility\, in the statistical dec
ision theory sense\, of the natural sample average estimator for a stochas
tic optimization problem (which is also known as the empirical risk minimi
zation (ERM) rule in learning literature). It is well known that for gener
al stochastic optimization problems\, the sample average estimator may not
be admissible. This is known as Stein’s paradox in the statistics literat
ure. We show in this paper that for optimizing stochastic linear functions
over compact sets\, the sample average estimator is admissible.

Speaker: Dani
el Reichman

\nAffiliation: Princeton University

Title: Contag ious sets in bootstrap percolation

\nAbstract:

\nConsider the
following activation process in undirected graphs: a vertex is active eith
er if it belongs to a set of initially activated vertices or if at some po
int it has at least r active neighbors. This process (commonly referred to
as bootstrap percolation) has been studied in several fields including co
mbinatorics\, computer science\, statistical physics and viral marketing.
A contagious set is a set whose activation results with the entire graph b
eing active. Given a graph G\, let m(G\,r) be the minimal size of a contag
ious set.

I will survey upper and lower bounds for m(G\,r) in gene ral graphs and for graphs with special properties (random and pseudo-rando m graphs\, graphs without short cycles) and discuss many unresolved questi ons.

\nBased on joint work with Amin Coja-Oghlan\, Uriel Feige and Michael Krivelevich.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-292@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Xuan Wu\nAffiliation: Johns Hopkins\nTitle: Coreset fo r Ordered Weighted Clustering\nAbstract: \nOrdered k-Median is a generali zation of classical clustering problems such as k-Median and k-Center\, th at offers a more flexible data analysis\, like easily combining multiple o bjectives (e.g.\, to increase fairness or for Pareto optimization). Its ob jective function is defined via the Ordered Weighted Averaging (OWA) parad igm of Yager (1988)\, where data points are weighted according to a predef ined weight vector\, but in order of their contribution to the objective ( distance from the centers). \nCoreset is a powerful data-reduction techni que which summarizes the data set into a small (weighted) point set while approximately preserving the objective value of the data set to all center s. When there are multiple objectives (weights)\, the above standard cores et might have limited usefulness\, whereas in a \\emph{simultaneous} cores et\, which was introduced recently by Bachem and Lucic and Lattanzi (2018) \, the above approximation holds for all weights (in addition to all cente rs). Our main result is the first construction of simultaneous coreset for the Ordered k-Median problem of small size. \nIn this talk\, I will int roduce the basics of coreset construction for the clustering problem and t he main ideas of our new results. Finally\, we discuss some remaining open problems. \nThis talk is based on joint work with Vladimir Braverman\, Sh aofeng Jiang\, and Robert Krauthgamer. DTSTART;TZID=America/New_York:20190501T120000 DTEND;TZID=America/New_York:20190501T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Xuan Wu URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-xuan-wu/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Xuan
Wu

\nAffiliation: Johns Hopkins

Title: Coreset for Ordered W eighted Clustering

\nAbstract:

\nOrdered k-Median is a gener alization of classical clustering problems such as k-Median and k-Center\, that offers a more flexible data analysis\, like easily combining multipl e objectives (e.g.\, to increase fairness or for Pareto optimization). Its objective function is defined via the Ordered Weighted Averaging (OWA) pa radigm of Yager (1988)\, where data points are weighted according to a pre defined weight vector\, but in order of their contribution to the objectiv e (distance from the centers).

\nCoreset is a powerful data-reduct ion technique which summarizes the data set into a small (weighted) point set while approximately preserving the objective value of the data set to all centers. When there are multiple objectives (weights)\, the above stan dard coreset might have limited usefulness\, whereas in a \\emph{simultane ous} coreset\, which was introduced recently by Bachem and Lucic and Latta nzi (2018)\, the above approximation holds for all weights (in addition to all centers). Our main result is the first construction of simultaneous c oreset for the Ordered k-Median problem of small size.

\nIn this talk\, I will introduce the basics of coreset construction for the cluster ing problem and the main ideas of our new results. Finally\, we discuss so me remaining open problems.

\nThis talk is based on joint work with Vladimir Braverman\, Shaofeng Jiang\, and Robert Krauthgamer.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-303@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Christopher Musco\nAffiliation: NYU\nTitle: Structured Covariance Estimation\nAbstract:\nGiven access to samples from a distribu tion D over d-dimensional vectors\, how many samples are necessary to lear n the distribution’s covariance matrix\, T? Moreover\, how can we leverage a priori knowledge about T’s structure to reduce this sample complexity? \nI will discuss this fundamental statistical problem in the setting where T is known to have Toeplitz structure. Toeplitz covariance matrices arise in countless signal processing applications\, from wireless communication s\, to medical imaging\, to time series analysis. In many of these applica tions\, we are interested in learning algorithms that only view a subset of entries in each d-dimensional vector sample from D. We care about minim izing two notions of sample complexity 1) the total number of vector sampl es taken and 2) the number of entries accessed in each vector sample. The later goal typically equates to minimizing equipment or hardware requireme nts.\nI will present several new non-asymptotic bounds on these sample com plexity measures. We will start by taking a fresh look at classical and wi dely used algorithms\, including methods based on selecting entries from e ach sample according to a “sparse ruler”. Then\, I will introduce a novel sampling and estimation strategy that improves on existing methods in many settings. Our new approach for learning Toeplitz structured covariance ut ilizes tools from random matrix sketching\, leverage score sampling for co ntinuous signals\, and sparse Fourier transform algorithms. It fits into a broader line of work which seeks to address fundamental problems in signa l processing using tools from theoretical computer science and randomized numerical linear algebra.\nBio:\nChristopher Musco is an Assistant Profess or in the Computer Science and Engineering department at NYU’s Tandon Scho ol of Engineering. His research focuses on the algorithmic foundations of data science and machine learning. Christopher received his Ph.D. in Compu ter Science from the Massachusetts Institute of Technology and B.S. degree s in Applied Mathematics and Computer Science from Yale University. DTSTART;TZID=America/New_York:20191011T123000 DTEND;TZID=America/New_York:20191011T133000 SEQUENCE:0 SUMMARY:[Theory Seminar] Christopher Musco URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-christopher -musco/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Chri
stopher Musco

\nAffiliation: NYU

Title: Structured Covariance Estimation

\nAbstract:

\nGiven access to samples from a distri
bution D over d-dimensional vectors\, how many samples are necessary to le
arn the distribution’s covariance matrix\, T? Moreover\, how can we levera
ge a priori knowledge about T’s structure to reduce this sample complexity
?

I will discuss this fundamental statistical problem in the setti ng where T is known to have Toeplitz structure. Toeplitz covariance matric es arise in countless signal processing applications\, from wireless commu nications\, to medical imaging\, to time series analysis. In many of these applications\, we are interested in learning algorithms that only view a subset of entries in each d-dimensional vector sample from D. We care abo ut minimizing two notions of sample complexity 1) the total number of vect or samples taken and 2) the number of entries accessed in each vector samp le. The later goal typically equates to minimizing equipment or hardware r equirements.

\nI will present several new non-asymptotic bounds on t hese sample complexity measures. We will start by taking a fresh look at c lassical and widely used algorithms\, including methods based on selecting entries from each sample according to a “sparse ruler”. Then\, I will int roduce a novel sampling and estimation strategy that improves on existing methods in many settings. Our new approach for learning Toeplitz structure d covariance utilizes tools from random matrix sketching\, leverage score sampling for continuous signals\, and sparse Fourier transform algorithms. It fits into a broader line of work which seeks to address fundamental pr oblems in signal processing using tools from theoretical computer science and randomized numerical linear algebra.

\nBio:

\nChristopher M
usco is an Assistant Professor in the Computer Science and Engineering dep
artment at NYU’s Tandon School of Engineering. His research focuses on the
algorithmic foundations of data science and machine learning. Christopher
received his Ph.D. in Computer Science from the Massachusetts Institute o
f Technology and B.S. degrees in Applied Mathematics and Computer Science
from Yale University.

Speaker: Guy
Kortsarz

\nAffiliation: Rutgers Universty – Camden

Title: A s
urvey on the Directed Steiner tree problem

\nAbstract:

\nThe dir
ected Steiner problem is one of the most important problems in optimizatio
n\, and in particular is more general than Group Steiner and other problem
s.

I will discuss the (by now classic) 1/\\epsilon^3 n^epsilon app roximation for the problem by Charikar et al (the algorithm was invented b y Kortsarz and Peleg and is called recursive greedy. A technique who peopl e in approximation should know). The running time is more than n^{1/epsil on}. One of the most important open questions in Approximation Algorithms is if there is a polynomial time polylog ratio for this problem. This is open from 1997.

\nI will discuss the Group Steiner problem ( a spec ial case of the Directed Steiner problem) and the Directed Steiner Forest (a generalization of the Directed Steiner problem) and many more related p roblems.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-294@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Jiapeng Zhang\nAffiliation: Harvard University\nTitle: An improved sunflower bound\nAbstract:\n\n\nA sunflower with $r$ petals is a collection of $r$ sets so that the intersection of each pair is equal t o the intersection of all. Erdos and Rado in 1960 proved the sunflower lem ma: for any fixed $r$\, any family of sets of size $w$\, with at least abo ut $w^w$ sets\, must contain a sunflower. The famous sunflower conjecture is that the bound on the number of sets can be improved to $c^w$ for some constant $c$. Despite much research\, the best bounds until recently were all of the order of $w^{cw}$ for some constant c. In this work\, we improv e the bounds to about $(log w)^w$.\n\n\nJoint work with Ryan Alweiss\, Sha char Lovett and Kewen Wu. DTSTART;TZID=America/New_York:20191106T120000 DTEND;TZID=America/New_York:20191106T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Jiapeng Zhang URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-jiapeng-zha ng/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Jiap
eng Zhang

\nAffiliation: Harvard University

Title:An improved sunflower bound

\nAbstract:

\n\n

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-310@www.cs.jhu.edu/~mdinitz/theory
DTSTAMP:20211018T045941Z
CATEGORIES:
CONTACT:
DESCRIPTION:Speaker: Robert Krauthgamer\nAffiliation: Weizmann Institute of
Science\nTitle: On Solving Linear Systems in Sublinear Time\nAbstract:\nI
will discuss sublinear algorithms that solve linear systems locally. In\n
the classical version of this problem\, the input is a matrix S and a vect
or\nb in the range of S\, and the goal is to output a vector x satisfying
Sx=b.\nWe focus on computing (approximating) one coordinate of x\, which p
otentially\nallows for sublinear algorithms. Our results show that there i
s a\nqualitative gap between symmetric diagonally dominant (SDD) and the m
ore\ngeneral class of positive semidefinite (PSD) matrices. For SDD matric
es\, we\ndevelop an algorithm that runs in polylogarithmic time\, provided
that S is\nsparse and has a small condition number (e.g.\, Laplacian of a
n expander\ngraph). In contrast\, for certain PSD matrices with analgous a
ssumptions\, the\nrunning time must be at least polynomial.\nJoint work wi
th Alexandr Andoni and Yosef Pogrow.
DTSTART;TZID=America/New_York:20191108T120000
DTEND;TZID=America/New_York:20191108T130000
SEQUENCE:0
SUMMARY:[Theory Seminar] Robert Krauthgamer
URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-robert-krau
thgamer/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\n\n

\n\nA sunflower with $r$ petals is a collection of $r$ sets so that
the intersection of each pair is equal to the intersection of all. Erdos a
nd Rado in 1960 proved the sunflower lemma:
for any fixed $r$\, any family of sets of size $w$\, with at least about $
w^w$ sets\, must contain a sunflower. The fa
mous sunflower conjecture is that the bound
on the number of sets can be improved to $c^w$ for some constant $c$. Desp
ite much research\, the best bounds until recently were all of the order o
f $w^{cw}$ for some constant c. In this work\, we improve the bounds to ab
out $(log w)^w$.

\nJoint work with Ryan Alweiss\, Sh
achar Lovett and Kewen Wu.

\nSpeaker: Robe
rt Krauthgamer

\nAffiliation: Weizmann Institute of Science

T itle: On Solving Linear Systems in Sublinear Time

\nAbstract:

\nI will discuss sublinear algorithms that solve linear systems locally. I
n

\nthe classical version of this problem\, the input is a matrix S a
nd a vector

\nb in the range of S\, and the goal is to output a vecto
r x satisfying Sx=b.

We focus on computing (approximating) one coo
rdinate of x\, which potentially

\nallows for sublinear algorithms. O
ur results show that there is a

\nqualitative gap between symmetric d
iagonally dominant (SDD) and the more

\ngeneral class of positive sem
idefinite (PSD) matrices. For SDD matrices\, we

\ndevelop an algorith
m that runs in polylogarithmic time\, provided that S is

\nsparse and
has a small condition number (e.g.\, Laplacian of an expander

\ngrap
h). In contrast\, for certain PSD matrices with analgous assumptions\, the

\nrunning time must be at least polynomial.

Joint work with Alexandr Andoni and Yosef Pogrow.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-314@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Yasamin Nazari\nAffiliation: Johns Hopkins University \nTitle: Sparse Hopsets in Congested Clique\nAbstract:\nWe give the first Congested Clique algorithm that computes a sparse hopset with polylogarith mic hopbound in polylogarithmic time. Given a graph G=(V\,E)\, a (β\,ϵ)-ho pset H with “hopbound” β\, is a set of edges added to G such that for any pair of nodes u and v in G there is a path with at most β hops in G∪H with length within (1+ϵ) of the shortest path between u and v in G.\n\nOur hop sets are significantly sparser than the recent construction of Censor-Hill el et al. [6]\, that constructs a hopset of size Õ (n^{3/2})\, but with a smaller polylogarithmic hopbound. On the other hand\, the previously know n constructions of sparse hopsets with polylogarithmic hopbound in the Con gested Clique model\, proposed by Elkin and Neiman [10]\,[11]\,[12]\, all require polynomial rounds.\n\nOne tool that we use is an efficient algorit hm that constructs an ℓ-limited neighborhood cover\, that may be of indepe ndent interest.\n\nFinally\, as a side result\, we also give a hopset cons truction in a variant of the low-memory Massively Parallel Computation mod el\, with improved running time over existing algorithms. DTSTART;TZID=America/New_York:20191204T120000 DTEND;TZID=America/New_York:20191204T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Yasamin Nazari URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-yasamin-naz ari-2/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Yasa
min Nazari

\nAffiliation: Johns Hopkins University

Title: Spa rse Hopsets in Congested Clique

\nAb stract:

\nWe give the first Congested Clique algorithm that
computes a sparse hopset with polylogarithmic hopbound in polylogarithmic
time. Given a graph G=(V\,E)\, a (β\,ϵ)<
/span>-hopset H with “hopbound” β\, is a set of edges added to G such that for any pair of nodes u and v in G there is a path with at most
β hops in ~~G∪H<
/span> with length within (1+ϵ) of the shortest path
between <
span id='MathJax-Span-50' class='math'>u~~~~ a
nd
v in ~~G ~~.~~

\nOur hopsets are significantly sparser than the recent construction of C
ensor-Hillel et al. [6]\, that constructs a hopset of size Õ (n^{3/2}
)\, but with a smaller polylogarithmic hopboun
d. On the other hand\, the previously known constructions of sparse hopset
s with polylogarithmic hopbound in the Congested Clique model\, proposed b
y Elkin and Neiman [10]\,[11]\,[12]\, all require polynomial rounds.

\n\nOne tool that we use is an efficient algorithm that constructs an
~~ℓ~~~~-limited
neighborhood cover\, that may be of independent interest.~~

\nFi
nally\, as a side result\, we also give a hopset construction in a variant
of the low-memory Massively Parallel Computation model\, with improved ru
nning time over existing algorithms.

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-315@www.cs.jhu.edu/~mdinitz/theory
DTSTAMP:20211018T045941Z
CATEGORIES:
CONTACT:
DESCRIPTION:Speaker: Richard Shea\nAffiliation: Applied and Computational M
ath program\, Johns Hopkins University\nTitle: Progress towards building a
Dynamic Hawkes Graph\nAbstract:\nThis talk will introduce the Dynamic Haw
kes Graph. It builds from developments in multivariate Hawkes Processes\,
locally stable Hawkes distributions\, and representations of the Hawkes p
rocess as an Integer Valued autoregressive (INAR) fit. The background to
enable understanding of the Dynamic Hawkes Graph will be the bulk of the t
alk. Richard is presenting this as part of his Master’s thesis.
DTSTART;TZID=America/New_York:20191211T120000
DTEND;TZID=America/New_York:20191211T130000
SEQUENCE:0
SUMMARY:[Theory Seminar] Richard Shea
URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-richard-she
a/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Rich ard Shea

\nAffiliation: Applied and Computational Math program\, Joh ns Hopkins University

\nTitle: Progress towards building a Dynamic H awkes Graph

\nAbstract:

\nThis talk will introduce the Dynamic Hawkes Graph. It builds from de
velopments in multivariate Hawkes Processes\, locally stable Hawkes distri
butions\, and representations of the Hawkes process as an Integer Valued a
utoregressive (INAR) fit. The background to enable understanding of the D
ynamic Hawkes Graph will be the bulk of the talk. Richard is presenting th
is as part of his Master’s thesis.

\n
END:VEVENT
BEGIN:VEVENT
UID:ai1ec-317@www.cs.jhu.edu/~mdinitz/theory
DTSTAMP:20211018T045941Z
CATEGORIES:
CONTACT:
DESCRIPTION:Speaker: Aditya Krishnan\nAffiliation: Johns Hopkins University
\nTitle: Schatten Norms in Matrix Streams: The Role of Sparsity\nAbstract:
\nSpectral functions of large matrices contain important structural inform
ation about the underlying data\, and are thus becoming increasingly impor
tant to efficiently compute. Many times\, large matrices representing real
-world data are sparse or doubly sparse (i.e.\, sparse in both rows and co
lumns)\, and are accessed as a stream of updates\, typically organized in
row-order. In this setting\, where space (memory) is the limiting resource
\, all known algorithms require space that is polynomial in the dimension
of the matrix\, even for sparse matrices. We address this challenge by pro
viding the first algorithms whose space requirement is independent of the
matrix dimension\, assuming the matrix is doubly-sparse and presented in r
ow-order.\nIn addition\, we prove that multiple passes are unavoidable in
this setting and show extensions of our primary technique\, including a tr
ade-off between space requirements and number of passes. Our algorithms ap
proximate the Schatten p-norms\, which we use in turn to approximate other
spectral functions\, such as logarithm of the determinant\, trace of matr
ix inverse\, and Estrada index.\nJoint work with Vladimir Braverman\, Robe
rt Krauthgamer and Roi Sinoff.
DTSTART;TZID=America/New_York:20200311T120000
DTEND;TZID=America/New_York:20200311T130000
SEQUENCE:0
SUMMARY:[Theory Seminar] Aditya Krishnan
URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-aditya-kris
hnan/
X-COST-TYPE:free
X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Adit
ya Krishnan

\nAffiliation: Johns Hopkins University

Title: Sc hatten Norms in Matrix Streams: The Role of Sparsity

\nAbstract:

\nSpectral functions of large matrices contain important structural info
rmation about the underlying data\, and are thus becoming increasingly imp
ortant to efficiently compute. Many times\, large matrices representing re
al-world data are *sparse* or *doubly sparse* (i.e.\, sparse
in both rows and columns)\, and are accessed as a *stream* of upda
tes\, typically organized in *row-order*. In this setting\, where s
pace (memory) is the limiting resource\, all known algorithms require spac
e that is polynomial in the dimension of the matrix\, even for sparse matr
ices. We address this challenge by providing the first algorithms whose sp
ace requirement is *independent of the matrix dimension*\, assuming
the matrix is doubly-sparse and presented in row-order.

In additi on\, we prove that multiple passes are unavoidable in this setting and sho w extensions of our primary technique\, including a trade-off between spac e requirements and number of passes. Our algorithms approximate the Schatt en p-norms\, which we use in turn to approximate other spectral functions\ , such as logarithm of the determinant\, trace of matrix inverse\, and Est rada index.

\nJoint work with Vladimir Braverman\, Robert Krauthgame r and Roi Sinoff.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-305@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Arnold Filtser\nAffiliation: Columbia University\nTitl e: TBD\nAbstract: TBD DTSTART;TZID=America/New_York:20200325T120000 DTEND;TZID=America/New_York:20200325T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Arnold Filtser URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-arnold-filt ser/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Arno
ld Filtser

\nAffiliation: Columbia University

Title: TBD

\nAbstract: TBD

Speaker: Jasp
er Lee

\nAffiliation: Brown University

Title: Real-Valued Sub -Gaussian Mean Estimation\, Optimal to a (1+o(1)) Multiplicative Factor

\nAbstract:

\nWe revisit one of the most fundamental problems in
statistics: given access to independent samples from a 1D random variable
(with finite but unknown mean and variance)\, what is the best way to esti
mate the mean\, in terms of error convergence with respect to sample size?
The conventional wisdom is to use the sample mean as our estimate. Howeve
r it is known that the sample mean has optimal convergence if and only if
the underlying distribution is (sub-)Gaussian. Moreover\, it can even be e
xponentially slower than optimal for certain heavy-tailed distributions. O
n the other hand\, the median-of-means estimator (invented and reinvented
in various literature) does have sub-Gaussian convergence for all finite-v
ariance distributions\, albeit in the big-O sense with a sub-optimal multi
plicative constant. The natural remaining question then\, is whether it is
possible to bridge the gap\, to have an estimator that has optimal sub-Ga
ussian concentration with an optimal constant\, for all finite-variance di
stributions.

In this talk\, we answer the question affirmatively b y giving an estimator that converges with the optimal constant inside the big-O\, up to a (1+o(1)) multiplicative factor. Our estimator is furthermo re computable in time linear in the sample size. The convergence analysis involves deriving tail bounds using tools from linear and convex programmi ng\, which may be of independent interest.

\nJoint work with Paul Va liant.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-324@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Edinah Gnang\nAffiliation: Johns Hopkins University\nT itle: On the Kotzig-Ringel-Rosa conjecture\nAbstract:\nIn this talk we des cribe and motivate the K.R.R. conjecture graph partitioning and describe a functional approach enabling us to tackle the K.R.R. conjecture via a new composition lemma. If time permits I will also discuss algorithmic aspec ts. DTSTART;TZID=America/New_York:20200923T120000 DTEND;TZID=America/New_York:20200923T130000 LOCATION:https://wse.zoom.us/j/91450299380 SEQUENCE:0 SUMMARY:[Theory Seminar] Edinah Gnang URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-edinah-gnan g/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Edin
ah Gnang

\nAffiliation: Johns Hopkins University

\nTitle: On the
Kotzig-Ringel-Rosa conjecture

Abstract:

\nIn this talk we de
scribe and motivate the K.R.R. conjecture graph partitioning and describe
a functional approach enabling us to tackle the K.R.R. conjecture via a ne
w composition lemma. If time permits I will also discuss algorithmic aspe
cts.

Speaker: Adit
ya Krishnan

\nAffiliation: Johns Hopkins University

Title: Sc hatten Norms in Matrix Streams: The Role of Sparsity.

\nAbstract: Spectral functions of large matrices contain important structural informa tion about the underlying data\, and are thus becoming increasingly import ant to efficiently compute. Many times\, large matrices representing real- world data are \\emph{sparse} or \\emph{doubly sparse} (i.e.\, sparse in b oth rows and columns)\, and are accessed as a \\emph{stream} of updates\, typically organized in \\emph{row-order}. In this setting\, where space (m emory) is the limiting resource\, all known algorithms require space that is polynomial in the dimension of the matrix\, even for sparse matrices. W e address this challenge by providing the first algorithms whose space req uirement is \\emph{independent of the matrix dimension}\, assuming the mat rix is doubly-sparse and presented in row-order.

\nIn addition\, we prove that multiple passes are unavoidable in this setting and show exten sions of our primary technique\, including a trade-off between space requi rements and number of passes. Our algorithms approximate the Schatten p-no rms\, which we use in turn to approximate other spectral functions\, such as logarithm of the determinant\, trace of matrix inverse\, and Estrada in dex.

\nJoint work with Vladimir Braverman\, Robert Krauthgamer and Roi Sinoff.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-331@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Brian Brubach\nAffiliation: Wellesley College\nTitle: Online matching under three layers of uncertainty\nAbstract:\nOnline match ing problems have become ubiquitous with the rise of the internet and e-co mmerce. From the humble beginnings of a single problem 30 years ago\, the theoretical study of online matching now encompasses dozens of variations inspired by diverse applications. We’ll take a tour through the landscape of online matching problems. As we go\, we’ll venture deeper and deeper in to the jungle of stochasticity and uncertainty. Finally\, we’ll cover some very recent work introducing new algorithms and models. Along the way\, I ’ll highlight techniques and tricks I’ve found useful in studying these pr oblems. DTSTART;TZID=America/New_York:20201021T120000 DTEND;TZID=America/New_York:20201021T130000 LOCATION:https://wse.zoom.us/j/91450299380 SEQUENCE:0 SUMMARY:[Theory Seminar] Brian Brubach URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-brian-bruba ch/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Bria
n Brubach

\nAffiliation: Wellesley College

Title: Online matc hing under three layers of uncertainty

\nAbstract:

\nOnline mat
ching problems have become ubiquitous with the rise of the internet and e-
commerce. From the humble beginnings of a single problem 30 years ago\, th
e theoretical study of online matching now encompasses dozens of variation
s inspired by diverse applications. We’ll take a tour through the landscap
e of online matching problems. As we go\, we’ll venture deeper and deeper
into the jungle of stochasticity and uncertainty. Finally\, we’ll cover so
me very recent work introducing new algorithms and models. Along the way\,
I’ll highlight techniques and tricks I’ve found useful in studying these
problems.

Speaker: Josh
ua Grochow

\nAffiliation: University of Colorado

Title: Isomo rphism of tensors\, algebras\, and polynomials\, oh my!

\nAbstract:
We consider the problems of testing isomorphism of tensors\, p-groups\, cu
bic polynomials\, quantum states\, algebras\, and more\, which arise from
a variety of areas\, including machine learning\, group theory\, and crypt
ography. Despite Graph Isomorphism now being in quasi-polynomial time\, an
d having long had efficient practical software\, for the problems we consi
der no algorithm is known that is asymptotically better than brute force\,
and state-of-the-art software cannot get beyond small instances. We appro
ach this situation in two ways:

\n – Complexity-theoretic: we show th
at all these problems are polynomial-time equivalent\, giving rise to a cl
ass of problems we call Tensor Isomorphism-complete (TI-complete). Perhaps
surprising here is that we show that isomorphism of d-tensors for any fix
ed d (at least 3) is equivalent to testing isomorphism of 3-tensors. These
equivalences let us focus on just one problem rather than dozens\, or a w
hole infinite hierarchy\, separately.

\n – Algorithmic: Adapting the
Weisfeiler-Leman method from Graph Isomorphism to tensors\, trying to unde
rstand tensor isomorphism by taking advantage of isomorphisms of small sub
-tensors. This leads to tensor analogues of the Graph Reconstruction conje
cture and related questions.

Based on joint works with Vyacheslav V. Futorny and Vladimir V. Sergeichuk (Lin. Alg. Appl.\, 2019\; arXiv:1810 .09219)\, with Peter A. Brooksbank\, Yinan Li\, Youming Qiao\, and James B . Wilson (arXiv:1905.02518)\, and with Youming Qiao (arXiv:1907.00309).

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-338@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Jingfeng Wu\nAffiliation: Johns Hopkins University\nTi tle: Direction Matters: On the Implicit Regularization Effect of Stochasti c Gradient Descent with Moderate Learning Rate\nAbstract:\nUnderstanding t he algorithmic regularization effect of stochastic gradient descent (SGD) is one of the key challenges in modern machine learning and deep learning theory. Most of the existing works\, however\, focus on very small or even infinitesimal learning rate regime\, and fail to cover practical scenario s where the learning rate is moderate and annealing. In this paper\, we ma ke an initial attempt to characterize the particular regularization effect of SGD in the moderate learning rate regime by studying its behavior for optimizing an overparameterized linear regression problem. In this case\, SGD and GD are known to converge to the unique minimum-norm solution\; how ever\, with the moderate and annealing learning rate\, we show that they e xhibit different directional bias: SGD converges along the large eigenvalu e directions of the data matrix\, while GD goes after the small eigenvalue directions. Furthermore\, we show that such directional bias does matter when early stopping is adopted\, where the SGD output is nearly optimal bu t the GD output is suboptimal. Finally\, our theory explains several folk arts in practice used for SGD hyperparameter tuning\, such as (1) linearly scaling the initial learning rate with batch size\; and (2) overrunning S GD with high learning rate even when the loss stops decreasing. DTSTART;TZID=America/New_York:20201104T120000 DTEND;TZID=America/New_York:20201104T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Jingfeng Wu URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-jingfeng-wu / X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Jing
feng Wu

\nAffiliation: Johns Hopkins University

Title: Direct ion Matters: On the Implicit Regularization Effect of Stochastic Gradient Descent with Moderate Learning Rate

\nAbstract:

\nUnderstanding
the algorithmic regularization effect of stochastic gradient descent (SGD
) is one of the key challenges in modern machine learning and deep learnin
g theory. Most of the existing works\, however\, focus on very small or ev
en infinitesimal learning rate regime\, and fail to cover practical scenar
ios where the learning rate is moderate and annealing. In this paper\, we
make an initial attempt to characterize the particular regularization effe
ct of SGD in the moderate learning rate regime by studying its behavior fo
r optimizing an overparameterized linear regression problem. In this case\
, SGD and GD are known to converge to the unique minimum-norm solution\; h
owever\, with the moderate and annealing learning rate\, we show that they
exhibit different directional bias: SGD converges along the large eigenva
lue directions of the data matrix\, while GD goes after the small eigenval
ue directions. Furthermore\, we show that such directional bias does matte
r when early stopping is adopted\, where the SGD output is nearly optimal
but the GD output is suboptimal. Finally\, our theory explains several fol
k arts in practice used for SGD hyperparameter tuning\, such as (1) linear
ly scaling the initial learning rate with batch size\; and (2) overrunning
SGD with high learning rate even when the loss stops decreasing.

Speaker: Arno
ld Filtser

\nAffiliation: Columbia University

Title: Scatteri
ng and Sparse Partitions\, and their Applications

\nAbstract:

\n
A partition $\\mathcal{P}$ of a weighted graph $G$ is $(\\sigma\,\\tau\,\\
Delta)$-sparse if every cluster has diameter at most $\\Delta$\, and every
ball of radius $\\Delta/\\sigma$ intersects at most $\\tau$ clusters. Si
milarly\, $\\mathcal{P}$ is $(\\sigma\,\\tau\,\\Delta)$-scattering if inst
ead for balls we require that every shortest path of length at most $\\Del
ta/\\sigma$ intersects at most $\\tau$ clusters. Given a graph $G$ that a
dmits a $(\\sigma\,\\tau\,\\Delta)$-sparse partition for all $\\Delta>0$\,
Jia et al. [STOC05] constructed a solution for the Universal Steiner Tree
problem (and also Universal TSP) with stretch $O(\\tau\\sigma^2\\log_\\ta
u n)$. Given a graph $G$ that admits a $(\\sigma\,\\tau\,\\Delta)$-scatte
ring partition for all $\\Delta>0$\, we construct a solution for the Stein
er Point Removal problem with stretch $O(\\tau^3\\sigma^3)$. We then cons
truct sparse and scattering partitions for various different graph familie
s\, receiving new results for the Universal Steiner Tree and Steiner Point
Removal problems.

Speaker: Yu Z
heng

\nAffiliation: Johns Hopkins University

Title: Space Eff icient Deterministic Approximation of String Measures

\nAbstract:

\nWe study approximation algorithms for the following three string meas
ures that are widely used in practice: edit distance (ED)\, longest common
subsequence (LCS)\, and longest increasing sequence (LIS). All three prob
lems can be solved exactly by standard algorithms that run in polynomial t
ime with roughly $\\Theta(n)$ space\, where $n$ is the input length\, and
our goal is to design deterministic approximation algorithms that run in p
olynomial time with significantly smaller space.

Towards this\, we design several algorithms that achieve $1+\\eps$ or $1-\\eps$ approximati on for all three problems\, where $\\eps>0$ can be any constant and even s lightly sub constant. Our algorithms are flexible and can be adjusted to a chieve the following two regimes of parameters: 1) space $n^{\\delta}$ for any constant $\\delta>0$ with running time essentially the same as or sli ghtly more than the standard algorithms\; and 2) space $\\mathsf{polylog} (n)$ with (a larger) polynomial running time\, which puts the approximatio n versions of the three problems in Steve’s class (SC). Our algorithms sig nificantly improve previous results in terms of space complexity\, where a ll known results need to use space at least $\\Omega(\\sqrt{n})$. Some of our algorithms can also be adapted to work in the asymmetric streaming mod el [SS13]\, and output the corresponding sequence. Furthermore\, our resul ts can be used to improve a recent result by Farhadi et. al. [FHRS20] abou t approximating ED in the asymmetric streaming model\, reducing the runnin g time from being exponential in [FHRS20] to a polynomial.

\nOur al gorithms are based on the idea of using recursion as in Savitch’s theorem [Sav70]\, and a careful adaption of previous techniques to make the recurs ion work. Along the way we also give a new logspace reduction from longest common subsequence to longest increasing sequence\, which may be of indep endent interest.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-341@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Xuan Wu\nAffiliation: Johns Hopkins University\nTitle: Coreset for Clustering in Graph Metrics.\nAbstract:\nClustering is a fund amental task in machine learning. As the increasing demand for running mac hine learning algorithms in the huge data sets\, classic clustering algori thms were found not to scale well. To this end\, coreset is introduced as a powerful data reduction technique that turns a huge dataset into a tiny proxy. Coresets have been also successfully applied to streaming and distr ibuted computing. Coresets for clustering in Euclidean spaces have been v ery well studied. However\, very few results were known about the non-Eucl idean space. Beyond Euclidean\, graph metrics is a very important family o f metric space. In this talk\, I will cover my recent work on coreset for k-clustering in graph metrics\, including bounded treewidth graph and excl uded-minor graph. DTSTART;TZID=America/New_York:20201209T120000 DTEND;TZID=America/New_York:20201209T130000 LOCATION:https://wse.zoom.us/j/91450299380 SEQUENCE:0 SUMMARY:[Theory Seminar] Xuan Wu URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-xuan-wu-2/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Xuan
Wu

\nAffiliation: Johns Hopkins University

Title: Coreset fo r Clustering in Graph Metrics.

\nAbstract:

\nClustering is a fu
ndamental task in machine learning. As the increasing demand for running m
achine learning algorithms in the huge data sets\, classic clustering algo
rithms were found not to scale well. To this end\, coreset is introduced a
s a powerful data reduction technique that turns a huge dataset into a tin
y proxy. Coresets have been also successfully applied to streaming and dis
tributed computing. Coresets for clustering in Euclidean spaces have been
very well studied. However\, very few results were known about the non-Eu
clidean space. Beyond Euclidean\, graph metrics is a very important family
of metric space. In this talk\, I will cover my recent work on coreset fo
r k-clustering in graph metrics\, including bounded treewidth graph and ex
cluded-minor graph.

Speaker: Enay
at Ullah

\nAffiliation: Johns Hopkins University

Title: Machi ne unlearning via algorithmic stability

\nAbstract: We study the pro blem of machine unlearning\, and identify a notion of algorithmic stabilit y\, Total Variation (TV) stability\, which we argue\, is suitable for the goal of exact efficient unlearning. For convex risk minimization problems\ , we design TV-stable algorithms based on noisy Stochastic Gradient Descen t (SGD). Our key contribution is the design of corresponding efficient unl earning algorithms\, which are based on constructing a (maximal) coupling of Markov chains for the noisy SGD procedure. To understand the trade-offs between accuracy and unlearning efficiency\, we give upper and lower boun ds on excess empirical and population risk of TV stable algorithms for con vex risk minimization. Our techniques generalize to arbitrary non-convex f unctions\, and our algorithms are differentially private as well.

\nSpeaker: Thom
as Lavastida

\nAffiliation: Carnegie Mellon University

Title: Combinatorial Optimization Augmented with Machine Learning

\nAbstra ct:

\nCombinatorial optimization often focuses on optimizing for the worst-case. However\, the best algorithm to use depends on the “relevant inputs”\, which is application specific and often does not have a formal d efinition.

\nThe talk gives a new theoretical model for designing algorithms that are tailored to inputs for the application at hand. In the model\, learning is performed on past problem instances to make predictio ns on future instances. These predictions are incorporated into the design and analysis of the algorithm. The predictions can be used to achieve “i nstance-optimal” algorithm design when the predictions are accurate and th e algorithm’s performance gracefully degrades when there is error in the p rediction.

\nThe talk will apply this framework to online algorithm design and give algorithms with theoretical performance that goes beyond w orst-case analysis. The majority of the talk will focus on load balancing with restricted assignments.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-349@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Hung Le\nAffiliation: University of Massachusetts\, Am herst\nTitle: Reliable Spanners: Locality-Sensitive Orderings Strike Back \nAbstract:\nA highly desirable property of networks is robustness to fail ures.\nConsider a metric space $(X\,d_X)$\, a graph $H$ over $X$ is a $\\v artheta$-reliable $t$-spanner if\, for every set of failed vertices $B\\su bset X$\, there is a superset $B^+\\supseteq B$ such that the induced subg raph $H[X\\setminus B]$ preserves all the distances between points in $X\\ setminus B^+$ up to a stretch factor $t$\, while the expected size of $B^+ $ is as most $(1+\\vartheta)|B|$. Such a spanner could withstand a catastr ophe: failure of even $90\\%$ of the network.\nBuchin\, Har-Peled\, and Ol ah showed how to construct a sparse reliable spanner for Euclidean space f rom Euclidean locality-sensitive orderings\, an object introduced by Chan\ , Har-Peled\, and Jones. In this talk\, we extend their approach to non-Eu clidean metric spaces by generalizing the ordering of Chan\, Har-Peled\, a nd Jones to doubling metrics and introducing new types of locality-sensiti ve orderings for other metric spaces. We also show how to construct reliab le spanners from the newly introduced locality-sensitive orderings via rel iable 2-hop spanners for paths. The highlight of our results is that the n umber of edges in our spanner has no dependency on the spread. DTSTART;TZID=America/New_York:20210303T120000 DTEND;TZID=America/New_York:20210303T130000 LOCATION:https://wse.zoom.us/j/91450299380 SEQUENCE:0 SUMMARY:[Theory Seminar] Hung Le URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-hung-le/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Hung
Le

\nAffiliation: University of Massachusetts\, Amherst

Titl e: Reliable Spanners: Locality-Sensitive Orderings Strike Back

\nAb
stract:

\nA highly desirable property of networks is robustness to fa
ilures.

\nConsider a metric space $(X\,d_X)$\, a graph $H$ over $X$ i
s a $\\vartheta$-reliable $t$-spanner if\, for every set of failed vertice
s $B\\subset X$\, there is a superset $B^+\\supseteq B$ such that the indu
ced subgraph $H[X\\setminus B]$ preserves all the distances between points
in $X\\setminus B^+$ up to a stretch factor $t$\, while the expected size
of $B^+$ is as most $(1+\\vartheta)|B|$. Such a spanner could withstand a
catastrophe: failure of even $90\\%$ of the network.

Buchin\, Har -Peled\, and Olah showed how to construct a sparse reliable spanner for Eu clidean space from Euclidean locality-sensitive orderings\, an object intr oduced by Chan\, Har-Peled\, and Jones. In this talk\, we extend their app roach to non-Euclidean metric spaces by generalizing the ordering of Chan\ , Har-Peled\, and Jones to doubling metrics and introducing new types of l ocality-sensitive orderings for other metric spaces. We also show how to c onstruct reliable spanners from the newly introduced locality-sensitive or derings via reliable 2-hop spanners for paths. The highlight of our result s is that the number of edges in our spanner has no dependency on the spre ad.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-356@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Teodor Marinov\nAffiliation: Johns Hopkins University \nTitle: Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bo unds for Episodic Reinforcement Learning\nAbstract:\nReinforcement Learnin g (RL) is a general scenario where agents interact with the environment to achieve some goal. The environment and an agent’s interactions are typica lly modeled as a Markov decision process (MDP)\, which can represent a ric h variety of tasks. But\, for which MDPs can an agent or an RL algorithm s ucceed? This requires a theoretical analysis of the complexity of an MDP. In this talk I will present information-theoretic lower bounds for a large class of MDPs. The lower bounds are based on studying a certain semi-infi nite LP. Further\, I will show that existing algorithms enjoy tighter gap- dependent regret bounds (similar to the stochastic multi-armed bandit prob lem)\, however\, they are unable to achieve the information-theoretic lowe r bounds even in deterministic transition MDPs\, unless there is a unique optimal policy. DTSTART;TZID=America/New_York:20210310T120000 DTEND;TZID=America/New_York:20210310T130000 LOCATION:https://wse.zoom.us/j/91450299380 SEQUENCE:0 SUMMARY:[Theory Seminar] Teodor Marinov URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-teodor-mari nov/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Teod
or Marinov

\nAffiliation: Johns Hopkins University

Title: Bey ond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Epi sodic Reinforcement Learning

\nAbstract:

\nReinforcement Learni
ng (RL) is a general scenario where agents interact with the environment t
o achieve some goal. The environment and an agent’s interactions are typic
ally modeled as a Markov decision process (MDP)\, which can represent a ri
ch variety of tasks. But\, for which MDPs can an agent or an RL algorithm
succeed? This requires a theoretical analysis of the complexity of an MDP.
In this talk I will present information-theoretic lower bounds for a larg
e class of MDPs. The lower bounds are based on studying a certain semi-inf
inite LP. Further\, I will show that existing algorithms enjoy tighter gap
-dependent regret bounds (similar to the stochastic multi-armed bandit pro
blem)\, however\, they are unable to achieve the information-theoretic low
er bounds even in deterministic transition MDPs\, unless there is a unique
optimal policy.

Speaker: Domi
nik Kempa

\nAffiliation: Johns Hopkins University

Title: How to store massive sequence collections in a searchable form

\nAbstrac
t:

\nCompressed indexing is concerned with the design and constructio
n of data structures to store massive sequences in space close to the size
of compressed data\, while simultaneously providing searching functionali
ty (such as pattern matching) on the original uncompressed data. These ind
exes have a huge impact on the analysis of large-scale DNA databases (cont
aining hundreds of thousands of genomes) but until recently the size of ma
ny indexes lacked theoretical guarantees and their construction remains a
computational bottleneck.

In this talk I will describe my work pro ving theoretical guarantees on index size as a function of compressed data size\, resolving a key open question in this field. Related to that\, I w ill also describe new algorithms for converting between two conceptually d istinct compressed representations\, Lempel-Ziv and the Burrows-Wheeler Tr ansform. I will show how these new findings enable advanced computation di rectly on compressed data. I will conclude by describing avenues for futur e work\, including the new algorithms for the construction of compressed i ndexes that have the potential to cut indexing time by further orders of m agnitude\, unlocking the ability to search truly enormous text or DNA data sets.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-353@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Audra McMillan\nAffiliation: Apple\nTitle: Hiding amon g the clones: a simple and nearly optimal analysis of privacy amplificatio n by shuffling\nAbstract:\nDifferential privacy (DP) is a model of privacy -preserving machine learning that has garnered significant interest in rec ent years due to its rigorous privacy guarantees. An algorithm differentia lly private if the output is stable under small changes in the input datab ase. While DP has been adopted in a variety of applications\, most applica tions of DP in industry actually satisfy a stronger notion called local di fferential privacy. In local differential privacy data subjects perturb th eir data before it reaches the data analyst. While this requires less trus t\, it comes a substantial cost to accuracy. Recent work of Erlingsson\, F eldman\, Mironov\, Raghunathan\, Talwar\, and Thakurta [EFMRTT19] demonstr ated that random shuffling amplifies differential privacy guarantees of lo cally randomized data. Such amplification implies substantially stronger p rivacy guarantees for systems in which data is contributed anonymously [BE MMRLRKTS17] and has led to significant interest in the shuffle model of pr ivacy [CSUZZ19\, EFMRTT19]. In this talk\, we will discuss a new result on privacy amplification by shuffling\, which achieves the asymptotically op timal dependence in the local privacy parameter. Our result is based on a new proof strategy which is simpler than previous approaches\, and extends to a lightly weaker notion known as approximate differential privacy with nearly the same guarantees. Based on joint work with Vitaly Feldman and K unal Talwar (https://arxiv.org/abs/2012.12803) DTSTART;TZID=America/New_York:20210324T120000 DTEND;TZID=America/New_York:20210324T130000 LOCATION:https://wse.zoom.us/j/91450299380 SEQUENCE:0 SUMMARY:[Theory Seminar] Audra McMillan URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-audra-mcmil lan/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Audr
a McMillan

\nAffiliation: Apple

Title: Hiding among the clone s: a simple and nearly optimal analysis of privacy amplification by shuffl ing

\nAbstract:

\nDifferential privacy (DP) is a model of priva
cy-preserving machine learning that has garnered significant interest in r
ecent years due to its rigorous privacy guarantees. An algorithm different
ially private if the output is stable under small changes in the input dat
abase. While DP has been adopted in a variety of applications\, most appli
cations of DP in industry actually satisfy a stronger notion called local
differential privacy. In local differential privacy data subjects perturb
their data before it reaches the data analyst. While this requires less tr
ust\, it comes a substantial cost to accuracy. Recent work of Erlingsson\,
Feldman\, Mironov\, Raghunathan\, Talwar\, and Thakurta [EFMRTT19] demons
trated that random shuffling amplifies differential privacy guarantees of
locally randomized data. Such amplification implies substantially stronger
privacy guarantees for systems in which data is contributed anonymously [
BEMMRLRKTS17] and has led to significant interest in the shuffle model of
privacy [CSUZZ19\, EFMRTT19]. In this talk\, we will discuss a new result
on privacy amplification by shuffling\, which achieves the asymptotically
optimal dependence in the local privacy parameter. Our result is based on
a new proof strategy which is simpler than previous approaches\, and exten
ds to a lightly weaker notion known as approximate differential privacy wi
th nearly the same guarantees. Based on joint work with Vitaly Feldman and
Kunal Talwar (https://arxiv.org/abs/2012.12803)

Speaker: Mary
am Negahbani

\nAffiliation: Dartmouth University

Title: “Revi siting Priority k-Center: Fairness and Outliers

\nAbstract:

\nC
lustering is a fundamental unsupervised learning and facility location pro
blem extensively studied in the literature. I will talk about a clustering
problem called “priority k-center” introduced by Plesnik (in Disc. Appl.
Math. 1987). Given a metric space on n points X\, with distance function
d\, an integer k\, and radius r_v for each point v in X\, the goal is to c
hoose k points S as “centers” to minimize the maximum distance of a point
v to S divided by r_v. For uniform r_v’s this is precisely the “k-center”
problem where the objective is to minimize the maximum distance of any poi
nt to S. In the priority version\, points with smaller r_v are prioritized
to be closer to S. Recently\, a special case of this problem was studied
in the context of “individually fair clustering” by Jung et al.\, FORC 20
20. This notion of fairness forces S to open a center in every “densely po
pulated area” by setting r_v to be v’s distance to its closest (n/k)-th ne
ighbor.

In this talk\, I show how to approximate priority k-center with outliers: When for a given integer z\, you are allowed to throw away z points as outliers and minimize the objective over the rest of the poin ts. We show there is 9-approximation\, which is morally a 5\, if you have constant many types of radii or if your radii are powers of 2. This is via an LP-aware reduction to min-cost max-flow and is general enough that cou ld handle Matroid constraints on facilities (where instead of asking to pi ck at most k facilities\, you are asked to pick facilities that are indepe ndent in a given matroid). Things become quite interesting for priority kn apsack-center with outliers: where opening each center costs something and you have a limited budget to spend on your solution S. In this case\, we do not know how to solve the corresponding flow problem\, so we alter our reduction to reduce to a simpler problem we do know how to solve taking a hit of +5 in the approximation ratio. There are still many open problems i n this work\, in addition to solving the flow problem in the knapsack case \, the best LP integrality gap we know for priority k-center with outliers is 3.

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-346@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Leonidas Tsepenekas\nAffiliation: University of Maryla nd\nTitle: Approximating Two-Stage Stochastic Supplier Problems\nAbstract: \nThe main focus of this talk will be radius-based (supplier) clustering i n the two-stage stochastic setting with recourse\, where the inherent stoc hasticity of the model comes in the form of a budget constraint. Our event ual goal is to provide results in the most general distributional setting\ , where there is only black-box access to the underlying distribution. To that end\, we follow a two-step approach. First\, we develop algorithms fo r a restricted version of the problem\, in which all possible scenarios ar e explicitly provided\; second\, we employ a novel scenario-discarding var iant of the standard Sample Average Approximation (SAA) method\, in which we also crucially exploit structural properties of the algorithms develope d for the first step of the framework. In this way\, we manage to generali ze the results of the latter to the black-box model. Finally\, we note tha t the scenario-discarding modification to the SAA method is necessary in o rder to optimize over the radius.\nPaper: https://arxiv.org/abs/2008.03325 DTSTART;TZID=America/New_York:20210407T120000 DTEND;TZID=America/New_York:20210407T130000 LOCATION:https://wse.zoom.us/j/91450299380 SEQUENCE:0 SUMMARY:[Theory Seminar] Leonidas Tsepenekas URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-leonidas-ts epenekas/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Leon
idas Tsepenekas

\nAffiliation: University of Maryland

Title: Approximating Two-Stage Stochastic Supplier Problems

\nAbstract:

\nThe main focus of this talk will be radius-based (supplier) clustering
in the two-stage stochastic setting with recourse\, where the inherent st
ochasticity of the model comes in the form of a budget constraint. Our eve
ntual goal is to provide results in the most general distributional settin
g\, where there is only black-box access to the underlying distribution. T
o that end\, we follow a two-step approach. First\, we develop algorithms
for a restricted version of the problem\, in which all possible scenarios
are explicitly provided\; second\, we employ a novel scenario-discarding v
ariant of the standard Sample Average Approximation (SAA) method\, in whic
h we also crucially exploit structural properties of the algorithms develo
ped for the first step of the framework. In this way\, we manage to genera
lize the results of the latter to the black-box model. Finally\, we note t
hat the scenario-discarding modification to the SAA method is necessary in
order to optimize over the radius.

Paper: https://arxiv.org/abs/2 008.03325

\n END:VEVENT BEGIN:VEVENT UID:ai1ec-358@www.cs.jhu.edu/~mdinitz/theory DTSTAMP:20211018T045941Z CATEGORIES: CONTACT: DESCRIPTION:Speaker: Samson Zhou\nAffiliation: Carnegie Mellon University\n Title: Tight Bounds for Adversarially Robust Streams and Sliding Windows v ia Difference Estimators\nAbstract:\nWe introduce difference estimators fo r data stream computation\, which provide approximations to F(v)-F(u) for frequency vectors v\,u and a given function F. We show how to use such est imators to carefully trade error for memory in an iterative manner. The fu nction F is generally non-linear\, and we give the first difference estima tors for the frequency moments F_p for p between 0 and 2\, as well as for integers p>2. Using these\, we resolve a number of central open questions in adversarial robust streaming and sliding window models.\nFor both model s\, we obtain algorithms for norm estimation whose dependence on epsilon i s 1/epsilon^2\, which shows\, up to logarithmic factors\, that there is no overhead over the standard insertion-only data stream model for these pro blems. DTSTART;TZID=America/New_York:20210414T120000 DTEND;TZID=America/New_York:20210414T130000 SEQUENCE:0 SUMMARY:[Theory Seminar] Samson Zhou URL:https://www.cs.jhu.edu/~mdinitz/theory/event/theory-seminar-samson-zhou -3/ X-COST-TYPE:free X-ALT-DESC;FMTTYPE=text/html:\\n\\n\\nSpeaker: Sams
on Zhou

\nAffiliation: Carnegie Mellon University

Title: Tigh t Bounds for Adversarially Robust Streams and Sliding Windows via Differen ce Estimators

\nAbstract:

\nWe introduce *difference estimat
ors* for data stream computation\, which provide approximations to F(v
)-F(u) for frequency vectors v\,u and a given function F. We show how to u
se such estimators to carefully trade error for memory in an iterative man
ner. The function F is generally non-linear\, and we give the first differ
ence estimators for the frequency moments F_p for p between 0 and 2\, as w
ell as for integers p>2. Using these\, we resolve a number of central open
questions in adversarial robust streaming and sliding window models.

For both models\, we obtain algorithms for norm estimation whose depe ndence on epsilon is 1/epsilon^2\, which shows\, up to logarithmic factors \, that there is no overhead over the standard insertion-only data stream model for these problems.

\n END:VEVENT END:VCALENDAR