We typically have seminars on Wednesday at noon in Malone 228. All seminar announcements will be sent to the theory mailing list.
Speaker: Samson Zhou
Affiliation: Purdue University
Title: Pattern Matching over Noisy Data Streams
Abstract: The identification of low-complexity structure in strings is a fundamental building block for many algorithms in computational biology or natural language processing. The general paradigm in these algorithms is to find either highly repetitive structure, in the form of periodicity or palindromes in a pre-processing stage, to filter out locations where a certain pattern cannot occur, thus improving efficiency.
Unfortunately, we must expect massive data to contain a number of small imperfections, such as through human error or mutations. This motivates the need to study structure in models of sublinear space, resilient to sources of noise. In this talk, we introduce several types of structure and noise, and discuss efficient algorithms to identify these structures over data streams.
As a warm-up, we provide an algorithm for identifying a longest common aligned substring of two inputs, resilient up to d errors of insertions, substitutions, or deletions. We then present a streaming algorithm for identifying the longest palindrome, resilient up to a threshold of d substitution errors. Finally, we discuss the problem of finding all periods of a string including up to d persistent changes or erasures. For each of these scenarios, we also provide complementary lower bounds.
Joint work with Funda Ergun, Elena Grigorescu, and Erfan Sadeqi Azer.
Samson is a PhD candidate in the Department of Computer Science at Purdue University, under the supervision of Greg Frederickson and Elena Grigorescu. He received his undergraduate education at MIT, where he obtained a Bachelor’s in math and computer science, as well as a Master’s in computer science. He is a member of the Theory Group at Purdue, and his current research interests are sublinear and approximation algorithms, with an emphasis on streaming algorithms.
Speaker: Venkata Gandikota
Affiliation: Johns Hopkins University
Title: NP-Hardness of Reed-Solomon Decoding and the Prouhet-Tarry-Escott Problem
Abstract: Establishing the complexity of Bounded Distance Decoding for Reed-Solomon codes is a fundamental open problem in coding theory, explicitly asked by Guruswami and Vardy (IEEE Trans. Inf. Theory, 2005). The problem is motivated by the large current gap between the regime when it is NP-hard, and the regime when it is efficiently solvable (i.e., the Johnson radius).
We show the first NP-hardness results for asymptotically smaller decoding radii than the maximum likelihood decoding radius of Guruswami and Vardy. Specifically, for Reed-Solomon codes of length N and dimension K = O(N), we show that it is NP-hard to decode more than N-K-O(log N / log log N) errors.
These results follow from the NP-hardness of a generalization of the classical Subset Sum problem to higher moments, called Moments Subset Sum, which has been a known open problem, and which may be of independent interest. We further reveal a strong connection with the well-studied Prouhet-Tarry-Escott problem in Number Theory, which turns out to capture a main barrier in extending our techniques. We believe the Prouhet-Tarry-Escott problem deserves further study in the theoretical computer science community.
This is a joint work with Badih Ghazi (MIT) and Elena Grigorescu (Purdue).
Speaker: Amirbehshad Shahrasbi
Affiliation:Carnegie Mellon University