Published:
Category:
Sign reading "Cold Spring Harbor Laboratory. Found 1890. CSH. Research and education in molecular biology, genetics, cancer, and neuroscience.
Image Credit: Cold Spring Harbor Laboratory

Students from the Department of Computer Science have been selected to present their genome informatics research at the 2025 Cold Spring Harbor Laboratory Conference on Genome Informatics, to be held November 5–8 at Cold Spring Harbor, New York.

Renowned for its collaborative environment and cutting-edge discussions, the conference is a distinguished gathering in the field of genomics that brings together leading minds from academia and industry to explore the latest advancements in genome informatics. Johns Hopkins’ Professor Ben Langmead is a co-organizer of this year’s event.

PhD student Nicole Brown—who is advised by Bloomberg Distinguished Professor of Computational Biology and Oncology Michael Schatz and is also a data scientist at the Johns Hopkins Applied Physics Laboratory—will present “Identifying Introgressions Across Pangenomes with Panagram.” The talk covers an alignment-free pangenome analysis tool developed to capture and interpret genomic diversity at scale. Specifically, Panagram accurately and rapidly visualizes introgressions—the incorporation of a DNA segment from one species into another, often used to improve agricultural traits—thus providing a scalable framework for tracking trait transfer in domestication and breeding.

Yuchen “Peter” Ge, a PhD student of biomedical engineering advised by Bloomberg Distinguished Professor of Computational Biology and Genomics Steven Salzberg, and Edward Li, a second-year undergraduate student of computer science, will give a talk on “Improving Metagenomics Classification with Kmask: Entropy-Based Masking of Low-Complexity Regions,” in which they present an entropy-based masking tool for improving the accuracy of metagenomic analyses for when using very large microbial databases as a reference.

CS PhD student Wui Wang “Edward” Lui—who is advised by Liliana Florea, an associate professor of genetic medicine and computer science—will present “SpliSync: Genomic-Language-Model-Driven Splice Site Correction of Long RNA Sequencing Reads.” The work, funded by the National Institutes of Health, introduces a splice site corrector based on a genomic language model that, when integrated with a short read assembler, outperforms the reference tools IsoQuant and FLAIR. SpliSync precisely corrects splice sites in long-read alignments, improving intron and transcript reconstruction and enhancing novel-variant discovery, making it a promising tool for long-read transcriptomics.

Matthew Nguyen, another PhD student in the Schatz Lab, will give a talk on “Refining Kraken2 Long-Read Taxonomic Classifications Using Convolutional Neural Networks,” which presents a framework that predicts the true positive probability of a Kraken2 taxonomic classification and helps reduce its false positive rate by over 70%. The method is computationally lightweight and can be integrated as a post-processing step in existing Kraken2 workflows to yield interpretable scores that aid in taxonomic classification.

Finally, PhD student Vikram Shivakumar, who is advised by Langmead, will present “Mumemto—Scalable Multi-MUM Finding for Pangenomes.” This talk introduces a technique for rapidly finding similarities between genomes from large collections called pangenomes, which scale to hundreds of human genomes. Using this technique, the researchers were able to visualize large pangenomes, identify genetic variation, and find conserved genomic sequence across the tree of life.

Computer Science PhD students Mao-Jan Lin and Alex Sweeten and postdoctoral fellow Sina Majidian will also be presenting posters at the conference.