Feeling inspired to start your own project? Learn more about research opportunities for CS undergraduate students here.
Research Spotlight: RescueReady and TriageNow
Computer science teams developed new digital platforms to tackle critical gaps in EMS training and mass casualty response.
Research Projects
Faculty Research Advisor: Randal Burns
Abstract: Modern AI models like GPT and LLaMA rely heavily on matrix multiplication, a mathematically simple but computationally expensive operation that dominates large language model inference cost. A key insight is that during a forward pass, large portions of the matrices being multiplied are effectively zero, a phenomenon called activation sparsity. The problem is that this sparsity only appears at runtime and changes with every input, making it invisible to traditional sparse computing libraries that require structure to be fixed in advance. Efficient Sparse Matrix Multiplication (ESMM) is a family of CUDA kernels that detects and exploits this sparsity on the fly, skipping computation over zero-contributing regions within a single matrix multiplication with no model modifications, retraining, or format conversion required. On an NVIDIA A10G GPU, ESMM achieves up to 2.65× speedup over NVIDIA’s own cuBLAS library at high sparsity levels. A secondary finding from evaluating real LLaMA model weights reveals a structural gap: standard pruning methods produce sparsity patterns that are incompatible with block-structured skipping, identifying block-aware pruning as a critical open problem for the field.
About Avery: Avery Clapp double-majored in computer science and economics at Johns Hopkins, where he conducted research as a Pistritto Fellow under Bill and Lisa Stromberg Department Head Randal Burns. What draws him to computer science and programming is the mental challenge; he is most interested in the kinds of problems where software has to be deeply reasoned about, which is what pulls him towards AI infrastructure and quantitative finance. Avery has worked in trading development and will be joining Bloomberg after graduation. He also just wrapped up his swimming career at Hopkins, outside of which he’s spent time reading and writing about philosophy, listening to and producing music, and playing chess.
Faculty Research Advisor: Alexis Battle
Abstract: High-throughput proteomics data provide dense individual-level molecular readouts, enabling the development of machine learning models for predicting diverse phenotypes relevant to patient health. Proteins interact in the cell in complex, nonlinear relationships that may not be reflected by linear models or simple machine learning approaches, highlighting the potential for more expressive deep neural networks to improve performance. Despite this possibility, in practice, developing neural network approaches in biological domains has been a significant challenge. We developed a deep learning framework for predicting disease-related traits from protein expression data using an innovative model architecture designed to exploit structured biological knowledge. The core of the model is a graph neural network (GNN) operating on bipartite graphs where one set of nodes represents protein expression levels, and the other represents hundreds of protein sets derived from gene ontology libraries. Edges encode set membership, providing a compact and biologically meaningful structure. We trained our model using the UK Biobank plasma proteomics and individual phenotype data. Of the architectures we examined, the best-performing architecture had three parallel heads: two GNNs each using graphs constructed with independent protein set libraries and one global head consisting of tabular protein expression data. Their outputs are concatenated and passed through a dense feed-forward network to predict phenotype. When applied to predicting glycated hemoglobin levels and a range of other phenotypes, our model showed strong predictive performance, outperforming other deep learning architectures and simpler linear models. Control models with permuted protein labels displayed worse performance, demonstrating that the model benefits from the inductive bias from incorporating prior knowledge, especially in settings with limited training data. We present an innovative model architecture incorporating biological domain knowledge to predict complex traits from large scale proteomic data.
About Prabuddha: Prabuddha Ghosh Dastidar recently completed his undergraduate degrees in computer science and applied mathematics at the Johns Hopkins University. Eager to combine his interest in biology and genetics with his training in computer science and mathematics, he reached out to Alexis Battle to begin a research position in her lab, which ultimately defined his undergraduate experience. At the Battle Lab, he developed a series of machine learning models that incorporate biological knowledge to learn increasingly rich patterns in the molecular underpinnings of disease risk. Prabuddha’s research was recognized by the Department of Computer Science, which awarded him a Pistritto Research Fellowship. By integrating structured biological knowledge with machine learning models, he hopes to develop generalizable tools that reveal new principles of biological networks, bridging the gap between accuracy-driven modeling and mechanistic insight. He will continue his training at Stanford University, where he is pursuing a PhD in computer science.
Faculty Research Advisor: Russell Taylor
Abstract: This research develops a speech-guided computer vision system for video-guided skull base surgery. The goal is to let surgeons ask for visual assistance using natural language—such as identifying a surgical tool, tracking it in the video, registering anatomical structures, or showing guidance overlays—without needing to stop the procedure or interact with extra hardware. Unlike many traditional surgical navigation systems that require optical trackers or additional setup, this system works directly from the surgical video. The system first uses interactive segmentation to identify and label the surgical instrument in the video. Once the instrument is segmented, it is tracked automatically and used as a spatial reference for several downstream tasks, including anatomy segmentation, registration of preoperative 3D anatomical models, estimation of surgical tool pose from monocular video, and real-time anatomical overlays. In skull base surgery experiments, we compared the vision-based tracking method with a commercial tracking system. Across three trials, the method achieved a mean tool-tip position error of 2.32 mm, with small frame-to-frame orientation differences in yaw and pitch. The system also completed tool segmentation and anatomy registration in about two minutes, showing that speech-guided video-based systems can provide useful surgical guidance while reducing setup complexity. This project is funded by the Pistritto Fellowship; its preliminary version was accepted to a workshop at the 2025 International Conference on Medical Image Computing and Computer Assisted Intervention, and the extended version is currently under review by IEEE Transactions on Medical Robotics and Bionics. In addition to video-based surgical guidance, this research also explores speech-guided robotic path planning for endoscopic sinus and skull base navigation. It focuses on enabling a robotic endoscope to interpret surgeon commands—such as moving toward an anatomical landmark, zooming in or out, or adjusting the viewing direction—and then convert those commands into safe, anatomy-aware motion plans. This method uses segmented anatomical structures and signed distance fields (SDFs) to represent the nasal workspace, allowing the planner to generate collision-free trajectories while maintaining awareness of nearby critical anatomy. A graph-based planning framework is used to search feasible paths through the constrained nasal cavity, and viewpoint-aware orientation proposals are generated so that the endoscope can preserve useful visualization of surgical landmarks along the path. The sinus path planning system further incorporates local 3D scene memory and viewpoint preference modeling to improve navigation quality beyond simply reaching a target point. During navigation, depth observations are accumulated into local anatomical memory, and geometric descriptors such as surface distance and SDF gradient directions are used to evaluate whether a planned view provides useful surgical context. The goal is to model not only where the endoscope should move, but also what it should look at during the motion. In simulation experiments, the system demonstrated the ability to generate collision-free, executable endoscope trajectories in the nasal workspace, with average command-to-motion runtime on the order of seconds and sub-millimeter-level final pose error between planned and executed tip positions. Together, this work supports a broader vision of intelligent surgical assistance in which natural language commands, anatomical understanding, and robotic motion planning are combined to provide hands-free, context-aware guidance during endoscopic skull base procedures.
About Jecia: Jecia Z.Y. Mao is an undergraduate student at the Johns Hopkins University majoring in computer science and minoring in robotics, computer integrated surgery, and applied mathematics and statistics. Her research interests include computer vision, robotics, surgical AI, vision-language interaction, and image-guided surgical systems. Jecia has worked on speech-guided surgical scene segmentation, anatomy-aware surgical navigation, robotic endoscope control, 3D tool pose tracking, and computer vision for microsurgical robotics, and her research has been recognized through the Pistritto Research Fellowship. She is especially interested in building intelligent robotic systems that combine perception and language to assist clinicians in complex surgical environments.
Faculty Research Advisors: Mathias Unberath, Lalithkumar Seenivasan
Abstract: A surgical world model capable of generating realistic surgical action videos with precise control over tool-tissue interactions can address fundamental challenges in surgical AI and simulatiom—from data scarcity and rare event synthesis to bridging the sim-to-real gap for surgical automation. However, current video generation methods, the very core of such surgical world models, require expensive annotations or complex structured intermediates as conditioning signals at inference, limiting their scalability. Other approaches exhibit limited temporal consistency across complex laparoscopic scenes and do not possess sufficient realism. We propose Surgical Action World (SAW), a step toward surgical action world modeling through video diffusion conditioned on four lightweight signals: language prompts encoding tool-action context, a reference surgical scene, tissue affordance masks, and 2D tool-tip trajectories. We design a conditional video diffusion approach that reformulates video-to-video diffusion into trajectory-conditioned surgical action synthesis. The backbone diffusion model is fine-tuned on a custom-curated dataset of 12,044 laparoscopic clips with lightweight spatiotemporal conditioning signals, leveraging a depth consistency loss to enforce geometric plausibility without requiring depth at inference. SAW achieves state-of-the-art temporal consistency (CD-FVD: 199.19 vs. 546.82) and strong visual quality on held-out test data. Furthermore, we demonstrate its downstream utility for (a) surgical AI, where augmenting rare actions with SAW-generated videos improves action recognition (clipping F1-score: 20.93% to 43.14%; cutting: 0.00% to 8.33%) on real test data, and (b) surgical simulation, where rendering tool-tissue interaction videos from simulator-derived trajectory points toward a visually faithful simulation engine.
About Sampath: Sampath Rampuri is a recent graduate from the Johns Hopkins University with dual bachelor’s degrees in biomedical engineering and computer science, as well as a concurrent master’s degree in biomedical engineering. He is interested in the application of AI in healthcare, with a particular focus on surgical robotics and computer vision, as well as AI-related health policy and medical technology translation. Sampath’s research spans surgical world modeling, wearable devices, and critical care risk stratification, with work featured in Brain, the British Journal of Anaesthesia, JMIR Medical Informatics, and other venues. His research has been supported through a Summer Student Fellowship from the Institute for Data-Intensive Engineering and Science, the Pistritto Research Fellowship, the Life Design Lab, Summer Provost’s Undergraduate Research Awards, and industry partnerships.
Faculty Research Advisors: Swaroop Vedula, Vishal Patel, Shameema Sikder
Abstract: This research explores a data-driven approach for assessing surgical skill in operating room videos of cataract surgery, focusing on two critical steps: the main incision and capsulorhexis. Surgical skill assessment and feedback have traditionally been subjective and inconsistent, heavily dependent on human raters. On the other hand, AI enables objective and consistent assessment and feedback, offering a promising solution to these limitations. To harness this potential, we developed an AI-driven assessment pipeline using a Video Masked Autoencoder (VideoMAE) model to extract temporally and spatially rich features from surgical videos. The model was pre-trained on full-length cataract surgery videos and tested on curated datasets with ground truth skill ratings provided by an expert ophthalmologist using the International Council of Ophthalmology’s Ophthalmology Surgical Competency Assessment Rubric criteria. Pre-training provided superior spatial-temporal representations for skill assessment, outperforming convolutional neural network-based architectures like CNN-Long Short Term Memory and CNN-Vision Transformer. Specifically, pre-training significantly enhanced the specificity and sensitivity of the assessment for both the main incision and capsulorhexis, surpassing the performance of previous methods. We compared the performance of the VideoMAE-based transformer model against baseline architectures using multiple random splits and standard evaluation metrics such as accuracy, area under the curve, sensitivity, and specificity. For the main incision, the model achieved an accuracy of 0.76, sensitivity of 0.73, and specificity of 0.80. For capsulorhexis, it achieved an accuracy of 0.71, sensitivity of 0.85, and specificity of 0.90. Importantly, while the AI models were validated using cross-validation, further validation on new datasets is needed to confirm their generalizability. These findings highlight that VideoMAE enables an objective, automated evaluation of critical surgical steps in cataract surgery, improving assessment accuracy and consistency. Its ability to capture subtle temporal transitions and provide fine-grained spatial analysis makes it more effective than conventional frame-level CNN features. These results suggest a promising future for transformer-based video models in automating surgical training and delivering objective, actionable feedback to surgeons.
About Subhasri: Subhasri Vijay is a master’s student of computer science at the Johns Hopkins University, where she also earned her undergraduate degree in computer science with minors in computational medicine and entrepreneurship and management. She has a strong foundation in AI, full-stack software development, data analysis, and problem-solving, with a deep passion for using technology to drive real-world impact. As a research assistant at the Wilmer Eye Institute and the Malone Center for Engineering in Healthcare, Subhasri collaborates with ophthalmologists and engineers to develop AI solutions for surgical skill assessment that leverages deep learning and video analysis to enhance surgical training and patient outcomes. Her work has been presented at leading conferences including the International Symposium on Biomedical Imaging, the Annual Meeting of the Association of University Professors of Ophthalmology, and the Medical Imaging with Deep Learning conference, and has been recognized through internal honors such as the department’s Masson Fellowship. Subhasri has served as the president of the Software Engineering Club and has helped to build an inclusive and vibrant software engineering community here at Hopkins. She currently serves as head course assistant for Data Structures, mentoring her peers and supporting the course’s teaching staff. Whether advancing AI in health care, improving emergency response through her app RescueReady, or empowering the next generation of engineers, Subhasri brings a hands-on, mission-driven approach to innovation.
Student Spotlight: Alessa Carbo
The third-year undergraduate’s AI-powered translation of sign language videos earned her first authorship at a major natural language processing conference.