
Johns Hopkins researchers, including several affiliated with the Department of Computer Science, will present their research at the 2025 IEEE/CVF Computer Vision and Pattern Recognition Conference, to be held June 11–15 in Nashville, Tennessee.
CPVR is the premier annual computer vision event that showcases advancements in computer vision, with several co-located workshops and short courses.
In addition to presenting papers and posters, Johns Hopkins will host a booth at the conference. Attendees are encouraged to stop by booth 1317 to learn more about Johns Hopkins’ transformational investment in the power and promise of data science and AI.
Johns Hopkins researchers will present the following papers:
- “Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency” by Feng Wang; visiting student Timing Yang; Yaodong Yu; Sucheng Ren; Guoyizhe Wei; Angtian Wang, Engr ’24 (PhD); Wei Shao; Yuyin Zhou, Engr ’20 (PhD); Alan Yuille; and Cihang Xie, Engr ’20 (PhD)
- “CamFreeDiff: Camera-Free Image to Panorama Generation With Diffusion Model” by Xiaoding Yuan, Shitao Tang, Kejie Li, Alan Yuille, and Peng Wang
- “Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning” by Bardia Safaei, Faizan Siddiqui, Jiacong Xu, Vishal M. Patel, and Shao-Yuan Lo
- “Flowing From Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution” by Qihao Liu, Xi Yin, Alan Yuille, Andrew Brown, and Mannat Singh
- “Lux Post Facto: Learning Portrait Performance Relighting With Conditional Video Diffusion and a Hybrid Dataset” by Yiqun Mei, Engr ’25 (PhD); Mingming He; Li Ma; Julien Philip; Wenqi Xian; David M George; Xueming Yu; Gabriel Dedic; Ahmet Levent Taşel; Ning Yu; Vishal M. Patel; and Paul Debevec
- “MambaReg: Vision Mamba Also Needs Registers” by Feng Wang; Jiahao Wang; Sucheng Ren; Guoyizhe Wei; Jieru Mei, Engr ’24 (PhD); Wei Shao; Yuyin Zhou, Engr ’20 (PhD); Alan Yuille, and Cihang Xie, Engr ’20 (PhD)
- “MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval” by Reno Kriz; Kate Sanders; David Etter; Kenton Murray; Cameron Carpenter; Hannah Recknor; Jimena Guallar-Blasco, Engr ’24, ’24 (MSE); Alexander Martin; Eugene Yang; and Benjamin Van Durme
- “Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models” by Xingrui Wang, Wufei Ma, Tiezheng Zhang, Celso M. de Melo, Jieneng Chen, and Alan Yuille
- “SpatialLLM: A Compound 3D-Informed Design Toward Spatially-Intelligent Large Multimodal Models” by Wufei Ma, Luoxin Ye, Nessa McWeeney, Celso M. de Melo, Alan Yuille, and Jieneng Chen
- “Towards Zero-Shot Anomaly Detection and Reasoning With Multimodal Large Language Models” by Jiacong Xu, Shao-Yuan Lo, Bardia Safaei, Vishal M. Patel, and Isht Dwivedi
- “Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval” by Arun Reddy, Alexander Martin; Eugene Yang; Andrew Yates, Kate Sanders; Kenton Murray; Reno Kriz; Celso M. de Melo, Benjamin Van Durme, and Rama Chellappa
Also to be presented is the demo “SimWorld: A World Simulator for Scaling Photorealistic Multi-Agent Interactions” by Yan Zhuang, Jiawei Ren, Xiaokang Ye, Xuhong He, Zijun Gao, Ryan Wu, Mrinaal Dogra, Cassie Zhang, Ziqiao Ma, Tianmin Shu, Zhiting Hu, and Lianhui Qin.