Johns Hopkins investigators create open-source database to more easily study cancer

Published: March 18, 2026

Category:

Research

Janis Taube and Alex Szalay working on the AstroPath platform. The platform uses astronomy algorithms to map tumor biomarkers and identify predictive biomarkers for cancer immunotherapy. — Johns Hopkins astrophysicist Alex Szalay and pathologist Janis M. Taube. Image Credit: Sidney Kimmel Comprehensive Cancer Center

Researchers from the Sidney Kimmel Comprehensive Cancer Center and the Johns Hopkins University have created a novel database structure that allows investigators anywhere to more easily study multiple types of cancer data—including laboratory results, genetic sequencing, and imaging data—in one setting.

Called AstroID, the resource organizes clinical and correlating blood and tissue specimen information in six tiers, including information from the patient (de-identified to protect privacy); diagnosis; clinical events such as treatment or a blood draw; specimens such as material from a biopsy or serology; and then details about how those are processed by the lab into tissue blocks and vials, down to individual slides or aliquots.

The structure, built in a commercial web-based application called REDCap, can be subsequently scaled to accommodate thousands of patients and the spatial characterization of billions of cancer cells. A description of AstroID, which has been made available for any researcher to use, was published December 25, 2025 in the Journal for ImmunoTherapy of Cancer. The work was supported in part by the National Institutes of Health.

Researchers at Johns Hopkins Medicine have now deployed this structure in their laboratories for 16 different patient groups with multiple tumor types, and have over one billion cells spatially mapped and tagged with clinical information from patient experiences.

Typically, in oncology, each patient’s course includes multiple visits, treatments, and outcome measures, explains Janis M. Taube, the director of the Division of Dermatopathology and Oral Pathology and the co-director of the Tumor Microenvironment Laboratory at the Bloomberg~Kimmel Institute for Cancer Immunotherapy. To identify and characterize biomarkers, these parameters need to be linked to multiple tests and assessments, including blood-based laboratory values, tissue-based pathology, radiography, genomic studies, and more.

“What this structure does is allow me to ask questions across all of this data that’s already been gathered, across tumor types, and combine it all together in the context of the longitudinal patient experience,” Taube says.

For example, her lab often conducts studies of patients with melanoma. If she conducted a study 10 years ago looking at patients’ age at diagnosis and what therapies they received, and then later wanted to do another study of this patient population and survival, she might have had to repeat some of the steps to compile a new cohort and regather information about treatments received, what specimens were collected, and clinical outcomes.

“Investigators across the whole institution are also trying to tap into these patients and collect this information,” she says. “There were really huge inefficiencies across how we were working, and lots of duplicating efforts.”

It had been painstaking for researchers to manually enter data, so cancer studies typically were designed around relatively small cohorts, adds Alexander Szalay, the Bloomberg Distinguished Professor of Big Data.

“What we are trying to do is scale out so we can handle patients on the order of hundreds or thousands of patients in a study,” says Szalay, who also is the director of the Institute for Data-Intensive Engineering and Science at Johns Hopkins. “One of our postdoctoral students, Elizabeth Will, in partnership with graduate student Benjamin Green, came up with this wonderful idea of how to organize all the medical and specimen data into multiple hierarchical tiers, which then can be easily translated to a query-oriented platform based on a large relational database.”

While for now the Johns Hopkins researchers are using this platform for cancer studies, the structure could be adapted to characterize longitudinal biospecimens from any disease process, they say.

The code for AstroID is publicly available here. Additional information is available here. Exported data can be explored on its own for research purposes on clinical outcomes, independent of additional biomarker correlates, or merged and queried with a variety of scientific correlates.

Study coauthors include Scott Carey, Govind Warrier, Aasheen Qadri, Andrew Jorquera, Sigfredo Soto-Diaz, Daphne Wang, Joel C. Sunshine, Julie Stein Deutsch, Robert A. Anders, Qingfeng C. Zhu, Ludmila Danilova, Leslie Cope, Evan J. Lipson, and Logan L. Engle of Johns Hopkins, and Tricia R. Cottrell of Queen’s University in Ontario, Canada.

The work was supported by the Mark Foundation for Cancer Research, the Melanoma Research Alliance, the Marilyn and Michael Glosserman Fund for Basal Cell Carcinoma and Melanoma Research, the Bloomberg~Kimmel Institute for Cancer Immunotherapy, and the National Cancer Institute (grants R01CA142779 and T32CA009071).

Taube and Szalay report receiving research support and stock options from Akoya Biosciences. Taube also receives research support from Bristol Myers Squibb and has served as a consultant/advisory board member to Bristol Myers Squibb, Merck & Co., Moderna, Roche/Genentech, Elephas, Regeneron, NextPoint, and Akoya Biosciences. Taube, Szalay, Will, Green, Cottrell, and Engle have patents and pending patents related to the AstroPath platform and associated biomarker discovery. These relationships are managed by the Johns Hopkins University in accordance with its conflict-of-interest policies.

This article originally appeared in the Johns Hopkins Medicine Newsroom »

Johns Hopkins investigators create open-source database to more easily study cancer

Stay Connected

Address

Contact

Site Menu

Share Options

Site Menu