Large (big?) data: diseases, medical records, genes and environment

I will attempt to cover several interrelated topics in analysis of big biomedical data, spending more time on parts that generate feedback.

First, I will introduce our recent study analyzing phenotypic data harvested from over 100 million unique patients. Curiously, these non-genetic large-scale data can be used for genetic inferences. We discovered that complex diseases are associated with unique sets of rare Mendelian variants, referred to as the “Mendelian code.” We found that the genetic loci indicated by this code were enriched for common risk alleles. Moreover, we used probabilistic modeling to demonstrate for the first time that deleterious Mendelian variants likely contribute to complex disease risk in a non-additive fashion.

The second topic that I hope to cover is related to analysis of apparent clusters of neurodevelopmental disorders. Disease clusters are defined as geographically compact areas where a particular disease, such as a cancer, shows a significantly increased rate. It is presently unclear how common are such clusters for neurodevelopmental maladies, such as autism spectrum disorders (ASD) and intellectual disability (ID). As in the first story, examining data for one third of the whole US population, we demonstrated that (1) ASD and ID are manifesting strong clustering across US counties; (2) counties with high ASD rates also appear to have high ID rates, and (3) the spatial variation of both phenotypes appears to be driven by environment, and, by a lesser extent, by economic incentives at the state level.

Speaker Biography

Andrey Rzhetsky is a Professor of Medicine and Human Genetics, at the University of Chicago. He is also a Pritzker Scholar, and a Senior Fellow of both the Computation Institute, and the Institute for Genomics and Systems Biology at the University of Chicago. His research is focused on computational analysis of complex human phenotypes in context of changes and perturbations of underlying molecular networks and environmental insults.