Research Overview

Image of Tim Randolph and colleagues at a white board

Research Focus

Tim Randolph's work focuses on mathematical and statistical methods to facilitate the analysis of data in clinical, laboratory and public health sciences research. These data are often comprised of many (thousands or more) measurements per sample which represent the presence and/or activity of molecular functions. For example, groups of metabolites may be indicative of a drug’s success or groups of gut bacteria may be associated with health. The goal is to help researchers understand how these types of data, when analyzed together, may reveal additional insights. An interesting challenge to put this into a statistical framework that accounts for the many uncertainties in the data so that inferences can be made about relationships between molecular measurements and health or disease.

Statistical Research Projects

Kernel Penalized Regression: This is motivated by a popular use of distance-based tests for analyzing multivariate data. Briefly, given n samples each with p measurements, many different n-by-n (dis)similarity matrices can be formed to summarize relationships between the samples. These relationships may be plotted or further summarized to investigate whether the p measurements are associated with a disease or phenotype of the n individuals. We incorporate these structures into a linear regression model that selects variables (i.e., metabolites, genes, etc) that are associated with an outcome. Examples include investigating whether patterns of metabolite measures or microbial abundances are related to disease.

Partially Empirical Eigenvectors for Regression (PEER): This family of projects is aimed at analyses of data having many (many) variables which may relate to one another based on spatial, temporal, biochemical or functional structure.  These extensions of penalized regression (such as ridge regression and the lasso) provide a statistically tractable way of incorporating biological context into the analysis of an otherwise ill-posed (intractable, underdetermined) problem. Applications include: longitudinally-sampled functions; metabolic networks; neuro-connectivity data; phylogenetic structure for microbiome data.

Tissue Array Co-Occurrence Matrix Analysis (TACOMA): Accurate and interpretable open-source machine-learning algorithm for quantifying immunohistochemically-stained tissue images.

Clinical and Public Health Science collaborations

Tim Randolph collaborates with clinical and laboratory scientists who are using data, as described above, in studies on a wide range of topics: early detection of colorectal cancer; body fat and cancer risk; brain imaging studies of HIV-associated cognitive decline; personalizing drug dose in hematopoietic stem cell transplant recipients; or understanding why leukemia patients respond to differently to the same drug.