Research Overview

Image of Tim Randolph and colleagues at a white board

Clinical and Public Health Science collaborations

Tim Randolph collaborates with clinical, laboratory, and public health scientists who are using a wide range of data types to study health: early detection of cancer, classification of cancer subtypes, the influence of microbiome on cancer risk and treatment, neuro-connectivity and cognitive health, personalized treatment in transplant recipients, and patient-specific response to drugs including the lung cancer therapies. In this latter role, Tim directs the Bioinformatics Core for the Fred Hutch Lung SPORE, directed by McGarry Houghton.

Statistical Research

Mathematical and statistical work focuses on methods to facilitate the analysis of data in clinical, laboratory and public health sciences research. Data may be comprised of dozens to thousands (or more) measurements per sample which represent the presence and/or activity of molecular functions. Groups of metabolites, for example, may be indicative of a health status with the abundance of these metabolites determined by (groups of) gut bacteria. The goal is to aid understanding of how these types of data, when analyzed together, reveal additional insights. An interesting challenge is to put these analyses into a statistical framework that accounts for the many uncertainties in the data so that inferences can be made about relationships between molecular measurements and human health. Active collaborations on these topics include: Jarek Harezlak, Parker Knight, Jing Ma, Ali Shojaie, Yue Wang, Mike Wu.

Examples of Statistical Research Projects

Kernel Penalized/Generalized Matrix Decompostion Regression: This family of projects is aimed at analyses of data having many (many) variables which may relate to one another based on spatial, temporal, biochemical or functional structure. Extensions of penalized regression (such as ridge regression and the lasso) provide a statistically tractable way of incorporating biological context into the analysis of an otherwise intractable, underdetermined models. Applications include: longitudinally-sampled functions; metabolic networks; neuro-connectivity data; phylogenetic structure for microbiome data.

Motivated by a popular use of distance-based methods for analyzing multivariate data, we also incorporate (non linear) similarities between many-variate samples in order to inform high-dimensional linear regression models. Briefly, given n samples each with p molecular easurements (typically n << p), a set of (dis)similarity matrices (kernels) summarize relationships between the patient samples and investigate whether the p molecular measures (e.g., genes or microbes) are associated with a disease outcome among the n individuals. We incorporate these structures into a linear regression model that selects variables (i.e., metabolites, genes, etc) associated with disease. See the works of KPR, GMDR, SpINNEr.

Joint Matrix Decomposition Regression: Bio-medical/chemical information can arise from multiple sources in the same individual: proteins, genes, metabolites, and/or gut bacteria. Integrating these different data types into a single analysis is of interest since these sources of data are interdependent and may jointly inform human health. Here, we mathematically decompose the variation in multiple datasets into that which is shared among datatypes versus that which is separate (omic-specific). This project aims at using this joint data decomposition to obtain a statistical model that regresses a phenotype or health outcome on the shared signal (as well as the individual-data signal). We estimate the association of each variable (e.g., each metabolite and each bacteria) with the outcome and provide statistical significance for each association.

Multiplex Immunohistochemistry (mIHC) analysis of tissue biopsies. Our mIHC local analysis of neighborhood densities (mIHC-LAND) quantifies regionally defined cell-specific immune markers based on their local densities in the tumor microenvironment (TME), and the densities of all neighboring cell-type markers. Using these region-specific densities (markers and their neighborhoods), we transform a tissue image into a high-dimensional array of marker-based density patterns. This quantifies proteins (and their proximites) in the TME in a way that successfully classifies lung cancer subtypes that exhibit different treatment responses.

Tissue Array Co-Occurrence Matrix Analysis (TACOMA): Accurate and interpretable open-source machine-learning algorithm for quantifying and classifying immunohistochemically-stained tissue images.