Upcoming Schedule
- Tuesday, October 7, 2025, 12:10-1pm
Hybrid: Arnold Building M3-A805 and Zoom
Speaker: Guanghua Xiao, UT Southwestern
Title: To come
- Tuesday, November 4, 2025, 12:10-1pm
Hybrid: Arnold Building M3-A805 and Zoom
Speaker: Carrie Wright, FHCC
Title: To come
- Tuesday, December 2, 2025, 12:10-1pm
Hybrid: Arnold Building M3-A805 and Zoom
Speaker: Anton Arkhipov, Allen Institute
Title: To come
Past Speakers
- September 2, 2025
Speaker: Matheus Viana, Allen Institute
Title: From Pixels to Principles:Revealing the Intracellular Organization of Human Stem Cells
Abstract: At the Allen Institute for Cell Science, we investigate the principles of cellular organization in human induced pluripotent stem cells (hiPSCs) and how these principles change during differentiation and disease. To visualize this dynamic organization, we used CRISPR/Cas9 to generate a large collection of endogenously tagged hiPSC lines. This talk will present the computational workflows we have developed to analyze high-resolution 3D live images from these cells. I will focus on our advanced methods for cellular representation, including the use of spherical harmonics to model cell and nuclear shape, and diffusion autoencoders to study how hiPSC-derived endothelial cells respond to shear stress.
- July 17, 2025
Speaker: Larry Han, Northeastern University
Title: Two Frontiers in Statistical Learning: Data Fusion for Survival Analysis and Fair Conformal Prediction
Abstract: This talk will highlight recent methodological advances at the intersection of causal inference, data fusion, and conformal prediction. In the first part, I will cover methods for causal survival analysis across multiple data sources under distribution shift, including a semiparametric efficient estimator for shared individual-level data and a federated learning approach for privacy-constrained settings. These methods flexibly model time-to-event outcomes while addressing censoring and heterogeneous populations, illustrated through multi-site HIV-1 prevention trials conducted across the United States, Europe, South America, and sub-Saharan Africa. In the second part, I will introduce Surrogate-Assisted Group-Clustered Conformal Inference (SAGCCI), which improves fairness and efficiency in conformal prediction by clustering protected groups with similar conformal score distributions and incorporating surrogate outcomes to produce narrower prediction sets than state-of-the-art methods. SAGCCI ensures approximate group-conditional coverage and demonstrates strong empirical performance in simulations and an application to the Moderna COVE COVID-19 vaccine trial. For more details, see our papers: Targeted Data Fusion for Causal Survival Analysis Under Distribution Shift (arXiv:2501.18798 ); Bridging Fairness and Efficiency in Conformal Inference: A Surrogate-Assisted Group-Clustered Approac (ICML 2025)
- June 3, 2025
Speaker: Jean Feng, UCSF
Title: Bayesian Concept Bottleneck Models with LLM Priors
Abstract: Concept Bottleneck Models (CBMs) have been proposed as a compromise between white-box and black-box models, aiming to achieve interpretability without sacrificing accuracy. The standard training procedure for CBMs is to predefine a candidate set of human-interpretable concepts, extract their values from the training data, and identify a sparse subset as inputs to a transparent prediction model. However, such approaches are often hampered by the tradeoff between enumerating a sufficiently large set of concepts to include those that are truly relevant versus controlling the cost of obtaining concept extractions. This work investigates a novel approach that sidesteps these challenges: BC-LLM iteratively searches over a potentially infinite set of concepts within a Bayesian framework, in which Large Language Models (LLMs) serve as both a concept extraction mechanism and prior. BC-LLM is broadly applicable and multi-modal. Despite imperfections in LLMs, we prove that BC-LLM can provide rigorous statistical inference and uncertainty quantification. In experiments, it outperforms comparator methods including black-box models, converges more rapidly towards relevant concepts and away from spuriously correlated ones, and is more robust to out-of-distribution samples.
- May 6, 2025
Speaker: William Hsu, UCLA
Title: Using Multimodal Models to Infer Cancer Biology from Medical Images
Abstract: Clinicians utilize increasingly diverse data types, including demographics, lab results, clinical notes, radiologic and histopathologic images, and blood-based biomarkers, to characterize genetic and environmental factors influencing disease development and progression, such as cancer. This talk will detail my lab's efforts to develop and validate multimodal machine learning methods that integrate clinical, diagnostic imaging, and molecular data to provide novel insights into cancer behavior and progression. I will discuss strategies for multimodal data analysis and informatics methods to effectively combine and interpret information across various data modalities.
- April 1, 2025
Speaker: Kyle Lafata, Duke
Title: Computational Tumor Phenotyping and Multiscale Mathematical Modeling to Study Radiation Resistance and Immune Dysregulation
Abstract: Cancer heterogeneity spans multiple length-scales of biological organization, including tissue, cellular, and molecular levels. Characterization of these domains is essential to overcoming therapeutic resistance and guiding personalized treatment strategies. This talk will focus on computational tumor phenotyping strategies and multiscale mathematical modeling of treatment resistance and immune dysregulation. I will demonstrate how imaging, digital pathology, and spatial transcriptomics enable a multiscale representation of tumor appearance and behavior. By integrating physics-informed, mathematical tumor models (theory) with image-based, data-driven solutions (observables), I will demonstrate that these techniques can capture both clinically relevant and biologically-sound phenomena. Overarching illustrating examples will include radiation-induced changes in tumor dynamics, single-cell evaluation of the tumor immune microenvironment and immune response, molecular insight into tumor heterogeneity, and biologically-guided adaptive treatment strategies.
- March 5, 2025
Radiation Oncology Seminar: Andrew Trister, Verily
Title: Plaiting the Golden Braid: How Artificial Intelligence Can Lead to Greater Humanity in Healthcare
Andrew Trister, MD, PhD, is the Chief Medical and Scientific Officer at Verily where he leads the company’s research, science and population health initiatives, providing expertise across Verily’s portfolio as the company furthers its precision health strategy. He specializes in digital health and artificial intelligence.
- February 4, 2025
Speaker: Hoifung Poon, Microsoft Research
Title: Multimodal Generative AI for Precision Health
Abstract: The dream of precision health is to develop a data-driven, continuous learning system where new health information is instantly incorporated to optimize care delivery and accelerate biomedical discovery. The confluence of technological advances and social policies has led to rapid digitization of multimodal, longitudinal patient journeys, such as electronic medical records (EMRs), imaging, and multiomics. Our overarching research agenda lies in advancing multimodal generative AI for precision health, where we harness real-world data to pretrain powerful multimodal patient embedding, which can serve as digital twins for patients. This enables us to synthesize multimodal, longitudinal information for millions of cancer patients, and apply the population-scale real-world evidence to advancing precision oncology in deep partnerships with real-world stakeholders such as large health systems and life sciences companies.
- January 7, 2025
Speaker: Gary Zhao, FHCC Translational Science and Therapeutics
Title: Detection of Mutant Blood Cells by Trans-species Morphological Learning. (No recording available)
Abstract: Detection of sparse mutant cells in blood samples has broad implications in precision medicine and cancer prevention. The intrinsic technical limitations of DNA sequencing-based mutation detection methods have been limiting the sensitivity, cost effectiveness, and turnaround time of such tasks. We broke away from the “ball and chain” of DNA sequencing and developed a trans-species single-cell morphology learning system that allows detection of mutant blood cells with improved sensitivity, lower cost, and faster turnaround, making population-wide screening and dense-time scale analysis possible.
- November 5, 2024
Speaker: Adam Visokay, University of Washington
Title: Inference on Predicted Data: Examples from Verbal Autopsies and the BMI
As AI and ML tools become more accessible, and scientists face new obstacles to data collection (e.g. rising costs, declining survey response rates), researchers increasingly use relatively cheap predictions from pre-trained algorithms in place of more expensive "ground truth" data. Standard tools for inference can misrepresent the association between independent variables and the outcome of interest when the true, unobserved outcome is replaced by a predicted value. In this talk, I present an overview detailing how to perform valid inference when working with predicted data. I will share two examples of this method in practice - one in the context of global public health working with Verbal Autopsy data, and the other in the context of medicine, working with BMI data.
Bonus talk!
Speaker: Ben McGough, FHCC Scientific Computing
Title: Hutch Scientific Computing HPC Cluster Roadmap
- October 1, 2024,
Speaker: Pang Wei Koh, Univeristy of Washington
Title: Reliable data use: Synthesis, retrieval, and interaction
How can we better use our data to build more reliable and responsible AI models? I will first discuss when it might be useful to train on synthetic image data derived, in turn, from a generative model trained on the available real data. Next, I will describe how scaling up the datastore for retrieval-based language models can significantly improve performance, indicating that the amount of data used at inference time—and not just at training time—should be considered as a new dimension of scaling language models. Finally, I will discuss how the static nature of most of our training data leads to language model failures in interactive settings.
- June 18, 2024
Speaker: James Zou, Stanford University
Title: How Generative AI Can Transform Biomedical Research (recording not available)
Abstract: This talk explored how we can develop and use generative AI to help researchers and clinicians. I will first discuss how generative AI can act as research co-advisors. Then I will present how we developed visual-language AI to help clinicians aggregate and interpret noisy data. Finally, I will explore the role of language as the foundational data modality for biomedicine.
- May 7, 2024
Speaker: Zheng Wei, FHCC Comp Bio
Title: Deciphering Gene Regulatory Logic Through Deep Learning Interpretation
Abstract Discovering DNA/RNA regulatory sequence motifs and their relative positions is fundamental to understanding the mechanisms of gene expression regulation across development, tumors, and various diseases. Although deep convolutional neural networks (CNNs) have achieved great success in predicting different cis-regulatory elements, the discovery of motifs and their combinatorial patterns from these CNN models has remained difficult. We show that the main difficulty is due to the problem of multifaceted neurons that respond to multiple types of sequence patterns. To overcome this problem, we propose the NeuronMotif algorithm to interpret such neurons. NeuronMotif can output the sequence motifs, and the syntax rules governing their combinations are depicted by position weight matrices organized in tree structures, which are supported by multiple sources including existing knowledge databases, ATAC-seq data, and the literature. Currently, we are developing CNN-transformer hybrid models and corresponding interpretation algorithms to decode the regulatory logic involving long distance (> 1 Mb) in a networked manner. Our long-term goal is to use the regulatory logic to understand how non-coding mutations drive cancer risk and to develop accurate risk prediction scores for precision oncology.
- April 16, 2024
Speaker: Dominik Otto, FHCC Setty Lab
Title: Quantifying Cell-State Densities using Mellon and Inferring Differentiation Dynamics.
Abstract: Cell-state density characterizes the distribution of cells along phenotypic landscapes and is crucial for unraveling the mechanisms that drive cellular differentiation, regeneration, and disease. We present Mellon, a novel computational algorithm for estimation of cell-state densities from high-dimensional representations of single-cell data. We demonstrate Mellon's efficacy by dissecting the density landscape of various differentiating systems, revealing a consistent pattern of high-density regions corresponding to major cell types intertwined with low-density, rare transitory states. Mellon offers the flexibility to perform temporal interpolation of time-series data, providing a detailed view of cell-state dynamics during the inherently continuous developmental processes. I elaborate on how to use these cell-state density estimates for comprehensive training of cell-differentiation models based on deep neural networks and gaussian process. I will showcase an existing publication [1] and further advancements we are working on with Mellon.
[1] Sha, Yutong, Yuchi Qiu, Peijie Zhou, and Qing Nie. “Reconstructing Growth and Dynamic Trajectories from Single-Cell Transcriptomics Data.” Nature Machine Intelligence 6, no. 1 (January 2024): 25–39. https://doi.org/10.1038/s42256-023-00763-w.
- March 5, 2024
Speaker: Rachel Thomas, fast.ai
Title: Medical AI Needs You
Abstract: It is not only possible, but also crucial, for you to get involved with medical AI. Advances in AI are having a transformative impact on many fields, including medicine. However, AI also creates and amplifies a number of ethical risks. Having researchers and practitioners from a variety of backgrounds helps to mitigate these risks and to take advantage of opportunities. Through my work co-founding fast.ai, I helped to create the longest running deep learning course in the world and reached a diverse group of students. Fast.ai alumni from a range of unconventional backgrounds have been able to make a positive impact. This talk will cover details of the particular risks and challenges impacting medical AI, as well as the positives of when people from unlikely backgrounds get involved.
Speaker's Erratum: Lauren Oakden-Rayner is not a fast.ai alum. Two radiologists who are fast.ai alums are Alexandre Cadrin-Chênevert and Judy Gichoya.
- February 6, 2024
Speaker: Harsha Nori, and Rich Caruana, Microsoft Research
Title: Large Language Models (LLMs), Healthcare, and Interpretable Machine Learning
Abstract: Recent language models like GPT-4 and MedPaLM have shown remarkable capabilities in various aspects of medicine, including clinical reasoning, diagnostics, and even augmenting clinician-patient interactions. These models have achieved expert level performance on competency exams like the USMLE and a battery of specialty board exams. We'll begin with a discussion on how we've been assessing these models at Microsoft, including a dive into how to elicit maximal performance from these models and a reflection on translating benchmark performance to the real world. We'll then share work we're doing to bring the power of LLMs into more traditional machine learning for healthcare tasks, like predicting 30-day readmission on structured, tabular datasets. We'll show how LLMs and interpretable machine learning models commonly used in healthcare can work surprisingly well together, especially on tasks that LLMs alone are not naturally suited for. Finally, we'll discuss promising trends on the horizon for the future of ML for healthcare.
- January 9, 2024
Speaker: Daniel Jones, Newell Lab, Vaccine and Infectious Disease Division
Title: Cell Simulation as Cell Segmentation (recording not available)
Abstract: Single-cell spatial transcriptomics promises a highly detailed view of a cell's transcriptional state and microenvironment, yet inaccurate cell segmentation can render this data murky by misattributing large number of transcripts to nearby cells or conjuring nonexistent cells. We adopt methods from ab initio cell simulation to rapidly infer morphologically plausible cell boundaries that preserve cell type heterogeneity.
- December 5, 2023
Speaker: Clemens Grassberger, Department of Radiation Oncology, University of Washington
Title: Outcome Prediction in Radiation Oncology – from Machine Learning to Mechanistic Models
Abstract: Based on a series of paper we will discuss the role of data-driven AI/ML in radiation oncology outcome prediction, and how they compare to more knowledge-driven mechanistic models.
- November 7, 2023
Speaker: Michael Haffner, Human Biology Division and Clinical Research Division, Fred Hutchinson Cancer Center
Title: Machine Learning-Based Morphologic Characterization of Genitourinary Malignancies
Abstract: In recent years, the field of medicine has witnessed a remarkable transformation with the advent of cutting-edge computer-based image and pattern analyses. These approaches have proven exceptionally adept at efficiently detecting and classifying objects, including the identification of cancer from large and complex medical images. This is particularly invaluable, as such tasks can be exceedingly time-consuming for healthcare professionals. Moreover, the remarkable power of these methods lies in their capacity to delineate patterns that were previously not recognized by physicians and researchers. In this seminar, I will share our efforts in harnessing machine learning-based approaches to define morphologic features of advanced metastatic prostate cancer and bladder cancer from standard pathology images, with a focus on developing biomarkers that can guide clinical decision making.
- October 3, 2023
Speaker: Lucas Liu, Biostatistics Program, Publich Health Sciences Division, Fred Hutchinson Cancer Research Center
Title: Deep Learning for Data Representation in Temporal Electronic Health Records
Abstract: Analyzing and mining Electronic Health Records (EHR) data is essential for improving patient care and reducing healthcare costs. Sequential deep learning (DL) methods have become increasingly popular for analyzing temporal EHR data in various healthcare applications. Despite their promising performance, DL methods face significant challenges in being adopted in real-world healthcare settings. These challenges include handling irregular temporal scales and asynchronous multi-variable EHR inputs, ensuring model interpretability, and addressing fairness and performance disparities. In this talk, we will introduce DL algorithms for learning data representation in temporal EHR data and propose solutions to overcome these challenges.
- September 5, 2023
Speaker: John Kang, Department of Radiation Oncology, University of Washington
Title: Unlocking Information Extraction in Oncology using NLP
Abstract: Being able to detect and predict patient outcomes is an aspirational goal for AI in oncology. In this talk, we discuss how deeper and improved representation methods can unlock better methods for detection and prediction. We begin with current applications of AI in clinical trials using structured data, and then discuss how improved representation using NLP can improve performance and decrease labeling burden.
- June 15, 2023
Speaker: Young Hwan Chang, Computational Biology Program, OHSU
Title: Representation Learning and its Application on Multiplex Tissue Imaging Data
Abstract: Multiplexed Tissue Imaging (MTI) techniques have revolutionized tissue sample analysis by enabling simultaneous measurement of numerous biomarkers. However, challenges such as technical artifacts, tissue loss, long acquisition times, and limitations in current MTI analyses hinder its full potential. In this talk, we will address these challenges and propose solutions to enhance the capabilities and accessibility of MTI, with a focus on representation learning. Our emphasis lies in the need for comprehensive representations of multiplexed single-cell images, encompassing morphology, cell shape, and texture beyond mean intensity features. We will also explore techniques like image-to-image translation and image-to-omics integration to obtain transferable multimodal representations, facilitating a holistic interpretation of cellular data. Through the utilization of representation learning on MTI, we uncover diagnostically significant features in standard histopathology images, advancing our understanding of tumor biology and improving cancer diagnosis and treatment. Furthermore, we will discuss strategies to overcome cost and time challenges associated with MTI, making it more accessible in cancer research and clinical settings. These advancements propel the field forward, unlocking the potential of MTI in cancer diagnosis and treatment, driving scientific discoveries, and ultimately improving patient outcomes.
- May 2, 2023
Speaker: William Stafford Noble, Department of Genome Science, University of Washington
Title: Deep Learning for Mass Spectrometry Proteomics
Abstract: In this talk, I will describe several recent and ongoing projects that apply deep neural networks to the analysis of protein tandem mass spectrometry data. We first show how a Siamese architecture can be trained in a supervised fashion to embed individual mass spectra into a 32-dimensional space, yielding a compact representation that enables large-scale, highly accurate clustering of the spectra and significantly enhances our ability to assign observed spectra to their corresponding peptide sequences. The second project uses a model with a transformer architecture to perform de novo peptide sequencing, by translating directly from a mass spectrum (a sequence of peaks) to a peptide (a sequence of amino acids). The resulting model, trained from 30 million spectra, outperforms existing methods and enhances our ability to interpret various types of mass spectrometry data.