The central theme of the Lin Lab is to study the regulatory logics of the non-coding regulatory network in the oncogenic 3D genome, for developing cancer risk prediction schemes by integrating the regulatory consequences of non-coding mutations, and further for developing broadly targeted therapies by deciphering the sequence codes governing the regularly logics.
98% of a human genome is made up of non-coding regions. They contain various non-coding elements that collectively determine cell fates and disease stages. However, we are still not clear how these non-coding elements work together through the 3D genome. We have developed and applied computational algorithms (for instance, BSeQC: quality control of bisulfite sequencing experiments, Sparse conserved under-methylated CpGs are associated with high-order chromatin structure and Comparative analysis of metazoan chromatin organization) on multimodel omics data to identify non-coding regulatory elements and elucidate their regulatory logics in the context of cell identity and cancers (for instance, DNMT3A Loss Drives Enhancer Hypomethylation in FLT3-ITD-Associated Leukemias, Broad H3K4me3 is associated with increased transcription elongation and enhancer activity at tumor-suppressor genes, and Homeobox oncogene activation by pan-cancer DNA hypermethylation).
The high-content CRISPR screen that enables functional understanding of non-coding regulatory network brings great opportunities for the systematic identification of the regulatory logics underlying development and disease. We have developed and applied computational methods to maximize the power of the dCas9 CRISPR system in developing diverse tools for chromatin and 3D genome perturbation, and data mining for CRISPR screen data (for instance, Computational Methods for Analysis of Large-Scale CRISPR Screens, CRISPR Activation Screens Systematically Identify Factors that Drive Neuronal Fate and Reprogramming and CRISPhieRmix: a hierarchical mixture model for CRISPR pooled screens). On the other hand, we are developing multiplexed CRISPR interference (CRISPRi) screen with single cell sequencing to perturb pairs of non-coding regulatory elements (e.g. enhancers) to infer their functional interactions in oncogenic regulatory networks. To study the generalized DNA/chromatin codes in determining the regulatory logics in oncogenic regulatory networks, machine learning and deep learning models are being developed to integrate multi-model data to predict non-coding functional interaction and decipher the latent codes.
Recent GWAS studies have revealed that more than 90% of variants in the non-coding genome with small effects in cancers. With the accumulating genome-wide sequencing datasets, it remains a major challenge to interpret the biological and pathological functions of non-coding variants. This bottleneck further inhibits our ability to translate the genomic sequencing information into clinical applications. We believe the key lies in understanding the millions of non-coding variants as a whole function through oncogenic regulatory networks. We are leveraging the generalized DNA/chromatin codes in oncogenic regulatory networks to design a novel and efficient strategy to study the consequences as a whole of low-risk non-coding variants in cancer risk. Our ultimate goal is to develop epistasis-aware genomic risk prediction schemes for precision oncology. The current focus is on using the large amount of data in GECCO to develop the personalized colorectal cancer risk prediction model for early detection in high-risk individuals.
CRISPR systems enable us to target specific sequence patterns and perturb them. We developed computational methods to delve into the treasure trove of genome sequences of RNA viruses and identify the CRISPR targetable sequence patterns for the development of broadly targeted therapy. Deep convolutional neural networks have achieved great success in predicting cis-regulatory elements. Our current work focuses on deciphering the sequence codes and their syntax rules in governing the regulatory logics in oncogenic regulatory networks. These sequences codes will be utilized into CRISPR diverse system for cancer targeted therapy.