Single cell RNA-sequencing (scRNA-seq) can be used to dissect transcriptomic heterogeneity that is masked in population-averaged measurements. We validated a fully-integrated and robust droplet-based system that enables 3’ mRNA digital profiling of thousands of single cells in a highly multiplex fashion. We demonstrate the clinical utility of our technology to characterize both immune cell subtypes and genotypes by integrating single cell digital RNA profiling with de novo single nucleotide variant (SNV) calling.
To permit the measurement of spontaneous and induced nuclear and mitochondrial mutations, we developed the digital Random Mutation Capture assay (dRMC). The dRMC permits the analysis of millions of nucleotides, and can identify one mutant base pair among 109 wild-type base pairs. In our approach, enrichment for mutant mtDNA with restriction endonucleases precedes single molecule amplification, effectively eliminating issues with polymerase fidelity.
Next-generation sequencing (NGS) technologies have transformed genomic research and have the potential to revolutionize clinical medicine. However, the background error rates of sequencing instruments and limitations in targeted read coverage have precluded the detection of rare DNA sequence variants by NGS. We developed a method, termed CypherSeq, which combines double-stranded barcoding error correction and rolling circle amplification (RCA)-based target enrichment to vastly improve NGS-based rare variant detection.
Multiple independent studies have documented that the presence and quantity of tumor-infiltrating lymphocytes (TILs) are strongly correlated with increased survival. However, because of methodological factors, the exact effect of TILs on prognosis has remained enigmatic, and inclusion of TILs in standard prognostic panels has been limited. To address this limitation, we introduced a robust digital DNA-based assay, termed QuanTILfy, to count TILs and assess T cell clonality in tissue samples, including tumors.
Characterizing the transcriptome of individual cells is fundamental to understanding complex biological systems. We validated a fully-integrated and robust droplet-based system that enables 3’ mRNA digital profiling of thousands of single cells in a highly multiplex fashion. Cell encapsulation, of up to 8 samples at a time, takes place in ∼6 min, with ∼50% cell capture efficiency.
Single-cell RNA-sequencing (scRNA-seq) can be used to dissect transcriptomic heterogeneity that is masked in population-averaged measurements. scRNA-seq studies have led to the discovery of novel cell types and provided insights into regulatory networks during development. However, previously described scRNA-seq methods face practical challenges when scaling to tens of thousands of cells or when it is necessary to capture as many cells as possible from a limited sample. Commercially available, microfluidic-based approaches have limited throughput. Plate-based methods often require time-consuming fluorescence-activated cell sorting (FACS) into many plates that must be processed separately. Droplet-based techniques have enabled processing of tens of thousands of cells in a single experiment, but current approaches require generation of custom microfluidic devices and reagents.
To overcome these challenges, we developed a droplet-based system that enables 3′ messenger RNA (mRNA) digital counting of thousands of single cells (Figure 1). Approximately 50% of cells loaded into the system can be captured, and up to eight samples can be processed in parallel per run. Reverse transcription takes place inside each droplet, and barcoded complementary DNAs (cDNAs) are amplified in bulk. The resulting libraries then undergo Illumina short-read sequencing. An analysis pipeline, Cell Ranger, processes the sequencing data and enables automated cell clustering. Here we first demonstrated comparable sensitivity of the system to existing droplet-based methods by performing scRNA-seq on cell lines and synthetic RNAs. Next, we profiled 68k fresh peripheral blood mononuclear cells (PBMCs) (Figure 2) and demonstrated the scRNA-seq platform’s ability to dissect large immune populations. Last, we developed a computational method to distinguish donor from host cells in bone marrow transplant samples by genotype. We combined this method with clustering analysis to compare subpopulation changes in acute myeloid leukemia (AML) patients. This analysis enables transplant monitoring of the complex interplay between donor and host cells.
We demonstrated the scalability and robustness of the system through transcriptome analysis of ∼250k single cells across 29 samples. scRNA-seq of cell lines and synthetic RNAs showed the system’s comparable sensitivity to other droplet-based methods.
The GemCode technology platform enables high-throughput scRNA-seq with rapid cell encapsulation and a high cell capture rate that addresses the challenges associated with existing scRNA-seq platforms. Single gel beads are encapsulated into GEMS at ∼80% fill rate. This fill rate combined with Poisson loading of cells results in ∼50% cell capture rate, enabling the processing of samples with limited cell input material. We demonstrate the ability to load from 1,000 to 23,000 cells per well, from four different cell lines and two primary cell types (PBMCs and BMMCs), illustrating the applicability of the GemCode platform to a wide variety of cell types. The GEM-based encapsulation of single cells within the microfluidics platform reduces the need for expensive sorting equipment and complicated workflows involving large numbers of plates. The scalability and high-throughput nature of the GemCode platform is achieved in two ways: hundreds to thousands of cells can be encapsulated per channel, and each chip has eight channels. Therefore, a large number of cells can be processed within a very short period of time, minimizing the perturbation of the cellular transcriptome. In addition, multiple samples can be processed simultaneously, a key advantage for experimental setups that involve a time course or multiple treatments.
Previous mutational assays able to identify rare random spontaneous mutations have ultimately been restricted to model systems. Although tissue culture and transgenic animal systems are powerful tools for identifying potential mutagens, they cannot accurately predict mutagenesis in humans. To permit the measurement of rare random mutation in human tissues, we developed the Random Mutation Capture (RMC) assay (Figure 1). The RMC assay is >100-fold more sensitive than previous methods that employ genomic selection, permits analysis of a large number of nucleotides, and can identify one mutant base pair among 109 wild-type base pairs.
It was with the development of this new technology that we were first able to provide the most convincing evidence to date for existence of a mutator phenotype in human cancers, a hypothesis proposed more than 30 years prior.
Although this assay was initially developed to study point mutation accumulation in the nuclear genome, we have since adapted it to resolve mitochondrial mutations and increased its resolution and throughput by “digitizing” the assay to more sensitively monitor base substitution and deletion mutations (Figure 1). This has allowed us to redefine the relationship among mitochondrial mutagenesis, cancer and aging.
For example, we recently demonstrated two surprising phenomena: 1) far fewer mitochondrial mutations arise in tumors than in normal healthy tissue, and 2) mitochondrial DNA exhibits mutagenic resistance to DNA-damaging agents.
Next-generation sequencing (NGS) technologies have transformed genomic research and have the potential to revolutionize clinical medicine. However, the background error rates of sequencing instruments and limitations in targeted read coverage have precluded the detection of rare DNA sequence variants by NGS. We have developed a method, termed CypherSeq, that combines double-stranded barcoding error correction and rolling circle amplification (RCA)-based target enrichment to vastly improve NGS-based rare variant detection.
CypherSeq (Figure 1) is designed to overcome the three main barriers to rare variant detection: (i) error correction, (ii) read depth and (iii) enrichment. CypherSeq employs double-stranded molecular barcoding to achieve high sensitivity base calling. Additionally, we exploit the circular nature of the plasmid-based sequencing library to enrich for specific targets using rolling circle amplification (RCA) based enrichment (Figure 2) to reduce off-target reads and maximize read depth. CypherSeq's combination of accuracy and enrichment will enable the full potential of personalized, sequencing-based clinical applications to be realized.
The number of reads produced from an NGS instrument is an important factor for rare variant detection. The coverage depth required at a site in order to detect a variant is inversely proportional to its frequency within a sample, requiring ever greater depth to detect rarer variants. For example, detecting a variant in ‘gene X’ present in 1 out of every 105 genomes would require at least 105 coverage of ‘gene X’. 105 reads is not difficult to achieve, however with conventional approaches the rest of the genome, roughly 3 × 109 bp, would also be sequenced at a depth of 105, requiring 2.4 × 1012 (2.4 trillion) 125 bp reads or the equivalent of 1200 HiSeq lanes, which is cost prohibitive. This problem is compounded when combined with error correcting sequencing technologies which, due to the need for redundant barcoded reads, reduce the number of unique reads produced. As there are practical constraints on the read yield available from current sequencing platforms, detection of extremely rare variants cannot be performed quantitatively for each site genome-wide and must be limited to specific genomic targets of interest. In order to ensure adequate read depth, target sequences must be enriched within the heterogeneous input sample to limit off-target sequence reads.
The CypherSeq methodology incorporates the error-correcting capabilities of double-stranded barcodes into a circular construct that carries all the components required for NGS. The sequencing construct is cloned into a bacterial plasmid, and thus permits the replication and storage of the barcoded CypherSeq vectors in bacteria, whereas its circular nature allows for enrichment and amplification of specific targets via RCA. The CypherSeq workflow is compatible across many NGS platforms including the Illumina, Ion Torrent, Pacific Bio, 454 and SMRT systems, and is also capable of large-scale multiplexing using conventional indexes.
We demonstrate that CypherSeq corrects errors inherent in NGS sequencing outputs allowing detection of mutations down to a frequency of 2.4 × 10−7 per base pair. However, the sensitivity of the CypherSeq methodology is likely even greater, as double-stranded barcoding-based error correction can theoretically permit the resolution of mutation frequencies as low as 10−9–10−10 per nucleotide and depends upon the number of unique reads generated.
Translation of robust rare variant detection methods, such as CypherSeq, to the clinic have the potential to dramatically transform disease diagnostics, monitoring and prognostication. Circulating tumor DNA (ctDNA) and circulating tumor cells (CTCs) are detectable in the blood of most patients with advanced cancer and in a significant percentage of patients in the early stages of cancer. Early cancer diagnosis is currently the most promising approach to reducing mortality, as early detection is associated with more favorable prognosis for nearly all cancer types. Reliable detection of early-stage cancer, by quantifying ctDNA or CTCs marked by cancer-specific mutations, will require the most highly sensitive and specific rare variant detection assays to enable screening in a vast background of wild-type normal cells. By exploiting CypherSeq's highly sensitive error correction abilities and by targeting the enrichment step to a panel of genes known to be mutated in cancers, we expect CypherSeq will be able to achieve the sensitivity and specificity required for the early detection of disease.
The human cellular adaptive immune system identifies and destroys cells expressing aberrant proteins or protein fragments. The source of the abnormal protein fragments can include intracellular pathogenic infection, genomic mutations, or deregulation of gene expression. Cancerous cells often express such aberrant peptides, prompting a cellular adaptive immune response. These peptides are presented on the surface of cells by human leukocyte antigen molecules for binding by T cell receptors (TCRs) on the surface of T-lymphocytes, the primary mediators of the cellular adaptive immune response.
Tumor-infiltrating lymphocytes (TILs) have been shown to directly attack tumor cells in a variety of types of cancer, and multiple independent studies have demonstrated that the presence of TILs is strongly correlated with increased survival. For both colorectal and ovarian carcinoma patients, the presence or absence of TILs provides a strong prognostic marker for survival independent of current staging methods. However, existing assays and pathology tests to measure TILs are cumbersome, have inherent variability, are mostly restricted to research studies, and thus are not used for clinical decision-making.
As the importance of TILs gains appreciation, particularly given their potential utility for cancer prognostication and their role in immunotherapeutic response, new technologies to quantitatively measure TILs are needed. Fortunately, adaptive immune cells have a molecular signature that can be exploited for direct measurement. T cells have gene rearrangements in their TCR loci. The nucleotide sequences that encode the TCR regions are generated by somatic rearrangement of noncontiguous variable (V), diversity (D), and joining (J) region gene segments for the β chain, and V and J segments for the α chain. The existence of multiple V, D, and J gene segments in germline DNA permits substantial combinatorial diversity in receptor composition, and receptor diversity is further increased by the deletion of nucleotides adjacent to the recombination signal sequences (RSSs) of the V, D, and J segments, and template-independent insertion of nucleotides at the Vβ-Dβ, Dβ-Jβ, and Vα-Jα junctions.
We have developed QuanTILfy to measure the number of T-lymphocytes and assess clonality in a tissue using droplet digital polymerase chain reaction (ddPCR) technology. The massive sample partitioning is a key aspect of the ddPCR technique and a vital component of the QuanTILfy assay (Figure 1). ddPCR surpasses the performance of earlier techniques by introducing a scalable implementation of digital PCR, where the creation of tens of thousands of droplets allows for the generation of tens of thousands of data points, bringing the power of statistical analysis inherent to digital PCR into practical application.