Functional Genomics

What is functional genomics?

Functional genomics is the study of the function of genes contained within an organism's genome, or, put another way, an attempt to figure out what roles genes have in an organism. The earliest examples of functional genomics came in the form of "forward genetic" screens in model organisms such as bacteriophages, bacteria, budding yeast, fruit flies, and roundworms. Their genomes could be mutated at random (e.g., with chemical mutagens) to identify genetic loci (i.e., genes) required for developmental or physiological or molecular processes of interest (e.g., viral particle formation, cell growth, eye formation, reproductive cycle, etc.). These types of studies allowed researchers to infer by introducing a gene's function – an induced mutation in a particular gene causes a specific, measurable change. Importantly, this inference could be made because the screens and follow up experiments were performed in a "wild type" or reference organism. This allowed researchers to introduce only the single genetic change of interest into the reference genome. Since nothing else changes during the experiment, only the DNA mutation of interest could be responsible for the phenotypic change. This became a very powerful tool to assign functions to genes and, in effect, create functional genomic portraits of these model organisms.

However, these same techniques were difficult to carry out in mammals due to multiple experimental barriers, largely arising from the nature of the mammalian genome. First, the genome is diploid – at least two copies of each gene are present. Mutating or altering one gene copy still leaves the other one intact – foiling traditional mutagenesis approaches. Most model organisms allow for a "haploid" phase whereby mutations in single gene copies are "uncovered". Second, classic gene mutagenesis techniques are further hampered by the fact that mammalian genomes contain only <2% protein coding genes (compared to ~70% for yeast), making their use inefficient. Further, for human cells, there is no definable "wild type". This is because individual genomes naturally vary from one another by >10 million bases (usually occurring as single nucleotide polymorphisms). So no two humans or human cell lines are exactly the same (even monozygotic [identical] twins, which should be genetically identical, have small numbers of mutations that arise during development). So if we mutate or inhibit a gene in one human cell will it have the same effect in another isolate? (reviewed in Paddison and Hannon, Cancer Cell 2002)

Lack of expeditious gene manipulation techniques with which to perform functional genomic studies in mammals have hampered basic mammalian biological research and human disease research for decades. Fortunately, two powerful homology-based gene targeting technologies have come along that have helped overcome these barriers, revolutionizing functional genomics in mammals. These are RNAi and CRISPR-Cas9.

The RNAi pathway in mammals (as it pertains to gene silencing)

RNA interference or RNAi

RNAi emerged out of the pioneering work of Fire, Mello, and colleagues (1998) in the nematode Caenorhabditis elegans. Attempting to use antisense RNA to knock down gene expression, they found synergistic effects on gene silencing when antisense and sense RNA strands where delivered together as double-stranded RNA (dsRNA). While at first RNAi seemed a peculiarity of nematodes, the core machinery that underlies RNAi is conserved in virtually every experimental eukaryotic system and has been co-opted in most of them to trigger gene silencing. At least three core components of the RNAi pathway appear to be generally required for dsRNA-dependent silencing phenomena in higher eukaryotes: the Drosha, Dicer, and Argonaute (Ago) gene family members. Drosha and Dicer proteins sit atop the RNAi pathway in the first catalytic steps that convert various forms of dsRNA into smaller, guide dsRNAs of 21–25 nt. Ago proteins incorporate these small dsRNAs and use their sequence as a guide to identify and target homologous mRNAs for silencing. Some Ago proteins have nuclease activities that can cut or "slice" mRNA targets, triggering their destruction (reviewed in Paddison, 2008).

Uncovering and characterizing many of the components and biochemical determinants of RNAi in invertebrate systems has helped translate RNAi into a genetic tool in mammals via inhibiting mRNA translation. Today, we generally use two types of RNAi triggers to inhibit gene function in mammals: small interfering RNAs (siRNAs) and short hairpin RNAs (shRNAs). SiRNAs are generally chemically synthesized RNA duplexes that contain 21 nt of identity to a homologous mRNA target, 19 nt of dsRNA, and a 2-nt 3′ overhang. ShRNAs are RNA duplexes of 23–29 nt, contain a loop structure that joins both strands of the duplex, which are expressed from DNA-based plasmids or viral vectors. The Paddison Lab routinely performs siRNA and shRNA functional genomic screens in numerous cell types, including human and mouse stem and progenitor cells. As a graduate student Dr. Paddison helped design some of the first RNAi libraries targeting the human genome. (Paddison et al, Nature 2004)


In bacteria, the CRISPR-Cas (Clustered, Regularly Interspaced, Short Palindromic Repeats (CRISPR)–CRISPR-associated (Cas)) pathway acts as an adaptive immune system, conferring resistance to genetic parasites and bacteriophage. Similar to the RNAi pathway in eukaryotes, the CRISPR-Cas system utilizes a single guide RNA (sgRNA) that is incorporated into a protein effector nuclease (e.g., Cas9) to target exogenous genomic sequences. Unlike RNAi, however, CRISPR-Cas systems are able to target and degrade DNA. This property has been harnessed for sgRNA-directed genome editing in prokaryotes and now eukaryotes. In specific, the type II CRISPR-Cas system from Streptococcus pyogenesto has been shown to elicit robust RNA-guided gene editing in multiple eukaryotic systems, including mammalian cells (Gasiunas et al., PNAS 2012; Jinek et al. Science 2012; Wiedenheft et al., Nature 2012; Mali et al., Science 2013; Cho et al., Nature Biotech. 2013).

In its simplest form for gene editing studies, the CRISPR-Cas system consists of two components, an sgRNA and a Cas protein (e.g., Cas9 from S. pyogenes). The sgRNA is a chimeric guide RNA composed of a ~20nt ‘protospacer’ sequence which is used for target recognition and a structural RNA required for complex sgRNA-Cas9 complex formation (i.e., tracrRNA). In addition, a DNA cleavage is "licensed" by an appropriate protospacer adjacent motif (PAM) at the 3' end of the protosequence in the targeted gene. For the type II S. pyogenes system, this sequence is "NGG", where N is any nucleotide. PAMs allow self versus non-self recognition in bacteria and without an appropriate PAM the Cas9-sgRNA complex fails to cut target DNA. The Cas9 gene from S. pyogenes has two catalytic nuclease domains (HNH and RuvC-like) that generate a blunt-ended, double-stranded break 3 bp upstream of the PAM. The HNH domain cleaves the strand of DNA complementary to the sgRNA, while its RuvC-like domain cleaves the non-complementary strand (reviewed by Jinek et al. Science 2012; Wiedenheft et al, Nature 2012; Mali et al., Nature Methods 2013).

The use of the two component CRISPR-Cas system in mammalian cells generally involves the expression of a codon-optimized Cas9 gene with a nuclear localization sequence and expression of a sgRNA from an RNA polymerase III promoter. When expressed together in a mammalian cell, Cas9 promotes gene editing, stimulated by triggering a DNA double-stranded break (DSB), which is repaired by the error-prone non-homologous end joining (NHEJ) or the higher fidelity homology-directed repair (HDR) pathway. While HDR can be error-free, it requires the presence of a homologous repair template, such as a sister chromatid or homology arms from an insertion construct, and appears to only be present in dividing cells, though efficiency can vary widely. By contrast, the NHEJ pathway repairs DSBs throughout the cell cycle in the absence of repair templates through DSB trimming, processing, and re-ligation. NHEJ leaves repair scars in the form of small insertion/deletion (indel) mutations. Thus, in the absence of a repair template the vast majority of Cas9 directed dsDNA cleavage events lead to indel formation at the target site, which, when occurring in an exon, cause frameshifts and premature stop codons in the target gene (reviewed by Jinek et al. Science 2012; Wiedenheft et al, Nature 2012; Mali et al., Nature Methods 2013).

My interest in functional genomics arose from working with model genetic organisms like bacteriophage T4 and budding yeast as an undergrad and also as a technician in Lee Hartwell's lab in the late 1990s. During my time in his lab, Lee Hartwell was fascinated by the notion of synthetic lethality. He thought this could be applied to cancer because cancer is a disease of genomic alterations. Synthetic lethality occurs when a cell or organism can tolerate the loss of gene A or gene B but not loss of gene A and gene B together. This usually means that gene A can compensate for gene B function and vice versa. Because cancer cells are riddled with genetic alterations, it is possible that some of these alterations will result in loss of these types of redundancy – in other words gene A goes missing and now gene B's loss cannot be tolerated. For cancer therapeutic targets, this represents a potentially ideal scenario because normal cells can live without gene B, while cancer cells cannot. This remains a long-term focus of the field of cancer therapeutic and also of my lab.

Validation of CRISPR-Cas9-based gene targeting in human GSCs and NSCs

Validation of CRISPR-Cas9-based gene targeting in human GSCs and NSCs

(A) Cartoon of lentiviral construct used for sgRNA:Cas9 expression.

(B) sgEGFP:Cas9 was used to target stably expressed H2B-EGFP in GSCs and NSCs. Cells were first infected with LV-EGFP-H2B, and then infected with sgControl or sgEGFP at MOI<1, selected, outgrown for 14 days, and flow analyzed. At day 5 post-selection, for EGFP+sgEGFP NSC-CB660s, we noted 19.5% of cells still positive for GFP, while by D12, this number was reduced to <1%, suggesting that peak suppression probably occurs around D10 for a single, mono-allelic genomic target.

(C) Western blot confirmation of TP53 protein expression after targeting TP53 gene with sgRNA:Cas9 in NSC-U5s. Cells were outgrown for >21 days following selection. Doxorubicin treatment (0.75μg/ml for 6 hours) was used to stabilize TP53 in response to DNA damage.

(D) CRISPR-Cas9-based targeting of an essential gene, MCM2. Cells were infected with sgRNAs and seeded 3 days post-selection for a 10-day culture in triplicate. Cell viability was then measured using alamarBlue reagent. *p<0.01, student's t-test (unpaired, unequal variance).