Nucleosome dynamics behind the replication fork
Nucleosome disruption during transcription
Dynamic maintenance of nucleosome-depleted regions by SWI-SNF remodelers
Pioneer factor binding in vivo
Deregulation of nucleosome dynamics in cancer
Nucleosomes in giant viruses
Understanding centromere evolution
In recent years, we have developed new chromatin profiling tools both to enable our continuing studies of chromatin dynamics and to lower the bar to entry into the increasingly high-tech field of epigenomics. My lab has pioneered enzyme-tethering strategies beginning with DamID1 in 2000, re-emerging with ChEC-seq2 in 2015, followed by CUT&RUN3 in 2017, CUT&Tag4 in 2019, CUTAC5 in 2020 and CUT&Tag2for16 and MulTI-Tag7 in 2021, including adaptation of these technologies to full automation8,9 and scalable single-cell chromatin profiling10. Our technological advances have led us to a deeper understanding of the competition between transcription factors and nucleosomes behind the replication fork11,12, of nucleosomal unwrapping intermediates driven by RNA Polymerase II (RNAP2)13, of transcription factor “pioneering”14, of nucleosome depletion by SWI/SNF remodelers15 and RNAP216, and of chromatin deregulation in cancer8, 17, 18, 19, 20. In the meantime, our CUT&RUN/Tag technologies have already been adopted by several hundred laboratories and have fueled development of several commercial products and kits. My laboratory's outreach activities include our COVID-responsive CUT&Tag@home project21 chosen by The Scientist magazine as one of the top technical advances of 202022.
Work in the lab continues to be sharply focused on interaction dynamics of chromatin components, “where the rubber hits the road”. Our technology development efforts remain focused on better ways of addressing dynamics at the interfaces between DNA and the proteins and protein complexes involved in these fundamental genetic processes, with the expectation that our methods will continue to be adopted by researchers working on diverse biological problems23. Recently we have also developed RT&Tag to profile RNA bound to chromatin epitopes24.
ChEC-seq -> CUT&RUN -> CUT&Tag -> CUTAC -> CUT&Tag2for1 and MulTI-Tag
ChEC-seq: The dominant technology for mapping protein-DNA interactions has been Chromatin Immunoprecipitation (ChIP) since the mid-1980s25, with successive advances in read-out platforms including PCR, microarrays and ‘Next-generation’ DNA sequencing. ChIP begins with solubilization of chromatin, typically by sonication of cross-linked cells, followed by immune precipitation with an antibody (Figure 1a). In contrast, enzyme tethering methods are performed in intact cells or nuclei. For example, to perform Laemmli’s ChEC (Chromatin Endogenous Cleavage) method26, targeted chromatin proteins are fused to Micrococcal Nuclease (MNase), which is activated in situ by addition of Ca++ to permeabilized cells, taking advantage of the base-pair resolution possible with MNase. In 2015 we adapted ChEC for genome-wide application (ChEC-seq)2. ChEC-seq has since gained popularity in the yeast chromatin field, where strains are available with epitope tags on chromatin proteins and transcription factors (TFs). ChEC-seq projects moved to the laboratory of former postdoc Gabriel Zentner, Assistant Professor at Indiana University27.
CUT&RUN: The success of ChEC-seq encouraged us to adapt Laemmli’s ChIC (Chromatin Immuno-Cleavage) method26. ChIC uses a protocol similar to that of ChEC, except that instead of tethering MNase via a fusion to the target protein, the enzyme is purified as a fusion with Protein A, which binds avidly to most antibodies. (Figure 1b3). In addition, our simple workflow using magnetic beads makes the method suitable for full automation. Post-doc Peter Skene, who helped develop CUT&RUN, left the lab in 2017 and is currently Director of Molecular Biology and Biochemistry at the Allen Institute of Immunology.
CUT&Tag: Post-doc Hatice Kaya-Okur modified the CUT&RUN protocol to utilize a Protein A-Tn5 (pA-Tn5) fusion protein, where Tn5 is the cut-and-paste transposase used in Illumina’s Nextera system and in ATAC-seq28 (Figure 1c). CUT&Tag (Cleavage Under Targets & Tagmentation)4 eliminates the library preparation step, while providing even lower backgrounds than CUT&RUN. Our outreach efforts over the past two years have helped to gain rapid acceptance for CUT&Tag in both academia and industry. We later simplified CUT&Tag29 so that all steps from nuclei to sequencing-ready libraries are performed in single PCR tubes in a day or on a general-purpose robot8. In 2019, Hatice Kaya-Okur joined the Altius Institute for Biomedical Sciences as an Altius Scholar.
Figure 2: a) CUTAC tethers Tn5 to either H3K4me25 or RNA Pol2 Ser5 phosphate16, where the active site of Pol2 is ~130 bp from accessible “open” chromatin sites genome-wide. b) CUT&Tag2for1 deconvolves CUTAC signals from Pol2S5p and H3K27me3 mixtures based on fragment size and density. c) MulTI-Tag sequentially binds and tagments barcoded-antibody/Tn57.
CUTAC: ChEC-seq, CUT&RUN and CUT&Tag were expressly intended to replace ChIP-seq, however, our novel chromatin accessibility profiling method (CUTAC for Cleavage Under Targeted Accessible Chromatin) began with a serendipitous observation during my CUT&Tag@home project. When using an antibody to H3K4me2, which marks nucleosomes flanking both promoters and enhancers, and performing the Mg++-catalyzed tagmentation step in low salt, I noticed that antibody-tethered Tn5 integrated into accessible chromatin sites genome-wide5. Chromatin accessibility mapping using CUTAC is specific for H3K4me2 and H3K4me3, but not for any other tested histone modification, and for RNA Polymerase II Serine-5 phosphate (Pol2S5p)16 (Figure 2a). The distribution of CUTAC peaks closely resembles that of ATAC-seq peaks, with especially high signal-to-noise attributable to tethering of Tn5 to nearby epitopes, with Pol2S5p-CUTAC providing the best accessible chromatin data quality. The close correspondence between Pol2S5p-CUTAC and chromatin accessibility mapping implies that accessibility is coupled to Pol2 pausing, and that promoters and enhancers share the same basic chromatin configuration30.
CUT&Tag2for1: Technological progress in single-cell read-out technologies has fueled interest in “Multi-OMICs” where two different modalities, such as RNA-seq and ATAC-seq are performed in the same cells. However, multimodal methods require complicated workflows and deconvolution methods to take advantage of multiple modalities for cell-state identification. Pol2S5p-CUTAC releases mostly subnucleosome-sized fragments corresponding to peaks of TF binding at promoters and enhancers, whereas H3K27me3-marked nucleosomes are in broad domains. We have taken advantage of these differences to use two antibodies for CUT&Tag simultaneously and then use a Bayesian deconvolution strategy to computationally separate active regulatory sites from developmentally silenced Polycomb domains based on fragment size and feature width (Figure 2b). The high efficiency of CUTAC using these two antibodies in a mixture has allowed us to profile the active and repressive regulomes in the same single cells6.
MulTI-Tag: Post-doc Michael Meers has developed Multiple Targets Identified via Tagmentation (MulTI-Tag), a CUT&Tag-based approach that uses identifying barcodes to profile multiple chromatin-associated proteins in the same individual cells7 (Figure 2c). MulTI-Tag is as efficient as single-antibody CUT&Tag both in bulk and in single cells and represents a landmark advance in single cell chromatin profiling. Mike has taken the MulTI-Tag project with him to continue his pioneer factor project14 (described below) at his own lab at Washington University in St. Louis, while we will continue using CUTAC and CUT&Tag2for1 in our research.
RT&Tag: In addition to proteins and DNA, chromatin contains substantial amounts of RNA, both as nascent transcripts and as RNAs associated with chromatin. Post-doc Nadiya Khyzha developed Reverse Transcribe and Tagment (RT&Tag) by adapting CUT&Tag to profile RNAs in proximity to chromatin epitopes for which a primary antibody exists31. RT&Tag uses a secondary antibody conjugated to streptavidin with a biotinylated oligo-dT fused to an adapter and protein A-Tn5 loaded with a second adapter, then reverse transcribes transcripts and tagments RNA-cDNA hybrids simultaneously. She detected the Drosophila roX2 RNA associated with the male dosage compensation complex, low-expressing developmental RNAs associated with Polycomb complex (H3K27me3) domains, and found that N6-methyladenine-modifed mRNAs are correlated with paused RNAP2.
Figure 3: Metabolic labeling with EdU followed by ‘click’ chemistry to attach biotin, MNase digestion, streptavidin pulldown and MINCE-seq library preparation maps newly replicated chromatin12.
Every nucleosome across the genome must be disrupted and reformed when the replication fork passes, but how chromatin organization is re-established following replication was unknown. To address this problem, post-doc Srinivas Ramachandran developed a metabolic labeling method using 5-Ethynyl-2'-deoxyuridine (EdU) uptake followed by MNase-seq and ‘click’ chemistry to characterize the genome-wide location of nucleosomes and other chromatin proteins behind replication forks at high temporal and spatial resolution12 (Figure 3). We found that the characteristic chromatin landscape at Drosophila promoters and enhancers is lost upon replication. The most conspicuous changes are at promoters that have high levels of RNAP2 stalling and DNA accessibility and show specific enrichment for the BAF (Brahma-associated factor) remodeler complex. Enhancer chromatin is also disrupted during replication, suggesting a role for TF competition in nucleosome re-establishment. Thus, the characteristic nucleosome landscape emerges from a uniformly packaged genome by the action of TFs, RNAP2, and remodelers minutes after replication fork passage.
Figure 4: Transcription produces asymmetrically unwrapped nucleosomal intermediates13.
Nucleosomes are disrupted during transcription, but the structural intermediates during nucleosome disruption in vivo had been unknown. To identify transcriptional intermediates, Srinivas Ramachandran mapped subnucleosomal protections in Drosophila cells using MNase-seq and CUT&RUN. At the first nucleosome position downstream of the transcription start site, we identified unwrapped intermediates, including hexasomes that lack either proximal or distal contacts13. Inhibiting topoisomerases or depleting histone chaperones increased unwrapping, whereas inhibiting release of paused RNAP2 or reducing RNAP2 elongation decreased unwrapping (Figure 4). Our results indicated that positive torsion generated by elongating RNAP2 causes transient loss of histone-DNA contacts. Using this “structural epigenomics” approach, we found that nucleosomes flanking human CTCF insulation sites are similarly disrupted.
We also identified diagnostic subnucleosomal particle remnants in cell-free human DNA data as a relic of transcribed genes from apoptosing cells. Thus identification of subnucleosomal fragments from nuclease protection data represents a general strategy for structural epigenomics. Cell-free DNA and structural epigenomics projects have moved to the lab of former post-doc Srinivas Ramachandran, who is currently Assistant Professor at U. Colorado HSC.
Figure 5: Both H3K4me2 and Pol2Ser5p CUTAC robustly correspond to ENCODE ATAC-seq sites relative to the best available ATAC-seq data (Omni-ATAC).
Our discovery that RNAP2S5p-CUTAC maps chromatin accessibility provides direct evidence that paused RNAP2 is engaged immediately adjacent to ATAC-seq and DNase-seq peaks at enhancers and promoters genome-wide (Figure 5). CUTAC replaces the “open chromatin” metaphor for gene regulatory elements based on unrelated enzymatic and physical accessibility assays with a rigorous definition based on the well-established role of Pol2 pausing in gene regulation5,16.
Figure 6: Model for chromatin dynamics at yeast promoters15.
The classic view of nucleosome organization at active promoters is that two well-positioned nucleosomes flank a nucleosome-depleted region (NDR). However, this view has been disputed by contradictory reports as to whether wider (>150 bp) NDRs instead contain unstable, micrococcal nuclease-sensitive (‘fragile’) nucleosomal particles. To determine the composition of fragile particles, post-doc Sandipan Brahma applied CUT&RUN.ChIP, in which targeted nuclease cleavage and release is followed by chromatin immunoprecipitation. He found that fragile particles represent the occupancy of the RSC (Remodeling the Structure of Chromatin) complex. a member of the SWI-SNF remodeler family, and RSC-bound, partially unwrapped nucleosomal intermediates. Sandipan also found that general regulatory factors (GRFs) bind to partially unwrapped nucleosomes at these promoters. We proposed that RSC binding and its action cause nucleosomes to unravel, facilitate subsequent binding of GRFs, and constitute a dynamic cycle of nucleosome deposition and clearance at the subset of wide Pol2 promoter NDRs (Figure 615).
In mouse embryonic stem cells, Sandipan found that the SWI-SNF family remodeler Brahma-Associated Factor (BAF) is stabilized by paused RNAP2, which enhances nucleosome eviction by BAF32. BAF and RNAP2 probe both active and Polycomb-repressed chromatin, and transient site exposure due to BAF-mediated nucleosome unwrapping allows TFs to bind and promote cell-type-specific chromatin accessibility. Remodeler-specific projects will move to the lab of Sandipan Brahma, currently a K99/R00 post-doctoral fellow.
Figure 7: Using CUT&RUN to directly test genome-wide nucleosome binding predicted by the pioneer factor hypothesis14. Top: Experimental scheme; Bottom: During differentiation 29% of FoxA2 sites show pioneering, but only a handful of sites for the CTCF control.
Although the in vitro structural and in vivo spatial characteristics of TF binding are well defined, TF interactions with chromatin and other companion TFs during development were poorly understood. To analyze such interactions in vivo, Post-doc Michael Meers used CUT&RUN to profile several TFs across a time course of human embryonic stem cell differentiation, and studied their interactions with nucleosomes and co-occurring TFs by Enhanced Chromatin Occupancy (EChO), a computational strategy for classifying TF interactions with chromatin (Figure 714). EChO showed that multiple individual TFs can employ either direct DNA binding or “pioneer” nucleosome binding at different enhancer targets. Nucleosome binding is not exclusively confined to inaccessible chromatin, but rather is correlated with local binding of other TFs, and with degeneracy at key bases in the pioneer factor target motif responsible for direct DNA binding. Our strategy revealed a dynamic exchange of TFs at enhancers across developmental time that is aided by pioneer nucleosome binding. Pioneer factor projects have moved to the lab of Michael Meers at Washington University in St. Louis.
Figure 8: a) A Drosophila retina development model for H3.3K27M-driven pediatric glioblastoma shows inhibition (red cells) behind the morphogenetic furrow (yellow arrow), but not when the cell cycle is inhibited by p2117. b) Based on our Drosophila and glioma cell line evidence we explain the differences between replication-independent (RI, H3.3) and replication coupled (RC, H3.2) inhibition of PRC2 in terms of the different histone deposition pathways for these two histone variants.
In 2002, then-post-doc Kami Ahmad discovered that the three histone fold domain amino acids that distinguish the conserved histone variant, H3.3, from canonical H3 (H3.1/H3.2 in humans) specify replication-independent (H3.3) versus replication-coupled (H3.1/3.2) nucleosome assembly33. His Drosophila cytological study using GFP-labeled histones also revealed that H3.3 incorporated genome-wide at active chromatin, including the active but not the inactive rDNA loci. Subsequent work from many groups built on Kami’s findings by molecular characterization of dedicated chaperones and other features of the two pathways34. In 2012, the first “oncohistones” were discovered in pediatric diffuse midline gliomas (DMGs) characterized by lysine 27-to-methionine (K27M) mutations in either H3.3 or H3.135,36. These oncohistone mutations dominantly inhibit histone H3K27 trimethylation and silencing, but it was unknown how oncohistone type affected gliomagenesis. Again using Drosophila as a model, Kami, now a Principal Investigator in the adjacent laboratory, demonstrated that inhibition of H3K27 trimethylation occurs only when H3K27M oncohistones are deposited into chromatin and only when expressed in cycling cells (Figure 8a). Using CUT&RUN on human DMG cell lines, post-doc Jay Sarthy showed that the genomic distributions of H3.1 and H3.3 oncohistones in human patient-derived DMG cells are consistent with the DNA replication-coupled deposition of histone H3.1 and the predominant replication-independent deposition of histone H3.3. Although H3K27 trimethylation is reduced for both oncohistone types, H3.3K27M-bearing cells retain some domains, and only H3.1K27M-bearing cells lack H3K27 trimethylation. We proposed that oncohistones inhibit the H3K27 methyltransferase as chromatin patterns are being duplicated in proliferating cells, predisposing them to tumorigenesis (Figure 8b).
In a follow-up study, Kami Ahmad used H3K27M inhibition of PRC2 in fly imaginal discs to show that simultaneous misexpression of a master regulatory TF and H3K27M results in an overexpression phenotype reminiscent of oncogenesis37.
In other collaborative work, Jay Sarthy found that the testes-specific histone variant, H2A.B, is significantly over-expressed in a variety of tumors, including about half of all diffuse large B-cell lymphomas18. These first examples of “ready-made” oncohistones, which are potentially oncogenic without a mutation, wrap less DNA than canonical H2A and destabilize nucleosomes in vivo38. Oncohistone cancer projects have now moved to the lab of Jay Sarthy, a pediatric oncologist at the Ben Towne Center for Childhood Cancer Research, Seattle Children’s Hospital.
Anthracyclines are widely prescribed anti-cancer drugs that disrupt chromatin by intercalating into DNA and enhancing nucleosome turnover39. Postdoc Matt Wooten observed that treatment with the anthracycline aclarubicin leads to elevated levels of elongating RNA polymerase II and changes in chromatin accessibility in Drosophila S2 cells40. He found that closely spaced divergent promoter pairs show greater chromatin changes when compared to codirectionally-oriented tandem promoters, and that aclarubicin treatment changes the distribution of non-canonical DNA G-quadruplex structures both at promoters and at G-rich pericentromeric repeats. Matt’s work suggests that aclarubicin’s effects on nucleosome disruption, RNA polymerase II, chromatin accessibility and DNA structures underlie its anti-cancer activity.
Nucleosome cores comprised of octamers containing two molecules each of the histones H2A, H2B, H3, and H4 arranged in dimers of H2A with H2B and H3 with H4 are characteristic of eukaryotes, and contrast with histones in archaea that can assemble an indeterminate number of homodimers or heterodimers into a slinky-like “hypernucleosome”41. The N-terminal tails of eukaryotic histones that are post-translationally modified and are important for transcriptional regulation are generally lacking in archaeal histones. In 2009 a full set of divergent histones was found in the giant virus Marseillevirus42, with histones arranged in fused doublets, designated Hβ-Hα and Hδ-Hγ, which are specifically related to H2B and H2A, and to H4 and H3, respectively. A third doublet, Hζ-Hε, appears to be a more divergent homolog of H2B-H2A. All three doublet histones are found in the viral capsid, with abundant Hβ-Hα and Hδ-Hγ, and smaller amounts of Hζ-Hε. These histones appear to have diverged from proto-eukaryotic counterparts prior to the divergence of modern eukaryotes43.
In collaboration with the lab of Karim-Jean Armache at the Skirball Institute, NYU, we44 and others45 found that Hβ-Hα and Hδ-Hγ can form nucleosomes from two molecules of each doublet that wrap 121 bp of DNA in vitro, and are very similar to eukaryotic nucleosomes. Technician Terri Bryson permeabilized Marseillevirus capsids to digest viral chromatin in virio with MNase or Methidiumpropyl-EDTA Fe(II) (MPE) and extract the DNA for MNase-seq or MPE-seq46. We found that the viral genome in the capsid is tightly packed with abutting nucleosomes that wrap 121 bp, lack linkers, and show no phasing of nucleosomes over genes, unlike the characteristic “beads on a string” of eukaryotic nucleosomes and linkers. Similarly divergent viral histone doublets, and even triplets and quadruplets are found in many giant viruses47, and evoke a phase in the evolution of histones in which viruses evolved histones doublets for tight genome packaging in the virion, perhaps prior to the origin of the eukaryotic nucleus and the evolution of post-translational modifications on histones to regulate nuclear gene transcription.
Figure 9: Two non-exclusive models for how non-B DNA specifies centromeres throughout the eukaryotic kingdom57.
In a series of four publications from 2015-2018, post-doc Jitendra Thakur developed a research program based on application of our chromatin profiling advances to centromeric chromatin. We first applied our native MNase ChIP-seq method48 to human centromeric (CENP-A) nucleosomes, and showed that specific dimeric α-satellite units shared by multiple individuals dominate functional CENP-bound human centromeres49, recently confirmed by the Telomere-to-Telomere Consortium50. We also used native MNase ChIP-seq to delineate the centromeric complexes at fission yeast ‘regional’ centromeres51 and human ‘satellite’ centromeres52. Our evidence suggested that fission yeast centromeres are wrapped by CENP-A nucleosomes and CENP-T nucleosome-like particles in a dispersed non-sequence-specific manner51. In contrast, human CENP-A and CENP-T orthologs are part of a coherent CENP-A/CENP-B/CENP-C/CENP-T (“CCAN”) complex at α-satellite dimers that comprise the fundamental unit of centromeric chromatin52. Salt-fractionation applied to native MNase ChIP-seq and CUT&RUN revealed that the occupancy of the CCAN complex is highly variable, even for α-satellite dimers that are targeted by the sequence-specific CENP-B protein53. Centromere projects begun by Jitendra are being continued in her own lab at Emory University where she is currently an Assistant Professor. Jitendra’s lab will also continue the CUT&RUN.RNase54,55,56 projects that she developed as a postdoc.
Meanwhile, MSTP student Siva Kasinathan discovered that there are two determinants of centromere sequence specificity: CENP-B and predicted short non-B DNA foldback elements. Siva found that non-B DNA is a characteristic of centromeres and neocentromeres throughout the eukaryotic kingdom in organisms that lackCENP-B57 (Figure 9). Siva’s evidence that non-B DNA is a general property of centromeres challenged the dogma of “epigenetic” centromeres while also rekindling our interest in what former post-docs Harmit Malik, Kami Ahmad and I had termed the “centromere paradox”, stable inheritance despite rapid evolution of DNA and centromere proteins. At the time we explained the centromere paradox by invoking centromere drive during female meiosis, which has gained general acceptance over the past 20 years58,59,60. However, the molecular basis for rapid evolution of centromere satellites remains speculative. For instance, using native MNase ChIP-seq we found that the functional centromeres of D. simulans include complex satellite families that are entirely absent from the genome of a sibling species Drosophila melanogaster61. New light on the molecular mechanism of satellite DNA evolution derives from evidence that break-induced repair (BIR) replication underlies centromere drive62,63.