Recent Research

Research Areas in the Henikoff Lab

Profiling of FFPE samples by CUTAC
Dynamic maintenance of nucleosome-depleted regions by SWI-SNF remodelers
Automated profiling of KMT2A oncogenic fusion proteins
Understanding centromere evolution
Nucleosome dynamics behind the replication fork
Nucleosome disruption during transcription
Pioneer factor binding in vivo
Deregulation of nucleosome dynamics in cancer
Nucleosomes in giant viruses

In recent years, we have developed new chromatin profiling tools both to enable our continuing studies of chromatin dynamics and to lower the bar to entry into the increasingly high-tech field of epigenomics. My lab has pioneered enzyme-tethering strategies beginning with DamID¹ in 2000, re-emerging with ChEC-seq²in 2015, followed by CUT&RUN³in 2017, CUT&Tag⁴in 2019, CUTAC⁵ in 2020 and CUT&Tag2for1⁶ and MulTI-Tag⁷in 2021, including adaptation of these technologies to full automation^8,9 and scalable single-cell chromatin profiling¹⁰. Our technological advances have led us to a deeper understanding of the competition between transcription factors and nucleosomes behind the replication fork^{11, 12}, of nucleosomal unwrapping intermediates driven by RNA Polymerase II (RNAP2)¹³, of transcription factor “pioneering”¹⁴, of nucleosome depletion by SWI/SNF remodelers¹⁵ and RNAP2^{16, 17}, and of chromatin deregulation in cancer^{8, 18, 19, 20, 21, 22, 23}. In the meantime, our CUT&RUN/Tag technologies have already been adopted by several hundred laboratories and have fueled development of several commercial products and kits. My laboratory's outreach activities include our COVID-responsive CUT&Tag@home project²⁴ chosen by The Scientist magazine as one of the top technical advances of 2020²⁵.

Work in the lab continues to be sharply focused on interaction dynamics of chromatin components, “where the rubber hits the road”. Our technology development efforts remain focused on better ways of addressing dynamics at the interfaces between DNA and the proteins and protein complexes involved in these fundamental genetic processes, with the expectation that our methods will continue to be adopted by researchers working on diverse biological problems²⁶. Recently postdoc Nadiya Khyzha developed RT&Tag to profile RNA bound to chromatin epitopes²⁷ and graduate student James Anderson coupled fluorescence-activated cell sorting with CUT&Tag (FACS-CUT&Tag) to investigate chromatin maturation in Drosophila spermatocytes²⁸.

ChEC-seq -> CUT&RUN -> CUT&Tag -> CUTAC -> CUT&Tag2for1 and MulTI-Tag

Figure 1: Differences between ChIP-seq, CUT&RUN and CUT&Tag3, 4.

ChEC-seq: The dominant technology for mapping protein-DNA interactions has been Chromatin Immunoprecipitation (ChIP) since the mid-1980s²⁹, with successive advances in read-out platforms including PCR, microarrays and ‘Next-generation’ DNA sequencing. ChIP begins with solubilization of chromatin, typically by sonication of cross-linked cells, followed by immune precipitation with an antibody (Figure 1a). In contrast, enzyme tethering methods are performed in intact cells or nuclei. For example, to perform Laemmli’s ChEC (Chromatin Endogenous Cleavage) method³⁰_, targeted chromatin proteins are fused to Micrococcal Nuclease (MNase), which is activated in situ by addition of Ca++ to permeabilized cells, taking advantage of the base-pair resolution possible with MNase. In 2015 we adapted ChEC for genome-wide application (ChEC-seq)². ChEC-seq has since gained popularity in the yeast chromatin field, where strains are available with epitope tags on chromatin proteins and transcription factors (TFs). ChEC-seq projects moved to the laboratory of former postdoc Gabriel Zentner, Assistant Professor at Indiana University³¹.

CUT&RUN: The success of ChEC-seq encouraged us to adapt Laemmli’s ChIC (Chromatin Immuno-Cleavage) method³⁰. ChIC uses a protocol similar to that of ChEC, except that instead of tethering MNase via a fusion to the target protein, the enzyme is purified as a fusion with Protein A, which binds avidly to most antibodies. (Figure 1b³). In addition, our simple workflow using magnetic beads makes the method suitable for full automation. Post-doc Peter Skene, who helped develop CUT&RUN, left the lab in 2017 and is currently Director of Molecular Biology and Biochemistry at the Allen Institute of Immunology.

CUT&Tag: Post-doc Hatice Kaya-Okur modified the CUT&RUN protocol to utilize a Protein A-Tn5 (pA-Tn5) fusion protein, where Tn5 is the cut-and-paste transposase used in Illumina’s Nextera system and in ATAC-seq³² (Figure 1c). CUT&Tag (Cleavage Under Targets & Tagmentation)⁴ eliminates the library preparation step, while providing even lower backgrounds than CUT&RUN. Our outreach efforts over the past two years have helped to gain rapid acceptance for CUT&Tag in both academia and industry. We later simplified CUT&Tag³³ so that all steps from nuclei to sequencing-ready libraries are performed in single PCR tubes in a day or on a general-purpose robot⁸. In 2019, Hatice Kaya-Okur joined the Altius Institute for Biomedical Sciences as an Altius Scholar.

Figure 2: a) CUTAC tethers Tn5 to either H3K4me25 or RNA Pol2 Ser5 phosphate16, where the active site of Pol2 is ~130 bp from accessible “open” chromatin sites genome-wide. b) CUT&Tag2for1 deconvolves CUTAC signals from Pol2S5p and H3K27me3 mixtures based on fragment size and density. c) MulTI-Tag sequentially binds and tagments barcoded-antibody/Tn57.

CUTAC: ChEC-seq, CUT&RUN and CUT&Tag were expressly intended to replace ChIP-seq, however, our novel chromatin accessibility profiling method (CUTAC for Cleavage Under Targeted Accessible Chromatin) began with a serendipitous observation during my CUT&Tag@home project. When using an antibody to H3K4me2, which marks nucleosomes flanking both promoters and enhancers, and performing the Mg⁺⁺-catalyzed tagmentation step in low salt, I noticed that antibody-tethered Tn5 integrated into accessible chromatin sites genome-wide⁵. Chromatin accessibility mapping using CUTAC is specific for H3K4me2 and H3K4me3, but not for any other tested histone modification, and for RNA Polymerase II Serine-5 phosphate (Pol2S5p)¹⁶ (Figure 2a). The distribution of CUTAC peaks closely resembles that of ATAC-seq peaks, with especially high signal-to-noise attributable to tethering of Tn5 to nearby epitopes, with Pol2S5p-CUTAC providing the best accessible chromatin data quality. The close correspondence between Pol2S5p-CUTAC and chromatin accessibility mapping implies that accessibility is coupled to Pol2 pausing, and that promoters and enhancers share the same basic chromatin configuration³⁴.

CUT&Tag2for1: Technological progress in single-cell read-out technologies has fueled interest in “Multi-OMICs” where two different modalities, such as RNA-seq and ATAC-seq are performed in the same cells. However, multimodal methods require complicated workflows and deconvolution methods to take advantage of multiple modalities for cell-state identification. Pol2S5p-CUTAC releases mostly subnucleosome-sized fragments corresponding to peaks of TF binding at promoters and enhancers, whereas H3K27me3-marked nucleosomes are in broad domains. We have taken advantage of these differences to use two antibodies for CUT&Tag simultaneously and then use a Bayesian deconvolution strategy to computationally separate active regulatory sites from developmentally silenced Polycomb domains based on fragment size and feature width (Figure 2b). The high efficiency of CUTAC using these two antibodies in a mixture has allowed us to profile the active and repressive regulomes in the same single cells⁶.

MulTI-Tag: Post-doc Michael Meers has developed Multiple Targets Identified via Tagmentation (MulTI-Tag), a CUT&Tag-based approach that uses identifying barcodes to profile multiple chromatin-associated proteins in the same individual cells⁷ (Figure 2c). MulTI-Tag is as efficient as single-antibody CUT&Tag both in bulk and in single cells and represents a landmark advance in single cell chromatin profiling. Mike has taken the MulTI-Tag project with him to continue his pioneer factor project¹⁴ (described below) at his own lab at Washington University in St. Louis, while we will continue using CUTAC and CUT&Tag2for1 in our research.

RT&Tag: In addition to proteins and DNA, chromatin contains substantial amounts of RNA, both as nascent transcripts and as RNAs associated with chromatin. Post-doc Nadiya Khyzha developed Reverse Transcribe and Tagment (RT&Tag) by adapting CUT&Tag to profile RNAs in proximity to chromatin epitopes for which a primary antibody exists²⁷. RT&Tag uses a secondary antibody conjugated to streptavidin with a biotinylated oligo-dT fused to an adapter and protein A-Tn5 loaded with a second adapter, then reverse transcribes transcripts and tagments RNA-cDNA hybrids simultaneously. She detected the Drosophila roX2 RNA associated with the male dosage compensation complex, low-expressing developmental RNAs associated with Polycomb complex (H3K27me3) domains, and found that N6-methyladenine-modifed mRNAs are correlated with paused RNAP2.

Profiling of FFPE samples by CUTAC

One important area that has been challenging for chromatin profiling is formalin-fixed paraffin-embedded samples (FFPEs), which have been used for more than century to preserve biological samples. While profiling of regulatory elements in tumors is particularly attractive to cancer researchers, solubilization of heavily cross-linked samples has required strong ionic detergents, proteases, sonication, and/or micrococcal nuclease digestion, resulting in long, complicated modifications to existing chromatin profiling protocols. We found that simple modifications to our CUTAC protocol either in single tubes or directly on slides (Figure 3) produced high-quality maps of paused RNAP2 and H3K27ac at promoters and enhancers in FFPE samples from three types of mouse brain tumors and normal brains²². FFPE-CUTAC profiles could distinguish the three tumor types from each other and from normal brain, even within the same FFPE. More than 90% of the differences between tumor and normal signal at candidate cis-regulatory elements (cCREs) were increases in RNAP2 or H3K27ac, indicating hypertranscription. FFPE-CUTAC regulatory elements in cCREs correlated well with high quality RNA-seq from the same mouse brain tumors, and in addition could detect excised microRNAs that are not detected by poly-A dependent RNA-seq. The degree of hypertranscription measured by FFPE-CUTAC is tumor-specific in both human and mouse tumors, even between tumors from genetically identical mice, but is not specific to a particular type of regulatory element (promoter, proximal enhancer, distal enhancer, etc.) within a tumor²³. Hypertranscription measured at the histone locus directly reflects elevated cell proliferation. Unsupervised peak calling by our SEACR algorithm³⁵ identified all 100 top-ranked cCREs in at least one tumor, and the majority of hypertranscribed regulatory elements are hypertranscribed in multiple tumor types²³. Fifty-five of the 100 top-ranked hypertranscribed cCREs are present in a 5 Mb region of chromosome 17 near the ERBB2 (HER2) gene at CHr17q2.1, likely reflecting independent amplification of HER2 in breast and colon cancers. FFPE-CUTAC will likely be easily adapted for cancer screening and other applications.

Figure 4: Model of RNAPII, BAF and DNA-sequence-specific TFs working synergistically for productive chromatin remodeling and nucleosome eviction. (a) RNAPII and BAF dynamically engage chromatin in an abortive manner and require chromatin binding by DNA-sequence-specific TFs for productive chromatin remodeling and histone eviction to form/maintain an NDR. (b) Dynamic cycle can start at any step: RNAPII loading at nucleosome-depleted regions (step 1) and transcription initiation (step 2), BAF binding to nucleosomes (step 3) or TF-binding nucleosomes that are partially unwrapped due to spontaneous thermal fluctuations in histone–DNA interactions or BAF binding and remodeling (step 4).

Dynamic maintenance of nucleosome-depleted regions by SWI-SNF remodelers

The classic view of nucleosome organization at active promoters is that two well-positioned nucleosomes flank a nucleosome-depleted region (NDR). However, contradictory reports suggested that wider (>150 bp) NDRs instead contain unstable, micrococcal nuclease-sensitive (‘fragile’) nucleosomal particles. Post-doc Sandipan Brahma applied CUT&RUN.ChIP, in which targeted nuclease cleavage and release is followed by chromatin immunoprecipitation and found that fragile particles represent the occupancy of the RSC (Remodeling the Structure of Chromatin) complex. a member of the SWI-SNF remodeler family, and RSC-bound, partially unwrapped nucleosomal intermediates. Sandipan also found that general regulatory factors (GRFs) bind to partially unwrapped nucleosomes at these promoters. We proposed that RSC binding and its action cause nucleosomes to unravel, facilitate subsequent binding of GRFs, and constitute a dynamic cycle of nucleosome deposition and clearance at the subset of wide Pol2 promoter NDRs¹⁵.

In mouse embryonic stem cells, Sandipan found that the SWI-SNF family remodeler Brahma-Associated Factor (BAF) is stabilized by paused RNAP2, which enhances nucleosome eviction by BAF¹⁷. BAF and RNAP2 probe both active and Polycomb-repressed chromatin, and transient site exposure due to BAF-mediated nucleosome unwrapping allows TFs to bind and promote cell-type-specific chromatin accessibility (Figure 4). This work also led to a model of how transient enhancer-promoter interactions facilitate transcription, in which BAF is handed off from enhancers to nucleosomes flanking promoters to release paused RNAP2 into a transcriptional burst³⁶. Remodeler-specific projects have moved to the lab of Sandipan Brahma, currently Assistant Professor in the Department of Genetics, Cell Biology and Anatomy at the University of Nebraska Medical Center.

Figure 5: Model of lineage switching mediated by KMT2A-fusion oncoprotein expression levels. At high oncoprotein expression levels the B cell lymphocytic leukemia regulatory network is favored, but at low levels a default myeloid regulatory network takes over.

Automated profiling of KMT2A oncogenic fusion proteins

Automation of our CUT&RUN and CUT&Tag methods was developed by postdoc Derek Janssens^{9, 8}. Derek applied Auto CUT&RUN and Auto CUT&Tag to profiling myeloid, lymphoid, and mixed phenotype leukemias that result from translocations involving the lysine methyl transferase KMT2A (also known as mixed-lineage leukemia1 or MLL1) which normally methylates H3K4, catalyzed by its C-terminal SET domain. Oncogenic translocations fuse the N-terminus containing the DNA-binding domain to a variety of other chromatin-regulatory proteins³⁷. Derek used antibodies to the N- and C-terminal portions of KMT2A to distinguish binding profiles of the wildtype and fusion proteins, where enrichment of the N-terminus over the C-terminus indicated fusion protein binding, and identified chromatin features associated characteristic of the different tumor types⁹. A subset of sites had bivalent (H3K4me3 and H3K27me3) marks, and these showed heterogeneity within single cells of the same tumor, suggesting chromatin dynamics may underlie the heterogeneity of mixed lineage tumors. Derek found that the most common fusions (to AF4, AF9, ENL, and AF10) co-localize with the DOT1L and ENL proteins over gene bodies, suggesting that elongation complexes recruit the fusions into gene bodies. AF9 fusions are particularly sensitive to DOT1L inhibitors, while fusions with gene bodies enriched for H3K4me3 and RNAP2 are susceptible to menin inhibitors, suggesting tumor profiling may be informative for therapeutic treatment.

We profiled the binding of KMT2A fusion proteins in 34 tumors that spanned the diversity of fusion proteins and leukemia types, and found considerable heterogeneity both in binding sites and levels of expression³⁸. Much of the variance was attributable to the fusion protein, but high levels of expression were instructive for a lineage pathway, whereas low level expression was non-instructive. We profiled a patient sample that underwent lineage switching from a B cell acute lymphoid leukemia to an acute myeloid leukemia in response to chemotherapy and found that reduction in oncoprotein levels allowed the oncoprotein to direct a non-canonical myeloid program (Figure 5), and that the same program was activated in two samples from patients who relapsed soon after initial response to Revumenib, which targets menin, suggesting this pathway may form a mechanism of epigenetic resistance to treatment. Derek will begin his assistant professor position at Van Andel Research Institute in Grand Rapids Michigan in June 2024.

Figure 6: Model of Break-Induced Replication (BIR). Replication fork stalling results in a one-sided double-strand break. RAD52 mediates annealing of the resected end and D-loop formation. PIF1 helicase facilitates D-loop migration with DNA synthesis. The second strand is synthesized conservatively later. In tandem arrays, re-annealing can occur out-of-register, resulting in expansions or contractions.

Understanding centromere evolution

Post-doc Jitendra Thakur applied our native MNase ChIP-seq method³⁹ to human centromeric (CENP-A) nucleosomes, and showed that specific dimeric α-satellite units dominate functional CENP-bound human centromeres⁴⁰, recently confirmed by the Telomere-to-Telomere Consortium⁴¹. We also used native MNase ChIP-seq to delineate the centromeric complexes at human ‘satellite’ centromeres⁴². Our evidence suggested that human CENP-A and CENP-T orthologs are part of a coherent CENP-A/CENP-B/CENP-C/CENP-T (“CCAN”) complex at α-satellite dimers that comprise the fundamental unit of centromeric chromatin. The occupancy of the CCAN complex is highly variable, even for α-satellite dimers that are targeted by the sequence-specific CENP-B protein⁴³. Centromere projects begun by Jitendra are being continued in her own lab at Emory University where she is currently an Assistant Professor. Jitendra’s lab will also continue the CUT&RUN.RNase^{44, 45, 46} projects that she developed as a postdoc.

Meanwhile, Medical Scientist Training Program student Siva Kasinathan discovered that there are two determinants of centromere sequence specificity: CENP-B and predicted short non-B DNA foldback elements. Siva found that non-B DNA is a characteristic of centromeres and neocentromeres throughout the eukaryotic kingdom in organisms that lack CENP-B⁴⁷ (Figure 9). Siva’s evidence that non-B DNA is a general property of centromeres challenged the dogma of purely “epigenetic” centromeres and has been confirmed by others^{48, 49, 50}. Such non-B form DNA can cause replication fork uncoupling and breakage⁵¹. This offers potential insight into what former post-docs Harmit Malik, Kami Ahmad and I had termed the “centromere paradox”, stable inheritance despite rapid evolution of DNA and centromere proteins⁵². We explained the centromere paradox by invoking centromere drive, the preferential segregation of a favored centromere during female meiosis in animals and seed plants, which has gained general acceptance over the past 20 years^53,54,55. However, the molecular basis for rapid evolution of centromere satellites remains speculative. For instance, using native MNase ChIP-seq we found that the functional centromeres of Drosophila simulans include complex satellite families that are entirely absent from the genome of a sibling species D. melanogaster⁵⁶. New light on the molecular mechanism of rapid satellite DNA evolution derives from evidence that break-induced replication (BIR, Figure 6) underlies centromere expansion^57,58. Molecular and Cellular Biology student Soyeon Showman detected copy number changes of centromere satellites on chromosome 11 of U2OS cells, which undergo BIR, within ~20 cell generations, and these copy number changes depended on the recombination protein RAD52 and the PIF1 helicase, both of which are required for BIR⁵⁹.

Figure 7: Metabolic labeling with EdU followed by ‘click’ chemistry to attach biotin, MNase digestion, streptavidin pulldown and MINCE-seq library preparation maps newly replicated chromatin12.

Nucleosome dynamics behind the replication fork

Every nucleosome across the genome must be disrupted and reformed when the replication fork passes, but how chromatin organization is re-established following replication was unknown. To address this problem, post-doc Srinivas Ramachandran developed a metabolic labeling method using 5-Ethynyl-2'-deoxyuridine (EdU) uptake followed by MNase-seq and ‘click’ chemistry to characterize the genome-wide location of nucleosomes and other chromatin proteins behind replication forks at high temporal and spatial resolution¹² (Figure 7). We found that the characteristic chromatin landscape at Drosophila promoters and enhancers is lost upon replication. The most conspicuous changes are at promoters that have high levels of RNAP2 stalling and DNA accessibility and show specific enrichment for the BAF (Brahma-associated factor) remodeler complex. Enhancer chromatin is also disrupted during replication, suggesting a role for TF competition in nucleosome re-establishment. Thus, the characteristic nucleosome landscape emerges from a uniformly packaged genome by the action of TFs, RNAP2, and remodelers minutes after replication fork passage.

Figure 8: Transcription produces asymmetrically unwrapped nucleosomal intermediates13.

Nucleosome disruption during transcription

Nucleosomes are disrupted during transcription, but the structural intermediates during nucleosome disruption in vivo had been unknown. To identify transcriptional intermediates, Srinivas Ramachandran mapped subnucleosomal protections in Drosophila cells using MNase-seq and CUT&RUN. At the first nucleosome position downstream of the transcription start site, we identified unwrapped intermediates, including hexasomes that lack either proximal or distal contacts¹³. Inhibiting topoisomerases or depleting histone chaperones increased unwrapping, whereas inhibiting release of paused RNAP2 or reducing RNAP2 elongation decreased unwrapping (Figure 8). Our results indicated that positive torsion generated by elongating RNAP2 causes transient loss of histone-DNA contacts. Using this “structural epigenomics” approach, we found that nucleosomes flanking human CTCF insulation sites are similarly disrupted.

We also identified diagnostic subnucleosomal particle remnants in cell-free human DNA data as a relic of transcribed genes from apoptosing cells. Thus identification of subnucleosomal fragments from nuclease protection data represents a general strategy for structural epigenomics. Cell-free DNA and structural epigenomics projects have moved to the lab of former post-doc Srinivas Ramachandran, who is currently Assistant Professor at U. Colorado HSC.

Figure 9: Model of lineage switching mediated by KMT2A-fusion oncoprotein expression levels. At high oncoprotein expression levels the B cell lymphocytic leukemia regulatory network is favored, but at low levels a default myeloid regulatory network takes over.

Our discovery that RNAP2S5p-CUTAC maps chromatin accessibility provides direct evidence that paused RNAP2 is engaged immediately adjacent to ATAC-seq and DNase-seq peaks at enhancers and promoters genome-wide (Figure 9). CUTAC replaces the “open chromatin” metaphor for gene regulatory elements based on unrelated enzymatic and physical accessibility assays with a rigorous definition based on the well-established role of Pol2 pausing in gene regulation^5,16.

Figure 10: Using CUT&RUN to directly test genome-wide nucleosome binding predicted by the pioneer factor hypothesis14. Top: Experimental scheme; Bottom: During differentiation 29% of FoxA2 sites show pioneering, but only a handful of sites for the CTCF control.

Pioneer factor binding in vivo

Although the in vitro structural and in vivo spatial characteristics of TF binding are well defined, TF interactions with chromatin and other companion TFs during development were poorly understood. To analyze such interactions in vivo, Post-doc Michael Meers used CUT&RUN to profile several TFs across a time course of human embryonic stem cell differentiation, and studied their interactions with nucleosomes and co-occurring TFs by Enhanced Chromatin Occupancy (EChO), a computational strategy for classifying TF interactions with chromatin (Figure 10¹⁴). EChO showed that multiple individual TFs can employ either direct DNA binding or “pioneer” nucleosome binding at different enhancer targets. Nucleosome binding is not exclusively confined to inaccessible chromatin, but rather is correlated with local binding of other TFs, and with degeneracy at key bases in the pioneer factor target motif responsible for direct DNA binding. Our strategy revealed a dynamic exchange of TFs at enhancers across developmental time that is aided by pioneer nucleosome binding. Pioneer factor projects have moved to the lab of Michael Meers at Washington University in St. Louis.

Figure 11: a) A Drosophila retina development model for H3.3K27M-driven pediatric glioblastoma shows inhibition (red cells) behind the morphogenetic furrow (yellow arrow), but not when the cell cycle is inhibited by p2118. b) Based on our Drosophila and glioma cell line evidence we explain the differences between replication-independent (RI, H3.3) and replication coupled (RC, H3.2) inhibition of PRC2 in terms of the different histone deposition pathways for these two histone variants.

Deregulation of nucleosome dynamics in cancer

In 2002, then-post-doc Kami Ahmad discovered that the three histone fold domain amino acids that distinguish the conserved histone variant, H3.3, from canonical H3 (H3.1/H3.2 in humans) specify replication-independent (H3.3) versus replication-coupled (H3.1/3.2) nucleosome assembly⁶⁰. His Drosophila cytological study using GFP-labeled histones also revealed that H3.3 incorporated genome-wide at active chromatin, including the active but not the inactive rDNA loci. Subsequent work from many groups built on Kami’s findings by molecular characterization of dedicated chaperones and other features of the two pathways⁶¹. In 2012, the first “oncohistones” were discovered in pediatric diffuse midline gliomas (DMGs) characterized by lysine 27-to-methionine (K27M) mutations in either H3.3 or H3.1^{62, 63}. These oncohistone mutations dominantly inhibit histone H3K27 trimethylation and silencing, but it was unknown how oncohistone type affected gliomagenesis. Again using Drosophila as a model, Kami, now a Principal Investigator in the adjacent laboratory, demonstrated that inhibition of H3K27 trimethylation occurs only when H3K27M oncohistones are deposited into chromatin and only when expressed in cycling cells (Figure 11a). Using CUT&RUN on human DMG cell lines, post-doc Jay Sarthy showed that the genomic distributions of H3.1 and H3.3 oncohistones in human patient-derived DMG cells are consistent with the DNA replication-coupled deposition of histone H3.1 and the predominant replication-independent deposition of histone H3.3¹⁸. Although H3K27 trimethylation is reduced for both oncohistone types, H3.3K27M-bearing cells retain some domains, and only H3.1K27M-bearing cells lack H3K27 trimethylation. We proposed that oncohistones inhibit the H3K27 methyltransferase as chromatin patterns are being duplicated in proliferating cells, predisposing them to tumorigenesis (Figure 11b).

In a follow-up study, Kami Ahmad used H3K27M inhibition of PRC2 in fly imaginal discs to show that simultaneous misexpression of a master regulatory TF and H3K27M results in an overexpression phenotype reminiscent of oncogenesis⁶⁴.

In other collaborative work, Jay Sarthy found that the testes-specific histone variant, H2A.B, is significantly over-expressed in a variety of tumors, including about half of all diffuse large B-cell lymphomas¹⁹. These first examples of “ready-made” oncohistones, which are potentially oncogenic without a mutation, wrap less DNA than canonical H2A and destabilize nucleosomes in vivo⁶⁵. Oncohistone cancer projects have now moved to the lab of Jay Sarthy, a pediatric oncologist at the Ben Towne Center for Childhood Cancer Research, Seattle Children’s Hospital.

Anthracyclines are widely prescribed anti-cancer drugs that disrupt chromatin by intercalating into DNA and enhancing nucleosome turnover⁶⁶. Postdoc Matt Wooten observed that treatment with the anthracycline aclarubicin leads to elevated levels of elongating RNA polymerase II and changes in chromatin accessibility in Drosophila S2 cells⁶⁷. He found that closely spaced divergent promoter pairs show greater chromatin changes when compared to codirectionally-oriented tandem promoters, and that aclarubicin treatment changes the distribution of non-canonical DNA G-quadruplex structures both at promoters and at G-rich pericentromeric repeats. Matt’s work suggests that aclarubicin’s effects on nucleosome disruption, RNA polymerase II, chromatin accessibility and DNA structures underlie its anti-cancer activity.

Figure 12: Marseillevirus nucleosomes. Nucleosomes formed from two Hb-Ha and two Hd-Hg doublets closely resemble human histone octameric nucleosomes. They differ, however, in forming densely packed nucleosomes lacking linkers inside the viral capsid.

Nucleosomes in Giant Viruses

Nucleosome cores comprised of octamers containing two molecules each of the histones H2A, H2B, H3, and H4 arranged in dimers of H2A with H2B and H3 with H4 are characteristic of eukaryotes, and contrast with histones in archaea that can assemble an indeterminate number of homodimers or heterodimers into a slinky-like “hypernucleosome”⁶⁸. The N-terminal tails of eukaryotic histones that are post-translationally modified and are important for transcriptional regulation are generally lacking in archaeal histones. In 2009 a full set of divergent histones was found in the giant virus Marseillevirus⁶⁹, with histones arranged in fused doublets, designated Hβ-Hα and Hδ-Hγ, which are specifically related to H2B and H2A, and to H4 and H3, respectively. A third doublet, Hζ-Hε, appears to be a more divergent homolog of H2B-H2A. All three doublet histones are found in the viral capsid, with abundant Hβ-Hα and Hδ-Hγ, and smaller amounts of Hζ-Hε. These histones appear to have diverged from proto-eukaryotic counterparts prior to the divergence of modern eukaryotes⁷⁰.

In collaboration with the lab of Karim-Jean Armache at the Skirball Institute, NYU, we⁷¹ and others⁷² found that Hβ-Hα and Hδ-Hγ can form nucleosomes from two molecules of each doublet that wrap 121 bp of DNA in vitro, and are very similar to eukaryotic nucleosomes. Technician Terri Bryson permeabilized Marseillevirus capsids to digest viral chromatin in virio with MNase or Methidiumpropyl-EDTA Fe(II) (MPE) and extract the DNA for MNase-seq or MPE-seq⁷³. We found that the viral genome in the capsid is tightly packed with abutting nucleosomes that wrap 121 bp, lack linkers, and show no phasing of nucleosomes over genes, unlike the characteristic “beads on a string” of eukaryotic nucleosomes and linkers (Figure 12). Similarly divergent viral histone doublets, and even triplets and quadruplets are found in many giant viruses⁷⁴, and evoke a phase in the evolution of histones in which viruses evolved histones doublets for tight genome packaging in the virion, perhaps prior to the origin of the eukaryotic nucleus and the evolution of post-translational modifications on histones to regulate nuclear gene transcription.