The exponential decrease in the cost of DNA sequencing has revolutionized genetics and genomics, but we still have only a sketchy understanding of how the genome is packaged, read and interpreted to program the astonishing complexity of cells and organisms. To chip away at long-standing questions in developmental biology, evolution, cellular physiology and disease research we apply genomic tools to Drosophila, yeast and mammalian models. We ask questions such as: How and why are centromeres so different from the rest of the genome? How can replication and transcription machineries move along a DNA template that is tightly wound around nucleosome cores? How are genes silenced? How are gaps created in the nucleosome landscape for gene activation? What is the basis for the specificity of DNA-binding proteins? To facilitate our research on these problems we have developed several experimental and computational tools over the years.
Over the past decade we have introduced genomic tools to probe the dynamic structure of the chromatin landscape and explore its relationship to gene regulation and centromere function. These tools include salt fractionation, a method to separate classically ‘active’ from ‘silent’ chromatin; CATCH-IT, a metabolic labeling strategy to directly measure nucleosome turnover; INTACT, a cell-type-specific nuclear purification method to determine chromatin differences between tissues; ORGANIC, a method for mapping native chromatin at base-pair resolution; 3'NT, a method for determining the last base added onto a nascent chain within the active site of RNAPII; TMP-seq, a method to map DNA torsion genome-wide; MNase X-ChIP-seq, a high-resolution cross-linked chromatin immunoprecipitation (ChIP) protocol for large insoluble complexes, ChEC-seq, an in situ alternative to ChIP; MINCE-seq, a metabolic labeling strategy for observing changes in nucleosomes and transcription factors (TFs) during DNA replication; H3 chemical cleavage mapping and Quantitative MNase-seq for precise genome-wide mapping of single nucleosomes and linkers in vivo; and CUT&RUN, which uses antibody-targeted tethered MNase to release DNA adjacent to proteins of interest. The low background and high resolution of CUT&RUN permits profiling with very low cell numbers and requires about one tenth the sequencing depth of ChIP-seq. CUT&RUN has spawned variations including AutoCUT&RUN for economical high-throughput profiling of cell and tissue samples, CUT&RUN.salt and CUT&RUN.ChIP for profiling chromatin complexes based respectively on chromatin solubility and complex composition, and CUT&Tag, which outperforms previous chromatin profiling methods for small samples and single cells. The high efficiency of CUT&RUN and CUT&Tag, together with the SEACR and EChO computational tools designed to analyze the high-resolution maps that result, have led to new insights into nucleosome dynamics and transcription factor binding.
Nucleosomes are disrupted during transcription and other active processes, but the structural intermediates during nucleosome disruption in vivo are unknown. To identify intermediates, we mapped subnucleosomal protections in Drosophila cells using Micrococcal Nuclease followed by sequencing. At the first nucleosome position downstream of the transcription start site, we identified unwrapped intermediates, including hexasomes that lack either proximal or distal contacts. Inhibiting topoisomerases or depleting histone chaperones increased unwrapping, whereas inhibiting release of paused RNAPII or reducing RNAPII elongation decreased unwrapping. Our findings indicate that positive torsion generated by elongating RNAPII causes transient loss of histone-DNA contacts. Using this mapping approach, we found that nucleosomes flanking human CTCF insulation sites are similarly disrupted. We also identified diagnostic subnucleosomal particle remnants in cell-free human DNA data as a relic of transcribed genes from apoptosing cells. Thus identification of subnucleosomal fragments from nuclease protection data represents a general strategy for structural epigenomics.
We have used MINCE-seq to characterize the genome-wide location of nucleosomes and other chromatin proteins behind replication forks at high temporal and spatial resolution. We found that the characteristic chromatin landscape at Drosophila promoters and enhancers is lost upon replication. The most conspicuous changes are at promoters that have high levels of RNAPII stalling and DNA accessibility and show specific enrichment for the BRM remodeler. Enhancer chromatin is also disrupted during replication, suggesting a role for TF competition in nucleosome re-establishment. Thus, the characteristic nucleosome landscape emerges from a uniformly packaged genome by the action of TFs, RNAPII, and remodelers minutes after replication fork passage. MINCE-seq thus provides a first glimpse into the dynamic processes that establish and maintain the chromatin landscape every cell generation.
A class of histone variants in which we have a long-standing interest mediates chromosome segregation. Centromere-specific histone H3 variants, called cenH3, CENP-A (in mammals), or Cse4 (in yeast), mark the location of the kinetochore, which attaches to microtubules to segregate chromosomes in mitosis and meiosis. We previously showed that cenH3 nucleosomes of budding yeast wrap DNA to form positive supercoils, in contrast to conventional nucleosomes, which form negative supercoils. Later, we precisely characterized this “point” centromeric nucleosome in vitro and in vivo, and contrasted its unique features with those of the “regional” centromeres of fission yeast. In repeat-based centromeres of plants and animals, satellite sequences position cenH3 nucleosomes, so that they are translationally and rotationally phased. Applying new experimental and computational tools, we have elucidated the molecular organization of animal and plant centromeres embedded in homogeneous satellite repeats, which had proven intractable using conventional mapping strategies. Using hierarchical clustering of sequences immunopreciptated by kinetochore proteins of the constitutive centromere-associated network (CCAN), we found that a unique chromatin complex occupies young dimeric α-satellite arrays that dominate functional human centromeres, occupied by CENP-A, CENP-B, CENP-C and CENP-T. Using CUT&RUN.Salt to map the centromere complex to human α-satellite dimers we discovered a surprising diversity of CCAN structures on neighboring dimers that diverged by only ~10%, raising questions about the evolutionary processes that have resulted in such an extraordinary degree of structural diversity at human centromeres.
Despite the conserved function of centromeres, necessary at every cell division, centromere sequences are not conserved and show a paradoxical diversity in both sequence and organization. This is consistent with the centromere drive model, which predicts that tandem repeats will compete in female meiosis for inclusion in the egg, rather than be lost in polar bodies. If an expanded satellite array can attract more CENP-A nucleosomes, it may become favorably oriented to be passed into the egg. Recent expansions of 5- and 10-bp repeats in Drosophila melanogaster and a 10-bp WW dinucleotide periodicity (W =A or T) in CentO repeats from rice suggest that rotational phasing may stabilize cenH3 nucleosomes against the pulling forces of the spindle in anaphase and thereby favor their own inclusion in the egg.
Centromeres also appear to be selected for their content of non-B form DNA. Cruciform extrusion is a form of non-B DNA that is promoted by short (<10 bp) dyad symmetries, which are widespread at centromeres throughout the eukaryotic domain, including satellite centromeres of primates, mouse, horse, chicken, stickleback, plants, regional centromeres of fission yeast, and point centromeres of budding yeasts. We have proposed that the 4-way junctions of cruciforms are bound by Holliday junction binding activity of the dedicated CENP-A chaperone HJURP and its Scm3 ortholog in yeast. Given the enrichment of non-B form DNA at centromeres throughout the eukaryotic kingdom, we suggest that this feature of centromeres can provide a basis for centromere specification despite the lack of primary sequence conservation.