The histone variant cenH3 (CENP-A in mammals, Cse4 in yeast), is an essential component of centromeres1 in nearly all eukaryotes. Although there has been general agreement in the field that cenH3 nucleosomes determine centromere identity (rather than the DNA they wrap), the exact composition and structure of the centromeric nucleosome itself has been controversial. Recent developments in our lab indicate that there is a surprising diversity of centromeric structures and organizations in different organisms.
The budding yeast centromeric nucleosome is a hemisome
The debate over centromeric nucleosome structure began in 2007 with our publication of evidence for a cenH3/H4/H2A/H2B tetramer (“hemisome”) at Drosophila centromeres2 and publication by Carl Wu of evidence for a hexamer containing the budding yeast Cse4 protein, histone H4, and the non-histone protein Scm3 in place of H2A and H2B3.
Over the past several years, we have focused on understanding the “point centromere”4 of budding yeast, which is genetically defined by a ~120-bp sequence, so that we can be confident that the single cenH3 nucleosome that we have mapped there is functional. We have characterized the Cse4 nucleosome at the yeast centromere in vivo and in vitro. First, former postdoc Takehito Furuyama demonstrated positive supercoiling in vivo for functional centromeres using yeast minichromosomes and conditional mutants5. Positive supercoiling implies a right-handed DNA wrap, which is opposite to the wrapping of conventional nucleosomes6. The right-handed wrapping of DNA around the canonical histone core means that interaction surfaces between histones that prevent the nucleosome core from springing apart would be facing away from one another. Thus, right-handed wrapping is inconsistent with octamer formation, in accordance with the hemisome model for cenH3 nucleosome structure. The mutual incompatibility of nucleosomes with opposite topologies can potentially explain how centromeres are efficiently maintained as unique loci on chromosomes: Incorporation of cenH3 into chromosome arms where H3 nucleosomes dominate would create incomplete particles removed by proteolysis7,8, whereas the right-handed wrapping of centromeric DNA would resist octamer formation5. Our findings raised the possibility that DNA topology, rather than DNA sequence, underlies centromere identity and inheritance.
Further support for the hemisome model came from our application of native ChIP (ORGANIC) profiling, as described above, which included the demonstration by graduate student, Kristina Krassovsky of the presence of H2A in the particle, inconsistent with the particle being a (Cse4/H4)2 “tetrasome”9. Next, Takehito Furuyama used conventional salt dialysis to assemble either octasomes or tetrasomes in vitro simply by using either 145-7 bp DNA duplexes (for octasomes) or 62-78 bp duplexes (for hemisomes). Importantly, we found that hemisomes assembled on the 78-bp 92% A+T cen4 CDEII sequence remained stable even in 4M urea10,11. The exceptional stiffness that is predicted for CDEII DNA suggests that AT-richness evolved to favor hemisome over octasome formation.
Recently, we applied H4S47C-anchored cleavage mapping, which reveals the precise position of histone H4 in every nucleosome in the genome12. We found that cleavage patterns at centromeres are unique within the genome and are incompatible with symmetrical structures, including octasomes and (Cse4-H4)2 tetrasomes10. A single core structure is compatible with centromere cleavage patterns and distances, one in which oppositely oriented Cse4-H4-H2A-H2B hemisomes occupy one of two rotationally phased positions on each of the 16 yeast centromeres at similar frequencies within the population. Centromeric Cse4 hemisomes are stable, remaining intact under ex vivo conditions that evict “fragile” H3 nucleosomes. Our results indicated that the orientation and rotational position of the stable hemisome at each yeast centromere is not specified by the functional centromere sequence. From a chromatin perspective, the Cse4 hemisome over CDEII is an odd particle indeed: it is precisely constrained in position to the base pair, but shows full reflectional and rotational flexibility.
Fission yeast centromeres form dense arrays of unpositioned cenH3 nucleosomes
Postdoc Jitendra Thakur then applied H4S47C-anchored cleavage mapping and high-resolution ORGANIC and cross-linking ChIP to centromeric nucleosomes of the distantly related fission yeast Schizosaccharomyces pombe, with surprisingly different results. Fission yeast has classic “regional centromeres”4 with a 4-7 kb central domain of unique or low-copy sequence that is flanked by outer repeats that assemble heterochromatin. Unlike repeat-based centromeres in plants and animals, fission yeast central domain sequences can be mapped to specific locations, allowing us to use ChIP-Seq data to determine that H3 nucleosomes are virtually absent from the central domains, which are instead occupied by arrays of cenH3 (Cnp1 or CENP-A) nucleosomes. In contrast to the precisely positioned Cse4 hemisomes in budding yeast, the cenH3 nucleosomes of fission yeast are unpositioned, variably spaced, and show no evidence of rotational phasing13. The distances between cleavage fragment endpoints are consistent with nucleosomes with two H4 molecules, meaning there are few or no hemisomes, and the bulk of centromeric nucleosomes must be octasomes, hexasomes or tetrasomes. Other inner kinetochore proteins, including CENP-C, CENP-T, CENP-I (Mis6) and the cenH3 chaperone Scm3 are also found throughout the central domain with no indication of preferred kinetochore assembly sites. Inner kinetochore proteins are also found at low levels in the pericentric heterochromatin, but they appear to be less stably incorporated than they are in the central domains, suggesting cenH3 nucleosomes have greater stability in contiuous arrays.
CENP-C and CENP-T have been previously proposed to interact with H3 nucleosomes14, but our data indicate they interact primarily with cenH3 nucleosomes. In vitro, CENP-T together with its binding partner CENP-W has been shown to protect DNA from micrococcal nuclease (MNase) in a continuous manner, rather than forming discrete particles. This property as well as the variable spacing of cenH3 nucleosomes may contribute to the chromatin “smear” that has long been observed in the central domain following MNase digestion, rather than a typical nucleosome ladder of mononulceosomes, dinulceosomes, trinulceosomes, etc.
Though both fission yeast centromeres and the repeat-based centromeres of most plants and animals are called “regional”, they differ dramatically in structure, organization, and size. In plants and animals centromeric DNA is typically comprised of megabases of ‘satellite’ repeats, on which alternating arrays of H3 and cenH3 nucleosomes assemble. The highly repetitive nature of these sequences has been an obstacle to assembling centromeric sequences and mapping cenH3 nucleosomes. To circumvent this difficulty, we used a “bottom-up” approach to understand the organization of human centromeres. Using high-resolution native ChIP Seq with 100 x 100 paired-end reads to obtain functional centromeric sequences, we clustered sequence data to find the most abundant sequences that assemble cenH3 and therefore represent the functional centromere15. We found that the sequences were dominated by two distantly related families of alpha satellite dimers of 340 and 342 bp that comprise longer arrays on at least 20 of the 23 human chromosomes. The two halves of the dimers are separated by a CENP-B box, the 17 bp recognition sequence for binding the CENP-B protein, and cenH3 (CENP-A) nucleosomes are precisely positioned on a 100 bp sequence in each monomer, with a 60 bp linker containing the CENP-B box between them. CENP-C ChIP is nearly identical, producing the same set of dimers and protecting the same 100 bp positions from MNase with added protection of the CENP-B box. The precisely positioned 100 bp particles suggest a single wrap of DNA as in budding yeast hemisomes. On more divergent alpha satellite, positioning rapidly becomes less precise and other sizes of protected particles emerge, such as a ~130 bp particle that may be consistent with octasomes. The cenH3 occupancy of the most homogeneous, youngest dimers supports a model of tandem repeat evolution by unequal crossover, with progressively more divergent monomers in the sequences at the centromere edges, including the higher order repeats (HORs) that have been mapped at the edges of human centromeres.
In contrast to what we observed in fission yeast centromeres, Jitendra Thakur found no enrichment of CENP-T over alpha satellite using our standard native ChIP protocol, but under low MNase conditions we observed modest enrichment, suggesting that CENP-T localization is unusually sensitive to MNase16. Using MNase cross-linking ChIP, which is expected to link kinetochore components together, we obtained robust enrichment of CENP-T on alpha satellite, confirming that CENP-T was being lost in during chromatin preparation in native ChIP. Extremely similar size distributions of X-ChIP-seq fragments from CENP-A, -C, and -T mapped onto the same alpha satellite sequences suggested that all three are present in the same large complex on dimeric alpha satellite. To verify this, we expressed CENP-A –FLAG and performed sequential ChIP on anti-FLAG precipitated DNA fragments with antibodies to CENP-A, -B, -C, and –T, and found all to be enriched over anti-GFP and input controls, indicating these proteins are found in the same complex, the Constitutive Centromere-Associated Complex (CCAN). X-ChIP profiles for CENP-A, -C, and -T on homogenous dimeric alpha satellite all gave the same profile of a single complex encompassing the positions of the 100 bp CENP-A/C particles as well as the 60 bp linker containing the CENP-B box observed in N-ChIP, suggesting CENP-T fills in the linker region between CENP-A nucleosomes. This is consistent with the co-localization of CENP-A, -C, and –T observed in fission yeast13 and clarifies how CENP-C and CENP-T can interact genetically17, in contrast to models in which CENP-T interacts with H314.
The low salt conditions typically used in ChIP leave more than 80% of kinetochore proteins insoluble, raising the possibility that we were looking at a structurally distinct fraction of CCAN complexes18. Indeed CCAN complexes extracted by native ChIP in high salt yielded heterogeneous larger fragments of ~100-450 bp. By combining classical salt fractionation of chromatin with CUT&RUN (CUT&RUN.Salt), in which MNase is tethered and does not nibble or cut particle fragments internally, we observed primarily fragments of ~160-185 bp with a smaller peak ~340 bp, regardless of salt solubility. Low salt fragments were less enriched in for CENP-B, suggesting that CENP-B contributes to stability, and both CENP-B box density and match to the CENP-B box consensus sequence correlated with the efficiency of CCAN formation on a- satellite dimeric arrays. Surprisingly we found a diversity of CCAN structures on neighboring dimers that diverged by as little as 5%, with sharply different occupancies on some adjacent monomers, and with differences in orientation of the complex relative to the CENP-B box.
In work with our collaborators in the laboratory of Jiming Jiang, we find translational and rotational phasing of cenH3 particles of ~100 bp in rice, similar to what we see in human centromeres. Rice Cen8 has both unique sequences and satellite repeats in the centromere, and cenH3 nucleosomes are less precisely phased on unique sequences, suggesting tandem repeats evolve to favor the translational and rotational phasing of cenH3 nucleosomes19. Rotational phasing is thought to contribute stability to the nucleosome, which may give these tandem repeats an advantage in the competition between centromere variants for inclusion in the egg or megaspore in asymmetric female meiosis, where only one of two variants will survive to be passed into the next generation20. This competition, known as centromere drive, may favor large arrays of precisely phased ~100 bp nucleosome particles, whereas in the symmetric meiosis of fission yeast, precise positioning may be irrelevant, since there is no competition between variants.
The centromere drive model was proposed to explain the rapid evolution of CENP-A (Cid) between the sibling species Drosophila melanogaster and D. simulans21. Consistent with rapid evolution of centromeres driving divergences of CENP-A proteins, the centromeres of D. melanogaster and D. simulans are dramatically different22. In D. melanogaster centromeres comprise a few families of short 5- and 10-bp repeats, whereas D. simluans centromeres comprise a complex family of diverged 500 bp repeats. Comparison of the quantity of each repeat found across related Drosophila species indicates that individual centromere repeat families are expanding in the lineages where they serve as centromeres. This is consistent with the centromere drive model, which predicts that satellite expansions may be able to recruit more CENP-A and form a ‘stronger’ centromere that favors their own inclusion in the egg, as has been recently seen in mice23. It also provides support for with the notion that rotational phasing of nucleosomes may be advantageous. Because a complete turn of the DNA double helix takes about ten base pairs, a 10-bp repeat will always present the same face to the histone octamer, and a 10 bp periodicity of an AA dinucleotide, present in the D. melanogaster 5- and 10-bp repeats, reduces the energy of wrapping and stabilizes the nucleosome24, as in rice. Rotational phasing of nucleosomes may therefore stabilize centromeres against the pulling forces of the spindle in anaphase and favor their own inclusion in the egg.
In some plants and animals, the entire chromosome appears to act as a centromere. Previous work showed that in nematodes cenH3 (hcp3) could be found throughout the chromosome, but the exact sites and organization of cenH3 nucleosomes were unknown. Former postdoc Florian Steiner, now at the University of Geneva, used high resolution native ChIP to find ~700 sites in the genome of the nematode Caenorhabditis elegans that had high occupancy for cenH325. The same sites were enriched in CENP-C ChIP. The cenH3 nucleosomes protect ~100 bp of DNA from MNase, similar to what has been observed in humans, rice, and budding yeast. The high occupancy sites each have a single cenH3 nucleosome flanked by well-positioned H3 nucleosomes, similar to the point centromeres of budding yeast, leading to a view of holocentromeres as dispersed point centromeres. These point centromeres share a consensus DNA motif, which is extremely similar to the consensus for transcription factor hotspots, sites where multiple transcription factors bind. In non-dividing cells, cenH3 is undetectable, but the centromeric sites become occupied by transcription factors.
Holocentromeres have evolved in other plants and animals, and are common in insects, where they have arisen independently at least four times. Postdoc Anna Drinnenberg, co-mentored by Harmit Malik and now at the Curie Institute, discovered that insect holocentromeres are completely different than nematode holocentromeres, and indeed from all other centromeres, since they lack cenH3 and CENP-C, though they retain outer kinetochore components and some inner kinetochore components26. Surprisingly, even though the four clades of insects that have holocentromeres diverged from monocentric insects separately at times more than 100 million years apart, all four lineages have lost cenH3, suggesting some ancient change in the insect kinetochore that tolerates and perhaps facilitates loss of cenH3 and transition to holocentromeres.
Non-B form DNA in centromeres
The obvious plasticity of centromere structure and lack of conservation of centromere sequences has led to the view that centromeric DNA sequence does not matter. Yet Graduate student Siva Kasinathan found that a surprisingly broad range of centromeres and neocentromeres are enriched in the potential to form non-B form DNA. Non-B form DNA regions can be detected by Permanganate/S1 nuclease sequencing27,28. Such partially denatured non-B form DNA regions were found to be abundant in both human α-satellite and mouse major and minor satellites from activated B-cells29. Non-B DNA is has been frequently predicted to form at satellite centromeres from many eukaryotes29-36. A form of non-B DNA that could account for its detection in activated mouse and human B-cells is cruciform extrusion, promoted by short (<10 bp) dyad symmetries. Significant dyad symmetries are widespread at centromeres throughout the eukaryotic domain, including satellite centromeres of primates, mouse, horse, chicken, stickleback, and plants, regional centromeres of fission yeast, and point centromeres of budding yeasts29. Interestingly, non-proliferating (“resting”) mouse B-cells showed reduced levels of centromeric non-B DNA, which is consistent with the possibility that proliferation induces cruciform extrusion for “seeding” centromeres.
Although the mechanism whereby non-B DNA forms at centromeres is unknown, two hypotheses have been proposed29. One is that the 4-way junctions of cruciforms are bound by Holliday junction binding activity of HJURP and its Scm3 ortholog in yeast, whereupon HJURP would load CENP-A/H4. Alternatively, non-B DNA might result from transcriptional initiation, where melting of DNA is required for engagement of Pol II, and from Pol II elongation, which moves the denaturation bubble forward. These hypotheses are not mutually exclusive, and in both cases the enigmatic CENP-B sequence-specific DNA binding protein likely plays a role. Centromeres that are not predicted to be enriched for non-B form DNA, such as our own, may have DNA-binding proteins like CENP-B or Reb1 that induce sharp bends in DNA, which may serve the same function of initiating non-B form DNA to relieve the stress of accommodating a 60° bend in the DNA. Given the enrichment of non-B form DNA at centromeres throughout the eukaryotic domain, it seems likely that this feature of centromeres can provide a basis for centromere specification despite the lack of primary sequence conservation. The ability of centromeres to support non-B form DNA may be an evolutionary constraint on them, quite independent of centromere drive.
We continue to be intrigued by the diversity and evolution of centromeres and pericentric regions, and our application of powerful epigenomic tools described here should allow us to gain insights into what has long been considered an intractable problem.