A Johns Hopkins engineer co-led a team that has sequenced the genome of the world’s most widely used model plant species, Arabidopsis thaliana, at a level of detail never previously achieved. Up until now, regions of this genome—including centromeres, the spindles which guide chromosomes as an organism grows rapidly from one to trillions of cells—have remained uncharted territory, due to their complex structure. Now, for the first time, researchers have revealed the secrets of the Arabidopsis centromeres, shedding light on their evolution, and providing insights into a paradox that has mystified scientists for decades. Their results were published today in Science.
“In this study, we resolve the sequence and structure of the centromeres of the most-studied plant species for genetics research: the one which we use for understanding the genetics of rice or corn or wheat or tomatoes and beyond,” said Michael Schatz, Bloomberg Distinguished Professor of Computer Science and Biology at Johns Hopkins University, who co-led the study with Ian Henderson, head of the Genetic and Epigenetic Inheritance in Plants Group at the University of Cambridge. “And though the research was in plants, it certainly has implications for human genetics, and understanding how human cells grow and divide so precisely.” The study also included co-first-author Michael Alonge, who recently completed his PhD in the Department of Computer Science at Johns Hopkins Whiting School of Engineering, working with Schatz.
Arabidopsis thaliana was adopted as a model plant due to its short generation time, small size, ease of growth and prolific seed production through self-pollination. Its fast life cycle and small genome make it well suited for genetics and to map key genes that underpin traits of interest. The small, flowering plant often found on roadsides has led to a multitude of discoveries, and, in 2000, it became the first plant to have its genome sequenced—except its centromeres, telomeres (a structure at the end of a chromosome), and a few other complex regions of the genome.
Since then, newer long-read sequencing technologies have advanced, allowing researchers to view the genome in greater than 100,000 nucleotide pieces, instead of 100 to 200 nucleotide pieces. This is thanks to the introduction of nanopore sequencing, which measures electrical current as nucleic acids are passed through a protein nanopore, a hollow structure inserted in a membrane. As DNA passes through the nanopore, different nucleic acid bases change the current in distinct ways. The resulting electrical signal is then decoded to provide the specific DNA sequence.
These data, combined with algorithmic advances that assemble the reads, means that solving the “genomic jigsaw puzzle” is suddenly possible in a way that it wasn’t previously. Critically, this also means that probing the genetic makeup of the centromere, which had previously proved to be a dead end due to its challenging structure, is now possible.
“It’s fantastic to be able to see into the centromeres for the first time and use this to understand their unusual modes of evolution,” Henderson says.
For decades, researchers have been trying to understand the paradox of how and why centromeric DNA evolves with extraordinary rapidity, while remaining stable enough to perform its job during cell division. In contrast, other ancient parts of the cell tend to be very slow evolving. This study, by revealing the genetic and epigenetic topography of Arabidopsis centromeres, marks a step change in our understanding of this paradox.
“What is amazing about it is all higher organisms use this process, including all 10 trillion cells in your body, as well as the quadrillions of cells in other plant and animal species. What is surprising about this is even though the function of centromeres was established and maintained over billions of years, the DNA sequence for centromeres is extremely variable; it’s actually one of the most variable parts of any genome,” Schatz says.
The study’s “maps” provide new insights into the “repeat ecosystem” found in the centromere, revealing an architecture of repeat arrays, which has implications for how they evolve and more. The authors’ model reveals that centromeres evolve via cycles of sequence duplication and diversification. The research team plans to use these maps as a foundation to understand how and why centromeres are evolving so rapidly.