Summary: Researchers successfully sequenced the entire Y chromosome, previously considered the most elusive part of the human genome.
This feat enhances DNA sequencing accuracy for this chromosome, aiding the identification of genetic disorders. Using state-of-the-art technologies, the team pieced together over 62 million letters of genetic code.
This breakthrough, in tandem with the previous reference genome T2T-CHM13, offers the first complete genome for those with a Y chromosome.
Over half of the Y chromosome contains very repetitive DNA, making it especially challenging to sequence.
The completed Y chromosome sequence, termed T2T-Y, paired with T2T-CHM13, presents the first-ever complete genome for individuals with a Y chromosome.
The newfound understanding of the Y chromosome has the potential to uncover previously unknown genes and their functionalities, especially related to fertility and certain genetic disorders.
What was once the final frontier of the human genome — the Y chromosome — has just been mapped out in its entirety.
Led by the National Human Genome Research Institute (NHGRI), a team of researchers at the National Institute of Standards and Technology (NIST) and many other organizations used advanced sequencing technologies to read out the full DNA sequence of the Y chromosome — a region of the genome that typically drives male reproductive development.
The results of a study published in Nature demonstrate that this advance improves DNA sequencing accuracy for the chromosome, which could help identify certain genetic disorders and potentially uncover the genetic roots of others.
DNA sequencing isn’t as simple as reading genetic material from a genome’s beginning to its end. DNA gets chopped up when it is extracted from cells, plus even the best sequencing equipment can only handle relatively small bits of DNA at a time. So, researchers and clinicians rely on special software to piece together fragments of sequenced code in the correct order like a puzzle.
A reference genome is a separate, already pieced-together genome that serves as a guide, similar to the pictures on the front of puzzle boxes. And because 99.9% of our species’ genetic code is shared, any human genome would closely match a reference.
Last year, a team from the Telomere-to-Telomere (T2T) consortium, which is made up of experts from dozens of organizations such as NIST, generated the most complete reference genome at the time by using new sequencing technologies to crack previously indecipherable regions of the genome. But cells used in that work did not contain the most puzzling of all, the Y chromosome.
“Chromosomes all contain sections of very repetitive DNA, but well over half of the Y chromosome is like that,” said study co-author Justin Zook, who leads NIST’s Genome in a Bottle (GIAB) consortium. “If you use the puzzle analogy, a lot of the Y chromosome looks like the backgrounds often do, where all the pieces look really similar.”
With this new endeavor, T2T was not starting at zero as the GIAB had already gotten the ball rolling.
The GIAB’s mission is to produce test materials, or benchmarks, that can be used to evaluate sequencing technologies or methods. The materials themselves are highly accurate readouts of specific genes that can act as an answer key for checking the results of a particular sequencing method.
NIST has rigorously analyzed several individual human genomes to create their benchmarks. While GIAB has not yet produced a benchmark for the Y chromosome specifically, the consortium has studied one genome extensively, accumulating the largest collection of Y chromosome data prior to the new study.
That data served as a jumping-off point for the new study’s authors, who focused their analysis on the best understood GIAB Y chromosome. They examined the sample with a combination of cutting-edge technologies — namely high fidelity and nanopore sequencing — that make the DNA fragment puzzle pieces larger and thus easier to assemble.
A machine-learning analysis tool and gamut of other advanced programs helped the team identify and assemble the pieces of the chromosome. More than 62 million letters of genetic code later, the authors had spelled out the GIAB Y chromosome front to back.
The researchers pitted their complete Y chromosome sequence, named T2T-Y, against the most widely used reference genome’s Y chromosome parts, which are riddled with stretches of absent code. Using them both as guides for sequencing a diverse group of over 1,200 separate genomes, they found that T2T-Y drastically improved the outcomes.
T2T-Y, in combination with the group’s previous reference genome, T2T-CHM13, represents the world’s first complete genome for the half of the population with a Y chromosome.
The newest addition could be useful in identifying and diagnosing the few known conditions related to genes in the Y chromosome. But what’s more is the new reference’s potential to shed light on new genes and their function.
“There are certainly aspects of fertility and some genetic disorders that are connected to genes in the Y chromosome,” Zook said. “But because it’s been so hard to analyze up to this point, we may not even know yet just how important the Y chromosome is.”
At NIST, Zook and his fellow GIAB researchers have developed a new benchmark based on the X and Y chromosomes assembled by T2T to help translate the potential impact of the new reference material into reality.
About this genetics research news
Author: Ben Stein Source: NIST Contact: Ben Stein – NIST Image: The image is credited to Neuroscience News
The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications.
As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished.
Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region.
We have combined T2T-Y with a previous assembly of the CHM13 genome and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.