
TeaHe Human genome projectwhich published its results 20 years ago last month, was a milestone in biology. It was also given a somewhat misleading name. After all, there is no such thing as “the” human genome. Instead, there are 8 billion individual humans, each sharing the vast majority of their dna-But that’s not all. The genome, published by the Human Genome Project in 2003, was put together from a dozen anonymous blood donors in and around Buffalo in New York State.
But there is more to life than buffalo. In short, that’s the purpose behind this week’s publication Nature, of a set of 47 new “reference” genomes taken from individuals on four continents (Africa, the Americas and both Asia). The idea of the Human Pangenome Project, the organization behind the publications, is that instead of relying on a single “reference” genome, it would be better to have several, and to ensure that between them they capture as much genetic diversity as possible. Homo sapiens as possible.
Compared to the total size of the genome, the amount of diversity in question is small. Two people chosen at random will share approximately 99.6% of their dna, It is because of this similarity that the original genome produced by the Human Genome Project has proved so useful. Its annotated strings of genetic code serve as a baseline. Other genomes can be compared with this to look for variations, whether harmful or beneficial.
Yet although human beings are mostly alike, their differences matter. For example, a relatively recent mutation means that adults with ancestors from northern Europe, or parts of India and the Middle East, are more likely to be able to digest lactose (the sugar found in milk) than elsewhere. Which variation should be considered as the standard?
Sometimes, the limitations of using a reference have direct medical consequences. a set of genes is called HLAFor example, involved in driving the immune system. They are highly variable, and mutations in them have been associated with autoimmune diseases such as type-1 diabetes. A study published in 2015 found that, because many gene-sequencing technologies are not completely accurate, about 20% of mistakes were made by comparing readouts from the region with a single reference genome. Another paper published in 2022 found that relying on reference genomes means that the details of some gene variants found in people of African ancestry, and seemingly linked to cancer, are poorly understood .
In the age of home gene-testing kits, driven by declining sequencing costs (see chart), 47 genomes may not sound impressive. But existing sequencing technologies produce incomplete results. they rely on reading small portions of it dna, and do not deal well with the long, repetitive regions that dot the genome. As Ivan Eichler, a geneticist at the University of Washington, said at a press conference: “There are complex forms of [genetic] Variation where we know the current technology doesn’t do a good job… it misses about two-thirds of them. The Pangenome Project uses new, more accurate methods. This allows researchers to spot variants that might otherwise be missed, and to gain a better understanding of how, exactly, mutations arise.
The new genomes, then, represent a major improvement on the status quo. But gaps remain. All genomes were produced from material donated to the 1,000 Genomes Project, a collection of anonymous samples that began in 2008. It suffers from a lack of donations from the Pacific Islands and the Middle East. Researchers plan to fix that. But maximizing diversity doesn’t mean sampling every part of the world equally. Most human genetic diversity is found within Africa, the ancestral homeland of the species. (The rest of the world’s people are descendants of a relatively small group that moved out between 50,000 and 70,000 years ago.)
The researchers do not intend to list every genetic variation. It would be a Sisyphean task: As Tobias Marshall, a computational geneticist at Heinrich Heine University, points out, each child is born with dozens of mutations that their parents have. Benedict Patton, a geneticist at the University of California, Santa Cruz and one of the authors of this week’s group of papers, says the aim is to reach 350 high-quality genomes. This should allow researchers to capture the vast majority of genetic variation that is thought to be out there. This would give humanity a more representative picture of one of its favorite research subjects – itself.