Human genome project twenty years later
Scaling the highest peak of biological achievement: the first human genome sequenced in the world
Today you can sequence your entire genome in mere hours but the very first human genome sequence that was decoded was an enormous undertaking of thousands of scientists from around the world in a project spanning more than a decade and at an estimated cost of $3 billion dollars of taxpayers’ money!
On this day (June 26th) in 2000, the completion of a working draft of the human genome reference was announced at the White House by President Bill Clinton and Britain’s Prime Minister Tony Blair who were accompanied by representatives from two groups that completed the genome independently at the same time, “We are here to celebrate the completion of the first survey of the entire human genome. Without a doubt, this is the most important, most wondrous map ever produced by humankind.” If you are a fan of genetics and DNA testing, this sure was a fun day full of great quotes from Clinton and others. What did Clinton mean by “completion of the first survey”? He meant that they were cheating a bit by making the announcement as the genome was not completed (and in fact, it has never been “fully” completed to this day). The first draft DNA sequence of the human genome contained gaps and errors, but represented about 95% of all genes, so it was certainly enough for the big fanfare, feel-good announcement. Ambassadors from the United Kingdom, Japan, France and Germany were in the audience representing the major contributing countries in this international endeavour, along with James Watson (who would later become the second person in history to have their genome decoded) as well as many other scientists involved in the Human Genome Project.
Clinton had a great day that day. He proclaimed, “In coming years, doctors increasingly will be able to cure diseases like Alzheimer's, Parkinson's, diabetes and cancer by attacking their genetic roots. Just to offer one example, patients with some forms of leukemia and breast cancer already are being treated in clinical trials with sophisticated new drugs that precisely target the faulty genes and cancer cells, with little or no risk to healthy cells. In fact, it is now conceivable that our children's children will know the term cancer only as a constellation of stars.” Perhaps the notion of not knowing what cancer is seems too optimistic (unless we are willing to rewire our genetic code into a new form of synthetic species) but in the two decades since that announcement, indeed the ability to sequence human genomes has ushered in a medical revolution, especially in the area of cancer treatment options.
Fighting for bragging rights: the world against one smart company
But who were the two groups that raced towards the first human genome sequence, and along the way became bitter rivals? The Human Genome Project represented by Dr. Francis Collins, the director of the National Institute of Health, led the publicly funded effort. The other group was a private company, Celera Genomics - represented by Dr. J. Craig Venter who was its president at the time.
Celera was well equipped for this challenge against the entire scientific world because of Dr. Venter’s work at The Institute for Genomic Research where he had previously participated in sequencing the first complete genome of any living organism - that of the Haemophilus influenzae bacterium. So he came with ample experience. Then he became president of the Celera company, which was formed for the purpose of commercializing genomic information, but they entered the race to decode the first human genome pretty late in the game, in 1998, compared to the Human Genome Project which had commenced its efforts in 1990. To catch up Celera adopted a new type of approach to decoding the DNA, called the shotgun approach, which focused on sequencing many fragments in parallel to be subsequently combined into a whole by using computer programs that could figure out where the different fragments overlaped in their sequence.
This process was so successful, that the Human Genome Project consortium was forced to adopt the new strategy or be left in the dust. At the time there was lots of criticism and bitterness against Celera from the public sector because Celera was able to capitalize on the publicly available free data from years of scientific efforts in order to catapult their process of decoding the human genome and so they could jump dramatically ahead in the race with the Human Genome Project consortium. Those arguments are somewhat ridiculous though. That is like saying, we would have preferred to take a decade longer to complete the human genome and it is not fair that a private company can use public data to improve scientific knowledge. After all, in 1998 when Celera decided to enter the competition and show the world how to “get it done”, not even 10% of the human genome was decoded.
In the end, Celera managed to get the human genome done at approximately one tenth of the astronomical cost of the Human Genome Project and in just a fraction of the time. The final draft of the genome was published three years later.
In the final product, the human genome was made up from sequences of about 50 people, the bulk of which came from one volunteer, termed RP11, now thought to be of African American heritage. While the final version has undergone multiple enhancements, as the understanding of the complexity of the human genomes continues to grow, this has become the reference genome for most currently sequenced human genomes. If you have sequenced your genome, or will decide to sequence your genome, it will be this reference that will guide how the millions of pieces of your DNA code should be put back together to give you your own blueprint of life. You can decode your DNA without the use of a reference (termed de novo genome sequencing), but that requires different technology and you would be lucky if you could tap into such an approach (but if you are really so inclined, we can help you out). Practically all current human genomes that get decoded are assembled against this reference.
How many genes in human genome?
One of the big surprises that came out of the publication of the first human genome was how few genes the human genome actually contains, with there being approximately 21,000 genes. The original strategic plan of the Human Genome Project expected to discover at least 100,000 genes! Actually the truth is that we still do not know total number pf genes in the human genome as disagreements between different databases exist. By comparison, the recently published genome of wheat has over 100,000 genes! But the reason why we can have so few genes compared to other organisms is that our genes are made up of independent blocks of DNA that get copied and stitched together to produce RNA (it is the RNA that is then used as a blueprint for production of proteins) that can be used in vary diverse ways. There are alternative ways to how these gene blocks (they are called exons) are spliced together in the final RNA, so that one gene can produce multiple different versions of RNA (these are called transcripts) which can then produce different versions of proteins. Furthermore, there are alternative start sites and stop sites where the gene DNA is copied into RNA (called transcription start and transcription stop, respectively). So, while we might have smaller number of genes, almost all human genes can be used for a number of alternative products. Thus, the reality is that we have no clue at the moment how many different transcripts truly exist and how many different proteins can be produced in such way.
The discovery of the genetic map has had important implications in the study of human diseases which currently encompasses over 6,300 genetic disorders compared to only a few dozen genetic disorders known prior to the start of the Human Genome Project. It has not only enhanced our ability to treat and prevent disease, it has also transformed our understanding of the non-coding DNA - that part of the genome that leads to the production of proteins - which serves as the vast majority of human DNA. This not only included the complex web of regulation of genes themselves, but also the discovery of RNAs that are not just used for protein production but function biologically in their own right.
The Human Genome Project has created a template for successful large-scale international scientific projects to benefit mankind. International scientific consortiums to drive discovery processes are now commonly the norm rather than the exception. For example, efforts are now under way to unify all relevant medical digital knowledge in an effort to prevent disease and not just treat the outcome. In a bid to improve the scope of treatment, the assembly of all the collective molecular knowledge of the human body, as well as monitoring of its behaviour, is being used to build predictive models of potential future outcomes based on molecular markers found in the body. We are already beginning the era of personalized medicine and with that era also comes the enhanced ability of preventive medicine.
So thanks to the efforts of this unification of medically relevant information, instead of each of the data sources being examined separately, they are all examined together to see how the different medically relevant information can correlate amongst each other for better and deeper understanding of how the body functions in health and disease.
The future of human genomics
Where do we go from here?
While DNA testing is continuously expanding its reach in medical research, full genome sequencing of many more humans is still needed. Especially the de novo sequencing mentioned above. A constant increase in human genomic data leads to constant growth in the understanding of genetic events affecting health. As this process of discovery becomes more refined, it is now allowing the study of complex genetic health conditions which can have contributing genetic factors from many different locations spread out across the genome. The sum influence of many such mutations are now being assembled into what is referred to as polygenic risk scores.
De novo sequencing is important as it helps uncover the vast complexity of the human genome we have yet to fully grasp, far beyond what is currently being captured by just the human genome reference. In one recent example, de novo assembly of 31 human genomes revealed enough new non-reference DNA sequences that were equivalent to over 4% of the current human genome reference size. That is just one instance of discovery! The genetic diversity among different ethnicities can be so large that ethnicity-specific human genome references are being created around the world.
Using de novo assembly along with long read DNA sequencing (technology that can decode very long stretches of DNA at a time) will help to close the remaining gaps in the human genome - most of which are concentrated around centromeres. Centromeres are the areas of DNA that are used for the separation of chromosomes during cell replication and division. Chromosomes are structures of tightly packed DNA for the purpose of cell division. This is often how DNA is presented to us in popular media, but in reality, it is like a massive tangle of yarn - although how that seemingly messy structure is actually governed still needs to be understood. The DNA around centromeres is made up of sequence codes repeated over and over which makes its assembly challenging, and that is the reason why it has not been properly mapped. Thus far only the centromere of Y chromosome has been mapped and not that long ago. Over time we are likely to learn that each chromosome might have alternate structures among the different populations of the world, and many more surprise gaps will be filled over time. No one can truly know how much of the actual human genome sequence is still missing from what would be considered a complete assembly, but estimates range from 1 to 10 percent. A full understanding of these differences will be important for the elucidation of different patterns of disease predisposition among different ethnicities.
Another frontier to be confronted is achieving population-wide genomic comprehension, meaning having a grasp of the genomic identity of an entire population. This would have a profound impact on the provision of healthcare as it would allow providers to know with a higher precision what the expected clinical diagnoses could be for people affected with genetic predispositions to diseases – a good step towards enabling a more preventative approach to healthcare. This is quite an achievable goal as it only requires a small percentage of the population to be genetically sequenced to get a good genetic grasp on the entire population.
The final frontier, and perhaps the most controversial, will be the notion of altering our genetic code in a bid to improve our genetic destiny. We have already touched on this topic elsewhere, from a concept that Merogenomics supports, such as preventing the inheritance of genetic diseases (this concept is also supported by the majority of people in past polls), to a far more controversial concept like designer babies (a concept definitely not supported by the population in polls). But imagine if we could prevent aging as we know it. Would that be a concept to entice our species to re-engineer ourselves? We have already breached that barrier of redesigning the genetic code in human beings to a massive global condemnation.
How we move forward with this power to understand, map and even transform the human genome code depends on our collective concept of what we see as right and wrong at a given moment in time. The current uproar about past social and historical injustices against different groups of people, or the continuing exploitation of underserved and economically destitute members of our global society clearly indicates how much our moral compass can shift from generation to generation. Thus, we cannot know now how that might guide our future approach to genetic manipulation but luckily the current prevailing guidance promotes safety, greater depth of understanding prior to extravagant experimentation, and ever greater compassion towards participants.
On this twentieth anniversary of the unveiling of the first draft of the human genome, the future of genetics looks bright and promising. And that is just a beginning.
This article has been produced by Merogenomics Inc. and edited by Jason Chouinard, BSc. Reproduction and reuse of any portion of this content requires Merogenomics Inc. permission and source acknowledgment. It is your responsibility to obtain additional permissions from the third party owners that might be cited by Merogenomics Inc. Merogenomics Inc. disclaims any responsibility for any use you make of content owned by third parties without their permission.
Products and Services Promoted by Merogenomics Inc.