The Human Genome Project: Hype meets reality

I’ve had the immense good fortune to have trained and ultimately become a physician-scientist during a time when the pace of discovery and the paradigm changes in science have occurred just over the course of my career in medicine and science has been staggering. microRNA, the shift from single gene studies to genomics, the development of targeted therapies, the completion of the Human Genome Project, these are but a few examples. Of course, arguably the Human Genome Project is the granddaddy of all of the huge changes and paradigm shifts that has occurred to revolutionize biomedical research. Back when I was in and then later graduate school, it was inconceivable to me that we would ever be able to sequence the entire human genome in my lifetime. Back then, DNA sequencing was a tedious affair requiring tricky reactions, difficult-to-pour gels to separate nucleotide fragments, and hours of pouring over radiographs and manually matching fragments. Indeed, my PhD thesis project in the early 1990s involved cloning gene whose length was approximately 2,300 bases; it took months to sequence once it was isolated, and the tools to search DNA sequence databases to verify that it was a novel gene were primitive at best, involving e-mailing the sequence to an NIH server and then waiting for the results to come back. Does anyone remember doing that?

Arguably the most useful spinoffs of the Human Genome Project and Craig Venter’s competing project to sequence the human genome were the high throughput techniques developed to sequence DNA rapidly and to match and line up the appropriate sequences to produce an actual sequence of each chromosome, as well as the computational tools to analyze the results. The results over the last decade have been nothing short of paradigm changing. Add to that the ability to analyze the levels of gene expression of every gene in the human genome simultaneously on a chip, which, while not part of the genome project did use similar technology and computational tools, and the revolution that has occurred in molecular biology is unprecedented. Virtually overnight, we have gone from studying single genes, looking for the effects of increasing or decreasing their level of expression and studying the function of their individual protein products, to studying numbers of genes grouped into networks of similar function and related signaling tied together in “hubs” whose perturbation may be at the heart of much of the dysfunction leading to cancer and other diseases. There’s just one problem (well, there are severals, but I’m going to address primarily one problem in this post), and this problem is described by an article that appeared in the Sunday edition of the New York Times in an article by Nicholas Wade entitled A Decade Later, Genetic Map Yields Few New Cures:

Ten years after President Bill Clinton announced that the first draft of the human genome was complete, medicine has yet to see any large part of the promised benefits.

For biologists, the genome has yielded one insightful surprise after another. But the primary goal of the $3 billion Human Genome Project — to ferret out the genetic roots of common diseases like cancer and Alzheimer’s and then generate treatments — remains largely elusive. Indeed, after 10 years of effort, geneticists are almost back to square one in knowing where to look for the roots of common disease.

Disappointingly, this is more or less true. The Human Genome Project did spawn a revolution in studying the biology of disease. Unfortunately, that revolution has not yet made it to using the information to develop treatments for diseases that plague humanity, particularly diseases that claim the most lives (heart disease and cancer) or the most quality of life (Alzheimer’s disease, for example). These are common diseases, particularly heart disease and cancer, although in all fairness it should be pointed out that cancer is not a single disease. Be that as it may, here’s an example Wade provides to show how the results of the Human Genome Project have thus far been disappointing when applied to trying to predict or treat human disease:

One sign of the genome’s limited use for medicine so far was a recent test of genetic predictions for heart disease. A medical team led by Nina P. Paynter of Brigham and Women’s Hospital in Boston collected 101 genetic variants that had been statistically linked to heart disease in various genome-scanning studies. But the variants turned out to have no value in forecasting disease among 19,000 women who had been followed for 12 years.

The old-fashioned method of taking a family history was a better guide, Dr. Paynter reported this February in The Journal of the American Medical Association.

This is the paper to which Wade is referring. Basically, what Paynter et al did was to to examine a cohort of 19,313 initially healthy women enrolled in the Women’s Genome Health Study and followed for a median period of 12.3 years and construct genetic risk scores from the National Human Genome Research Institute’s catalog of genome-wide association study (GWAS) results published between 2005 and June 2009. The endpoints were myocardial infarction, stroke, arterial revascularization, and cardiovascular death. Unfortunately, what they found was that the genetic risk score developed from the 101 single nucleotide polymorphisms (SNPs) did not predict cardiovascular disease as manifest by MIs, strokes, or the need for angioplasty or coronary artery bypass surgery, nor did it predict death from heart disease. Clearly, this study was quite disappointing, although the results of the recent paper on the genetics of autism may give some hope that this is not the final word because that paper found that it was primarily uncommon SNPs that were associated with autism. A major limitation of this paper, as discussed by the authors, was that it looked only at common SNPs:

Limitations of our study merit consideration. As suggested by the strong effect of family history on cardiovascular disease risk, there is a substantial risk component due to genes and shared environment, which may be elucidated by future genetic research. While the NHGRI catalog is based on all available published genome-wide studies, these have focused to date only on common SNPs and, thus, we also were unable to assess the potential contributions of rare alleles. However, if only discovered through a major increase in sample size, it is possible that unidentified variants will have increasingly small effects.22 It also may be possible in the future to obtain stable estimates of the exact effect or HR for use in a weighted score and to find interactions between genes or within genes and other markers, both of which may improve predictive ability.

As Wade states, describing a second study derived from the Human Genome Project, the human HapMap, which catalogues common genetic variants in European, East Asian and African genomes:

It now seems more likely that each common disease is mostly caused by large numbers of rare variants, ones too rare to have been cataloged by the HapMap.

Which is exactly what Pinto et al found regarding autism, by the way.

Still, it’s hard to rate the results of attempts thus far to apply findings from the the Human Genome Project to predicting or treating disease as anything more than highly disappointing thus far, at least to the public who funded the project. Part of the problem was the hype around the Human Genome Project during the 1990s, when it was being carried out, and particularly shortly after its results were announced and published in 2000. At the time, as Wade points out, Francis Collins, who was in charge of the Genome Project at the time, was as guilty as anyone of feeding this hype. Remember, he predicted that the genetic diagnosis of diseases would be accomplished in ten years (i.e., right about now) and that five years after that the treatments and cures would start rolling out, something that now appears unlikely. Clearly, those grand predictions have not panned out to the extent expected in those heady days right after the human genome sequence and map were first published. The pharmaceutical industry has spent billions of dollars, as Wade points out, and by and large failed to come up with the expected results, largely because the genetics of most common human diseases is far more complex than we had expected. Wade quotes Harold Varmus, who gets it quite right:

“Genomics is a way to do science, not medicine,” said Harold Varmus, president of the Memorial Sloan-Kettering Cancer Center in New York, who in July will become the director of the National Cancer Institute.

The last decade has brought a flood of discoveries of disease-causing mutations in the human genome. But with most diseases, the findings have explained only a small part of the risk of getting the disease. And many of the genetic variants linked to diseases, some scientists have begun to fear, could be statistical illusions.

None of this is surprising to those who kept a level head on their shoulders ten years ago. For one thing, as has been pointed out ad nauseum on this blog, whenever you look at large numbers of anything and try to link them to something, there will be many false positives, and there will be a lot of noise. Because of the enormous amount of data generated in GWAS, it’s not at all surprising that, statistical tests notwithstanding, that most of the associations detected would be due to chance or to statistical flukes. This is particularly true since scientists don’t actually sequence the genomes of people in these studies. Rather, they look for sites in the genome where many people have a variant bit of DNA, known as the single nucleotide polymorphism, or SNP. When you start looking for differences in 1.2 million SNPs, you will find them. Lots of them. It isn’t the SNPs per se that tell us a lot, but rather the genes implicated by the SNPs, and, more importantly, the biological pathways and functions of the networks of genes implicated by them. That’s why I liked Pinto et al so much. That study implicated not just SNPs, but identified potential biological pathways that are altered in autism and autistic spectrum disorders. This is information that scientists can sink their teeth into in order to really understand the biology of autism. From the understanding of biology will eventually emerge treatments. Moreover, as was pointed out, the cost of sequencing a genome has fallen dramatically over the last decade. Within the next year, it is thought that the cost will fall to between $5,000 and $10,000 to sequence one genome, and I have been to talks where it is predicted that within three years the cost will fall further to around $1,000.

I’d pay $1,000 to have my genome sequenced. That’s cheaper than a typical MRI scan.

One thing that I couldn’t help but notice in the blogospheric discussions of this article is that Wade’s description of evolution as it relates to the Human Genome Project and applying its results to human disease is what came under the most fire, with the broader medical implications of the article mostly ignored. In particular, Larry Moran, P.Z. Myers, and Jonathan Eiesen are particularly peeved that Wade commented that the number of genes in humans is “astonishingly small” compared to “lower” animals like the roundworm and and fruit fly, which have comparable numbers of genes to humans. I guess it’s just the difference between me as a physician, who saw that the main point of the article is the difficulty we’ve discovered over the last decade in translating genetic information into treatments for common human diseases, and that of evolutionary biologists. In fact, the whole bit by Wade about the genomes of worms and flies compared to human genomes struck me as almost a throwaway point not necessary to the article, and leaving it out would have allowed the science blogosphere not to be distracted from the main point of the article. While I can see Larry’s, P.Z.’s, and Jonathan’s points and did cringe inwardly just a bit when Wade compared worm and fruit fly genomes to the human genome, I can’t help but think that they are missing the forest for the trees in this particular instance. I would posit that the forest is this statement in Wade’s article:

As more people have their entire genomes decoded, the roots of genetic disease may eventually be understood, but at this point there is no guarantee that treatments will follow. If each common disease is caused by a host of rare genetic variants, it may not be susceptible to drugs.

“The only intellectually honest answer is that there’s no way to know,” Dr. Lander said. “One can prefer to be an optimist or a pessimist, but the best approach is to be an empiricist.”

In other words, what we are finding as a result of the Human Genome Project is that the actual physical sequencing and deducing of the sequence of the human genome was the easy part of the project. Figuring out what all those genes do, how they do it, how they interact, and what perturbations is going to be so much harder than the sequencing which, when it comes down to it was primarily a technical, chemical, and engineering problem. Further, figuring out how to intervene will be even more difficult than figuring out the function. It may turn out that how genes are regulated may be far more important than the actual sequences of genes for many of the common diseases. Moreover, the Human Genome Project and projects derived from it, related to it, or spun off from it have been a boon to basic science in so many ways, particularly in comparative genomics. The more organisms there are that have their genomes sequence, the more we learn about how different sets of genes result in different phenotypes.

Indeed, as I’ve said on other occasions, our understanding of breast cancer has undergone a sea change over the last decade, largely due to results and technology derived from the Human Genome Project, as well as our ability to measure the levels of expression of every gene in the genome simultaneously, a technology known as cDNA microarray analysis or whole genome expression profiling, a technology developed as the Human Genome Project was in its later stages. For example, where once we looked at tumors that did and did not make the estrogen receptor, thanks to whole genome expression profiling that identified biological subtypes of breast cancer based on gene expression, including the less aggressive luminal subtypes versus the more aggressive basal subtypes. Multiple prognostic assays based on gene expression have been developed, the most commonly used and reliable of which is the Oncotype DX assay, which started out based on 250 genes linked to breast cancer progression selected from the Human Genome Project. That set was whittled down to a 21 gene assay that can be done on paraffin-embedded tissue that reliably predicts prognosis and which women with estrogen receptor-positive, node negative breast cancer do and do not require chemotherapy. We use this test right now. Coming into use is a test known as the Mammaprint assay, which looks at 70 genes and can predict the risk of distant metastases.

In the end, we have to remember that the translation of basic science discoveries into actual treatments used in humans typically takes on the order of 24 years, as John Ioannidis has documented, and many of the potentially useful discoveries based on the Human Genome Project probably haven’t even been made yet. What was wrong was not the Human Genome Project itself but rather our expectations of how fast the project would bear fruit in terms of amazing new treatments for diseases based on personal genomics. We have learned that, as is so often the case, our preconceptions about nature and genetics from ten years ago were far too simplistic and optimistic. Indeed, given how complex the interplay between genes, proteins, and gene regulation is, we may actually not be doing so badly, and we may not yet have the resources, mathemetical models, and computing power to fully exploit genomic medicine based on the Human Genome Project and its spinoffs. Given that in 20 years I’ll be in my late 60s, I certainly hope that we have managed to use the fruits of the Human Genome Project to improve the care of common diseases like heart disease and cancer.