Categories
Cancer Clinical trials Science Skepticism/critical thinking

Precision medicine: Hype over hope?

I am fortunate to have become a physician in a time of great scientific progress. Back when I was in college and medical school, the thought that we would one day be able to sequence the human genome (and now sequence hundreds of cancer genomes), to measure the expression of every gene in the genome simultaneously on a single “gene chip,” and to assess the relative abundance of every RNA transcript, coding and noncoding (such as microRNAs) simultaneously through next generation sequencing (NGS) techniques was considered, if not science fiction, so far off in the future as to be unlikely to impact medicine in my career. Yet here I am, mid-career, and all of these are a reality. The cost of rapidly sequencing a genome has plummeted. Basically, the first human genome cost nearly $3 billion to sequence, while recent developments in sequencing technology have brought that cost down to the point where the “$1,000 genome” is within sight, if not already here, as illustrated in the graph above published by the National Human Genome Research Institute. Whether the “$1,000 genome” is truly here or not, the price is down to a few thousand dollars. Compare that to the cost of, for instance, the OncoType DX 21-gene assay for estrogen receptor-positive breast cancer, which costs nearly $4,000 and is paid for by insurance because its results can spare many women from even more expensive chemotherapy.

So, ready or not, genomic medicine is here, whether we know enough or not to interpret the results in individual patients and use it to benefit them, so much so that President Obama announced a $215 million plan for research in genomic mapping and precision medicine known as the Precision Medicine Initiative. Meanwhile, the deeply flawed yet popular 21st Century Cures bill, which passed the House of Representatives, bets heavily on genomic research and precision medicine. As I mentioned when I discussed the bill, it’s not so much the genomic medicine funding that is the major flaw in the bill but rather its underlying assumption that encouraging the FDA to decrease the burden of evidence to approve new drugs and devices will magically lead to an explosion in “21st century cures,” the same old antiregulatory wine in a slightly new bottle. Be that as it may, one way or the other, the federal government is poised to spend lots of money on precision medicine.

Because I’m a cancer doctor, and, if there’s one area in medicine in which precision medicine is being hyped the hardest, it’s hard for me not to think that the sea change that is going on in medicine really hit the national consciousness four years ago. That was when Walter Isaacson’s biography of Steve Jobs revealed that after his cancer had recurred as metastatic disease in 2010. Jobs had consulted with research teams at Stanford, Johns Hopkins, and the Broad Institute to have the genome of his cancer and normal tissue sequenced, one of the first twenty people in the world to have this information. At the time (2010-2011), each genome sequence cost $100,000, which Jobs could easily afford. Scientists and oncologists looked at this information and used it to choose various targeted therapies for Jobs throughout the remainder of his life, and Jobs met with all his doctors and researchers from the three institutions working on the DNA from his cancer at the Four Seasons Hotel in Palo Alto to discuss the genetic signatures found in Jobs’ cancer and how best to target them. Jobs’ case, as we now know, was a failure. However much Jobs’ team tried to stay one step ahead of his cancer, the cancer caught up and passed whatever they could do.

That’s not to say that there haven’t been successes. For instance, in 2012 I wrote about Dr. Lukas Wartman, at the time a recently-minted oncologist who had been diagnosed with acute lymphoblastic leukemia as a medical student, was successfully treated, but relapsed five years later. He underwent an apparently successful bone marrow transplant, but recurred again. At that point, there appeared to be little that could be done. However, Dr. Timothy Ley at the Genome Institute at George Washington University decided to do something radical. He sequenced the genes of Wartman’s cancer cells and normal cells:

The researchers on the project put other work aside for weeks, running one of the university’s 26 sequencing machines and supercomputer around the clock. And they found a culprit — a normal gene that was in overdrive, churning out huge amounts of a protein that appeared to be spurring the cancer’s growth.

That was 2011 as well. Today, the sequence could have been done much more rapidly. In any case, Ley identified a gene that was overactive and could be targeted by a new drug for kidney cancer. His cancer went into remission. Wartman is now the assistant director of cancer genomics at Washington University.

The technology now, both in terms of sequencing and bioinformatics, has advanced enormously even since 2011. With it has advanced the hype. But how much is hype and how much is really hope? Let’s take a look. Also, don’t get me wrong. I do believe there is considerable promise in precision medicine. However, having personally begun my research career in the 1990s, when angiogenesis inhibitors were being touted as the cure to all cancer (and we know what happened there), I am also skeptical that the benefits can ever live up to the hype.

The origin of “precision” medicine

“Precision medicine” is now the preferred term for what used to be called “personalized medicine.” From my perspective, it is a more accurate description of what “personalized medicine” meant, given that many doctors objected to the term because they felt that every good doctor practices personalized medicine. Even so, “precision medicine” is no less a marketing term than was “personalized medicine.” If you don’t believe this, look at the hype on the White House website:

Today, most medical treatments have been designed for the “average patient.” In too many cases, this “one-size-fits-all” approach isn’t effective, as treatments can be very successful for some patients but not for others. Precision medicine is an emerging approach to promoting health and treating disease that takes into account individual differences in people’s genes, environments, and lifestyles, making it possible to design highly effective, targeted treatments for cancer and other diseases. In short, precision medicine gives clinicians new tools, knowledge, and therapies to select which treatments will work best for which patients.

If you think this sounds like what alternative medicine quacks (but I repeat myself) routinely say about “conventional medicine,” you’d be right. It’s not that precision medicine advocates don’t have a germ of a point, but they fail to put it this criticism into historical context. Medicine has always been personalized or “precision.” It’s just that in the past the only tools we had to personalize our care were things like family history, comorbid conditions, patient preferences, and aspects of the patient’s history that might impact which treatment would be most appropriate. In other words, our tools to personalize care weren’t that “precise,” making our precision far less than we as physicians might have liked. Genomics and other new sciences offer the opportunity to change that, but at the cost that too much information will paralyze decision making. Still, at its best, precision medicine offers the opportunity to “personalize” medicine in a science-based manner, rather than the “make it up as you go along” and “pull it out of my nether regions” method of so many alternative medicine practitioners. It could also offer the clinical trials tools to do it, such as NCI-MATCH. At its worst, precision medicine is companies jumping the gun and selling genomic tests direct to the consumer without having an adequate scientific basis to know what they mean or what should be done with the results.

In any case, up until 2011, the term “personalized” medicine tended to be used to describe a form of medicine not yet in existence in which the each patients’ unique genomic makeup would serve as the basis to guide therapies. Then, the National Academy of Sciences Committee issued a report, “Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease“, which advocated the term “precision medicine” and differentiated it from “personalized medicine” thusly:

“Personalized medicine” refers to the tailoring of medical treatment to the individual characteristics of each patient. It does not literally mean the creation of drugs or medical devices that are unique to a patient, but rather the ability to classify individuals into subpopulations that differ in their susceptibility to a particular disease or their response to a specific treatment. Preventive or therapeutic interventions can then be concentrated on those who will benefit, sparing expense and side effects for those who will not. (PCAST 2008) This term is now widely used, including in advertisements for commercial products, and it is sometimes misinterpreted as implying that unique treatments can be designed for each individual. For this reason, the Committee thinks that the term “Precision Medicine” is preferable to “Personalized Medicine” to convey the meaning intended in this report.

As I said, “precision medicine” is a marketing term, but it’s actually a better marketing term than “personalized medicine” because it is closer to what is really going on. That’s why I actually prefer it to “personalized medicine,” even though I wish there were a better term. Whatever it is called, however, the overarching belief that precision medicine is the future of medicine has led to what has been called an “arms race” or “gold rush” among academic medical centers to develop precision medicine initiatives, complete with banks of NGS machines, new departments of bioinformatics and genomics, and, of course, big, fancy computers to analyze the many petabytes of data produced, so much data that it’s hard to have enough media upon which to store it and we don’t know what to do with it. Genomic sequencing is producing so much data that IBM’s Watson is being used to analyze cancer genetics. It’s not for nothing that precision medicine is being likened to biology’s “moon shot“—and not always in a flattering way.

So what is the real potential of precision medicine?

Complexity intrudes

I discussed some of the criticism of precision medicine when I discussed the 21st Century Cures Act three weeks ago. I’ll try to build on that, but after a brief recap. Basically, I mentioned that I was of a mixed mind on the bill’s emphasis on precision medicine, bemoaning how now, at arguably the most exciting time in the history of biomedical research, the dearth of funding means that, although we’ve developed all these fantastically powerful tools to probe the deepest mysteries of the genome and use the information to design better treatments, scientists lack the money to do so. I even likened the situation to owning a brand new Maserati but there being no gasoline to be found to drive it, or maybe having the biggest, baddest car of all in the world of Mad Max but having to fight for precious gasoline to run it. I also noted that I thought precision medicine was overhyped (as I am noting again in this post), referencing skeptical takes on precision medicine in recent op-eds by Michael Joyner in The New York Times, Rita Rubin in JAMA declaring precision medicine to be more about politics, Cynthia Graber in The New Yorker, and Ronald Bayer and Sandro Galea in The New England Journal of Medicine. Basically, the number of conditions whose outcome can be greatly affected by targeting specific mutations is relatively small, far smaller than the impact likely would be from duller, less “sexy” interventions, such as figuring out how to get people to lose weight, exercise more, and drink and smoke less. The question is whether focusing in the genetic underpinnings of disease will provide the “most bang for the buck,” given how difficult and expensive targeted drugs are to develop.

Over the weekend, there was a great article in The Boston Globe by Sharon Begley entitled “Precision medicine, linked to DNA, still too often misses“, that gives an idea of just how difficult reaching this new world of precision medicine will be. It’s the story of a man named John Moore, who lives in Apple Valley, UT. Moore has advanced melanoma and participated in a trial of precision medicine for melanoma. His outcome shows the promise and limitations of such approaches:

Back in January, when President Obama proposed a precision medicine initiative with a goal of “matching a cancer cure to our genetic code,” John Moore could have been its poster child. His main tumors were shrinking, and his cancer seemed to have stopped spreading because of a drug matched to the cancer’s DNA, just as Obama described.

This summer, however, after a year’s reprieve, Moore, 54, feels sick every day. The cancer — advanced melanoma like former president Jimmy Carter’s — has spread to his lungs, and he talks about “dying in a couple of months.”

The return and spread of Moore’s cancer in a form that seems impervious to treatment shows that precision medicine is more complicated than portrayed by politicians and even some top health officials. Contrary to its name, precision medicine is often inexact, which means that for some patients, it will offer false hope rather than a cure.

On the other hand, in the Intermountain study, after two years, progression-free survival in the group with advanced cancer treated using precision medicine techniques was nearly twice what it was in those who underwent standard chemotherapy, 23 months versus 12 months. Moore himself reports that with a pill he had one year of improved health and quality of life before his cancer started progressing again. It’s not yet clear in this trial whether this will translate into an improvement in overall survival, the gold standard endpoint, but it’s a very promising start. It is, however, not a miraculous start.

Here’s the problem. I’ve alluded to it before. Cancer genomes are messed up. Really messed up. And, as they progress, thanks to evolution they become even more messed up, and messed up in different ways, so that the tumor cells in one part of a tumor are messed up in a different way than the tumor cells in another part of the tumor, which are messed up in a different way than the metastases. It’s called tumor heterogeneity.

Now enter the problem in determining which mutations are significant (commonly called “driver” mutations) and which are secondary or “just along for the ride” (commonly called “passenger” mutations):

But setbacks like Moore’s show that genetic profiling of tumors is, at this point, no more a cure for every cancer than angiogenesis inhibitors, which cut off a tumor’s blood supply, or other much-hyped treatments have been.

A big reason is that cancer cells are genetically unstable as they accumulate mutations. As a result, a biopsy might turn up dozens of mutations, but it is not always clear which ones are along for the ride and which are driving the cancer. Only targeting the latter can stop a tumor’s growth or spread.

Knowing which mutation is the driver and which are passenger mutations is so complicated that the Intermountain researchers established a “molecular tumor board” to help.

Composed of six outside experts in cancer genomics, the board meets by conference call to examine the list of a patient’s tumor mutations and reach a consensus about which to target with drugs. Tumor profiling typically finds up to three driver mutations for which there are known drugs, and the board reviews data on how well these drugs have worked in other patients with similar tumors.

And:

The next difficulty, Nadauld said, is that “the mutations may be different at different places in a tumor.” But oncologists are reluctant to perform multiple biopsies. The procedures can cause pain and complications such as infection, and there is no rigorous research indicating how many biopsies are necessary to snare every actionable mutation.

But a cancer-driving mutation that happens to lie in cells a mere millimeter away from those that were biopsied can be missed. Similarly, cancer cells’ propensity to amass mutations means that metastases, the far-flung descendants of the primary tumor, might be driven by different mutations and therefore need different drugs.

Or, as I like to say: Cancer is complicated. Really complicated. You just won’t believe how vastly, hugely, mind-bogglingly complicated it is. I mean, you may think it was tough to put a man on the moon, but that’s just peanuts to curing cancer, especially metastatic cancer. (Apologies to Douglas Adams.) Because of this, precision medicine as it exists now can lead to what Dr. Don S. Dizon calls a new kind of disappointment when genomic testing fails to identify any driver mutations for which targeted drugs exist because “discovery is an ongoing process and for many, we have not yet discovered the keys that drive all cancers, the therapies to address those mutations, and the tools to predict which treatment will afford the best response and outcome—an outcome our patients (and we) hope will mean a lifetime of living, despite cancer.”

Too true.

None of this is to say that precision medicine can’t be highly effective in cancer. I’ve already described one patient for whom it was. It’s also important to consider that even extra year of life taking a pill with few side effects is “not too shabby,” either, if the alternative is death a year sooner. Prolonging life with good quality is a favorable outcome, even if the patient can’t be saved in the end.

What is precision medicine, anyway?

As I thought about precision medicine during the writing of this post, one thing that stood out to me is that, although precision medicine is rather broadly defined, in the public eye (and, indeed, in the eyes of most physicians and scientists) its definition is much narrower. This narrower definition of precision medicine is the sequencing of patient genomes in order to find genetic changes that can be targeted for treatment, predict the response to therapy of various pharmaceuticals or dietary interventions, or predict disease susceptibility. In other words, it’s all genomics, genomics, genomics, much of it heavily concentrated in oncology. (I concentrated on oncology for this post because it is what I know best.) If you reread the definition from the National Academy of Sciences Committee report, you’ll see that precision medicine is defined much more broadly. Other similar definitions include metabolomics, environmental factors and susceptibilities, immunological factors, our microbiome, and many more, although even a recent editorial in Science Translational Medicine emphasized genomics more than other factors.

In fact, in the most recent JAMA Oncology, there are two articles, a study and a commentary, examining the effect of precision medicine in breast cancer. What is that “precision medicine”? It’s the OncoType DX assay, which is generically referred to as the 21 Gene Recurrence Score Assay.

Basically, this assay is used for estrogen receptor-positive (i.e., hormone-responsive) breast cancer that has not yet spread to the axillary lymph nodes. Twenty-one different genes related to proliferation, invasion, and other functions are measured, and an empirically derived formula is used to calculate a “recurrence score.” Scores below 18 indicate low risk of recurrence as metastatic disease and insensitivity to chemotherapy. Patients with low scores generally receive hormonal therapy but not chemotherapy. Scores over 30 indicate high risk and greater sensitivity to chemotherapy. For such patients, chemotherapy and hormonal therapy are recommended. Patients who score in the “gray” area from 18-30 remain a conundrum, but clinical trials are under way to better define the cutoff point for a chemo/no chemo recommendation. In any case, this study indicates that the use of OncoType DX is associated with decreased use of chemotherapy but because of limitations in the Surveillance, Epidemiology, and End Results (SEER) data set with linked Medicare claims, it wasn’t clear whether this decline was in appropriate patients. In any case, there’s no reason why genomic tests (like the Oncotype DX test) that are rapidly proliferating shouldn’t be considered “precision medicine,” and they are in practice already. Contrary to the image of oncologists wanting to push that poisonous chemotherapy, OncoType DX was designed with the intent of decreasing chemotherapy use in patients who will not benefit. Imagine that.

Conclusion: Medicine that works is just medicine

In the end, I don’t really like the term “precision medicine” that much. It seems to be a term that reminds me, more than anything, of Humpty Dumpty’s famously scornful boast, “When I use a word, it means just what I choose it to mean—neither more nor less.” It’s a sentiment that definitely seems to apply to the term “precision medicine.” To me, when new tests or factors that predict prognosis or response to therapy or suggest which therapies are likely to be most effective are developed and validated, it’s an artificial distinction to link them to genomics, proteomics, or whatever, as well as “big data” and refer to them as “precision medicine.” To me, medicine that works is just “medicine.”

By Orac

Orac is the nom de blog of a humble surgeon/scientist who has an ego just big enough to delude himself that someone, somewhere might actually give a rodent's posterior about his copious verbal meanderings, but just barely small enough to admit to himself that few probably will. That surgeon is otherwise known as David Gorski.

That this particular surgeon has chosen his nom de blog based on a rather cranky and arrogant computer shaped like a clear box of blinking lights that he originally encountered when he became a fan of a 35 year old British SF television show whose special effects were renowned for their BBC/Doctor Who-style low budget look, but whose stories nonetheless resulted in some of the best, most innovative science fiction ever televised, should tell you nearly all that you need to know about Orac. (That, and the length of the preceding sentence.)

DISCLAIMER:: The various written meanderings here are the opinions of Orac and Orac alone, written on his own time. They should never be construed as representing the opinions of any other person or entity, especially Orac's cancer center, department of surgery, medical school, or university. Also note that Orac is nonpartisan; he is more than willing to criticize the statements of anyone, regardless of of political leanings, if that anyone advocates pseudoscience or quackery. Finally, medical commentary is not to be construed in any way as medical advice.

To contact Orac: [email protected]

803 replies on “Precision medicine: Hype over hope?”

“Precision medicine” is an unfortunate term for a variety of reasons, among which is that “precision” is not the same thing as “accuracy”. In a medical context, we can take “accuracy” to mean “treatment that addresses, if not cures, the underlying condition.” Some diseases have well-known cures that don’t need to be precise, and with others, like cancer, precision does not always help.

I like to illustrate the distinction between precision and accuracy by quoting Archbishop Ussher’s estimate for the creation of the Earth. He names a precise time (6 PM local time at the Garden of Eden; the often-quoted 9 AM, which contradicts “the evening and the morning were the first day”, is apocryphal) on a precise date about 6 ka ago, with the main uncertainty being due to not knowing exactly where the Garden of Eden was. That’s on the order of a part in 10^8, which is extraordinarily precise for Ussher’s day. But it’s not accurate; the actual age of the Earth is closer to 4.5 Ga.

I don’t know of any case in medicine where “precision” leads to a “precisely wrong” treatment in the way Ussher’s Bible studies led him to his precisely wrong answer about the age of the Earth, but the field is yet young, and there will likely be plenty of opportunities to make such a mistake.

Orac does stipulate that he is first discussing this in relation to his specialty, which is understandable. But for more general application– if this 1K figure is indeed realistic, we could sequence every child born in the US in one year for the cost of a few dozen F-35’s or similar tradeoffs.

Now that would be a study.

It might actually tell us something useful about autism and obesity and such, as well as more physically acute conditions, as we follow the cohort through life.

Gets my vote.

Ah, but medico-marketing can be so persuasive–if you practice precision medicine, then ain’t you a “precise doctor?” Yup, a sad state when marketing/propaganda trumps science.

we could sequence every child born in the US in one year

It seems to me that would be an expensive way of generating vast amounts of data that no one would have any idea what to do with.

From Sam Kean’s ‘The Violinist’s Thumb’ (Venter and Collins were big players in the HGP):

Most human geneticists aim to cure diseases, and they felt certain that the HGP would reveal which genes to target for heart disease, diabetes, and other widespread problems. Congress in fact spent $3 billion largely on this implicit promise. But as Venter and others have pointed out, virtually no genetic-based cures have emerged since 2000; virtually none appear imminent, either. Even Collins has swallowed hard and acknowledged, as diplomatically as possible, that the pace of discoveries has frustrated everyone.

It turns out that many common diseases have more than a few mutated genes associated with them, and it’s nigh impossible to design a drug that targets more than a few genes. Worse, scientists can’t always pick out the significant mutations from the harmless ones. And in some cases, scientists can’t find mutations to target at all. Based on inheritance patterns, they know that certain common diseases must have significant genetic components—and yet, when scientists scour the genes of victims of those diseases, they find few if any shared genetic flaws. The “culprit DNA” has gone missing.

I think we may have to wait until computer processing power has become even cheaper before that kind of venture would be worthwhile.

I thought this was also interesting (op cit):

In addition, a comparison between Venter’s genome and the Platonic HGP genome revealed far more deviations than anyone expected—four million mutations, inversions, insertions, deletions, and other quirks, any of which might have been fatal. Yet Venter, now approaching seventy years old, has skirted these health problems. Similarly, scientists have noted two places in Watson’s genome with two copies of devastating recessive mutations—for Usher syndrome (which leaves victims deaf and blind), and for Cockayne syndrome (which stunts growth and prematurely ages people). Yet Watson, well over eighty, has never shown any hint of these problems.

It looks as if knowing a person’s genome isn’t quite as useful as one might think.

One major problem with precision medicine is that it relies on the false idea that a complex disease requires a complex treatment. This is not true, since many tumors can be treated by surgery without extensive molecular knowledge.
The same principle can apply to chemotherapy, by using the fact that all cancers have a common feature, uncontrolled growth. So the future of cancer treatment, immunotherapy aside, will come from G2 checkpoint inhibitors and protection of normal cells by cell inflation, associated to chemotherapy targeting dividing cells.
http://www.ncbi.nlm.nih.gov/pubmed/24156014

[email protected]

Krebiozen once again quotes out of context so he can reply with a non-sequitur.

He had a valid point regarding processing power. That much data would be difficult just to store let alone process. Why bother collecting all that data when we don’t even have the infrastructure to store it, let alone analyze it? From the conclusion of the study talked about in an article Orac linked to:

Genomics clearly poses some of the most severe computational challenges facing us in the next decade. Genomics is a “four-headed beast”; considering the computational demands across the lifecycle of a dataset—acquisition, storage, distribution, and analysis—genomics is either on par with or the most demanding of the Big Data domains. New integrative approaches need to be developed that take into account the challenges in all four aspects: it is unlikely that a single advance or technology will solve the genomics data problem.

It seems to me that would be an expensive way of generating vast amounts of data that no one would have any idea what to do with.

Let me calculate, by Fermi problem methods, how much information is involved here. For purposes of this post, cows are spherical, etc. (I am a physicist, after all).

A human has a few tens of thousands of genes. (Probably more than a zebrafish, which has about 20k, but probably less than 100k.) Let’s call it 30k, just to keep things in round numbers. Each gene codes for a protein that has between a few hundred and a few thousand amino acids–let’s take 1000 for an average figure. At three base pairs per amino acid, that’s around 100 million base pairs per genome, not counting junk DNA. There are four bases, so we are discussing something like 30 MB of data for one person. The US has a population of a bit over 300 million, which implies about 5 million births per year. So we are looking at 100-200 TB of data for a single year’s birth cohort, or around 100 PB of data for the entire US population. That’s large but not excessive for a Big Data project these days (many physics research projects will produce hundreds of terabytes per year, and some discard large amounts of data to keep the total that low).

But I agree that it won’t do any good to collect the data if you don’t know what you are going to do with it. At least with a Big Data physics project, you have some well-defined science question, and to undertake it you need to convince a leading funding agency that your project is worth funding. Zebra’s proposed data collection effort sounds (if I may put on my reviewer’s hat for a moment) like a solution in search of a problem. I can’t answer for NIH, but I know that NSF and NASA do not like to fund fishing expeditions like that.

Eric Lund,
It wasn’t storage I was thinking of so much as the processing power to find associations between groups of mutations and specific conditions. I thought it was interesting that Venter and Watson (as in Crick) both had mutations that are associated with serious physical conditions that they did not have – presumably this is some epigenetic phenomena that turns the relevant genes off (or on). When you add in the epigenetic data that would be required to figure out what is going on I don’t think current computers have the necessary power.

[email protected]
From that PLoS Biology paper:

For population and medical genomics, identifying the genomic variants in each individual genome is currently one of the most computationally complex phases. Variant calling on 2 billion genomes per year, with 100,000 CPUs in parallel, would require methods that process 2 genomes per CPU-hour, three-to-four orders of magnitude faster than current capabilities [42].

It goes on to say that this is an issue not necessarily solved by Moore’s law:

Improvements to CPU capabilities, as anticipated by Moore’s Law, should help close the gap, but trends in computing power are often geared towards floating point operations and do not necessarily provide improvements in genome analysis, in which string operations and memory management often pose the most significant challenges. Moreover, the bigger bottleneck of Big Data analysis in the future may not be in CPU capabilities but in the input/output (I/O) hardware that shuttles data between storage and processors [44], a problem requiring research into new parallel I/O hardware and algorithms that can effectively utilize them.

Krebiozen’s point was entirely valid, not a “non-sequitur”.

Eric Lund #8,

Eric, this would just be a database, like census information, that could be used by individual research projects. The sooner we have the data, the sooner that can begin to happen. Is the census a “fishing expedition”?

The numbers are big because this is a big country; maybe we could pay Canada to do it?

capnkrunch #10

Still having language problems? Look up “non-sequitur” and look up “valid”.

Eric Lund, I run a CLIA NGS lab, and you’re underestimating 🙂 We don’t just sequence each base once; we do it 30x, on average, for statistical power. Our latest exome data (which is what you’ve described) is ~15-20GB in it’s raw form, not to mention any files that are made for analysis purposes. Genomes are closer to 120 GB.

As everyone’s been describing though, the big reason we don’t just “sequence babies” – or the whole population, as some have suggested – now really is because sequencing is the easy part. We can generate genomes, and even store them (though that’s a pain), until we’re blue in the face, but determining the needle in the haystack causative mutation in sick people is hard enough, much less predicting what may go wrong in healthy folks. We just haven’t done enough of the (much harder) genetics work to figure out how to appropriately interpret the data.

Daniel, how’s that search for evidence demonstrating cell inflation represents a effective treatment for cancer coming? Got anything that approximates proof of concept, let alone that it has the potential (as you claimed when you first appeared on RI a couple of years ago that it represented a ‘universal’ treatment for cancer.

The numbers are big because this is a big country; maybe we could pay Canada to do it? Oh, yes, please! Our little hamlet will get right on that.

[email protected]

Still having language problems? Look up “non-sequitur” and look up “valid”

non-sequitur noun
: a statement that is not connected in a logical or clear way to anything said before it

valid adjective
: fair or reasonable

[email protected]

But for more general application– if this 1K figure is indeed realistic, we could sequence every child born in the US in one year for the cost of a few dozen F-35’s or similar tradeoffs.

[email protected] (responding)

I think we may have to wait until computer processing power has become even cheaper before that kind of venture would be worthwhile.

[email protected]

Krebiozen’s point was entirely valid, not a “non-sequitur”.

Swapping in definitions:
Krebiozen’s point was entirely reasonable, it is a statement that is connected in a logical or clear way to what was said before it.

I believe you misused “non-sequitur” to simply dismiss a valid point without addressing it.

Still having language problems? Look up “non-sequitur” and look up “valid”.

It would help if you looked up the former, as your attachment to that erroneous hyphen is quite grating.

JGC
What you do not seems to realize is that CIAC represents a lot of investment with no financial return. Fot the evidence, to make a comparison, it is like saying that gene therapy is the way to treat genetic disease: you don’t really need a proof of concept. The only thing is enough money to make things work. For me it’s better to work on G2 abrogation because, with drugs, you can attract investors, but I think that CIAC is safer. And I am quite optimistic that it will be done in one country or another.

capnkrunch #15

“Krebiozen’s point was entirely valid, not a “non-sequitur”.”

Is itself a non-sequitur.

“Krebiozen’s point is not a non-sequitur, [because it is] entirely valid.”

But “being valid” does not refute the claim of something being a “non-sequitur”.

Why don’t you just admit that you learned something again instead of wasting bandwidth? I already know how to suck eggs.

Are you really trying to correct someone over a word you can’t even spell correctly, or am I have a stroke?

Why bother collecting all that data when we don’t even have the infrastructure to store it, let alone analyze it?

While the computing power needed to analyze the data would be expensive, I’m not sure the algorithms needed for bulk comparison are there, and we likely don’t even know where to look, getting the data would be a first step when its cheap enough. We might not be able to use the full data set it in any good way for 10 or 20 years, but then the children won’t have developed all of the conditions one might correlate to the genetic data, either.

Other issues of such a study would be providing the ongoing follow-up and maintaining adequate confidentiality. Each genome would need to be linked to the individual’s medical records for the patient’s lifetime in order to get the best data. This has the potential for abuse or inadvertent disclosure.

@ Mephistoles
Another problem with big data is that the current academic reward system is based on experimental papers. If you just come up with a new interpretation of published data you will have hard time to publish.

Mephistopheles O’Brien #21

Basically correct on the first point. If we had the data tomorrow, it would be possible to begin preliminary sorting, which might well reduce computational cost later on as “sick” or otherwise characterized populations are identified over the decades.

Issues of confidentiality don’t seem that big a problem to me, unless is gets hacked and published on the internet… oh wait… . But realistically, no, assuming reasonable care and stiff penalties for misuse, I can’t see any real risk.

Mephistopheles O’Brien

While the computing power needed to analyze the data would be expensive, I’m not sure the algorithms needed for bulk comparison are there, and we likely don’t even know where to look, getting the data would be a first step when its cheap enough.

One thing I would be concerned about is that I/O speed might be the bottleneck (see my post #10). It seems like a waste of resources to store data we can’t use and that will likely need to be copied to new hardware just to be usable. The logistics involved in a project like that would be a nightmare. Heck with that amount of data software changes would be a nightmare as well. With the amount of resources required it’s really best to have the proper container prepared before trying to fill it. I think it would be a far more efficient use of resources to sort out the hardware and software requirements prior to mass collection of data.

I totally agree with you about confidentiality. The government, insurers, hospitals, etc haven’t been inspiring much faith in their ability to protect confidential data.

Is the census a “fishing expedition”?

No, because there are specified uses for that data set. In order to properly apportion representatives, we need to know how many people there are and where they live. In the US, the Constitution provides for an “actual Enumeration” every ten years. (Details are different in other countries, but any halfway functional democracy needs similar data at regular intervals for this purpose–the alternative is “rotten boroughs” and “pocket boroughs” such as existed in the UK at various times in history.) Other data collected by the Census Bureau is routinely used for a number of studies: demographics, wealth distribution, and many others of this kind. It’s also a much smaller data set than individual genomes: these days, you could store all of that data on one commercially available hard drive. There are also laws (at least in the US) requiring that the data remain confidential for a period of time (72 years IIRC), after which they are made public–those census records are handy for people who do genealogical research since they can often be used to track where certain people move.

A population-wide genome database would presumably also be confidential–HIPAA either would or should cover it. But it involves quite a bit more storage. As others note, data analysis gets a lot trickier; e.g., you would have to have some way of tying it to medical records for it to be useful in any way (as MO’B notes above). Comparisons are a hard problem as well; if you are not careful about how you design the algorithm, you will get something that scales as N^2 where N is the number of people in the database (because with N people you have N[N-1]/2 pairs of people), or worse if you are doing multi-way comparisons.

To me, the confidentiality issues MO’B brings up are sufficient reason not to collect the data any sooner than it would be of practical use, because IME any sufficiently large database of confidential information eventually will be abused in some fashion. Think hackers stealing credit card info, or the NSA’s dragnet collection of telephone metadata, but on a much larger scale.

Since this does begin to intrude on my area of expertise – I can say that Zebra has no idea exactly how much data he’s talking about – both storing and processing.

The data alone would take years to process – even the initial collection and export into a database. There are also few, if any, current database technologies that allow for the storing and analysis of datasets even approaching the size in question & none that would allow for the connection of so many disparate individual items of comparison.

By the time the technology “might” be available to do it, it is, in all likelihood, probably that better diagnostic tests would already be available to solve many of the issues that would have made this data interesting in the first place.

The Census is also a relatively small amount of data – meaning that the variables are both known and quantifiable (there are only a set number of questions on the forms & check boxes for the individual person).

Full gene sequencing for hundreds of thousands, if not millions, involves an unknown number of variables – and no idea, how any or all of the variables might be connected to each other.

Eric [email protected]

No, because there are specified uses for that data set.

Therein lies the issue I was trying to get at in #25. I was once involved in a project to upgrade server hardware and software as well as migrating our code base from VB 6 to VB.NET. It was a mess. There’s no reason to set ourselves up for that when there’s no current use for the data. It is so mcuh easier to build the proper infrastructure (both hardware and software) the first time around than to have to upgrade later.

Lawrence #27,

What “disparate individual items of comparison” are you talking about?

I get the sense that people are projecting some complicated scenario onto a simple suggestion. You record the genome– admittedly a long bit of information– along with, say, social security number.

What’s the problem?

[email protected]

…I can say that Zebra has no idea exactly how much data he’s talking about…

This is kind of zebra’s MO. I wouldn’t be surprised if he comes back and claims that he knows better than you.

Shorter Daniel @18

“No, I don’t have any evidence because MONEY.”

Where else have I heard this? Oh, yes–every alt-med proponent arguing that there are no funding for studies proving what they know to be true and that vitamin C/baking soda/aromatherapy etc. cures cancer because they aren’t patentable.

@zebra – and exactly how big is a single genome? How do all of the genes relate to one another & in what context?

You’ve put forth the idea that this data could be collected simply and just stuck somewhere – but by what method would you do so?

Having the data is merely one thing, but that data must be processed, stored and retrieved in some fashion for that information to be valuable – I am merely pointing out that the storage systems (i.e. databases) don’t exist to be able to handle this quantity of information in any meaningful way to allow for searching or using the results…..not in any sense that would take less than years – by which time, the information is no longer valuable.

Lawrence #33,

I just don’t understand what it is you are imagining.

Let’s begin like this: Can you tell me the maximum length a sequence would have to be in order for you (you imply you are an IT person) to be able to handle it?

capnkrunch #32

“This is kind of zebra’s MO. I wouldn’t be surprised if he comes back and claims that he knows better than you.”

Given how easily I got him to fold, maybe I do.

Some of you may have missed malia’s comment at #13 which was held up in moderation. It’s worth reading, I think.

My point, which I thought was obvious, was that most people seem to think it’s simply a matter of getting lots of genomes and correlating it with physical illnesses, including autism and obesity, and figuring out which genes differing from the ‘standard’ version* are responsible. It is a great deal more complex than that, with some mutated genes being turned off or on by other genes and by other epigenetic factors we do not yet understand. I think there are better ways the NIH or whoever** could spend $5 billion (assuming $1,000 per genome and five million births per year).

* How do we establish what is the normal human genome? The current ‘standard’ HGP genome is an average of a number of different people’s genome, but since we all have dozens of serious mutations this is a somewhat moot point.

** Oddly it was the US Energy Department that started on the HGP, the rationale being that they were investigating the effects of radiation on DNA.

malia #13

It’s great to have a real expert pitching in.

“We just haven’t done enough of the (much harder) genetics work to figure out how to appropriately interpret the data.”

For those of us who are not experts (and don’t pretend to be), could you explain what the “genetics work” entails?

A quick question – we each have two copies of each chromosome (apart from the y chromosome in men, of course). Presumably both are sequenced – does anyone know how that works?

@ zebra – the reason the genetics work is hard is that there’s lots of ways to skin a cat, but I’ll try.
– A lot of it is basic genetics – we “break” a gene in a model organism and see what happens phenotypically. But this only works for genes that change a phenotype. Krebiozen’s post above, where he mentions epigenetics, talks about some of the reasons why it’s hard to correllate genotype and phenotype; but there are a myriad of others that we have to take into account – gene families where one gene can “rescue” another can hide a gene’s function, for instance, or genes that have such a subtle phenotype that we can’t pinpoint a change by looking.
– Some of it is looking at patterns in large cohorts of people – but again, a lot of that information can be masked by the same issues I mentioned above, and in many cases, when we start with “phenotype first”, there may be lots of different genes creating something that *looks* the same to us on the outside, which can confuse the issue.
These types of research aren’t sexy, nor do they use fancy machines, so they’re rather under-funded, to the frustration of every working geneticist, ever.

@Krebiozen – being diploid is one reason why we sequence at depth rather than just 1x. We physically chop all 46 chromosomes (23 pairs) in to manageable bits, and sequence them, presumably equally; then to put them back together, we compare the sequence data to the “human genome” (called hg19, which is really a mix of about 6 people) to find canonical differences from that reference, and we look for any differences in our patient’s sequence – where we have 50% of one nucleotide, and 50% of the other nucleotide, we know we have a difference between the maternal and paternal chromosome. But, what we can’t easily do is “phase” these differences – so, we don’t know whether a set of mutations that are physically close to one another live together on a single chromosome, or whether they’re dispersed between a chromosome pair. (FYI – this problem with phasing is also a contributing factor in determining if a mutation profile is “disease causing” or benign in some cases)

malia #43.

Your reply is greatly appreciated. But I remain puzzled as to why anyone would object to my suggestion, since I am offering what you appear to need. Let’s do this with Canada as a more manageable source of data:

Yearly births of 385,000 times 1,000 per genome is 385,000,000.
An F-35 (USAF, the cheaper model) is about 150,000,000.

So the US could buy 4 fewer of these (in a projected fleet of 450 plus) and easily cover data collection costs for our friendly neighbors to the north with their more rational health care system. Need more data, say from a larger country, lose a few more planes.

So this is what I don’t get. You say you are underfunded, but here I am giving you (and geneticists everywhere) free access to all the data you could possibly use. I understand that you need computing power to work with it, but since you aren’t paying for the sequencing, you have more funds to do the cat-skinning.

And the data will be useful even after every baby has grown old and died, assuming other records are maintained. The processing is only going to get easier and cheaper over time.

So really, what is the problem?

Thanks malia, that answers my question perfectly. It seems to me that phasing is going to be a serious issue in the future, unless/until someone finds an ingenious way of figuring it out.

And the data will be useful even after every baby has grown old and died, assuming other records are maintained. [emphasis added, obviously]

And that the gene data is collected and labeled properly, and the other data is accurate, and that the other data can be perfectly correlated with the gene data, and that the other data is actually the data that is needed for the unforeseen analyses, and …

Think it through, don’t just go off half cocked, like you seem so fond of doing. Do like adults do: foresee problems and address them before they bite. This is your brainfartstorm; be responsible rather than defensive.

assuming other records are maintained

That, historically, has been a very generous assumption. Storage media age–yes, even hard drives, but this was even more of an issue with magnetic tapes, which were standard for decades. Interface technologies change. Software that was designed for a particular computer architecture may not be maintained as computers using that architecture age out of service. Et cetera. Dealing with these issues takes time and resources. Only recently have people in my (relatively data-heavy) field begun to devote the necessary time and resources to addressing this problem.

Show of hands here: how many of you who are over 30 can read every single computer file you have created over the last 20 years? I certainly can’t. At least two pieces of software I used extensively in the late 1990s (ClarisWorks and Canvas) no longer exist, but I still have files I created with those programs. In 1995 my home machine was still a Mac Plus (new in 1988) with a 400k floppy drive and a 20 MB external hard drive with SCSI connection–I still have the machine in a box in my basement somewhere, but I have no way of exporting data on those media to something my current machine can read, without paying major bucks to somebody who has maintained such a machine. I also have zip disks.

I work with people who have decades worth of data collected, some of it on nine-track tapes. Nine-track tape readers were once ubiquitous in this field; today the number of still-operating readers in the world can be counted on the fingers of your hands. And many of those tapes are too brittle to read. Even when data are on media we can read, we have to hope that documentation was kept of the data format (the data is generally in a binary format, because data storage was at a premium in those days). In some cases software to read the data exists, but is in some ancient version of Fortran that won’t necessarily compile on a modern computer, even if it had a Fortran compiler (they are no longer automatically included with many operating systems). A significant fraction of the data were never examined more than superficially.

Now scale this problem up to the size that Malia mentions for the human genome. And ponder the question of who is going to maintain such a database, and who is going to cover the costs of maintaining it–which are likely to be the same order of magnitude, per year, as it would cost to collect all of that data in the first place.

There is no shame in underestimating how much of a problem this is–lots of people do that. I have had occasion to recommend rejection of a proposal that I thought was making that mistake. But it’s better to understand the magnitude of the problem before we collect a bunch of data we will never be able to use.

Eric Lund #48,

This is very strange reasoning. We would “be able to use” the data immediately– my suggestion that it might still be usable 100 years from now is only to illustrate that this is a long-term project.

I also think your experience with magnetic tapes and consumer-type software is truly irrelevant– this would be a serious scientific endeavor backed by world governments and scientific institutions. (In the 21st century– no “stone knives and bearskins”; no slide rules and punch cards.)

So I still await an objection from someone (who one hopes would be an actual expert) who can explain why he or she would not like to have this resource available for research.

Eric Lund,

After having just spent hours transferring files on floppy disks using a borrowed external floppy disk drive (because they no longer come installed in computers) I can very much relate to what you are saying. It also reminded me of how painfully slow the buggers are. Now all I have left to do is find a way to get files off of a zip disk I have. Yay.

zebra, I’m sure everyone would like this as a resource. However, yours is a hollow victory unless you can will it to happen or pony up the money and resources to make it happen.

Daniel Corcos,

Performing single sperm sequencing in the same male individuals?

Good thinking, but I’m not sure that would help. Even if a single sperm contained enough DNA to sequence (it doesn’t*) it will carry chromosomes randomly selected, so you wouldn’t know if a gene was from a maternal or paternal chromosome. Also you would only sequence half the man’s DNA, and sequencing multiple sperm to get the full genome would lead to the same problem i.e.not knowing which genes came from which chromosome. Until we can isolate a single chromosome and extract enough DNA to sequence that, I see no way of overcoming this, yet.

* Genome sequencing requires 250 ng DNA. A single sperm contains only about 3 pg i.e. 0.003 ng of DNA. That’s 4 orders of magnitude difference, which will doubtless take a few years to overcome.

Not a troll #50,

“zebra, I’m sure everyone would like this as a resource.”

Ummm…. no. Apparently several people think it would be a Bad Idea. Including Eric.

Daniel Corcos #52,
Wow! That’s impressive. I wondered about PCR but dismissed it. It still doesn’t really solve the problem though, since we still don’t know which genes came from which copy of each chromosome. Or am I missing something?

It’s the $5 billion cost that makes it a bad idea. I’m all for collecting data just in case it comes in useful, but not when it costs lots of money that could be spent on something with immediate practical uses.

Krebiozen
I would say that with enough sperms, you would be able to say that the genes come from the same chromosome and answer the question of whether several mutations are on the same chromosome or not.

[email protected]

We would “be able to use” the data immediately–

No we wouldn’t. For a number of reasons that have been explained already.

I also think your experience with magnetic tapes and consumer-type software is truly irrelevant– this would be a serious scientific endeavor backed by world governments and scientific institutions.

Eric Lund’s comparison is apt. Recall the paper I linked to said I/O speed is likely to be a major bottleneck. Our current storage media is inadequate. To be able make practical use of the data it would need to be moved onto faster media when it is available. This too has already been explained.

Krebiozen is absolutely correct in #56

It’s the $5 billion cost that makes it a bad idea. I’m all for collecting data just in case it comes in useful, but not when it costs lots of money that could be spent on something with immediate practical uses.

I would also add that doing the data collection now would also unecessarily incur additional future costs to upgrade the software and hardware infrastructure as better technology becomes available. As I said before, it would make much better use of resources to take those funds and put them towards creating the necessary big data technoogies before collecting the data.

capnkrunch #58,

We would “be able to use” the data immediately–

No we wouldn’t. For a number of reasons that have been explained already.

No reasons have been “explained” at all.

Are you saying that malia is some kind of psycho troll who is claiming to use genomes when in fact she isn’t? Or any of the other “working geneticists” she mentions? You must be really out of touch with the 21st century, just like Eric appears to be.

But I remain puzzled as to why anyone would object to my suggestion, since I am offering what you appear to need.

Could you point me toward the part of Malia’s comment @#43 that you thought showed an apparent need for sequencing every child born in the US for a year?

Because I don’t see one. On the contrary. She seems to me to be saying that the data they already have is still way too much of a research imperative all on its own for there to be any need or use for more.

ann #60,

By “you”, I am (obviously to me at least) referring to malia and all those working geneticists she invokes, and all future geneticists who might be able to do research because this data is freely available. As I very clearly pointed out, the costs saved on sequencing should allow for expanding the more substantive research activity.

[Yes, obvious to me, but of course, we can always distract from the topic by now discussing whether I should have said “you geneticists”, or “y’all”, or something else, and then go on to whether y’all is properly used as singular or plural, and so on. Or is there a hyphen in there somewhere?]

@zebra: There is a definite way that you can settle this dispute in your favor. To wit: write a proposal to the NIH or NSF (or equivalent body if you are outside the US) in which you will describe how you will collect the data, and what science question you will use the data to answer. You will need to convince the funding agency that you can do it within the constraints of the program to which you propose, and that your science question is of sufficient interest that the agency should fund your proposal rather than one of the competing proposals of comparable merit that they would otherwise fund. Then go out and achieve the proposed goal. If you can do this, you will have proved yourself right. The various people who are skeptical of your proposal, myself included, have given reasons why we think you won’t be able to achieve the objective within the allotted resources.

Right now what you are proposing is an “underpants gnomes” scheme: 1. Collect large-scale genome data. 2. ??? 3. Science! I know from experience, as do several others in the commentariat, that funding agencies aren’t going to fund proposals like that when there are already many more proposals with an explicit step two than they can afford to fund.

No reasons have been “explained” at all.

Just because you have hand waved them away doesn’t mean we didn’t explain. First there’s the technology issues I’ve brought up numerous times. Read the PLoS Biology paper I linked to for more detail but here are some of the issues:

Protecting confidential data:

But in addition to tailoring genomics applications for the cloud, new methods of data reliability and security are required to ensure privacy, much more so than for the other three domains.

CPU speed:

Variant calling on 2 billion genomes per year, with 100,000 CPUs in parallel, would require methods that process 2 genomes per CPU-hour, three-to-four orders of magnitude faster than current capabilities.

Aligning all pairs of the ~2.5 million species expected to be available by 2025 amounts to 50–100 trillion such whole genome alignments, which would need to be six orders of magnitude faster than possible today.

I/O speed:

Moreover, the bigger bottleneck of Big Data analysis in the future may not be in CPU capabilities but in the input/output (I/O) hardware that shuttles data between storage and processors [44], a problem requiring research into new parallel I/O hardware and algorithms that can effectively utilize them.

Database size and search speed:

Similarly, efficient compression and indexing systems are critical to make the best use out of each available byte while making the data highly accessible.

On the other hand there’s the issue that malia brought up in #13:

We just haven’t done enough of the (much harder) genetics work to figure out how to appropriately interpret the data.

Throwing more data at the problem isn’t going to solve it. Just because you didn’t understand malia’s explanation in #43 doesn’t make it wrong.

As I very clearly pointed out, the costs saved on sequencing should allow for expanding the more substantive research activity.

And as she very clearly pointed out, since there is no present need to spend any time, money, or energy sequencing more data, that actually wouldn’t be a saving costs. It would be a waste.

It’s basically like saying:

“I don’t have enough clothes. I need to buy some, which I can barely afford to do. But I know! I can save money by buying the clothes I’ll wear in thirty years now!”

all those working geneticists she invokes, and all future geneticists who might be able to do research because this data is freely available.

Let me walk you through this.

(1) All those working geneticists she invokes have unfunded priorities right now.

(2) Having more data “freely available” would not help achieve them.

(3) It also wouldn’t be free. You’d have to spend money creating the database and making it available.

(4) That money is presently needed for other priorities.

(5) So spending it on something else now would detract from rather than aid presently ongoing research.

(6) Furthermore, there’s no way to even say whether the research being done now will result in findings that could be better translated to practical applications in the future if such a database were “freely” available.

(7) So the whole thing might easily be a great big waste of money, now and always. Because:

(8) Spending money supplying people with something for which there’s no demand just about always is.

Eric Lund #62,

Since I never proposed getting funding from any of those agencies, such a test would be irrelevant. I merely did a first approximation a la Enrico, see #44 for the less ambitious version.

If you would read carefully, you would see that I consistently have implied financing from the general fund, by invoking a metric– the much-maligned F-35– which is often used for this kind of analysis. If the right people got the contracts, I could imagine even this US Congress finding a way to fund the project. Probably by cutting food stamps and not airplanes, but you never know.

So the real issue is whether such a database would be useful.

“Probably by cutting food stamps and not airplanes, but you never know.”

“(4) That money is presently needed for other priorities.”

You heard it here first: Let them eat genome database entries.

Basically correct on the first point. If we had the data tomorrow, it would be possible to begin preliminary sorting, which might well reduce computational cost later on as “sick” or otherwise characterized populations are identified over the decades.

“Sorting”? Sorting what into what?

You’re magically* going to get 4 million human genomes each with over 3 trillion base pairs, and then…. do 8 trillion whole-genome comparisons? Why? Babbling about “computational cost,” a subject that you may reliably be assumed to know nothing whatever about (what word size for the sequences?) doesn’t cut it.

What sort of data structure do you imagine resulting from this exercise?

Issues of confidentiality don’t seem that big a problem to me, unless is gets hacked and published on the internet… oh wait… . But realistically, no, assuming reasonable care and stiff penalties for misuse, I can’t see any real risk.

That’s because you’re a simpleton. Remember:

You say you are underfunded, but here I am giving you (and geneticists everywhere) free access to all the data you could possibly use

No, the security would be on the level of that for access to the VSD. Eveything has to be deidentified, which is no small feat when one has a lifetime of medical records.

Speaking of which, how precisely do you figure that’s going to happen?

* You forgot about consent, now didn’t you?

So the real issue is whether such a database would be useful.

And @#13, you have a person who’s in a position to know telling you that it wouldn’t, then explaining why @#43.

see #44 for the less ambitious version

Where you underestimated the size of the U.S. birth cohort by an order of magnitude?

Capnkrunch,

Listing out-of-context [out of context] quotes doesn’t constitute an “explanation”, and anyway, whether I call it a non sequitur or a strawman or a Gish Gallop, this has nothing to do with my suggested project.

And, saying “it’s hard” is not an explanation. Nor is “it’s not perfect”.

And in particular, when we are talking about basic research, “but it might not yield useful results” is not just a poor argument, but actually stupid.

If this database existed, people would use it. It’s absurd to suggest otherwise. People would choose genetics as a career exactly because of the opportunity. And almost certainly, it would spur the kind of innovation and development of tools that you are talking about. Kind of like DARPA’s little experiment with connecting computers in different locations, you know…

I really do think a lot of people here are simply “stuck” with their attachment to 20th and even 19th century paradigms. You can’t imagine a different way of doing things, or it makes you uncomfortable, or you feel threatened. Too bad.

@Krebiozen/Daniel Corcos

Single cell sequencing is a thing; you can (and we have) sequenced a sperm, if one really wants to – and lots of animal science researchers want to. However, for humans, not super practical. First off, only useful in biosex males. Second off, those are germline cells, which are arguably different than the somatic cells (cells that make up the rest of your body) that you’d be interested in if you were doing diagnostic sequencing. We’re getting closer to being able to do phasing – there are some nifty biological tricks we can use with standard sequencing, and there is an adorable little sequencer, called the MinIon, that’s in beta testing right now – our lab got 30KB fragments off of it – but neither of these options are very cost or time effective….yet.

@zebra
Would such a database be useful in the real world?? Honestly, the answer is that we don’t know, because we don’t have enough of the biological groundwork to make that decision yet. Might it be useful in the future? Perhaps.
IF (and only if) money and time and storage space and processing power were not an limitation – AND doctors took a thorough, objectively defined, descriptive medical history that was always coded properly in an EMR that followed each person through their life so we had good phenotype data on every condition on every human ever – sure, future human genetic researchers would theoretically love a resource that could interrogate the genomes of all humans everywhere.
However, @ann has a pretty good run-down of the ideas for why it’s not being prioritized. Avenues of research that are arguably more fruitful are currently underfunded, and there are lots of technical hurdles that have been mentioned up-thread that would need to be innovated and properly managed in order to make such an endeavor feasible. Beyond that, there are some HUGE ethical considerations that are still being hammered out. Lots of people are categorically NOT OK with having their genome stored somewhere. Lots of insurance companies are looking to make unsubstantiated claims for denying coverage based upon preliminary genomic data as “pre-existing conditions”. There are questions about whether adults should be making decisions about whether a baby’s genome should be sequenced, rather than that baby when they reach majority. Etc, etc.

You can call something out of context but it doesn’t make it so. That paper quite clearly explains the technological challenges involved in big data genomics.

Similarly you can say people would use the data and they certainly would want to, but both the technology and our understanding of genetics is not at the level where this data would be useful. You have yet to provide any counterpoint to any beyond “I don’t think so.”

Our CPU’s, storage media, and database technology is not fast enough. Our security is not good enough. And even if it was we don’t understand enough to make use of the data. I provided references about the technology and malia is an expert in the field who told you how our fundamental genetics knowledge is lacking. You on the other hand have other nothing in defense of your idea.

@malia – a question almost, but not completely, off topic if you please.

Do you have any opinions on the commercial DNA testing companies that scan for genealogical ‘roots’? Any opinions on the validity of the results?

The word is that there is native American on both sides of my family tree, but as near as I can tell, it would be so far back, I’d probably be eligible for the Mayflower Society. I’d like to try to settle it one way or another.

Comments are closed.

%d bloggers like this: