Pity poor John Ioannidis.
The man does provocative work about the reliability of scientific studies as published in the peer-reviewed literature, and his reward for trying to point out shortcomings in how we as scientists and clinical researchers do studies and evaluate evidence is to be turned into an icon for cranks and advocates of pseudoscience–or even antiscience. I first became aware of Ioannidis two years ago around the time of publication of a paper by him that caused a stir, entitled Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. In that study, Ioannidis concluded that approximately 1/3 of highly cited clinical studies are later found to be incorrect and that therapeutic effects initially found in clinical trials are often found in later studies to be smaller or even nonexistent. Ioannidis then followed up with an editorial entitled Why Most Published Research Findings Are False. In response, I noted how alternative medicine mavens and others who are into pseudoscience jumped all over the study and how it even explained why antivaccinationists should not be surprised that effects attributed to thimerosal in vaccines in early iterations of studies disappeared on further analysis. Indeed, I even pointed out how some prominent credulous bloggers were citing Ioannidis as evidence that most scientists are “lousy.”
It’s happening again.
Fellow ScienceBlogger Mark Hoofnagle pointed out that this resurgence of crank interest in Ioannidis’ work seems to be due to a rather poorly written Wall Street Journal editorial, apparently inspired by Ioannidis’ most recent article, that almost totally missed the point of his studies, seemingly concluding, as prominent bloggers did two years ago that the problems with published research were because of “lousy” scientists.
Predictably, quacks and advocates of pseudoscience have jumped all over Ioannidis’ findings, erroneously and foolishly representing it as “proof” that the science they don’t like is hopelessly wrong, including HIV/AIDS denialists and global warming “skeptics.” Sadly, Ioannidis’ work rather easily lends itself to being misinterpreted by the pseudoscientists as a reason to dismiss the findings of science–as if doing so would make their pseudoscience correct absent good evidence for their position.
None of Ioannidis’ work should come as any shock to clinical investigators or scientists. Indeed, it did not. Two years ago, I actually took the opportunity to present Ioannidis’ JAMA article for our weekly journal club. Although it provoked a lively discussion, not a single one of my surgical colleagues were the least bit surprised or disturbed by its findings. Of course first attempts to answer a clinical question often produce incorrect or exaggerated results! It is the totality of evidence that has to be examined, and, until it is, new findings should be treated with care and skepticism.
There are, of course, many systemic problems in biomedical research. No one “in the biz” would deny that. But, in many ways, the present system of randomized clinical trials and peer-review is, to paraphrase Winston Churchill regarding democracy, the worst system for finding the best treatments–except for all the rest. It is indeed true that there are far too many crappy studies published. There is indeed a bias towards publishing studies that actually show a correlation, treatment effect, or other positive result, rather than a negative result. There has been a push towards encouraging the publication of negative studies, but the bias has not disappeared. Despite all that, it is a big mistake to take Ioannidis’ findings as “proof” that science is not the best methodology we have for answering fundamental questions about how the universe works, the pathogenesis of disease, or for identifying the most efficacious treatments. Certainly, it far surpasses any alternatives.
Perhaps the best analyses of the real significance of Ioannidis’ findings come from Steve Novella and Alex Tabbarok. Tabbarok, for example, explains very eloquently why, even under “perfect” conditions, as many as 25% of hypotheses may be incorrectly found to be true:
Suppose there are 1000 possible hypotheses to be tested. There are an infinite number of false hypotheses about the world and only a finite number of true hypotheses so we should expect that most hypotheses are false. Let us assume that of every 1000 hypotheses 200 are true and 800 false.
It is inevitable in a statistical study that some false hypotheses are accepted as true. In fact, standard statistical practice guarantees that at least 5% of false hypotheses are accepted as true. Thus, out of the 800 false hypotheses 40 will be accepted as “true,” i.e. statistically significant.
It is also inevitable in a statistical study that we will fail to accept some true hypotheses (Yes, I do know that a proper statistician would say “fail to reject the null when the null is in fact false,” but that is ugly). It’s hard to say what the probability is of not finding evidence for a true hypothesis because it depends on a variety of factors such as the sample size but let’s say that of every 200 true hypotheses we will correctly identify 120 or 60%. Putting this together we find that of every 160 (120+40) hypotheses for which there is statistically significant evidence only 120 will in fact be true or a rate of 75% true.
Thus, even if the research is “perfect,” with no flaws in the experimental design and no biases, it is not unreasonable to predict that at least 25% of the results would ultimately found to be incorrect, assuming the standard cutoff for statistical significance as p < 0.05. I've pointed out before in the context of discussing clinical trials for homeopathy that, even under “perfect” conditions, at least 5% of trials studying homeopathy would appear to be “positive” just due to random chance alone, but I didn’t take into consideration all the factors that Tabbarok did. In fact, in retrospect, I realize that I was wildly naïve and optimistic in my estimate. Taking into account deficiencies in study design, it’s not at all difficult to see how around half of medical studies could come up with incorrect results. Unfortunately, one way to reduce that number is a way that cranks will not like at all–not one bit.
What I’m talking about is the “prior probability” (i.e., the scientific plausibility or likelihood of its being correct based on basic science and previous data) of a hypothesis. If one takes into account prior probability, we can decrease the likelihood of false positive results. As Steve Novella put it:
Tabbarok points out that the more we can rule out false hypotheses by considering prior probability the more we can limit false positive studies. In medicine, this is difficult. The human machine is complex and it is very difficult to determine on theoretical grounds alone what the net clinical effect is likely to be of any intervention. This leads to the need to test a very high percentage of false hypotheses.
What struck be about Tabbarok’s analysis (which he did not point out directly himself) is that removing the consideration of prior probability will make the problem of false positive studies much worse. This is exactly what so-called complementary and alternative medicine (CAM) tries to do. Often the prior probability of CAM modalities – like homeopathy or therapeutic touch – is essentially zero.
If we extend Tabbarok’s analysis to CAM it becomes obvious that he is describing exactly what we see in the CAM literature – namely a lot of noise with many false-positive results.
That is exactly what we see, particularly when homeopaths can rattle off studies that appear to support the efficacy of homeopathy. However, if one examines the totality of evidence, these apparent “positive” studies turn out to be just background noise. What’s instructive is to look at Tabbarok’s observations and suggestions about what can be done about the problems that plague biomedical research:
- In evaluating any study try to take into account the amount of background noise. That is, remember that the more hypotheses which are tested and the less selection which goes into choosing hypotheses the more likely it is that you are looking at noise.
- Bigger samples are better. (But note that even big samples won’t help to solve the problems of observational studies which is a whole other problem).
- Small effects are to be distrusted.
- Multiple sources and types of evidence are desirable.
- Evaluate literatures not individual papers.
- Trust empirical papers which test other people’s theories more than empirical papers which test the author’s theory.
- As an editor or referee, don’t reject papers that fail to reject the null.
All of these are good bits of advice for evaluating the scientific literature. From my perspective, it’s critical to evaluate the totality of the scientific evidence. Single studies may be produce questionable results, but eventually science will correct them. To me, this is the most telling difference between evidence-based medicine and “alternative” medicine, between scientific medicine and pseudoscience: the ability and willingness to change hypotheses based on the evidence. Indeed, Skeptico shows us an example of just this difference in discussing an HIV vaccine that was abandoned because studies showed that it doesn’t work:
The difference between this and complementary and alternative “medicine” (CAM) is starkly shown. Real medicine is tested for efficacy, and abandoned if it doesn’t work. When was the last time any CAM treatment was publicly abandoned by its practitioners because they discovered it didn’t work?
The answer is: Never.
Alternative medicine mavens frequently accuse us “conventional” doctors of being “dogmatic” or otherwise unwilling to consider “different” ideas (specifically their ideas) about medicine and the treatment of disease. In actuality, it is supporters of alternative medicine and pseudoscience who tend to be far more dogmatic than any conventional physician. If Ioannidis’ results are correct, the results of 1/3 of seemingly very important papers were ultimately refuted, resulting in the abandonment of accepted treatments once thought sound. Doesn’t that tell you something? It should! It tells me that “conventional” medicine changes its tests, understanding of disease, and treatments on the basis of new evidence and, more importantly, abandons treatments found to be ineffective or not as effective as newer therapies. This winnowing and optimization process may be very messy to watch. It may not happen as fast as we’d like, but happen it does eventually. Contrast this to alternative medicine, where there are still alternative medicine practitioners pushing Laetrile (despite the fact that it was shown to have no efficacy against cancer in well-designed clinical trials 25 years ago), chelation therapy for coronary artery and peripheral vascular disease (despite multiple randomized studies during the 1990’s showing it to be no better than placebo), and homeopathy (despite 200 years of science showing that it, too, is no better than an elaborate placebo). Unlike conventional medicine, alternative medicine does not change, other than to gussy up its woo with “science-y”-sounding terminology, especially quantum theory or to add another scientifically highly improbable “treatment” to its panoply of scientifically improbably treatments. (Hulda Clark’s Zapper, anyone?) More importantly, it almost never abandons therapies found by sound research to be ineffective, as Skeptico so ably (and sarcastically) pointed out.
In the end, contrary to the best efforts of cranks, pseudoscientists, and quacks to portray its conclusions as indicating that science is “basically fraudulent,” remember that Ioannidis’ work does not give any succor at all to advocates of pseudoscience, be they alternative medicine mavens, HIV/AIDS denialists, or any other. In fact, it is work like his that differentiates science and evidence-based medicine from pseudoscience and alternative medicine. Ioannidis looks at how we as a profession do biomedical research and clinical trials and finds the faults even in studies thought to be the gold standard, all with a mind to improve how we do research, suggesting more replication, more care, and to be cautious about initial findings. There’s also an irony in this, given what Ioannidis is saying and how cranks are representing it. That’s because Ioannidis would have no way of determining what percentage of scientific findings are “wrong” if science weren’t a self-correcting enterprise, if scientists hadn’t continued to work on the same problems and published findings that contradicted and ultimately refuted the early findings. Again, contrast that to how alternative medicine operates, where, once a treatment becomes popular (homeopathy, for example), no matter how much science shows it to be implausible or not to work, it is defended for over 200 years, even to the point of trying to explain it based on torturing the findings of the latest science.
The misuse and abuse of Ioannidis’ work is, when you come right down to it, nothing more than a variant of the old crank chestnut of “science has been wrong before.” My usual response is: So what? Science has been wrong before, but it was generally scientists, not pseudoscientists, who found the error and corrected it. Moreover, they did it based on the evidence, not on cranks’ favored techniques of logical fallacies, cherry picking data (HIV/AIDS denialists’ and creationists’ favorite technique), and misrepresenting what science actually says (creationists’ favorite technique). It does not follow from the past mistakes of science that the science now is necessarily wrong. If you want to show that, then you need evidence, not appeals to past findings of science that were later found to be incorrect.
Physicians and scientists are generally aware of the shortcomings of the biomedical literature. Most, but sadly not all of us, know that early findings that haven’t been replicated yet should be viewed with skepticism and that we can become more confident in results the more they are replicated and built upon, particularly if multiple lines of evidence (basic science, clinical trials, epidemiology) all converge on the same answer. The public, on the other hand, tends not to understand this. Where we in science see changes in proclamations about a health question as the natural outgrowth of the self-correcting nature of science, the public sees them as confusing, particularly when the evidence is not yet as clear as we would like and science shifts back and forth between two positions. This problem is frequency exacerbated by shoddy science reporting, where each new study is breathlessly reported as the latest, greatest, and seemingly final word on a topic, even though the paper itself will often contain a detailed discussion of the uncertainties and possible sources of error in its results. Somehow, this uncertainty gets lost in the reporting. Certainty is so much more interesting and satisfying. That’s why it’s a good thing that Ioannidis and others like him remind us from time to time of the uncertainty inherent in biomedical science. Even if cranks have a field day with these sorts of findings, that’s just the price of good science.