NOTE: Orac was actually out rather late last night. It turns out that the more administrative responsibility he somehow seems to find the more he has to go out to dinner as a part of various cancer center-related functions. As a result, he is recycling a bit of recent material from elsewhere that he in his extreme arrogance considers just too good not to post up on this blog too. In any case, it’s always interesting to see how a different audience reacts to his stuff, and he did make some alterations to this post.
‘Tis the season, it would seem, for questioning science. Not that there’s necessarily anything wrong with questioning science and how it is done. Certainly, right here on this very blog I’ve not infrequently pointed out problems with how science, particularly medical science, is done
This time around, though, the challenge to science comes from an unexpected source in the form of an article in The New Yorker by Jonah Lehrer entitled The Truth Wears Off: Is There Something Wrong With the Scientific Method? Unfortunately, the full article is restricted only to subscribers. Fortunately, a reader sent me a PDF of the article; otherwise, I wouldn’t have bothered to discuss it. Also, Lehrer himself has elaborated a bit on questions asked of him since the article’s publication and published fairly sizable excerpts from his article here and here. In any case, I’ll try to quote as much of the article as I think I can get away with without violating fair use, and those of you who don’t have a subscription to The New Yorker might just have to trust my characterization of the rest. It’s not an ideal situation, but it’s what I have to work with.
The decline effect
I’m going to go about this in a slightly different manner than one might normally expect. First, I’m going to quote the a few sentences near the end of the article right now at the beginning, because you’ll rapidly see Orac might find them provocative, perhaps even a gauntlet thrown down. Before I do that, I should define the topic of the article, namely something that has been dubbed “the decline effect.” Basically, this is a term for a phenomenon in which initial results from experiments or studies of a scientific question are highly impressive, but, over time, become less so as the same investigators and other investigators try to replicate the results, usually as a means of building on them. In fact, Googling “the decline effect” brought up an entry from The Skeptic’s Dictionary, in which the decline effect is described thusly:
The decline effect is the notion that psychics lose their powers under continued investigation. This idea is based on the observation that subjects who do significantly better than chance in early trials tend to do worse in later trials.
In his article, Lehrer actually does cite paranormal research by Joseph Banks Rhine in the 1930s, whose testing of a self-proclaimed psychic demonstrated lots of “hits” early on, far more than were likely to be due to random chance. Rhine’s early results appeared to support the existence of extrasensory perception (ESP). However, as further testing progressed, the number of hits fell towards what would be expected by random chance alone, hence Banks’ coining of the term “decline effect” to describe it. Lehrer spends the bulk of his article describing examples of the decline effect, discussing potential explanations for this observation, and, the part that rated a bit of Insolence–the Respectful kind, this time!–trying to argue that the effect can be generalized to nearly all of science. Longtime readers would probably not find that all that much particularly irksome or objectionable in his article (well, for the most part, anyway); that is, until we get to the final paragraph:
Such anomalies demonstrate the slipperiness of empiricism. Although many scientific ideas generate conflicting results and suffer from falling effect sizes, they continue to get cited in the textbooks and drive standard medical practice. Why? Because these ideas seem true. Because they make sense. Because we can’t bear to let them go. And this is why the decline effect is so troubling. Not because it reveals the human fallibility of science, in which data are tweaked and beliefs shape perceptions. (Such shortcomings aren’t surprising, at least for scientists.) And not because it reveals that many of our most exciting theories are fleeting fads and will soon be rejected. (That idea has been around since Thomas Kuhn.) The decline effect is troubling because it reminds us how difficult it is to prove anything. We like to pretend that our experiments define the truth for us. But that’s often not the case. Just because an idea is true doesn’t mean it can be proved. And just because an idea can be proved doesn’t mean it’s true. When the experiments are done, we still have to choose what to believe.
As you might imagine, this passage rather irritated me with what appears on the surface to border on a postmodernist rejection of the scientific method as “just another way of knowing” in which we as scientists have to “choose what to believe.” Moreover, it certainly seems that many of the examples provided by Lehrer are seemingly curious examples of effect sizes declining in a variety of scientific areas over time as more work is done. What’s not quite as compelling is the way Lehrer, whether intentionally or inadvertently, gives the impression (to me, at least) of painting the decline effect as some sort of mysterious and unexplained phenomenon that isn’t adequately accounted for by the various explanations he describes in his article. He seems to paint the decline effect as a phenomenon that casts serious doubt on the whole enterprise of science in general and science-based medicine in particular, given that many of his examples come from medicine. In all fairness, Lehrer did later try to justify the way he concluded his article. to boil it all down, basically Lehrer equivocated by saying that all he meant by the above passage was that science is “a lot messier” than experiments, clinical trials, and peer review and that “no single test can define the truth.” Well, duh. (The snark in me might also say that science itself can’t actually define “The Truth.”) But if that’s all that Lehrer really meant, then why didn’t he just say so in the first place instead of sounding going all postmodernist-y on us, as though science can’t ever make any conclusions that are any more valid than “other ways of knowing”?
So which examples does Lehrer choose to bolster his case that the decline effect is a serious and underrecognized problem in science? He uses quite a few, several from medical sciences (in particular psychiatry), starting the article out with the example of second generation antipsychotics, such as Zyprexa, which appeared to be so much more effective than older antipsychotics in earlier studies but whose efficacy has recently been called into question, as more recent studies have showed lower levels of efficacy, levels that are no better than the older drugs. Of course Lehrer seems never to have heard of the “dilution effect,” whereby new drugs, once approved, are tried in larger and broader ranges of conditions and patients, in particular, in patients with milder cases of the diseases for which the drugs were designed. Over time, this frequently results in the appearance of declining efficacy, when in reality all that is happening is that physicians and scientists are pushing the envelope testing the drugs in patients who are less carefully selected than patients in the early trials. No real mystery here.
Another example came from evolutionary biology, specifically observations on fluctuating symmetry. This passage is taken from a blog post quoting Lehrer’s article:
In 1991, the Danish zoologist Anders Møller, at Uppsala University, in Sweden, made a remarkable discovery about sex, barn swallows, and symmetry. It had long been known that the asymmetrical appearance of a creature was directly linked to the amount of mutation in its genome, so that more mutations led to more “fluctuating asymmetry.” (An easy way to measure asymmetry in humans is to compare the length of the fingers on each hand.) What Møller discovered is that female barn swallows were far more likely to mate with male birds that had long, symmetrical feathers. This suggested that the picky females were using symmetry as a proxy for the quality of male genes. Møller’s paper, which was published in Nature, set off a frenzy of research. Here was an easily measured, widely applicable indicator of genetic quality, and females could be shown to gravitate toward it. Aesthetics was really about genetics.
In the three years following, there were ten independent tests of the role of fluctuating asymmetry in sexual selection, and nine of them found a relationship between symmetry and male reproductive success. It didn’t matter if scientists were looking at the hairs on fruit flies or replicating the swallow studies–females seemed to prefer males with mirrored halves. Before long, the theory was applied to humans. Researchers found, for instance, that women preferred the smell of symmetrical men, but only during the fertile phase of the menstrual cycle. Other studies claimed that females had more orgasms when their partners were symmetrical, while a paper by anthropologists at Rutgers analyzed forty Jamaican dance routines and dis- covered that symmetrical men were consistently rated as better dancers.
Then the theory started to fall apart. In 1994, there were fourteen published tests of symmetry and sexual selection, and only eight found a correlation. In 1995, there were eight papers on the subject, and only four got a positive result. By 1998, when there were twelve additional investigations of fluctuating asymmetry, only a third of them confirmed the theory. Worse still, even the studies that yielded some positive result showed a steadily declining effect size. Between 1992 and 1997, the average effect size shrank by eighty per cent.
And it’s not just fluctuating asymmetry. In 2001, Michael Jennions, a biologist at the Australian National University, set out to analyze “temporal trends” across a wide range of subjects in ecology and evolutionary biology. He looked at hundreds of papers and forty-four meta-analyses (that is, statistical syntheses of related studies), and discovered a consistent decline effect over time, as many of the theories seemed to fade into irrelevance. In fact, even when numerous variables were controlled for — Jennions knew, for instance, that the same author might publish several critical papers, which could distort his analysis–there was still a significant decrease in the validity of the hypothesis, often within a year of publication. Jennions admits that his findings are troubling, but expresses a reluctance to talk about them publicly. “This is a very sensitive issue for scientists,” he says. “You know, we’re supposed to be dealing with hard facts, the stuff that’s supposed to stand the test of time. But when you see these trends you become a little more skeptical of things.”
Jennions’ article was entitled Relationships fade with time: a meta-analysis of temporal trends in publication in ecology and evolution. Reading the article, I was actually struck by how small, at least compared to the impression that Lehrer gave in his article, the decline effect in evolutionary biology was estimated to be in Jennions’ study. Basically, Jennions examined 44 peer-reviewed meta-analyses and analyzed the relationship between effect size and year of publication; the relationship between effect size and sample size; and the relationship between standardized effect size and sample size. To boil it all down, Jennions et al concluded, “On average, there was a small but significant decline in effect size with year of publication. For the original empirical studies there was also a significant decrease in effect size as sample size increased. However, the effect of year of publication remained even after we controlled for sampling effort.” They concluded that publication bias was the “most parsimonious” explanation for this declining effect.
Personally, I’m not sure why Jennions was so reluctant to talk about such things publicly. You’d think from the quotes chosen by Lehrer for his article that scientists were all ready to come after him with pitchforks, hot tar, and feathers if he dared to point out that effect sizes reported by investigators in his scientific discipline exhibit apparent declines over the years due to publication bias and the bandwagon effect. Perhaps it’s because Jennions is not in medicine; after all, we’ve been speaking of such things publicly for a long time. Indeed, physicians generally expect that most initially promising results, even in randomized trials, will probably fail to ultimately pan out. In any case, those of us in medicine who might not have been willing to talk about such phenomena became more than willing after John Ioannidis published his provocatively titled article Why Most Published Research Findings Are False around the time of his study Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. Physicians and scientists are generally aware of the shortcomings of the biomedical literature. Most, but sadly not all of us, know that early findings that haven’t been replicated yet should be viewed with extreme skepticism and that we can become more confident in results the more they are replicated and built upon, particularly if multiple lines of evidence (basic science, clinical trials, epidemiology) all converge on the same answer. The public, on the other hand, tends not to understand this, while cranks tend to jump all over Ioannidis’ work as though it is somehow a lethal indictment of science-based medicine.
John Ioannidis, John Lehrer, and the decline effect
The work of John Ioannidis, as discussed here numerous times before, provides an excellent framework to understand why effect sizes appear to decline over time. Although Ioannidis has been criticized for exaggerating the extent of the problem and even using circular reasoning, for the most part I find his analysis compelling. In medicine, in particular, early reports tend to be smaller trials and experiments that, because of their size, tend to be more prone to false positive results. Such false positive results (or, perhaps, exaggerated results that appear more positive than they really are) generate enthusiasm, and more investigators pile on. There’s often a tendency to want to publish confirmatory papers early on (the “bandwagon effect”), which might further skew the literature too far towards the positive. Ultimately, larger, more rigorous studies are done, and these studies result in a “regression to the mean” of sorts, in which the newer studies fail to replicate the large effects seen in earlier results. This is nothing more than what I’ve been saying time and time again, namely that the normal course of clinical research is to start out with observations from smaller studies, which are inherently less reliable because they are small and thus more prone to false positives or exaggerated effect sizes
In his article, Lehrer blames in essence three things for the decline effect: publication bias, selective reporting, and the culture of science, which contributes to the proliferation of the first two problems. Publication bias has been discussed here on multiple occasions and in various contexts. Basically, it’s the phenomenon in which there is a marked bias towards the publication of “positive” data; in other words, negative studies tend not to be reported as often or tend to end up being published in lower tier, lower “impact” journals. To Lehrer, however, publication bias is not adequate to explain the decline effect because, according to him:
While publication bias almost certainly plays a role in the decline effect, it remains an incomplete explanation. For one thing, it fails to account for the prevalence of positive results among studies that never even get submitted to journals. It also fails to explain the experience of people like Schooler, who have been unable to replicate their initial data despite their best efforts.
This is what is known about being (probably) right for the wrong reasons. I would certainly agree that publication bias is probably an incomplete explanation for the decline effect, although I would be very curious about the prevalence of positive results among studies that never get submitted to journals; it’s pretty darned rare, in my experience, for positive results not to be submitted for publication unless there are serious flaws in the studies with positive results or some other mitigating circumstance takes hold, such as the death of the principal investigator, a conflict over the results between collaborating laboratories, or a loss of funding that prevents the completion of necessary controls or additional experiments. If Lehrer has evidence that show my impression that failure to publish positive results is rare, he does not present it.
I would also argue that Lehrer is probably only partially right (and makes a huge assumption to boot) when he argues that publication bias fails to explain why individual investigators can’t replicate their own results. Such investigators, it needs to be remembered, initially published highly positive results. When they have trouble showing effect sizes as large and seemingly robust as their initial results, doubt creeps in. Were they wrong the first time? Will reviewers give them a hard time because their current results do not show the same effect sizes as their original results? They hold back. True, this is not the same thing as publication bias, but publication bias contributes to it. A journal’s peer reviewers are probably going to give an investigator a much harder time for a result showing a smaller effect size if there is published data from before that shows a much larger effect size; better journals will be less likely to publish such a result, and investigators know it. Consequently, publication bias and selective reporting (the investigator holding back the newer, less compelling results, knowing the lower likelihood of getting it published in a top tier journal). Other investigators, not invested in the original investigator’s initial highly positive results, are less likely to hold back, and, indeed, there may even be an incentive to try to disprove a rival’s results.
Lehrer makes a good point when he points out that there is such a thing as selective reporting, wherein investigators tend to be less likely to report findings that do not fit into their current world view and might even go so far as to try to shoehorn findings into the paradigm they currently favor. He even goes so far as to give a good example of cultural effects on selective reporting, specifically the well-known tendency of studies of acupuncture from China to be far more likely to report positive results than studies of acupuncture done in “Western” nations. He points out that this discrepancy “suggests that scientists find ways to confirm their preferred hypothesis, disregarding what they don’t want to see.” Or, as Simon and Garfunkel once sang in The Boxer, “a man hears what he wants to hear and disregards the rest.” It is not surprising that scientists would share this quality with their fellow human beings, but it is devilishly difficult to identify and quantify such biases. That, of course, doesn’t stop proponents of pseudoscience from crying “bias!” whenever their results are rejected by mainstream science.
There is one other potential explanation that Lehrer seems not to consider at all: Popularity. About a year and a half ago, I discussed a fascinating study that examined the effect of popularity on the reliability of the medical literature about a topic. The study, by Pfeiffer and Hoffman, was published in PLoS ONE and entitled Large-Scale Assessment of the Effect of Popularity on the Reliability of Research, and its introduction lays out the problem:
In this context, a high popularity of research topics has been argued to have a detrimental effect on the reliability of published research findings [2]. Two distinctive mechanisms have been suggested: First, in highly competitive fields there might be stronger incentives to “manufacture” positive results by, for example, modifying data or statistical tests until formal statistical significance is obtained [2]. This leads to inflated error rates for individual findings: actual error probabilities are larger than those given in the publications. We refer to this mechanism as “inflated error effect”. The second effect results from multiple independent testing of the same hypotheses by competing research groups. The more often a hypothesis is tested, the more likely a positive result is obtained and published even if the hypothesis is false. Multiple independent testing increases the fraction of false hypotheses among those hypotheses that are supported by at least one positive result. Thereby it distorts the overall picture of evidence. We refer to this mechanism as “multiple testing effect”. Putting it simple, this effect means that in hot research fields one can expect to find some positive finding for almost any claim, while this is not the case in research fields with little competition [1], [2].
I discussed the implications of this paper in my usual nauseating level of detail here. Suffice to say, the more scientists working on a problem there are, the more false positives there are likely to be, but, as the field matures, there is a regression to the mean. Also, don’t forget that initial exciting results are often published in the “highest” impact journals, publication in which can really make a scientist’s career take off. However, because these results are the most provocative and might even challenge the scientific consensus strongly, they also have a tendency to turn out later to be wrong. Leaving out this aspect is a major weakness in Lehrer’s analysis, particularly given that each of the examples he provided could easily have a major component of the “popularity effect” going on.
The bottom line: Is science unreliable?
As I read Lehrer’s article, I was troubled. No, I wasn’t troubled because the implications of his article were somehow shaking my view of the reliability of science. I certainly wasn’t troubled by his discussing known problems with how science is practiced by fallible human beings, how it almost always isn’t done completely according to the idealized version of the scientific method taught to us in high school. After all, I’ve discussed the problems of publication bias and deficiencies in the peer review system seemingly ad nauseam. Rather, I was troubled by the final paragraph, quoted above, in which Lehrer seems to be implying, if not outright arguing, that science is nothing more than competing narratives between which scientists must choose, each of them not particularly well supported by data. Jerry Coyne nails it when he comments:
But let’s not throw out the baby with the bathwater. In many fields, especially physics, chemistry, and molecular biology, workers regularly repeat the results of others, since progress in their own work demands it. The material basis of heredity, for example, is DNA, a double helix whose sequence of nucleotide bases codes (in a triplet code) for proteins. We’re beginning to learn the intricate ways that genes are regulated in organisms. The material basis of heredity and development is not something we “choose” to believe: it’s something that’s been forced on us by repeated findings of many scientists. This is true for physics and chemistry as well, despite Lehrer’s suggestion that “the law of gravity hasn’t always been perfect at predicting real-world phenomena.”
Lehrer, like Gould in his book The Mismeasure of Man, has done a service by pointing out that scientists are humans after all, and that their drive for reputation–and other nonscientific issues–can affect what they produce or perceive as “truth.” But it’s a mistake to imply that all scientific truth is simply a choice among explanations that aren’t very well supported. We must remember that scientific “truth” means “the best provisional explanation, but one so compelling that you’d have to be a fool not to accept it.” Truth, then, while always provisional, is not necessarily evanescent. To the degree that Lehrer implies otherwise, his article is deeply damaging to science.
Indeed. I would argue that there is really no such thing as scientific “truth.” In fact, one thing I noticed right away in Lehrer’s articles is that the examples he chose were, by and large, taken from either psychology, parapsychology, or ecology, rather than physics and chemistry. True, he did point out an anomalous experiment that was off by 2% in estimating the gravitational constant. Given how difficult it is to measure the gravitational constant and how many scientists have done it over the years, I was actually surprised that Lehrer could only find one example of an anomalous measurement. In addition, Lehrer did point out how most gene association studies with diseases thus far have not been confirmed and how different groups find different results, but finding such associations is something that is currently popular but not a mature field. According to the “popularity effect,” it is not surprising that there is currently a lot of “noise” out in the scientific and medical literature in terms of what gene expression patterns and SNPs correlate with what disease. Over the next decade, it is very likely that many of these questions and disagreements will be sorted out scientifically.
Finally, Lehrer’s view also seems not entirely consistent in some ways. I’ll show you what I mean. On his blog, as I mentioned before, Lehrer answers reader questions and expands upon his ideas a bit. A reader asks Lehrer, “Does this mean I don’t have to believe in climate change?” Lehrer’s response is, basically, that “these are theories that have been verified in thousands of different ways by thousands of different scientists working in many different fields,” which is, of course, true, but almost irrelevant given Lehrer’s previous arguments. After all, even though I accept the scientific consensus regarding anthropogenic global warming, if publication bias and selective reporting can so distort science for so long in other fields, I have to ask how would Lehrer say he accepts the science of global warming. One way is that he quite correctly points out that the “truths” of science (I really hate using that word with respect to science) depend upon the strength of the “web” supporting them, namely the number of interconnections. We say that here ourselves time and time again as arguments against pseudoscience such as, for example, homeopathy. However, if, as Lehrer seems to be arguing, scientists already put their results into the context of what is known before, isn’t he just basically arguing for doing what we are already doing, even though he has just criticized science for being biased due to selective reporting due to scientists’ existing preconceptions?
Although Lehrer makes some good points, where he stumbles, from my perspective, is when he appears to conflate “truth” with science or, more properly, accept the idea that there are scientific “truths,” even going so far as to use the word in the title of his article. That is a profound misrepresentation of the nature of science, in which all “truths” are provisional and all “truths” are subject to revision based on evidence and experimentation. The decline effect–or, as Lehrer describes it the title of his article, the “truth wearing off”–is nothing more than science doing what science does so well: Correcting itself in its usual messy and glorious way.
32 replies on “Is the “decline effect” really so mysterious?”
Re: Individual researchers’ failure to confirm their own results and whether that can be explained by publication bias… well, that is easy to explain!
To make it easy to think through it, imagine a researcher, but the name of Dr. Alwayswrong, whose hypotheses are always false. Somehow he managed to continue to secure funding for more RCTs though. Dr. Alwayswrong always publishes positive results, and he never publishes negative results except on topics on which he’s previously reported a positive result.
About 1 in 20 times (please, statistics geeks, let this approximation slide even though I know it’s not quite a valid statement — it’s close enough!) he’s going to get a positive result with p < 0.05 even though the hypothesis was wrong. That will always be the first paper he publishes on a given topic. He will then study the effect further, and with a very high probability he will fail to replicate it (since the initial effect was a false positive). The totality of Dr. Alwayswrong's published work consists of an initially promising positive result, followed by one or more follow-up studies contradicting it -- an instant "decline effect". It's not difficult to see how the outcome of this idealized example would still apply to a lesser extent to researchers whose hypotheses are not so uniformly false, or whose decisions to publish are not so predictable. By definition, if at any point in one’s research career you encounter a false positive due to chance, there is an extremely high likelihood that you will fail to replicate your own work. And if, like pretty much everyone in the whole frakkin’ world, you are more likely to publish false positives (which of course you don’t yet know to be false) than you are to positive boring negative results… there you go. Easily explained.
Add my (first) name to the list of scientists who are not surprised by this “effect.” It is probably best defined as a phenomena and I’m not certain it is even a uniform phenomenon. I’m an ecologist and the dynamics of the belief in hypotheses is something I regularly teach in my introductory classes. I have little to add to the causes of declines of effect sizes, but I have a little to add in how it relates to ecology.
Most hypotheses in ecology are applied to a very wide variety of systems that vary by species involved, habitats, time and so on. Laboratory experiments can be replicated, but field experiments basically are all special cases due to the effect of climate and other extrinsic effects on experiments. So repeating experiments is really just doing comparative work. In fields like ecology and evolution (and really all biology), the “decline effect” is usually paired with adding contingency and subtlety to the original simple hypothesis. The hard sell that Lehrer adds (I haven’t read the article, but will trust your reading) is similar to the depression that first/second year graduate students have in these fields as their illusions that there will be some critical experiment that will solve a question are crushed.
Re my comment at #1… I always forget I need to be extra careful when I type “p < 0.05". The third paragraph should read something like this:
About the “dilution effect”: ( alt med usually comments about this in regard to anti-depressants : ” See, it *proves* they don’t *work*!”) new-stylee, high-priced anti-psychotics, such as Zyprexa and Abilify, are not only tried on people with milder symptoms, but on people with conditions other than the original schizophrenia ( e.g. for depression as an add-on, bipolar, dementia, for “irritability” in *autism*). Interestingly, Abilify is supposed to affect the difficult-to-treat “negative symptoms” ( i.e. deficits) as well as “positive” (i.e. psychotic) symptoms , so I can see why it’s tried in those other conditions which are often characterized by deficits. Actually, I’m surprized we haven’t heard more from alt med on this because side-effects are a realistic concern.
Yawwwwwn, so boring.
By pure random chance you would expect some effects to strengthen and some to weaken, because errors occur.
Except that, oh wait, the ones that would strengthen get discarded silently and therefore don’t get a chance to strengthen.
(And probably get brought back up later on when everyone thinks it’s a new idea because nobody knows about the initial (mistaken) failure…)
I mean, I guess it’s worth talking about? But who really cares, jeeze.
I have read what was available of Lehrer’s article and the article referring to the work of John Ioannidis. Thanks for putting things more in perspective/context.
The reason for commenting is
I do not know if you are familiar with the XMRV hype, but as an interested party I try to follow things up a bit. I haven’t seen anything to indicate that the scientists/doctors involved are familiar with the decline effect or with the work of Ioannidis. It is almost like there is no buffer, no context, … for all the news and this cannot be entirely blamed on patients and journalists.
I guess I don’t see what’s so surprising about the “decline effect”. As I see it, it’s analogous to evolution: random fluctuations selected in a non-random fashion.
Let me explain:
Small “pilot” studies (or “preliminary reports”) are much more likely to, by random chance, have either false-positive or false-negative results than larger, more rigorous studies. That’s the “random fluctuation” part.
Within this normal distribution of small studies, a few will be sufficiently outside the expected (e.g. female swallows prefer symmetrical mates) that they will prompt the reearcher(s) to submit them for publication and will intrigue the editors enough to publish them. Studies with unexpected results are more interesting and more likely to be submitted and published. This is the non-random selection part.
Studies that fail to find anything interesting or unexpected (e.g. female swallows don’t care whether their mate is symmetrical or not) aren’t submitted for publication (why would they? “Nothing to see here folks, move along.”) and if – for some reason – they are, they are rarely published (“This just in, Generalissimo Franco is still dead.”).
Of course, once something unexpected is published – especially a small study, which has a higher liklihood of being wrong – then there will be other researchers who try to either replicate it or knock it down (and let’s be honest, a lot of the folks trying to “replicate” the results are trying to prove the original researchers wrong). As time passes, the studies that are completed will be getting larger (‘cuz bigger studies take longer to do) and the original result – if it is spurious – will get smaller and smaller until it finally disappears into the randomness.
From my perspective, the “decline effect” is simply the business of science – someone finds something “interesting” and puts it out in the literature, then other researchers try to do them one better by either showing how big the effect is or (even better) that the effect is spurious. That way, we eventually get to the “facts” by successive approximations.
I suppose the “take home lesson” – and the advice I give to my students – is that “preliminary reports” and “pilot studies” are, at best, “provisional data” and should not be taken seriously until they are replicated.
One statement by Jonah Lehrer struck me as rather odd:
Unless he’s psychic, how does he know how many studies with positive results aren’t submitted to journals? That would seem to reside in the set of unobtainable data.
Prometheus
Medicine is not the same as science. That is Leher’s first mistake, and as a physical scientist, I am angered at his loose use of the word “science” and “scientific.” There is nothing that is scientific about a double-blind clinical trial. The clinician takes a drug that has passed toxicity studies, has shown some efficacy in patients, and gives the drug to hundreds of people looking for an effect versus placebo. The hypothesis is “this drug is effective.” Unblinding data to confirm or deny the hypothesis is NOT scientific.
I am a synthetic chemist, and I am certain that essentially every result I have ever published in more than 100 papers is reproducible and the compounds we made were as described. Our chemistry may not work for every chemist, as it may require a great deal of finesse or experience to carry out, but the results are still valid after more than 25 years. I would guess that the vast majority of my colleagues would say exactly the same thing.
Comparing the physical sciences (chemistry, physics, geology, etc.) to clinical medicine is like comparing clinical medicine to sociology.
@ Rob:
That’s rather unfair. Medicine most certainly is a science – and I say this as a high-energy physicist, so my field is even “harder” than yours. It’s a science with complexities you and I don’t deal with (an electron is an electron, but people aren’t so interchangeable). Greater uncertainties do NOT make a field not science. What makes a field a science is the scientific method, and medical science most certainly qualifies. (Medical PRACTICE, not so much. Informed by science, but not itself a science.)
âPhysicians and scientists are generally aware of the shortcomings of the biomedical literature. Most, but sadly not all of us, know that early findings that haven’t been replicated yet should be viewed with extreme skepticism and that we can become more confident in results the more they are replicated and built upon, particularly if multiple lines of evidence (basic science, clinical trials, epidemiology) all converge on the same answer.â
The problem is this: physicians and scientists, just like every other group of professionals, are of varying competence and integrity. The bottom third of these populations put the public at serious risk of harm. I went into the medical field from the humanities and was shocked to find how little anyone seemed to actually know about the history and philosophy of science. I did an NIH-funced summer program and found it wasnât much better among the PhDs. âScientists understand the scientific method like fish understand aquadynamics,â I read somewhere. Frightening!
The planet is pretty bad off, you must admit, unless you are in global warming denial. And now we are talking again about the threat of a nuclear attack on the US. How did we get to where we are? The scientific method, of course! We use the scientific method because it is useful, not because it necessarily gives us some âtrueâ picture of the world. It pains me to admit this, because I love science and math more than anything, and always have, but when science looks upon the earth, historically, its gaze scorches everything in its path. Bigger weapons, bigger SUVs, etc. Medicine has some dark secrets, with experiments on minorities, involvement with the Holocaust, eugenics, lobotomies⦠We so quickly forget this stuff, but we shouldnât. If you assert that you are of a certain standard of personal integrity and adhere to truly ethical behavior, then you have to start taking seriously what your professional opposite on that competence/integrity scale could be doing. Remember that the middle third in the integrity/competence scale are more likely to look the other way. Itâs harsh, but true, you must admitâthere are a lot of morally flexible idiots out there practicing medicine and science.
Pure arrogant bullshit. In fact, I will tell you, having done both chemistry and clinical trials, that clinical trials are science and that they’re way harder to do than synthetic chemistry because in clinical trials it is much harder to control all the variables to make the result reliable and reproducible.
Synthetic chemistry reproducible? Since when?
Like, half of published syntheses we’ve tried are completely unreproducible (90% of those in Tet Lett)
@8
“Medicine is not the same as science. That is Leher’s first mistake, and as a physical scientist, I am angered at his loose use of the word “science” and “scientific.” There is nothing that is scientific about a double-blind clinical trial.”
Thank-you. These wannabe scientist doctors keep wanting to dilute the word science in order to claim far more certainty than they really have, and ignore dissenting opinion.
Medicine is an art. Always has been, and likely always will be. Being an artist is harder, so I guess they prefer the easy way out and want to rely strictly on “science” so they don’t have to use clinical judgement, intuition (another word for discerning experience) or deductive reasoning to solve patient’s health problems. Far easier to look up symptoms in a book, order a few tests and let the lab do all the work, or better yet, just blame the patient when they can’t figure the problem using their “scientific” methods.
@Pablo
As a PhD student in organic synthesis, I found plenty of unreproducible results within in my own lab!
pablo said:
“Synthetic chemistry reproducible? Since when?
Like, half of published syntheses we’ve tried are completely unreproducible”
Chemmomo said:
“As a PhD student in organic synthesis, I found plenty of unreproducible results within in my own lab”
Do you guys even read what you are responding to?
Here is what Orac said:
“in clinical trials it is much harder to control all the variables to make the result reliable and reproducible”
All this implies is that in medicine it is harder to get reliable and reproducible results than in chemistry, not that results in chemistry are reliable and reproducible.
I hope you see the difference.
anon,
“These wannabe scientist doctors keep wanting to dilute the word science in order to claim far more certainty than they really have, and ignore dissenting opinion.”
On the contrary, it has been hard work to get most doctors to base their treatments on evidence of effectiveness obtained via meta-analyses of clinical trials. Many prefer to do what feels right (intuition), or do what seems to have worked before (experience), which are both notoriously error prone methods.
“Medicine is an art. Always has been, and likely always will be. Being an artist is harder, so I guess they prefer the easy way out and want to rely strictly on “science” so they don’t have to use clinical judgement, intuition (another word for discerning experience) or deductive reasoning to solve patient’s health problems.”
Medicine is not purely an art. It is, or should be, an art based on science (the results of meta-analyses of clinical trials). And contrary to what you say, doing science based medicine is much harder and more time consuming than relying on experience and intuition to diagnose and manage a patient’s health problems. It adds another layer that can, in fact, veto any tentative decisions based purely on experience and intuition.
“Far easier to look up symptoms in a book, order a few tests and let the lab do all the work, or better yet, just blame the patient when they can’t figure the problem using their “scientific” methods”.
Oh well, you are a cynic. Nevermind then.
I wonder what I’m missing. There’s a chance for error in studies? Isn’t that why people try to reproduce results to begin with? Did I drop into an alternate history where the state of science is still, “I had an idea so I must be right”?
And what’s with the odd formulations implying that the truth is changing? Am I to understand that once upon a time the Earth really was hollow, but when we stopped believing it was hollow, it started to fill up as a result?
“I had another idea so the world must have changed to accomodate my cleverness!”
You don’t know what you’re talking about, do you?
@18
Probably not. Just spewing some idiotic nonsense, but then again, what can one expect from it….
Billy Joe,
Do you not have a sense of humor?
Can we not poke fun at our own profession just because other fields are even less reproducible?
Chemmomo,
“Do you not have a sense of humor?”
If you did not want to seem to support pablo’s erroneous implication about what Orac said, you had the oppoprtunity of calling him on it before adding your “humor”. That way I may have had the opportunity of obtaining a “sense” of it. 😉
BillyJoe, pablo was responding to Rob @8, not Orac -.-;
nsib,
Pablo was clearly responding to Orac.
Post #11 by Orac:
“In fact, I will tell you, having done both chemistry and clinical trials, that clinical trials are science and that they’re way harder to do than synthetic chemistry because in clinical trials it is much harder to control all the variables to make the result reliable and reproducible.”
Post #12 by pablo:
“Synthetic chemistry reproducible? Since when?”
Billy Joe,
Yes, but it is Rob @8 who originally made the claim that synthetic chemistry is reproducible, not Orac. What Orac seems to be saying is that controlling the variables in the clinic is harder, not that it’s easy to do in synthetic chemistry.
Like nsib, I interpreted pablo’s comment to be directed at Rob’s claim.
And how is pointing out that the reproducibility of synthetic chemistry has been overstated “erroneous”?
Chemmomo,
In future perhaps pablo can indicate who he is responding to.
Well, unlike the author of the article, I’ve read The Structure of Scientific Revolutions. And I don’t remember a single allusion to “fleeting fads”. I see how Kuhn can be exploited by post-modernists, but he’s not a proponent of their infamous idea according to which “science is just another narrative”. Only a careless and/or dishonest reading can make anyone believe the contrary.
“In future perhaps pablo can indicate who he is responding to.”
All you had to do was read Orac’s post to know that it wasn’t in response to him.
Pablo was apparently responding to someone who claimed that synthetic chemistry was reproducible. Since Orac never claimed that, why would you think it was a response to him?
Since Orac never claimed that, chat?
âTruthâ The ultimate one word oxymoron, perhaps?
The decline effect does concern us though. Much of the cutting edge of medicine is evolving so rapidly that we never have time for the decline effect to be observed, before the modality is superseded. The “art” is to remain vigilant/skeptical/open and honest enough to change when the time comes. (BUT not to bend with every fashion).
That’s an unnecessarily long rebuttal. The Decline Effect is nothing to worry about: surely it will itself decline.
Very amusing case about this post, but still this argument is to not an easy task to discuss.That is why we have mastered to create a custom writing service to help striving students. They can buy term paper that is custom and made by expert , buy custom essay papers or even buy research paper. But when customers use this kind of business â an ethical question is raised: Is it ok to use these kinds of services? Is it ok to use them, and will you be punished for using them.
I’ve been doing several demonstrations — the self-reference effect (SRE) recall study and the Bousfield category organization recall study — in my Human Memory classes for over 25 years. They Always work and they work in a robust manner (you don’t need statistics to verify the relative differences obtained — they are Very large in absolute terms and would easily overwhelm any within-cell variance).
So, my question — why, oh why my lord, has the decline effect declined to commune with my meager research endeavors(I have been replicating and extending SRE results in studies from 1986-2010, having been a key player in that paradim/field from near its inception)?