Is the “decline effect” really so mysterious? (revisited)

Back in December, I took issue with a highly irritating article by someone who normally should know better, Jonah Lehrer, entitled The Truth Wears Off: Is There Something Wrong With the Scientific Method?, so much so that I wrote one of my typical long-winded deconstructions of the article. One thing that irritated me was contained in the very title itself, namely the insinuation that the “decline effect,” which is the tendency of effects observed in early scientific experiments demonstrating a phenomenon to “decline” or become less robust as more and more experiments are performed, is somehow some mysterious phenomenon that scientists deny. If you want the long version of what I found so wrong about the Lehrer’s article, you can go to the link. The short version is that not only is the “decline effect” not nearly as mysterious as Lehrer made it sound but it’s not some sort of serious, near fatal problem with how science is done. Indeed, it’s not particularly mysterious at all to many of us who actually–oh, you know–do science, particularly those of us who do medical science and clinical trials. Nor was I the only one to take serious issue with Lehrer’s article. Steve Novella and P.Z. Myers did as well (not to mention a certain “friend” of the blog). None of us were pleased with Lehrer’s characterization of the “decline effect” as the “truth wearing off” and his implication at the end of the article that this effect means that it’s impossible ever really to “prove” anything.

It turns out that Jonah Lehrer has responded to some of the criticisms of his article entitled More Thoughts on the Decline Effect. Unfortunately, his response to criticism demonstrates that he’s pretty much missed the point again:

This week, the magazine published four very thoughtful letters in response to the piece. The first letter, like many of the e-mails, tweets, and comments I’ve received directly, argues that the decline effect is ultimately a minor worry, since “in the long run, science prevails over human bias.”


This is, of course, not a bad argument against the sensationalism that Lehrer demonstrated regarding the decline effect, and, as an example of this argument, Howard Stuart cited the Millikan oil drop effect, in which Robert Millikan calculated the charge of the electron. His first estimate of the charge was too small, and it took several years before the correct value was finally agreed upon after many investigators tried to replicated Millikan’s results. As Stuart points out, the reason it took so long for Millikan’s result to be corrected to the currently accepted, higher value for the charge of an electron was because scientists were biased towards rejecting results that differed too far from Millikan’s. Lehrer responds by in essence repeating Stuart’s point using an excerpt from a talk by Richard Feynman and then pointing out that this is a good example of selective reporting. Well, yes, no one is denying that, but the point is that science is self-correcting and that science does ultimately prevail over human bias.

Lehrer’s response:

But that’s not always the case. For one thing, a third of scientific papers never get cited, let alone repeated, which means that many errors are never exposed. But even those theories that do get replicated are shadowed by uncertainty. After all, one of the more disturbing aspects of the decline effect is that many results we now believe to be false have been replicated numerous times. To take but one example I cited in the article: After fluctuating asymmetry, a widely publicized theory in evolutionary biology, was proposed in the early nineteen-nineties, nine of the first ten independent tests confirmed the theory. In fact, it took several years before an overwhelming majority of published papers began rejecting it. This raises the obvious problem: If false results can get replicated, then how do we demarcate science from pseudoscience? And how can we be sure that anything–even a multiply confirmed finding–is true?

Let’s just put it this way. If a scientific paper is never cited and the experiments described in the paper never repeated, then I would argue that the science in the paper is probably just not that important. Think about it. The reason the Millikan oil drop experiment was repeated time and time again until science got it right is because the value of the charge of an electron is a very basic, fundamental value in physics. It was (and is) important to know what it is. Science that reports fundamentally important results will be replicated. Science that is not, might not ever be replicated, but, when you come right down to it, is it really that big a deal that it isn’t? I’d say that it probably isn’t. Yes, sometimes a bit of important science lies buried in the literature for years without anyone appreciating its importance, only to be near-miraculously discovered and extended by another scientist, but such stories are relatively uncommon.

More importantly, I don’t understand why Lehrer rehashes an example he used in his original article. As you recall, he used fluctuating asymmetry as his most prominent example of the decline effect, devoting considerable verbiage in his article to it. Basically, as described, fluctuating asymmetry appeared to be an important and robust result, with a number of papers finding results that supported the hypothesis. However, over time, the results became less and less robust to the point where it appears that the hypothesis is not so well supported after all. Again, this is nothing more than science correcting itself. As I’ve said before with regards to medicine, it may take longer than we like. It might be a lot messier than we like. It might even be uglier than we like. But eventually, science finds the way, and false hypotheses are rejected. Lehrer seems to think this should happen instantaneously, but if science were that clear cut it wouldn’t be so hard to do at the cutting edge. Even more distressingly, Lehrer goes one worse than his previous article, which he concluded by implying that the decline effect somehow makes it impossible to know anything about the universe with any degree of certainty. Now he’s implying that somehow the decline effect makes it horribly difficult to differentiate science from pseudoscience. Now I don’t want to try to make light of the demarcation problem or imply that it’s always easy to distinguish pseudsocience from science, but let’s just put it this way: The decline effect is not what makes demarcation between science and pseudoscience difficult. It’s probably not even a major consideration.

I find Lehrer’s question off-base as well. How can be be sure that even a multiply confirmed finding is true? He can’t! We can’t! Repeat after me, Jonah: Scientific conclusions are always provisional, always subject to change. We can never be sure that even multiply confirmed findings are “true,” whatever “true” means! Why is this such a difficult concept for Lehrer to grasp? He seems to think that science has to be able to “prove” something once and for all, or else it’s a failure, kind of like the way Mike Myers says, “If it isn’t Scottish, it’s crap!” If a scientific conclusion isn’t, well, conclusive, Lehrer seems to be saying, it’s crap. However, there’s almost certainly no such thing as a “perfect” or final understanding of anything. This is science’s greatest strength, but also its greatest Achilles heel in terms of acceptance by the public. How often do we hear people complaining about how one week there is a study concluding that this or that is unhealthy, only to be followed less than a year later claiming that this or that is unhealthy? Or how often do we hear cranks use changes in scientific “truth” as “evidence” that science is inherently unreliable, the discovery of H. pylori as the most common cause of duodenal ulcers being a favorite example? Yes, because in the early 1980s it was discovered that H. pylori causes ulcers, causing a radical change in how physicians treat them, cranks, particularly alt-med cranks, like to cite resistance to H. pylori as proof that science is unreliable and changes too radically–and therefore by implication their woo must work.

Lehrer instead chooses to take an easy swipe at his critics:

These questions have no easy answers. However, I think the decline effect is an important reminder that we shouldn’t simply reassure ourselves with platitudes about the rigors of replication or the inevitable corrections of peer review. Although we often pretend that experiments settle the truth for us–that we are mere passive observers, dutifully recording the facts–the reality of science is a lot messier. It is an intensely human process, shaped by all of our usual talents, tendencies, and flaws.

Actually, what Lehrer’s critics have been doing is anything but reassuring ourselves with platitudes about the rigors of replication. Indeed, all of us who bothered to write about Lehrer’s article spent considerable time pointing out how regression to the mean, publication bias, and a variety of other factors that could explain much of the decline effect. We spent a lot of effort trying to explain how it is unsurprising that initial promising results often appear less so as more and more scientists investigate a question, developing along the way better techniques and approaches to investigating the question and approaching it from different angles. We spent a lot of verbiage describing how it is not at all unsurprising that new drugs, which seem to work so well in early clinical trials, appear to lose efficacy as their indication is broadened beyond the homogeneous initial small groups of subjects to more patients whose characteristics are less tightly controlled. Indeed, one of the letter writers pointed this very fact out to Lehrer, but he chose not to address this point directly.

The decline effect is something any physician who does clinical research knows from experience (although he may not call it that) because he sees it so often. To Lehrer it seemed to be some sort of shocking revelation in clinical research. The expectation that randomized clinical trials can overestimate the efficacy of new drugs is the very reason why, after drugs are released, physicians sometimes carry out what are known as “pragmatic trials,” which are designed to find out how effective a treatment is in everyday, real-world practice, where the conditions are not nearly as controlled and the patient populations not nearly as homogeneous as they are in randomized clinical trials. Efficacy results determined in pragmatic trials are virtually always less robust than what was measured in the original randomized clinical trials. Not that any of this stops Lehrer from simply repeating the same stuff about big pharma having incentives to shape the results of its science and clinical trials. We get it; we get it. Science is done by humans, and sometimes human biases and motivations other than scientific discovery influence thee humans who do science.

Finally, unfortunately, Lehrer strikes the same wrong notes as he did before when trying to answer criticisms that he’s giving aid and comfort to denialists. Here’s what he writes in response to just such a criticism:

One of the sad ironies of scientific denialism is that we tend to be skeptical of precisely the wrong kind of scientific claims. Natural selection and climate change have been verified in thousands of different ways by thousands of different scientists working in many different fields. (This doesn’t mean, of course, that such theories won’t change or get modified–the strength of science is that nothing is settled.) Instead of wasting public debate on solid theories, I wish we’d spend more time considering the value of second-generation antipsychotics or the verity of the latest gene-association study.

Here Lehrer demonstrates a profound misunderstanding of how science denialism works. Here’s a hint: The reason why such topics become the targets of scientific denialism is because the conclusions of science run up against very strong religious, political, or primal views. Evolution runs up against fundamentalist religion that cannot abide the concept that humans evolved from “lower” creatures. Those with political views that oppose government mandated action to lower the emissions of greenhouse gases attack AGW science because of its implications. Although the treatment of mental illness can certainly bring out the crazy, if you’ll excuse the possible insensitivity of the term (for example: Scientology), for most people there just isn’t the same level of intense ideological investment in the efficacy of second generation antipsychotics as there is in whether or not our understanding of AGW is accurate or whether humans evolved from “lower” creatures. Ditto whether the latest gene association study is correct. Besides, the efficacy of second generation antipsychotics and results of the latest gene association study are not yet settled science in anything like the way that evolution is. Scientists still debate them intensely, and, particularly for the latest gene association studies, they are not accepted as anything near settled science. Consequently, when the public hears about such studies, they usually don’t know what to think of them and promptly forget them.

More importantly, after discussing the decline effect and impugning the reliability of science, Lehrer still can’t seem to give a coherent explanation as to why AGW and evolution are such reliable, well-founded scientific theories compared to what he seems to perceive as the unreliability of the rest of science. Worse, he hasn’t addressed many of the more cogent criticisms of his work, in particular the numerous attempts to explain to him why it is not at all remarkable that second generation antipsychotics have not proven to be as effective as initial results suggested or why it is not particularly surprising or disturbing that fluctuating asymmetry never panned out. Lehrer had a great opportunity to explain why making scientific conclusions is so difficult and why all scientific knowledge is provisional. Those points are in his articles on the decline effect, but they’re buried in the surrounding implication that the decline effect is mysterious. Then in the last paragraph of his response to critics Lehrer has the chutzpah to declare that “there is nothing inherently mysterious about why the scientific process occasionally fails or the decline effect occurs.”

That’s what we’ve been trying to tell Lehrer since he wrote his first article on the decline effect, but he hasn’t been listening! Lehrer is, of course, correct when he quotes a scientist asserting that the decline effect can be studied by science; it’s just that he doesn’t seem to realize that how and why the scientific method fails have been subjects of research ever since there has been a scientific method.