Quoth Dean Ornish and NCCAM: Your randomized clinical trials can’t study my CAM!

Practitioners of “complementary and alternative medicine” (CAM) have a love-hate relationship with randomized clinical trials (RCTs). Actually, it’s mostly hate, but they do crave the validation that only randomized clinical trials can provide within the paradigm of evidence-based medicine (EBM). Yes, I intentionally said EBM, rather than science-based medicine (SBM), because, as I’ve described so many times before, the two are not the same thing. EBM fetishizes clinical trials, a fixation that I sometimes call “methodolatry,” defined by a blog bud of mine from long ago as the “profane worship of the randomized clinical trial as the only valid method of investigation.” Of course, as I’ve also explained many times that SBM is needed is because EBM mostly ignores prior plausibility and basic science in favor of clinical trials. The EBM “pyramid” of evidence begins with basic science at the very bottom of the pyramid, which, as one climbs the pyramid, progresses through case reports, case series, case control studies, and cohort studies, progressing to RCTs and then systematic reviews and meta-analyses. If you look at a typical EBM pyramid, you’ll see that animal research and in vitro research are ranked below even “ideas, editorials, opinions.” That’s how CAM research infiltrated EBM. Modalities that can be dismissed as highly improbable or even impossible on basic science considerations alone, such as homeopathy or reiki, are not considered “impossible” for purposes of EBM because basic science is so low on the EBM pyramid. SBM is intended to try to correct this oversight.

The reason for this oversight is that there is an implicit assumption underlying EBM, and that’s that RCTs are not performed on a new treatment until it has “climbed the pyramid,” going through stages of preclinical basic science that supports it, through less rigorous preliminary study designs, and finally to RCTs before it’s considered validated and can become a standard of care. In other words, none of the shortcomings of EBM should be construed as meaning that RCTs don’t remain at the apex of clinical evidence for a treatment or drug. They do. It’s just that for modalities with very low prior plausibility, the “noise” of RCTs can lead to a lot of false-positives. As I’ve said before, it’s not plausibility bias that leads me to these conclusions. It’s reality bias. That’s the reason why CAM advocates try very, very hard to do what I like to call the CAM “appeal to other ways of knowing.” Specifically, that’s why they try to discount the value of RCTs and argue that we should use other methods to prove that their woo works. I’ve seen it time and time again.

And I’ve seen it again just this week, not once but twice.

The first example comes from that citadel of quackademic medicine, that blight on the National Institutes of Health (NIH). I’m referring, of course, to the National Center for Complementary and Alternative Medicine (NCCAM). It’s right there on the director’s blog, where Dr. Josephine Briggs speculates on When to Get Pragmatic. It is, unsurprisingly, an argument for applying pragmatic trials to CAM interventions, and it’s full of the same fallacies CAM advocates use to argue for “other ways of knowing” about CAM that don’t involve rigorous RCTs. I normally expect better from Dr. Briggs, as I’ve felt a bit sorry for her. She’s an actual scientist overseeing a Center that by its very nature can never by scientific, which is why I sympathized with her effort in the NCCAM five year plan, which I referred to as “let’s do some real science for a change.” This argument for pragmatic trials, however, is just sad:

Many complementary approaches are readily available in the marketplace. As a consequence, NCCAM sits at the crossroads between research and real-world consumer use. The general public wants to know what works and what doesn’t, and increasingly health care providers also want reliable information. Complementary health approaches are being integrated into the care offered in many nursing homes, hospices, and hospitals, and these health care organizations want good information to drive decisions about which therapies to provide or recommend. NCCAM wants to take on the challenge of meeting this need, but by and large we do not have the kind of rigorous, high-quality data that would help answer these questions.

None of this is a valid argument for decreasing the rigor of clinical trials. Rather, it is an argument for either doing the rigorous clinical trials that would be necessary to “answer these questions,” as Dr. Briggs put it, or to filter what we already know through the Bayesian lens of prior plausibility based on scientific knowledge. But that’s not what NCCAM is about, its protestations otherwise notwithstanding. It thinks that that’s what it’s about, and it even walks the walk when it comes to pure methodology. What the leadership of NCCAM can’t seem to understand is that, when it comes to hardcore CAM modalities, what it is doing is “tooth fairy science,” as Harriet Hall calls it. As she points out, we can study the amount of money left by the Tooth Fairy in different settings, but since we haven’t determined that there is really a Tooth Fairy, any conclusions we reach will be falsely attributed to an imaginary being rather than to the real cause (parental behavior). As for other modalities that have fallen under the rubric of CAM, such as dietary interventions, lifestyle, and exercise, these are nothing more than science-based modalities magically “rebranded” as CAM.

So what are “pragmatic” trials? They’re basically an attempt to determine whether treatments validated in RCTs work under “real world” conditions. RCTs are intentionally designed to make the population studied as homogeneous as possible, both to minimize differences between the control group and the experimental groups and to decrease variability within groups, the better to isolate the signal from the difference between treatment and control. However, once a treatment gets out into the community, it becomes more widely used, and the rigid inclusion and exclusion criteria used to select subjects for clinical trials fly out the window. The patients upon whom the treatment is used become much less homogeneous, and differences between academic medical centers and the community can change how the treatment is delivered. So “pragmatic” trials seek to determine effectiveness in the real world, which is a different thing than the efficacy determined in the rarified, tightly controlled world of RCTs. Here’s the problem. Pragmatic trials in CAM are putting the cart before the horse. You need to demonstrate efficacy in RCTs before it’s appropriate to consider doing pragmatic trials to determine real world effectiveness.

None of this stops Dr. Briggs from writing:

Two of the Collaboratory studies will address pain management. The LIRE study, a partnership with the National Institute of Arthritis and Musculoskeletal and Skin Diseases and a number of Health Maintenance Organizations, looks at the impact of more detailed radiology reports for back pain imaging studies on subsequent use of resources. The second partnership, a study called PPACT, which involves a number of Kaiser health systems, with oversight from the National Institute on Drug Abuse and the National Institute of Neurological Disorders and Stroke, will examine the impact of an integrated pain management strategy implemented in primary care practices.

Notice that neither of these studies involve CAM. The LIRE study involves looking at more detailed radiology reports and whether they are useful for targeting pain management. Neither does the other study, PPACT, involve CAM. It’s yet another example of how CAM appropriates and “rebrands” science-based therapies as somehow being “alternative.” These trials are not inappropriate, but no doubt NCCAM will be funding trials that are, such as “pragmatic” trials of acupuncture and various other forms of placebo medicine.

As bad as what Dr. Briggs argues is, at least she still respects the importance of RCTs and clinical trials, whether they be tightly controlled efficacy trials or “pragmatic” effectiveness trials. Dean Ornish, on the other hand, was asked what scientific idea is ready for retirement. Guess what scientific idea he picked? You guessed it: The randomized clinical trial:

It is a commonly held but erroneous belief that a larger study is always more rigorous or definitive than a smaller one, and a randomized controlled trial is always the gold standard . However, there is a growing awareness that size does not always matter and a randomized controlled trial may introduce its own biases. We need more creative experimental designs.

None of this is anything that clinical trialists don’t already know and recognized. Does Ornish think that clinical trialists are unaware of the perils and pitfalls of RCTs? Seriously? So contemptuous is he of clinical trialists that he presumes to lecture on what, exactly, statistical significance in a clinical trial means and what a p-value less than 0.05 means. Thanks a lot Dean. I never would have figured that out myself. Seriously. Thanks for that and for pointing out that variables are not necessarily independent and that their lack of independence can, if not adjusted for, result in confounding factors. Clinical trialists never would have thought of that! No, it’s not as though there aren’t all sorts of methodologies to try to prevent such biases and to account for the ones that can’t be eliminated.

It doesn’t take Ornish long to reveal what his real agenda is about, though. Basically, what he is saying is nothing more than the old, tried-and-not-so-true trope of CAM advocates everywhere that “your randomized trials are not up to proving that my woo works.” Or, as I like to put it, I reject your reality and substitute my own because I know my woo works. RCTs can’t persuade me otherwise. I’ll show you what I mean:

For example, a RCT may be designed to determine if dietary changes may prevent heart disease and cancer. Investigators identify patients who meet certain selection criteria, e.g., that they have heart disease. When they meet with prospective study participants, investigators describe the study in great detail and ask, “If you are randomly-assigned to the experimental group, would you be willing to change your lifestyle?” In order to be eligible for the study, the patient needs to answer, “Yes.”

However, if that patient is subsequently randomly-assigned to the control group, it is likely that this patient may begin to make lifestyle changes on their own, since they have already been told in detail what these lifestyle changes are. If they’re studying a new drug that only is available to the experimental group, then it is less of an issue. But in the case of behavioral interventions, those who are randomly-assigned to the control group are likely to make at least some of these changes because they believe that the investigators must think that these lifestyle changes are worth doing or they wouldn’t be studying them.

Or, they may be disappointed that they were randomly-assigned to the control group, and so they are more likely to drop out of the study, creating selection bias.

Thank you, Dr. Ornish. Thank you so much. None of us poor, benighted clinical investigators would have ever figured out that these problems with RCTs exist were it not for your fantastic wisdom in pointing it out to us! OK, OK, I know I’m being snarky. Ornish is writing for a general audience, but that doesn’t excuse him from making it sound as though these problems are insurmountable defects in RCTs that doom them to extinction as no longer useful in this brave new genomic world.

In all seriousness, though, does Dr. Ornish really think that clinical trialists are unaware of these particular pitfalls of clinical trials of dietary or lifestyle interventions? Does he seriously argue that they haven’t developed methods to compensate for them? To correct for them? One might think that Ornish thinks that clinical trialists have never thought of these issues or considered these problems. Or maybe—just maybe—he thinks that clinical trialists have never come up with methodology that could correct for these biases, shortcomings, and difficulties in making sure there are adequate controls. True, RCTs are not perfect, but no one is more aware of that than the people who actually design them and carry them out. Particularly hilarious is the part where Ornish points out that small trials “may be more likely to show significant differences between groups than a large one.” Well, duh. Double “Well, duh!” when it comes to Ornish’s conclusion:

We need new, more thoughtful experimental designs and systems approaches that take into account these issues. Also, new genomic insights will make it possible to better understand individual variations to treatment rather than hoping that this variability will be “averaged out” by randomly-assigning patients.

One wonders if Ornish means something like the I-SPY 2 trial, where the trial design evolves based on the previous results? Would that do it? Or would any of the other many innovations in RCT design developed over the last several years be enough for Dr. Ornish?

One example that Ornish uses as an example of the “failure” of clinical trials is the Women’s Health Initiative. He’s unhappy because the study reported that the dietary interventions tested (reduction of dietary fat, an increase in the amount of fruits, vegetables, and whole grains in the diet) didn’t protect against heart disease or cancer:

However, the experimental group participants did not reduce their dietary fat as recommended—over 29 percent of their diet was comprised of fat, not the study’s goal of less than 20 percent. Also, they did not increase their consumption of fruits and vegetables very much. In contrast, the control group reduced its consumption of fat almost as much and increased its consumption of fruits and vegetables, diluting the between-group differences to the point that they were not statistically significant. The investigators reported that these dietary changes did not protect against heart disease or cancer when the hypothesis was not really tested.

Dr. Ornish overstates his case a bit. Here are the three papers (1, 2, 3) reporting the results Dr. Ornish cites. The results are actually nicely summarized in this NIH press release. While it’s true that the women didn’t reach the goal of 20% of their calories, it’s a bit of an exaggeration to say that the control group decreased its intake of total fat by as much. For instance, at year one, the experimental group reduced their percentage from 37.8% to 24.3%, and the difference between the groups remained 8.1% at 6 years. That’s not perfect, but it’s pretty good. What it indicates to me more than anything else is that dietary interventions are hard. Very hard. It’s very difficult to change one’s diet over the long term. I think that what Ornish is actually complaining about but wasn’t specific about, was that the differences in specific kinds of fat intake were much lower; for example, the decrease in intake of saturated fats was only 2.9%. So, while Ornish has a point to some extent, he uses that point to drive straight off the cliff.

He also neglects to mention some rather important findings. For example, there was a trend towards a decrease in breast cancer risk noted; the 9% difference reported just didn’t achieve statistical significance at followup available (8.1 years on average). If the results were to be reported now, given the trend, it’s quite possible, in line with other studies, that the decrease in breast cancer risk would be statistically significant. Cancer takes decades to develop; it’s unreasonable to expect that decreased risk from cancer from a dietary intervention will become apparent in much less time. Ditto the risk of colorectal cancer, which was also decreased but the difference hadn’t achieved statistical significance as of 2006. In fact, this brings up an unspoken assumption on Ornish’s part, which is embodied in this paragraph by him:

Paradoxically, a small study may be more likely to show significant differences between groups than a large one. The Women’s Health Initiative study cost almost a billion dollars yet did not adequately test the hypotheses. A smaller study provides more resources per patient to enhance adherence at lower cost.

Did you figure out what the unspoken assumption is? It’s this: That the effect of dietary and lifestyle interventions will be so large that they will be detectable in much smaller studies in which the dietary interventions are more rigorously enforced. There’s a reason why the Women’s Health Initiative dietary intervention study involves nearly 50,000 women. It’s because the differences were not expected to be really dramatic, and they weren’t. Even if future reports show a statistically (and clinically) significant difference, as of eight years they hadn’t. Doing smaller studies, as Ornish advocates, would be more likely to result in a bunch of equivocal or negative studies because they wouldn’t have the statistical power to detect the differences. Maybe that’s what Ornish wants.

Of course, Ornish is no different than CAM advocates in that he’s unhappy that the clinical trials being done on “alternative” interventions have been so unremarkable in their results, with either no effect above placebo for the hard core alternative interventions or generally at best modest effects on SBM that’s been “rebranded” as CAM, such as diet and lifestyle interventions. RCTs are not showing what they had hoped for their favorite CAM or rebranded CAM; so they try special pleading, declaring that RCTs are not up to studying their woo or, as Ornish declares, that the very concept of the large RCT needs to be “retired.” It’s quite a convenient argument if RCTs don’t show what you want them to show about your favorite woo.