Yes, Virginia, unlike CAM, science-based medicine does change based on science

“Alternative medicine,” so-called “complementary and alternative medicine” (CAM), or, as it’s become fashionable to call it, “integrative medicine” is a set of medical practices that are far more based on belief than science. As my good bud and collaborator Mark Crislip so pointedly reminded us last week, CAM is far more akin to religion than science-based medicine (SBM). However, as I’ve discussed more times than I can remember over the years, both here and at my not-so-super-secret-other blog, CAM practitioners and advocates, despite practicing what is in reality mostly pseudoscience-based medicine, crave the imprimatur that science can provide, the respect that science has. That is why, no matter how scientifically implausible the treatment, CAM practitioners try to tart it up with science. I say “tart it up” because they aren’t really providing a scientific basis for their favored quackery. In reality, what they are doing is choosing science-y words and using them as explanations without actually demonstrating that these words have anything to do with how their favored CAM works.

A more important fundamental difference between CAM and real medicine is that CAM practices are not rejected based on evidence. Basically, they never go away. Take homeopathy, for example. (Please!) It’s the ultimate chameleon. Even 160 years ago, it was obvious from a scientific point of view that homeopathy was nonsense and that diluting something doesn’t make it stronger. When it became undeniable that this was the case, through the power of actually knowing Avogadro’s number, homeopaths were undeterred. They concocted amazing explanations of how homeopathy “works” by claiming that water has “memory.” It supposedly “remembers” the substances with which it’s been in contact and transmits that “information” to the patient. No one’s ever been able to explain to me why transmitting the “information” from a supposed memory of water is better than the information from the real drug or substance itself, but that’s just my old, nasty, dogmatic, reductionist, scientific nature being old, nasty, dogmatic, reductionist, and scientific. Then, of course, there’s the term “quantum,” which has been so widely abused by Deepak Chopra, his acolytes, and the CAM community in general, while the new CAM buzzword these days to explain why quackery “works” is epigenetics. Basically, as I pointed out earlier this week, whenever a proponent of alternative medicine uses the word “epigenetics” or “quantum” to explain how an alternative medicine treatment “works,” what he really means is, “It’s magic.” This is a near-universal truth, and even the most superficial probing of such justifications will virtually always reveal magical thinking combined with an utter ignorance of the science of quantum mechanics or epigenetics.

So, yes, much of CAM is either very much more like religion than science in that CAM is immune to evidence. True, the scientific “explanations” change, and CAM practices might evolve at the edges based on evidence, but the core principles remain. You don’t see, for example, homeopaths or naturopaths deciding that homeopathy doesn’t work because science and clinical trials overwhelmingly show that it is nonsense. You don’t see chiropractors leaving chiropractic in droves because they’ve come to the realization that subluxations don’t exist and they can’t cure allergies, heart disease, gastrointestinal ailments (or anything else) but rather are in reality physical therapists with delusions of grandeur. Ditto reiki, acupuncture, therapeutic touch, and “energy healing.” These practices persist despite overwhelming evidence that they do not work and are based on magical thinking, not science. All of the scientific studies and clinical trials funded by NCCAM and other CAM-friendly organizations never actually take the next step from all the negative studies of CAM and come to the conclusion that they should stop using such modalities.

No one is saying that the record of SBM is perfect when it comes to changing nimbly with new evidence, and any imperfection in the record of SBM and evidence-based medicine (EBM) actually being, well, science- and evidence-based, is a favorite target of CAM apologists. Hence there are frequent claims circulating that only 15% of medicine is actually evidence-based. It’s a bogus claim, a myth, as Steve Novella has pointed out. In reality, studies appear to converge on estimates that approximately 80% of interventions are based on compelling evidence, and between 30-60%, depending on the specialty, are based on randomized clinical trials. That’s not good enough, but it’s far better than CAM apologists would lead you to believe, and it’s certainly far better than anything in CAM.

Nonetheless, it has been recognized for a long time that EBM/SBM is sometimes slow to change in response to new evidence. Indeed, there was an aphorism I heard while in medical school that outdated treatments and procedures don’t die off completely until the physicians who learned them during residencies or fellowships die off. I learned that that’s not entirely true. There is, after all, a gap of around 20 years between the time a generation of physicians retires and dies off; so such practices actually die off much sooner. I keed, I keed, of course, but the point is valid.

There is the opposite problem in EBM/SBM as well, namely a tendency towards a “bandwagon” effect wherein a new therapy is widely adopted before there is solid evidence of its superiority (or at least of its non-inferiority with alternate benefits). I’m a surgeon, so I know that, unfortunately, the surgical world is very much prone to this sort of problem. Surgeons tend to like shiny, pretty new toys and to do spiffy new procedures that prove that they are the biggest, baddest scalpel cowboys in the all the land. These tendencies have led to a number of procedures becoming widely adopted before they were definitely shown to be superior. Laparoscopic cholecystectomy is the example that I like to use the most; it swept the surgical world over 20 years ago without compelling evidence for its safety. Later, it was found that the incidence of common bile duct injury was much higher after laparoscopic cholecystectomy than conventional cholecystectomy. That incidence fell as more surgeons became more facile at the procedure, but it was years before there was compelling evidence that the laparoscopic approach was truly superior. History seems to be repeating itself today with robotic surgery. At the risk of offending some of my surgical colleagues, I’ve yet to see compelling evidence that doing, for example, a radical prostatectomy with the da Vinci robot is truly superior to doing it using what was the new way ten or fifteen years ago but is now the old way, using laparoscopy. From my perspective evaluating existing evidence, the da Vinci is as safe and effective as laparoscopy, but if it is sufficiently more so to justify its much greater cost I haven’t seen the evidence yet. I sometimes joke that if it were possible to do breast surgery (my specialty) with the da Vinci, then I’d be all for it. Maybe I’ll have to look into that. I could be bigger than Armando Guiliano, and time’s wasting. I probably only have 15 or 20 years left in my career to make an international name for myself.

Yeah, that’s the ticket. I think I’ll sign myself up for a course in using the da Vinci robot and then figure out how to use it for breast surgery. I’m sure it’ll be a hit. So what if a lumpectomy and sentinel lymph node biopsy takes six hours instead of less than an hour and a half to do?

That is sarcasm, in case anyone’s thinking I’m serious.

But how often are medical practices found to be ineffective and abandoned? How much do we test existing practices in light of new data? There have been a number of studies looking at this issue, which is already a marked contrast to CAM, where ineffective practices are, as far as I can tell, never abandoned. The most recent of these caught my eye last week. Published in the Mayo Clinic Proceedings by a team from the National Cancer Institute, the University of Chicago (one of my alma maters!), Northwestern University, George Washington University, and Lankenau Medical Center and entitled A Decade of Reversal: An Analysis of 146 Contradicted Medical Practices, this study seeks to get a handle on the answer to that very question for these reasons:

We expect that new medical practices gain popularity over older standards of care on the basis of robust evidence indicating clinical superiority or noninferiority with alternative benefits (eg, easier administration and fewer adverse effects). The history of medicine, however, reveals numerous exceptions to this rule. Stenting for stable coronary artery disease was a multibillion dollar a year industry when it was found to be no better than medical management for most patients with stable coronary artery disease.1 Hormone therapy for postmenopausal women intended to improve cardiovascular outcomes was found to be worse than no intervention,2 and the routine use of the pulmonary artery catheter in patients in shock was found to be inferior to less invasive management strategies.3 Previously, we have called this phenomenon (when a medical practice is found to be inferior to some lesser or prior standard of care) a medical reversal.4, 5, 6 Medical reversals occur when new studies—better powered, controlled, or designed than their predecessors—contradict current practice.4 In a prior investigation of 1 year of publications in a high-impact journal, we found that of 35 studies testing standard of care, 16 (46%) constituted medical reversals.4 Another review of 45 highly cited studies that claimed some therapeutic benefit found that 7 (16%) were contradicted by subsequent research.7

Identifying medical practices that do not work is necessary. The continued use of such practices wastes resources, jeopardizes patient health, and undermines trust in medicine. Interest in this topic has grown in recent years. The American Board of Internal Medicine launched the Choosing Wisely campaign,8 a call on professional societies to identify the top 5 diagnostic or therapeutic practices in their field that should not be offered.9 In England, the National Institute for Health and Clinical Excellence has tried to “disinvest” from low-value practices, identifying more than 800 such practices in the past decade.10 Other researchers have found that scanning a range of existing health care databases can easily generate more than 150 low-value practices.11 Medical journals have specifically focused on instances in which more health care is not necessarily better. The Archives of Internal Medicine created a new feature series in 2010 entitled “Less is More.”12

One can’t help but note right from the introduction of this paper that SBM/EBM does continually reevaluate its practices and treatments, testing which ones work and which ones do not and comparing current practice against new treatments. Granted, the intensity of this effort seems to be a more recent development, with the implementation of the Patient Protection and Affordable Care Act, but is it really? This article suggests that the answer is: perhaps not.

The authors specifically examine the question of how much of the medical literature consists of what they refer to as “medical reversals,” as described above. Specifically, they tried to estimate what percentage of the medical literature consists of articles that question current medical practice, particularly that consist of high quality evidence suggesting that current practice needs to be changed or that a standard-of-care intervention doesn’t work, doesn’t work as well as a non-standard-of-care intervention, or is actually harmful. How the authors did this, I find easier to let them describe:

Two reviewers (C.T., A.V., M.C., J.R., S.Q., S.J.C., D.B., V.G., or S.S.) and V.P. read articles addressing a medical practice in full. On the basis of the abstract, introduction, and discussion, articles were classified as to whether the practice in question was new or existing. Methods were classified as one of the following: randomized controlled trial, prospective controlled (but nonrandomized) intervention study, observational study (prospective or retrospective), case-control study, or other methods. End points for articles were classified into those that reached positive conclusions and those that found negative or no difference in end points. Lastly, articles were given 1 of 4 designations. Replacement was defined as a new practice surpassing an older standard of care. Back to the drawing board was defined as a new practice failing to surpass an older standard. Reversal was designated when a current medical practice was found to be inferior to a lesser or prior standard. Reaffirmation was defined as an existing medical practice being found to be superior to a lesser or prior standard. Finally, articles in which no firm conclusion could be reached were termed inconclusive. The designation of an article was also performed in duplicate. When there were differences in opinion between the 2 reviewers, adjudication first involved discussion between the 2 readers to see whether agreement could be reached. If disagreement persisted, a third reviewer (A.C.) adjudicated the discrepancy. Less than 3% of articles required discussion, and less than 1% required adjudication. A table detailing each medical reversal was constructed (Supplemental Appendix; available online at http://www.mayoclinicproceedings.com), and the third reviewer (A.C.) reviewed all reversals.

So what did the investigators (Prasad et al) find? They examined ten years’ worth of NEJM original reports, from 2001 through 2010, for a total of 2,044 original articles. Of these, 1,344 (65.8%) addressed a medical practice, of which 911 (68%) were randomized controlled trials, 220 (16%) were prospective controlled but non-randomized studies, 117 (9%) were observational studies, 43 (3%) were case-control studies, and 53 (4%) used other methods. Of these 1,344 reports, 981 (73%) studied a new medical practice, while 363 (27%) addressed an existing practice. Overall, 756 articles (56%) found that a new practice surpassed the existing standard of care at the time (replacement), while 165 (12%) failed to find that a new practice was better than existing practices. In terms of what we’re really interested in, of the 363 studies examining an existing practice, 146 studies (40%) were reversals, while 138 (38%) upheld standard practices. Here’s a breakdown from the article for your edification:

JMCP_xxx_Figure01

Of the reversal articles, not surprisingly most (76%) turned out to be randomized clinical trials, and interestingly, the percentage of each type of trial didn’t change much over the decade-long study period:

JMCP_xxx_Figure02

The one problem I had with this study was that it only looked at one journal: The New England Journal of Medicine. I can understand why the authors might have chosen that particular journal. It’s very high impact, and, with the exception of a recent distressing tendency to let some low quality CAM articles slip in, one of the more rigorous medical journals out there that isn’t a specialty journal; i.e., it accepts articles covering all areas of medicine. It’s not a basic science journal; it generally only publishes original studies that are either clinical trials, epidemiological studies, or at the very least highly translational. It also, from my reading, only rarely publishes really preliminary clinical work, such as phase I clinical trials. On the other hand, one has to wonder whether the results would be generalizable to the rest of the medical literature.

For example, according to this study, articles in the NEJM that tested new practices were far more likely to find them beneficial than articles that tested existing ones (77.1% vs 38.0%), while articles that tested existing standard-of-care practices were far more likely to find those practices ineffective than articles testing new practices (40.2% vs 17.0%). Looking at such numbers, I can’t help but wonder if there is a publication bias for finding new therapies effective and/or for finding existing therapies either ineffective or harmful, particularly in the NEJM, which is among the highest of high-impact medical journals. Think about it. Who thinks that their findings are substantial enough and interesting enough to be seriously considered for publication in the NEJM? It’s investigators who have found that some new therapy works for a common or very serious disease, but it wouldn’t surprise me if it’s also authors who have found compelling evidence that a commonly used existing standard of care is either not effective or is even dangerous.

It’s also informative to look at some of the medical practices that were the subject of reversal articles. For instance, it was thought that certain vaccinations could increase the risk of relapse in multiple sclerosis, but two studies showed no increased risk. One looked at tetanus, hepatitis B, and influenza vaccination; the other at hepatitis B vaccination. One showed that delayed drainage of effusion in otitis media did not result in worse outcomes than immediate placement, resulting in a change in practice. Another key reversal came in the form of a 2003 study that showed that high-dose chemotherapy followed by bone marrow transplantation did not improve survival in advanced breast cancer. This was a huge one, and almost immediately oncologists stopped doing bone marrow transplants for breast cancer. Another showed that the use of pulmonary artery catheters in acute lung injury didn’t improve outcomes and was associated with more complications. (When I was a resident in the 1990s, all of these patients got pulmonary artery catheters.) A couple of these I’ve written about, such as vertebroplasty. More recently, there was a study that showed no benefit to routine PSA screening for prostate cancer in American men.

Indeed, I can’t help but mention here that the whole reevaluation of routine screening for cancer, such as PSA screening for prostate cancer and mammography for breast cancer, topics I’ve written about numerous times for this blog, are examples of exactly that: SBM/EBM evaluating current practices in light of new data and determining whether they should be changed or abandoned. Routine PSA screening for men at average risk of prostate cancer has more or less been abandoned, for example, while current mammography practices are being questioned as promoting too much overdiagnosis and likely will evolve in response.

Perhaps the most prominent example of the efforts EBM/SBM makes to continually reevaluate its practices is the Choosing Wisely initiative. Scott Gavura brought it up last year, and I’ve discussed it in depth myself. It’s an amazing effort, in which major medical societies have made a concerted effort to identify the top five “low value” tests members of their specialties routinely use and then try to get doctors to stop doing them. You will never—never—see CAM doing such a thing, mainly because the CAM practices that have value are in reality “rebranded” SBM, such as nutrition and exercise, and the practices that are really “alternative” are virtually universally “low value.” Actually, they’re of no value, most being based on long-disproven prescientific notions of disease.

One reason why EBM/SBM is slower than we might like to eliminate outdated and ineffective practices is simple. It’s not easy. Evidence from science, epidemiology, and clinical trials takes a long time to come in. It’s often very messy. When a practice comes into question, there will often be conflicting evidence, and it often takes a number of studies before conclusions about the practice firm up to the point where they are incorporated into evidence-based guidelines and become standard of care.

Often, practices that are later reversed come into usage based on premature and inadequate evidence. Often, small trials look promising, and physicians start using a treatment based on them. Sometimes such practices become standard based on short term outcome measures, and when long term data become available previously unsuspected harms become apparent. Sometimes it’s excessive confidence in the appropriateness of the proposed mechanism used to explain why the treatment should work. What is needed, according to Prasad et al (and I agree), is more rigor:

As such, we favor policies that minimize reversal. Nearly all such measures involve raising the bar for the approval of new therapies6, 83, 84 and asking for evidence before the widespread adoption of novel techniques. In all but the rarest cases,82 large, robust, pragmatic randomized trials measuring hard end points (with sham controls for studies of subjective end points) should be required before approval or acceptance. Our position is in contrast to efforts to lower standards for device and drug approval,85 which further erodes the value of the regulatory process.

One can’t help but note that this is in marked contrast to CAM studies, in which CAM advocates ask us to accept much less rigorous types of evidence to accept modalities. As Steve Novella has frequently pointed out, as rigorous randomized clinical trials show that most CAM interventions are no better than placebo, the refrain we frequently hear is that we should look at “pragmatic” trials. In this context, pragmatic doesn’t mean the same thing. What Prasad et al are referring to are randomized trials that reflect real-world practices. What I mean by “pragmatic” trials in the context of acupuncture are more observational trials of how the treatment is used in the real world. As I’ve said many times, this is putting the cart before the horse. Normally pragmatic trials are done for treatments that have already been shown to be efficacious in randomized clinical trials. They can’t show efficacy by themselves. They are designed to test how treatments already shown to be efficacious in randomized trials function once let “out into the wild” (i.e., the real world). Frequently, outside the rarified, rigorous world of randomized clinical trials, treatments are less effective.

It should also be pointed out that, just because a treatment was “reversed” in a clinical trial doesn’t necessarily mean that the older practices reversed were wrong. However, as Prasad et al put it:

The reversals we have identified by no means represent the final word for any of these practices. Simply because newer, larger, better controlled or designed studies contradict standard of care does not necessarily mean that older practices are wrong and new ones are right. On average, however, better designed, controlled, and powered studies reach more valid conclusions.94 Nevertheless, the reversals we have identified at the very least call these practices into question. Some practices ought to be abandoned, whereas others warrant retesting in more powerful investigations. One of the greatest virtues of medical research is our continual quest to reassess it.

So, yes, “conventional” medicine doesn’t always get it right. Occasionally it gets it wrong, on rare occasions spectacularly wrong. But unlike most CAM modalities, EBM/SBM is self-correcting. It actually does abandon treatments that don’t work. The process might be messy and ugly at times, but it does happen. For example, many years ago, angina pectoris was sometimes treated with a surgical procedure known as mammary artery ligation. The idea was that tying off these arteries would divert more blood to the heart. The operation became popular on the basis of relatively small, uncontrolled case series. Then, two randomized, sham surgery-controlled clinical trials were published in 1959 and 1960. Both of these trials showed no difference between bilateral internal mammary artery ligation and sham surgery. Very rapidly, surgeons stopped doing this operation. A similar example is one I mentioned above: bone marrow transplantation for advanced breast cancer, which was similarly rapidly abandoned after randomized clinical trials showing it to be no better than the previous standard of care. I’m not saying that this happened without conflict or disagreement; proponents of these therapies can always find reasons to discount the clinical trial evidence. But in the end evidence and science do eventually win out.

Now compare this to CAM practices. Can anyone name a CAM treatment that was abandoned by CAM practitioners as a result of research and randomized clinical trials showing that it doesn’t work? A single one? I can’t, but I don’t claim comprehensive knowledge; so if anyone can answer my question, please do.

In the meantime the abandonment of therapies based on science and evidence showing they don’t work or that they work far less effectively than previously thought is the key difference between CAM and EBM/SBM. The day that I see a CAM practice go extinct, like bilateral internal mammary artery ligation for angina pectoris, is the day that I might start to take CAM practitioner claims that they are science-based seriously. I doubt that I will see such a thing happen in my lifetime. I doubt it will happen in the lifetime of the current generation of medical students. In fact, I doubt that it will ever happen, because CAM is based far more on belief than science.