Over the years, the criticism of “evidence-based medicine” (EBM) that I have repeated here and that I and others have repeated at my not-so-super-secret other blog is that its levels of evidence relegate basic science considerations to the lowest level evidence and elevate randomized clinical trial evidence to the highest rung, in essence fetishizing it above all, a form of thinking that I like to call methodolatry. Now, when EBM works correctly, this is not an entirely unreasonable way to look at things. After all, we just want to know what works in patients. Basically, when EBM is working properly, its underlying assumption is that treatments don’t reach the level of double-blind randomized clinical trials (RCTs) without having first gone through several steps first, beginning with basic science considerations, progressing to early stage clinical trials, and then finally reaching the stage of large RCTs. In other words, preclinical studies (basic biochemistry and animal studies) produce biological plausibility that justifies testing a new drug or treatment in clinical trials.
Another important point is that basic science alone can’t demonstrate efficacy. However, it can show that a proposed treatment is so implausible based on its purported mechanism of action as to be utterly not worth testing in RCTs, particularly given that clinical equipoise is essential in clinical trials. Think of it this way. For homeopathy to be a valid treatment, our understanding of physics and chemistry would have to be not just wrong, but spectacularly wrong. While it is theoretically possible for scientists to be so wrong about such fundamental physical laws and theories, what’s more likely, that scientists are so completely wrong about laws and theories that rest upon a very solid evidence base or that homeopathy is bunk? The same thing can be said of mystical modalities like “energy healing.” For example, no one has ever demonstrated the existence of this “life energy” that is redirected to heal or the “universal source” that supposedly provides the energy that reiki masters claim to be able to use to heal. In a nutshell, basic science can tell us that it is, for all practical intents and purposes, impossible that a treatment can work. Basically, it can tell us a treatment doesn’t work or can’t work, but it can’t by itself tell us if a treatment works. Thus has the phenomenon of quackademic medicine entered medical academia through the blind spot in EBM, namely its assumption that a treatment won’t reach the stage of RCTs without first having “proven its plausibility” through preclinical basic science investigation. In a sense, EBM was blindsided by “complementary and alternative medicine” (CAM), which is why I’ve supported the concept of science-based medicine (SBM), which takes into account prior plausibility of proposed treatments.
So it was that I took a lot of interest in an article by a woman who is the director of a not infrequent topic of this blog, namely the misbegotten NIH center known as the National Center for Complementary and Alternative Medicine (NCCAM), a branch of the NIH that studies magic. Ever since I first became aware of its existence, I’ve kept an eye on the NCCAM researchblog, where Josephine Briggs, MD, the director of NCCAM, occasionally posts. This time around Dr. Briggs tackles the plausibility issue head-on and comes out of it worse for the wear, so poor are her arguments in a post entitled Bayes’ Rule and Being Ready To Change Our Minds. Regular readers might remember that, basically, applying Bayesian analysis to an RCT involves assigning a prior probability that a clinical trial is likely to be positive and using that estimate to weight the statistics. Basically, the lower the prior probability, the less likely that a “positive” trial (with the classic p-value less than 0.05) is to represent a “true” positive.
Where Bayesian considerations are most useful in the discussion of CAM is for modalities that have very low prior probabilities, particularly those that are about as close to zero as you can imagine, like homeopathy, reiki, acupuncture, and the like. If you take Bayes’ rule into account, the “positive” RCTs touted by CAM practitioners and quackademics are almost certainly not “true positives.” Rather, they’re false positives, noise. It’s also important to point out that we’re not arguing over whether a treatment with an estimated prior probability of, say, 10% is too “improbable” to be worth testing. It probably isn’t. What we’re talking about is something like homeopathy, whose pre-trial probability, based on the sheer scientific nonsensicalness of its purported mechanism is so close to be zero as to be, for all intents and purposes, indistinguishable from zero.
This brings us back to Dr. Briggs, who’s flogging a study that has also been a fairly frequent topic of this blog over the years, namely the Trial To Assess Chelation Therapy (TACT) trial, a trial that tested chelation therapy, a common quack therapy used by naturopaths and others to treat cardiovascular disease, for its effects on cardiovascular complications and death. For the gory details of why this $30 million boondoggle was a complete waste of taxpayer money that endangered patients testing a treatment with close to zero prior plausibility, you can go back and read Kimball Atwood’s criticism of the trial design itself and my discussions of the completely underwhelming results here, here, here, and here. (If you doubt me, you really should check out Dr. R. W. Donnell’s Magical Mystery Tour of NCCAM Chelation Study Sites, Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, and Part 7, asking yourself if you would trust any data coming from such sites. As Dr. Donnell points out, only 12 of the 110 TACT study sites were academic medical centers. Many of the study sites were highly dubious clinics touting highly dubious therapies, including heavy metal analysis for chronic fatigue, intravenous infusions of vitamins and minerals (I could never figure out how infusing minerals could be reconciled with chelation therapy to remove minerals, but that’s just me), anti-aging therapies, assessment of hormone status by saliva testing, and much more. Dr. Donnell also points out that the blinding of the study groups to local investigators was likely to have been faulty. So right off the bat, this study was dubious for so many reasons, not the least of which was that some of its site investigators were felons, a problem blithely dismissed by the NIH as being in essence irrelevant to whether the study could be done safely.
For those who aren’t inclined to click on a bunch of links and read fairly lengthy deconstructions, there were many problems with the design of the study, and the study turned out to be basically a negative study, with no decrease in a composite outcome of aggregated cardiovascular events in nondiabetics and a questionable (at best) claimed improvement found by pre-specified subgroup analysis in cardiovascular outcomes only in diabetics. As I’ve pointed out before, if CAM practitioners consider this study valid, they would have stopped using chelation therapy for cardiovascular disease in nondiabetics right away. They didn’t, and the results in diabetics were not persuasive for a number of reasons.
So what does Dr. Briggs have to say about this trial?
She begins by contrasting TACT to what usually happens, namely that treatments expected to be useful often fail to show evidence of efficacy in clinical trials, saying that every once in a while “the opposite happens.” Her lead-in to TACT thus established, she cites the most recent publication based on TACT. There have already been multiple publications, a clear example of what I like to call publishing the MPU (minimal publishable unit), and this looks like yet another reanalysis of the very same data coming to the same conclusion. Seriously, this is touted as a “factorial analysis,” but it’s the same vinegary wine in a different bottle, which is why I don’t plan on doing a particularly deep analysis of the paper. Despite the authors touting the results as showing a benefit in the entire TACT experimental population (which sure sounds to me like post hoc analysis compared to the first primary analysis published in JAMA), the authors simply repeat the claim that the results are especially compelling in diabetics. I’ve looked at these before before for the previous MPUs from TACT. Suffice to say that there were no statistically significant differences in all-cause mortality, MI, stroke, or hospitalization for angina. There was a statistically significant difference in coronary revascularization, but what that means is uncertain, particularly given that if there was any failure of blinding patients in the treatment group might be less likely to be referred for mild symptoms. Only for the same composite endpoint, a mixture of a “hard” endpoint like death and much “softer” endpoints subject to judgment calls, such as hospitalization and coronary revascularization, showed statistically significant differences. Be that as it may, Dr. Briggs seems inordinately and unjustifiably impressed by the results:
The authors found that those receiving the active treatment clearly fared better than those receiving placebo. The accompanying editorial in the AHJ reminds readers about the value of equipoise and the need to “test our beliefs against evidence.”3
Most physicians did not expect benefit from chelation treatment for cardiovascular disease. I readily admit, initially, I also did not expect we would find evidence that these treatments reduce heart attack, strokes, or death. So, the evidence of benefit coming from analyses of the TACT trial has been a surprise to many of us. The subgroup analyses are suggesting sizable benefit for diabetic patients—and also, importantly, no benefit for the non-diabetic patient. Clearly subgroup analyses, even if prespecified, do not give us the final answer. But it is also clear that more research is needed to test these important findings.
No. It. Is. Not.
Dr. Steven Nissen explained why in an editorial that accompanied one of the first MPUs published from TACT. I’ve explained why ad nauseam in the links I’ve included above. This was, in essence, a negative trial. Indeed, its results were as both Kimball Atwood and I predicted: Negative overall but with one subgroup with a suggestion of benefit that lets the authors claim that “more research is needed,” which is the same thing that Dr. Briggs is saying. She has to, given that TACT began at NCCAM (albeit before her tenure began) before being taken over by NHLBI. Of course, let’s say that, against all reason, you take TACT and its findings at face value. Remember that the trial took $30 million and a decade to do. Are the TACT findings reported by Gervasio Lamas and colleagues, even if completely reliable, compelling enough to justify spending a similar amount of money to follow up. Even considering that if a future study is limited to just diabetic patients and could thus be smaller, we’re still talking several million precious research dollars, minimum, to do the followup study, probably at least $10 million or more. Do the findings of TACT justify such an expenditure. I argue that they most definitely do not. It would be, at best, investing a lot of money to study a question that is simply not that compelling and not that likely to help very many people (under the most charitable interpretation of the results) and, at worst, throwing good money after bad, endangering more patients in the process and thus destroying equipoise.
Dr. Briggs then makes an argument that, while seeming persuasive on the surface, is actually less so if you look at it closely:
And TACT findings are indeed a reminder of the importance of retaining equipoise, seeking further research aimed at replicating the findings, and neither accepting nor rejecting findings based on personal biases. The scientific process is designed to weed out our preconceived notions and replace them with evidence.
Note the not-so-subtle implication that critics of TACT are rejecting its findings based not on their amazing unimpressiveness, coupled with the very low prior plausibility, but rather because of “personal biases” and how the scientific method (as represented by TACT, naturally) will weed out those “preconceived notions” and replace them with evidence. Dr. Briggs is very obviously trying to paint critics of TACT as unscientific zealots with an ax to grind. To do this, she cleverly tries to reclaim Bayes for herself, knowing that Bayes is a frequent argument against not just TACT and chelation therapy for heart disease but against CAM itself:
Bayesian methods are getting a lot of attention in the clinical research literature these days. The Bayes rule involves estimating the probability of a result—the prior—then modifying it with each round of new evidence. Another editorialist, a statistician, examined the TACT results, using a Bayesian approach, and comments: “If we start from a position of skepticism, the results of the TACT trial reduces the degree of skepticism. This is exactly how Bayes analysis helps modify prior beliefs by incorporating new evidence and upgrading knowledge.”4
One paper that Briggs cites echoes the sentiment:
When evidence conflicts with expectations, the findings are typically discounted. This response is rational from a Bayesian perspective—if the pretest probability (read “pretrial beliefs”) is low, a positive test (trial) should revise the posttest probability upward, but the result is not conclusive. Scientific paradigms shift only after the weight of evidence builds up sufficiently to move from hypothesis to proven fact. There are some classic examples of clinical trials that overturned conventional wisdom. Postmenopausal estrogen therapy was believed to prevent coronary disease events based on observational studies, but randomized clinical trials showed harm rather than benefit.8 and 9 Antiarrhythmic drug therapy suppresses ventricular ectopy after MI and was therefore widely believed to reduce the risk of sudden cardiac death, but a randomized controlled trial showed that it did not.10 β-Blockers were contraindicated in heart failure until randomized controlled trials proved they were indicated.11, 12 and 13 There are many examples of interventions commonly used—or not used—in practice that failed to show the expected result when tested in carefully conducted randomized controlled trials. In the case of TACT, the intervention (chelation therapy) is not commonly used in practice and most physicians expected the trial to show no benefit, yet a benefit was seen. Either way, we should not let our biases blind us to the possibility that unexpected results might provide an important clue for a new approach.
It is critical to use the scientific method to test our beliefs against the evidence. Simply dismissing results that we did not expect would ignore opportunities to expand knowledge and the armamentarium of effective therapies. This latest report is a useful extension of the previously published work from TACT and should prompt new research to replicate the initial provocative findings and base decisions about chelation on strong scientific evidence, not on beliefs, either pro or con.
Um, no. Not quite. Yes, the results of a prior trial can modify Bayesian considerations for a future trial, but in reality the most parsimonious interpretation of the results of TACT is that chelation therapy for cardiovascular disease does not work. There was basically no effect on mortality, no effect on myocardial infarction, no effect on stroke, no effect on any individual cardiovascular outcome, and only a relatively marginal effect observed in diabetics that could well be spurious. This is thin gruel to put up against all the basic science that fails to find a plausible mechanism for chelation therapy in cardiovascular disease. Basically, Briggs is using a variant of the “science was wrong before” argument, while the authors of the editorial that she cites, David J. Maron and Mark A. Hlatky, mistakenly accept the TACT trial at face value, ignoring its inherent flaws and focusing on Bayes like a laser beam to dismiss TACT critics as hopelessly biased and so hostile to the thought of chelation therapy that we are upset by the results of the study. Believe me, I’m not particularly upset by the results of the study. Equivocal results that show up in only one subgroup or require some factorial prestidigitation to be demonstrated are exactly what critics of TACT predicted given the trial design and the problems in its implementation.
I often think: What would it take for me to believe that, for example, homeopathy works—or at least to start changing my mind? Given its incredible scientific implausibility, to me it would take an utterly undeniable result, such as cures of several patients with stage IV pancreatic cancer. Chelation therapy isn’t quite as implausible as homeopathy, because chelation, at least, doesn’t involve magic and the memory of water. It is, however, pretty damned implausible. The pharmacology doesn’t work. The mechanism is not remotely plausible. It’s basically a load of fetid dingo’s kidneys. So while Dr. Briggs is correct, as far as she goes, that a surprising result in a clinical trial can modify the plausibility calculations for further clinical trials, TACT just isn’t particularly persuasive that we should do so. To illustrate, let me just ask a single question: Should we add chelation therapy to the armamentarium of treatments used in cardiovascular disease, based on this study? Even Dr. Lamas doesn’t think this study is enough, nor does Dr. Briggs. Ask yourself this, also: Then should we do another trial costing many millions of dollars to nail down this result? The answer is obvious: No. There are lots of other pressing questions to study for which the funds could be better used.
Proponents of TACT or those who don’t know much about chelation who were surprised by the results of the study like to paint themselves as being open-minded and following “true science” while we nasty critics are portrayed as hopelessly biased and close-minded. In reality, proponents of TACT are being so open-minded that their brains have fallen out.
14 replies on “The director of NCCAM discovers Bayesian probability. Hilarity ensues.”
Do turkeys vote for Christmas?
Dr. Briggs and the NCCAM share the turkey’s predicament.
NCCAM have been testing CAM treatments long enough now to have sufficiently proven sceptics to be correct in their negative expectations of success when scientifically nonsensical – but popular – treatments are subjected to careful empirical scrutiny. At what point does NCCAM get wound up and we cease spending tax revenues testing dumb-ass treatments for which there is no reason whatsoever to suppose could out-perform placebo? At best.
Briggs’s attempted resurrection of chelation therapy smacks of the turkey’s fear of Christmas. We will keep testing our dead turkeys; one of them must eventually be found to be living. The fat turkey that is NCCAM no doubt wants to live forever too.
“The Camptown ladies sing this song
Doo-dah, doo-dah,
I bet my money on a bob-tailed nag
Somebody bet on the Bayes”
Doo-dah, indeed.
It was great to attend the Science and Engineering Festival in DC a few weeks ago. Other than the questionable Avon booth, there appeared to be only one booth that promoted pseudoscience: the NCCAM booth, which featured a vapid (and not engaging) “herb search” for the kids.
Here’s a smaller version The version I received had more of these herbs.
I’m sure that Dr. Briggs is trying to justify the continued existence of her little empire. However, she needs to be reminded of the First Rule of Holes: If you’re in one, stop digging. Instead, she’s asking for a bigger shovel.
OB XKCD: http://xkcd.com/1132/
Really? She’s citing Bayesian statistics, for CAM?
Prior plausibility is LOW. Far less than 1%. You check multiple measures, one comes up positive. Frequentist methods say, “There might be something there.” Bayesian methods say, “OK, the probability that this works has just ticked up a tiny bit. Still less than 1%.”
I’m about as impressed with her use of Bayesian statistics as I am with the average non-physicist’s use of “quantum.”
“Quantum” has to do with the fact that energy comes in discrete packets, it does not mean that you can do magic. Bayesian statistics says that a collection of negative evidences makes one piece of positive evidence weaker, not that a single piece of positive evidence throws out a collection of negative evidence.
I’ve brought up the appropriate application of Bayesian analysis in the process of validating Natural/ALT/CAM/Integrative modalities in many of the altie forums I frequent and irritate. It usualy generates a blank response!
And wasn’t there some talk back when this study first hit, that the slight apparent benefit for diabetics could be potentially explained by sugar pill placebos making things worse? In other words, that the treatment’s only apparent success was that it only did nothing?
I think NCCAM’s objections have little to do with science and much more, if not all, to do with full employment: logically, if they rule out a form of alternative medicine (which, it seems, they haven’t despite 20 years of research) then some staff would have to be laid off or some studies cancelled, leading to lower funding.
@Gary9
The entire enterprise is a sham. Those who do the testing are almost entirely practitioners of that which they are testing. They are true believers before they even plan their study. If they do perform a HQ study they don’t get a positive result. Do they then say, hey I’ve just discovered that what I’ve believed for ever is mistaken, that’s the end of that then.
No they do not. They wring their study dry squeezing out some fatuous conclusion and sign off with “more research is needed”. And so the practitioners carry on practising and testing their treatments year after year. And NCCAM carry on finding ways to avoid saying what a bloody load of nonsense CAM has been proven to be. US tax payers are being taken for a merry old ride.
The director of NCCAM discovers Bayesian probability.
…and NCCAM stops issuing grants.
No, wait, sorry, I confused ‘discovers’ with ‘understands’.
I’m going to make a slightly contrarian case here.
Bayesian priors appear to be an exercise in positing subjective value judgements as ratio-scale variables. That’s a big no-no, quite like trying to rate ‘love’ or ‘loyalty’ or whatever on a scale of 0 to 100. ‘How do I love thee? Let me count the ways,’ makes for touching Victorian poetry but does not translate to an objective measurement.
And worst of all, it isn’t even necessary, so it muddies the waters and gives quacks & their enablers an escape hatch of arguing over Bayes numbers.
Given a data set from a faith healing experiment, shall we argue whether it should be interpreted with an atheistic prior of 0, or a faith-based prior of 1, or perhaps an agnostic prior of 0.5? The atheist will find much favour from scientists but have no credibility with the religious public, thereby creating a backlash in support of faith healing. The theist’s situation is reversed but the religious public will take it as affirmation and demand laying on of hands in hospital. The agnostic is on solid ground empirically (since the existence of deities is not testable) but none the less will be criticised by the atheists and the religious public alike. Either way, the arguement will continue until the proverbial cows come home.
The only people to benefit from that, are the likes of Mercola and the Wizard named Oz.
Falsification of quackery should be entirely possible with frequentist methods, leaving nothing to be argued except possibly the details of methodology, that themselves should be purely objective (e.g. quantity of compound administered, reduction in measurable objective signs of illness, etc.).
Here we have a study of TACT that demonstrates insignificant results on most measures, and a slightly significant result in one area. If we apply a low prior to make the latter disappear, we only end up creating ground for arguement.
Or we can show that the significant outcome is irrelevant because other treatments that are well supported have much more significant outcomes. Why, after all, should anyone in their right mind seek out a treatment modality with questionable efficacy that is at best low, when there are other treatments with reliable empirical support and much higher efficacy?
The fact that Bayesian methods work for certain applications such as the maths used in cryptology, does not make them universally relevant. The frequentist case for dismissing TACT, and homeoquackery, and ‘energy healing,’ and the rest of that rubbish, is much stronger.
Jenora have made some important remarks here, I have similar thing in my mind,
Jenora:
And wasn’t there some talk back when this study first hit, that the slight apparent benefit for diabetics could be potentially explained by sugar pill placebos making things worse? In other words, that the treatment’s only apparent success was that it only did nothing?
@Lurker
I’m afraid I must disagree very strongly with you.
First your description of Bayesian priors and their setting is just wrong. It repeats the old
trope that somehow we are being ‘subjective’ and further more that being ‘subjective’ implies
‘making it up’. Perhaps in the history of Bayesian inference there has been an unfortunate
tendency to use the word ‘belief’ which has lead to this but really it comes from a ‘philosophical’ objection
by frequentists which is largely an objection to their own interpretation, which they stick to despite all
evidence that their approach is flawed.
You suggest that we shouldn’t use Bayesian inference because it appears to give our ‘opponents’ opportunity
for misuse and abuse. Indeed it does, but when we are doing science we don’t have ‘opponents’, we seek the truth
with the best possible tools to hand, not the tools we think most likely to ‘win an argument’ whatever their standing – its a truely bad argument for use of one methodology over an other and would be just as bad if you had been arguing the opposite.
You claim that somehow Bayesian inference isn’t universally applicable. Really? Do you seriously believe that we pick and choose our methods to suit our cause and that for some reason this system of inference
(ie the logic of dealing with uncertainity) is applicable to some domains and not others? In what possible way can
you justify this statement? How do you differentiate yourself from, for instance, homeopaths who claim that RCT (and the rest) are not applicable to their particular science. I see no difference.
Indeed, frequentsist statiscs as a ‘methodology’ stands in some way as homoepathy does to medicince. When confronted with a problem, one consults the ‘big book of recipes’ and extracts a procedure that received wisdom has said one shoud use.
There is some vague talk that sounds like a rational basis, and after all there are symbols used that look like mathematical symbols. But lift the hood and we find a lack of theory and grounding, and indeed the ability to question’why?’.
Bayesian inference on the other hand, is a true theory of statistical inference, derivable from pellucid and up front assumptions (axioms),
it justifies much of frequentists procedures – a difference of course in the homoepathy analogy being that frequentist stats does often
reach the right conclusion a lot of the time – though it doesn’t know why – Bayesian inference puts it on a solid footing.
Of course the frequentists approach is wrong some of the time, and there are issues that its simply can’t deal with in any consistent manner.
So, on one hand we have some big books of ad hoccerys which may or may not be justified, on the other we have a rigourous,mathematical approach to logical inference (which is of course universally applicable – sorry but that was just silly!)which condenses, explains and is very successful in practical applications. Take Bertrand Paradox – a paradox for frequentists because their ad hocceries
simply don’t differentaite the three possible answers they come out with. No issue for Bayesian analysis.
Rigourous mathematical deduction leads to a unique answer which is both intellectually satisfying, and has also been empirically tested. This tour de force of reasoning coupled with empirical validation not to mention all the other advances Bayesian
inference has made, is usually enough for people to claim an advance in science and drop their old ways, not so in statistics for reasons that often appear
not so different to those touted by quacks and woomeisters.
Of course, Bayesian analysis is basically more difficult, more mathematical and there are acknowledged problems to be resolved – the setting
of priors being top of the list – though symmetry principles, maximum entropy etc appear to be making a start (by the way – where in the
‘Well-Posed Problem’ do you see any ‘subjective’ priors?). There is more to say on priors – but plucking numbers out of the air is in no way part of Bayesian methodology.
My friend – no procedure, whatever its provenance, is likely to convince these people out of their barmy beliefs.
Instead of following them by trying to pick and choose and making specious and unfounded declarations about applicability and relevance,
drop the ‘intuitive magic’ that is frequentist statistics and come into the fold of rigourous scientific analysis that is currently
best represented by Bayesian analysis. Perhaps if we teach the next generation how to think properly about uncertainty they will have
a chance to avoid the new dark age that sometimes when I read this blog and others like it seems imminent.