Epi Wonk versus Mark and David Geier: Guess who wins?

There’s a new blog in town that I’ve been meaning to pimp. It’s a blog by a retired epidemiologist who got things started looking at the role of diagnostic substitution in autism diagnoses and argued that the autism “epidemic” is an artifact of changing diagnostic criteria.

The blog is Epi Wonk, and it’s a good one so far.

This week, I’m really glad Epi Wonk exists. The reason is that somehow, another Geier père et fils crapfest of dumpster-diving has somehow slimed its way into the medical literature, just in time to be used in the Autism Omnibus hearings no doubt. The “study” (if you can call it that) is entitled Thimerosal exposure in infants and neurodevelopmental disorders: An assessment of computerized medical records in the Vaccine Safety Datalink, posted as an article in press at the Journal of Neurological Sciences. For reasons that escape me, Pharmalot has a direct link to the PDF and appears to be pimping this execrable exercise in cesspit (I mean data) mining. I understand why Age of Autism is so eager to pimp this article, but why Pharmalot?

In any case, a number of people have pointed this study out to me. However, after diving into the craptacular exercise in bad science known as the “13 monkeys” study being touted by the Age of Autism by Hewitson and Wakefield, the thought of diving into yet another exercise of tortured statistics so soon by the Geiers was about as appealing as diving head first into the Cuyahoga River in 1969, except that likening the Geiers’ “research” to industrial waste is an insult to industrial waste. At least industrial waste comes about as a result of producing goods and products that are useful. Not so anything the Geiers have published in the last decade or two at least.

Fortunately, Epi Wonk is taking on this study so that I don’t have to. She’s far more able to see through the data torturing than I am (although the manipulation of data is so blatant that anyone who’s taken Biostatistics 101 should see through it immediately.) A taste:

This study has a lot of problems, and I predict that it will take me at least five posts to go through the article point by point to explain all the flaws. However, there’s one trick that the authors play that’s so glaring that I have to point it out immediately. In fact, I’ve spent two days in shock that the journal editor and reviewers let the authors get away with it.

See what I mean? I could deconstruct the latest Geier opus, but I couldn’t do it in such ruthless, clinical detail. I know some epidemiology, but I’m not a professional. My first thought when seeing this new study was that it reminded me of the last time the Geiers tried to mine the Vaccine Safety Datalink.

Epi Wonk’s most damning finding is that the Geiers used some statistical sleight-of-hand to impute extra cases of autism to certain birth cohorts that couldn’t be legitimately done:

Let’s quote directly from the Young, Geier & Geier paper to make sure I have this right. “Because of concern that the cohorts from 1995-1996 had only 4-6 years of follow-up, frequency distributions of age at diagnosis were examined for all years. This revealed that for some of the disorders a sizable proportion of children were diagnosed after 4.5 years. Adjustments were made for counts of cases as needed for birth cohorts depending upon the disorder examined to correct for under ascertainment that occurred due to shorter follow-up times. These adjustments were made for all disorders including the control disorders as appropriate based on the age distribution….”

“For example, 37% of autism cases in the study were diagnosed after 5 years old with about 50% diagnosed after 4.5 years old. This is a conservative estimate since it includes the 2 years (1995-1996) that had shorter follow-up times. Examination of the distribution of age of diagnosis by birth year for autism revealed that only about 15% of cases were diagosed after 5 years of age in the 1995 birth cohort while the 1996 birth cohort had no cases diagnosed after 5 years of age and only 3.5% of cases diagnosed between 4.5 and 5 years of age. Based on the average age at diagnosis for all cohorts the 1995 count of autism cases was increased by 45 cases with the assumption that all of these would have been added in the 5 year+ age group (bringing this percentage close to the overall average of of 37% diagnosed after 5 years of age.) The same was done for 1996, but the number of cases was augmented by 80 because it was assumed that these would be diagnosed in the 4.5 to 5 and 5+ groups essentially bringing the percentage after age 4.5 close to the overall average of 50% diagnosed after 4.5 years of age. The new augmented frequency counts of cases in 1995 and 1996 birth cohorts were then use as new case counts in the analysis.”

This is just not done. It’s not valid. It’s not ethical. Adding imaginary cases into a data set borders on scientiific fraud. I’ve been trying to wrap my mind around some sort of rationale for the authors “imputing” extra cases and to me it’s just fudging the data. What they’ve done bears some relationship to a procedure called “direct age standardization,” but age standardardization might be useful in a situation where invesigators were comparing birth cohorts — not where the birth cohorts are the units of analysis (more on this “units of analysis” concept later). I don’t think this is downright scientific fraud for two reasons. First, they carried out this procedure of “imputing” imaginary cases for the control disorders, as well as autism and five other neurodevelopmental disorders. (I’ll explain this in more detai in upcoming posts.) Second, they come right and admit that they cooked the data by adding imaginary cases — it’s not as if they’re trying to hide anything.

The Geiers doing something unethical? Imagine that. I urge you to go and read Epi Wonk’s entire post. I look very much forward to seeing the rest of her deconstruction of this study.

The Geiers’ fudging data to make it show what they want it to show I can understand. After all, that’s just what the Geiers do. It’s like a dog rolling in excrement or a dead and decaying bird. They just can’t help themselves, as it’s natural to them. But what about Heather Young, an Assistant Professor of Epidemiology and Biostatistics at the George Washington University School of Public Health? Does she realize that she’s committing academic suicide to be associating with the Geiers? Surely she doesn’t think that a tenure committee will look favorably on a publication record that includes articles like this, does she? More importantly, surely she must have known that the methodology she used is dubious.

I will give the Geiers that the VSD database is a much better source of data for studies than the VAERS, their previous favorite stomping ground. The reason VAERS is so crappy for incidence and prevalence data is because anyone can submit a report to it and blame anything on a vaccine reaction. This is good for looking for early warnings of problems. However, it’s bad for any useful data about correlations. Indeed, Dr. Laidler once famously demonstrated how ridiculous reports to the VAERS database can be when he submitted a report claiming that vaccines had turned him green and large; i.e., had turned him into The Incredible Hulk:

The chief problem with the VAERS data is that reports can be entered by anyone and are not routinely verified. To demonstrate this, a few years ago I entered a report that an influenza vaccine had turned me into The Hulk. The report was accepted and entered into the database.

Because the reported adverse event was so… unusual, a representative of VAERS contacted me. After a discussion of the VAERS database and its limitations, they asked for my permission to delete the record, which I granted. If I had not agreed, the record would be there still, showing that any claim can become part of the database, no matter how outrageous or improbable.

Another problem with the VAERS database is that vaccine litigants have been urging people to report any and all perceived vaccine reactions, particularly cases of autism, to the VAERS database. As I discussed two years ago, this has hopelessly distorted the database.

In contrast, the VSD database is a collaborative passively collected database managed by the CDC with several large HMO. It was last used to demonstrate that neurodevelopmental disorders other than autism are not associated with vaccines. Medical professionals are entering data, and there is a system for monitoring. However, individual-level data takes work to derive, and the Geiers probably didn’t have access to individual-level data because they had been busted for trying to merge datasets in such a way that would have compromised patient confidentiality. Instead, they used average estimated levels of exposure for each of the seven birth cohorts, a point that I had planned on emphasizing heavily when I got around to analyzing the study in more detail. Given that, it might as well Epi Wonk tells us why this is inherently a bad way to analyze such data:

Aside from the fact that a regression analysis based on an N of 7 is unstable and not robust at all, it has been known in the social sciences since 1950 and in epidemiology since about 1973 that in general, regression estimates from ecological analyses tend to be hugely magnified compared to individual-level analyses. (By individual-level analysis I simply mean the type of study where individual exposure data and individual level outcome data is used in the analysis for every study participant.)

Indeed, compare how the Geiers mined the VSD to the way that a similar sort of analysis was done in the other use of the VSD:

We determined the mercury content of vaccines and immune globulins that the study children received when they were infants (1993-1998) from published data and the FDA (Table B of the Supplementary Appendix). We identified vaccines and immune globulins that children had received from HMO computerized immunization records, paper medical records, personal immunization records, and maternal interviews. Prenatal exposure to mercury included all known exposures of the mother to thimerosal-containing vaccines and immune globulins during pregnancy. We defined postnatal exposure as micrograms of mercury divided by the weight of the child in kilograms at the time of administration of each vaccine or immune globulin. Individual exposures were summed during the period of interest: birth to 1 month and birth to 7 months (1 to 214 days).

In other words, in that study, the investigators estimated each child’s exposure to mercury to the best of their abilities based on the data in VSD. Moreover, children enrolled in the study then underwent neuropsychological testing to seek out any diagnoses that could be correlated with vaccination status.

Of course, to magnify the regression estimates is exactly what the Geiers wanted to do. What I find it hard to believe is that an epidemiologist like Dr. Young would go along with this. If she is unware, perhaps it was naïveté that led her to trust the Geiers. At least, I hope that’s the case, because the other possibilities (that she’s either incompetent, intentionally cooked the data, or both) are ones that I would prefer not to contemplate. I also hope that her career doesn’t spiral downwards into autism crankery after this.

The other thing that I can’t figure out is how this sort of blatantly obvious crap got through peer review. The only explanation I can think of is that most peer reviewers also aren’t aware of the limitations of the database. In a way, this does provide a lesson for lay people who assume that peer reviewed studies are reliable, namely that there is a lot of crap out there, even in the peer-reviewed literature. The journal in which a study appears can give a reader an idea of the probability that an article within is lousy, low for high quality, high impact journals, and higher for lower quality, lower impact journals, but appearance of an article in a peer-reviewed journal, even a good one, is no guarantee that the science is solid. (That’s why when I see something in non-peer-reviewed ideological journals like the Journal of American Physicians and Surgeons, I know right away that it’s almost certainly garbage; if the authors could have gotten it into a peer-reviewed journal, even a dumping ground of a journal, rather than JPANDS they almost certainly would have.) Each study has to be evaluated on its own according to methodology. Sadly, the appearance of this article in the Journal of Neurological Sciences represents a failure of peer review for which the editors should be deeply ashamed.

I look forward to the next installment of Epi Wonk’s deconstruction of this article, and I’m going to add her blog to my blogroll the next time I update it. In the meantime, please head on over and check out Epi Wonk. Tell ’em Orac sent you.

ADDENDUM: Here’s an acknowledgment from the paper:

This study received funding from the Autism Petitioners’ Steering Committee of the no-fault National Vaccine Injury Compensation Program (NVICP). Dr. Heather Young has been a consultant in vaccine cases before the no-fault NVICP. David Geier has been a consultant in vaccine/biologic cases before the no-fault NVICP and in civil litigation. Dr. Mark Geier has been a consultant and expert witness in vaccine/biologic cases before the no-fault NVICP and in civil litigation.

Lovely. This means this study was almost certainly funded as part of the Autism Omnibus. Talk about our government bending over backwards to give the petitioners every benefit of the doubt; it’s even funding “studies” for them to use!