A lesson about correlation and causation

Besides yesterday being Mothers’ Day yesterday, I had a lot of grant stuff to do, which means that this one will be a quickie. On Saturday, a reader sent me a link to one of the most useful sites I’ve ever encountered. I realize that over the weekend it’s spread around the skeptical blogosphere like the proverbial wildfire, which is unfortunate (for me) given that I’ve made it a personal rule that I don’t post on the weekend any more, barring amazing developments. Still, this one tempted me. It’s a website called Spurious Correlations, and it is exactly what it claims to be. Its usefulness wil become apparent quickly.

One of the key arguments—if not the key argument—made by antivaccinationists is that correlation of the onset of the “autism epidemic” (i.e., the large increase in prevalence of autism and autism-spectrum disorders beginning in the 1990s) with the expansion of the vaccine schedule is strong evidence of causation. In other words, correlation equals causation. Of course, correlation can equal causation, but you need to have a lot of other evidence to show that. Most correlations don’t equal causation, and that’s where Spurious Correlations comes in. It has thousands of such spurious correlations, complete with graphs, numbers, and even correlation coefficients, many of which are well about 0.9, which is considered a very strong correlation indeed. For example, on the front page right now is a correlation between U.S. spending on space, science, and technology and suicides by hanging, strangulation, and suffocation, and the correlation coefficient (r) is 0.992082 (for a perfect correlation, r=1.0; no correlation, pure randomness, r=0; negative numbers mean a negative correlation in which one variable’s increase is correlated with the other variable’s decrease):


Amusingly, you can look for your own spurious correlations. I wanted to look for more spurious correlations to autism, but I couldn’t find it. So instead I looked for things that correlate with per capita high fructose corn syrup consumption (because HFCS is considered evil in alt-med circles) and found it correlated with (among other things):

Clearly, if I ever go back to doing abdominal surgery I’ll have to be careful about operating on people who consume a lot of HFCS, and people who consume a lot of HFCS need to stay indoors during thunderstorms. Meanwhile, HFCS demonstrated strong negative correlations with (among other things):

Hmmm. Perhaps there is something to all that HFCS fearmongering. After all, as HFCS consumption falls, look at how there are more lawyers, more bomedical doctorates, and less suicide by hanging!

Ah, you ask, but what doesn’t correlate with HFCS consumption? Ask no more! Here they are:

I think that last one was messed up by the spike in deaths in 2005 due (clearly) to Hurricane Katrina. As for the rest, if you want to go hang gliding, work with agricultural machinery, or go canoeing or kayaking, drink those sugary drinks up!

Of course, the reason skeptics are suspicious of correlations is not because correlation never equals causation. It’s just that we realize that there really are lots of spurious correlations like this, far more than the average person realizes. A person who hasn’t taken the time to understand just how common such correlations are and how easy it is to mine various data to find them will find correlations compelling, particularly if they have a modicum of seeming plausibility, as the vaccine-autism link once did before so many studies demolished it. A skeptic, however, will realize that such a correlation is only a starting point that probably doesn’t mean anything but might. Further evidence, in the form of testing other data sets, doing controlled experiments (when they are possible to do), and other means of testing are essential to determine whether what is observed is just a spurious correlation as opposed to a correlation that really does imply causation. We also know that confounding factors can easily lead to the appearance of correlation, such as steady increases in two different variables over time that just happen to occur over the same time frame but have nothing to do with each other, such as, for example, computer or cell phone use and autism. This sort of spurious correlation can lead to other correlations, such as increasing wifi exposure and autism, given that the increase in computer and Internet use over the last 20 years has led to the proliferation of wifi hotspots and increasing exposure to radio waves from cell phones. The list goes on and on.

I really wanted to go into the data and see if, for instance, autism really does correlate with vaccination or whether brain cancer correlates with mobile phone use. More importantly, I wanted to find a bunch of ready-made spurious correlations for those two conditions for me to use to demonstrate the principle that just because there’s a correlation does not mean the relationship is causal, such as when I suggest that autism also correlates with Internet usage, home computer ownership, CD sales (at least until around 10 years ago, when CD sales started to take a nosedive and sales of downloadable digital music took off), and a variety of other conditions. Similarly, given that the incidence of brain cancers doesn’t appear to be significantly increasing—actually, quite the opposite, it appears to be decreasing—the suggestion that cell phone radiation causes brain cancer seems no more convincing that the long-refuted claim that vaccines cause autism, and that’s leaving aside the monumental physical and biological implausibility of the claim on the basis of simple physics alone.

Then I thought: Maybe that’s the point. If this website had the data for the sorts of things for which cranks often confuse correlation with causation, like cancer, autism, asthma, and autoimmune diseases (to name a few), then the site would risk going from being a useful tool to teach critical thinking skills by allowing readers to explore and find the most ridiculous spurious correlations they can, thus demonstrating that correlation is not the same thing as causation, no matter how much the human brain likes to grasp onto such correlations, to a site that cranks can data mine to support their favorite pet hypotheses. That, of course, would be bad. On the other hand, it would allow people with me to demonstrate all the other things that correlate with autism prevalence, thus allowing me to ask antivaccinationists why they think it’s vaccines that are the culprit rather than all the other things or that correlate with cancer, thus allowing me to ask why it has to be cell phones or any of the other bogeymen on whom cancer is blamed rather than all the other—heh, heh—spurious correlations. Of course, to cranks their correlations can’t be spurious and must be strong evidence of causation, while all those other correlations are obviously nonsense.

In any case, I think it would be a fun exercise if you, my readers, would play around with the Spurious Correlations site and find the most interesting or bizarre spurious correlations for use in educating people in critical thinking. And, remember, the per capita consumption of cheese in the US correlates strongly with the number of people who died by becoming tangled in their bedsheets (r=0.947091). So, please, people whatever you do, don’t eat cheese right before going to bed. Oh, and if you are unfortunate enough to be confined to a wheelchair, either temporarily or permanently, whatever you do, don’t you eat cheese either!