This blog is intended to go along with Population: An Introduction to Concepts and Issues, by John R. Weeks, published by Cengage Learning. The latest edition is the 12th (it came out in 2015), but this blog is meant to complement any edition of the book by showing the way in which demographic issues are regularly in the news.

If you are a user of my textbook and would like to suggest a blog post idea, please email me at:

Tuesday, October 2, 2012

Correlation Does Not Imply Causation

Our modern understanding of the world depends upon science, an essential part of the Enlightenment. An important part of science is organizing information into something meaningful, and this is the role of statistics. A key statistical concept that underlies virtually all of modern analysis is correlation. We use it all the time--we wouldn't know what we know without it. But we are also routinely admonished not to equate correlation with causation. Two things can be correlated without one causing the other. Does that matter? Daniel Engber of Slate discusses this issue very intelligently and with some new historical information generated from Google books that I, for one, find fascinating.
Those first, modest peaks of "correlation is not causation" show up in print in the 1890s—a date that happens to coincide with the discovery of correlation itself. That's when the British statistician Karl Pearson introduced a powerful idea in math: that a relationship between two variables could be characterized according to its strength and expressed in numbers. Francis Galton had futzed around with correlations some years before, and a French naval officer named Auguste Bravais sketched out some relevant equations. But it was Pearson who gave the correlation its modern form and mathematics. He defined its role in science.
And he digs up a great quote from more than 100 years ago:
The father of correlation did worry about its overuse, says Theodore Porter, a historian of science at UCLA and a Pearson specialist. A footnote to the second edition of The Grammar of Science, published in 1900, lays out a critique of spurious relationships in terms that would not look out of place on an Internet message board:
All causation as we have defined it is correlation, but the converse is not necessarily true, i.e. where we find correlation we cannot always predict causation. In a mixed African population of Kaffirs and Europeans, the former may be more subject to smallpox, yet it would be useless to assert darkness of skin (and not absence of vaccination) as a cause.
So it seems the fear of correlations was formalized—made into a turn of phrase, I mean—at around the time that correlations came into formal being. One might say (citing another correlation) that Pearson's work marks the transition from an age of causal links to one of mere relationships—from anecdotal science to applied statistics. As correlations split and multiplied, we needed to remind ourselves of what they meant and what they didn't.
The bottom line, though, is that correlations are important. You can't have causation without them, even if they don't imply causation on their own. But they do tell us that something is going on, and that we then need to figure out what that is. That's how science moves forward.

No comments:

Post a Comment