Friday, September 9, 2011

Correlation is not causation

Product managers have to do a certain amount of statistical research to understand what the perceived benefits of a new product feature will be.  Though we all know that just because two things happen at the same time or sequentially it doesn't mean one causes the other, it's easy to fall into that trap.

For example, I recently got a new water bottle.  I find that when I use it, I drink a lot more water vs going into the break room with a cup or sipping from a prepackaged water bottle (e.g. Crystal Geyser).  If I wanted to market that water bottle, I might look for research about the health benefits of drinking more water.  There was a study done some time ago that claimed drinking more water helps you lose weight.  The study compared groups of people and found that people who drank more water tended to be less prone to obesity than a control group that drank much less.

From this, you might conclude that you can advertise that this water bottle will help people lose weight.  Right?

Well, you may be able to advertise it, but no, it's not ironclad proof of the benefits.  ...and if you didn't see that coming, I'm sure you're not the only one.  It's a correlation - just because they happen at the same time doesn't mean one causes the other.  Suppose the groups were selected based on water consumption per floor of a building.  Suppose the group that drank more water turned out to be health fanatics that eat right, exercise, and believe drinking more water has health benefits, whereas on the comparison floor someone brings in donuts every morning.  The people eat right and exercise regularly.  This too is correlation, not causation, but it does introduce an additional unknown:  what one factor is most responsible for these people being less obese - exercise, diet, drinking more water, or something else? 

Proof of causation requires much more research, so the most common way around this is for the marketing (or legal) team to claim something like "in a study, people who drank more water tended to also be prone to fewer weight problems," which in this case is true, implies a connection, and to be fair, the connection hasn't been proven but hasn't been disproven either.  (some geographies have specific laws about health benefit claims in advertising, so if you're managing a product and claiming there's a health benefit, you'll want to look into that)

Let's use another example -- JibberJobber is a web site that provides tools for organizing a job search.  JibberJobber could advertise that people using their tool tend to get hired sooner than people that track job leads on a spreadsheet.  There may be a causal relationship (it was immensely helpful in my job search a couple of years ago!).  However if you look at their clientele, these people probably would be more organized anyway and often have more skills to help them stay organized, making them more desirable candidates.  It's a fantastic tool, but there's no way to tell whether using JibberJobber actually accelerates the job search, or merely makes it easier to track job leads.

Correlation is not causation.