Big Data misses the difference between correlation and causation

No Gravatar

Big data. It’s all the rage. Somewhere, someone decided that, given our awesome computing capabilities,  we can use this to predict behavior. Or, to understand the world because our computer systems will let us crunch the numbers.

Somehow, we’ve decided that we no longer have to know why we do things. We just have to watch other – and, then, voila, we’ll know what you are about to do.

That’s really the problem. Big data- and big number crunching- has no underlying theory of why things ‘are’ or why things ‘do’. Basically, the concept is to observe the patterns and determine the probabilities of how things will occur.

No one seems to have remembered that this is why a lot of solutions fail.  The proposed “prediction” lacks the theory to include causation- and just wants to solve problems by correlation. [I’ve written often about causation and correlation. Some examples are found herehere, and here, for starters.] And, the scary thing is many believe this technique is infallible. As Mayer-Schonberger and Cukier claim in their tome, “Big Data“, correlations offer insights that are relatively clear, and “[t]hese insights often get obscured when we bring causality back into the picture.”

English: This is a drawing illustrating the re...

Really? It’s a problem if we try to discern what caused something? Wow! That explains a lot of politics- let’s correlate things and never determine if there is any causality- and let America (or France, or Germany) be damned! That’s the problem with this concept- the scientific process of hypothesis and testing no longer applies- even though it’s the primary process that works. So, you never understand what is going on- you can only predict… as long as there is NO new deal breaker, no singularity, no great disrupter.

And, it also assumes that we- the humans- never do things because we are bored, because we deceive ourselves, because we learn from experience. Or, because we can be convinced by others of the folly of our ways.

Maybe that’s why we keep checking for shoe bombs and airplanes hitting buildings, instead of pressure bombs (easy to make) stuffed inside of backpacks or machine guns with 30 round clips killing six-year olds.

Maybe these folks need to remember that when things go right and we succeed, we are sure it’s for our own efforts. But, when we fail- well, something screwed it up. No, folks, causality is key– and we can learn more by searching for errors, for deviations, and then discern how those ‘outliers’ happened. And, preclude them from recurring.

Big data can help us discern that poor kids who don’t have decent nutrition will do poorly in school. Using that, some will decide to not bother educating them, since they are bound to failure. Instead of intervening, feeding them, helping them succeed, and then increasing the success and market size for all of us.

Oh, and didn’t big data predict that the housing boom would continue forever?

Share this:
Share this page via Email Share this page via Stumble Upon Share this page via Digg this Share this page via Facebook Share this page via Twitter


10 thoughts on “Big Data misses the difference between correlation and causation”

  1. Now I understand why so much in politics fails. Surely everyone with blues eyes will also be left handed. I don’t trust the numbers, they change the base until they get the numbers to work the way they want them to.
    Chef William recently posted..Hungarian Goulash

Comments are closed.