Click here to monitor SSC

Simple-Talk columnist

Big Data and the Slough of Despond

Published 26 February 2013 1:21 pm

“This miry Slough is such a place as cannot be mended; it is the descent whither the scum and filth that attends conviction for sin doth continually run, and therefore is it called the Slough of Despond: for still as the sinner is awakened about his lost condition, there ariseth in his soul many fears, and doubts, and discouraging apprehensions, which all of them get together, and settle in this place; and this is the reason of the badness of this ground”.

John Bunyan, The Pilgrim’s Progress, 1677

For the past thirty-five years, I’ve been regularly staring at the  promises and announcements  of the IT marketing men and thinking, “what is the incredible technical breakthrough that has allowed this to happen?” Every time, it turns out that there has been no giant leap-forward in the technology, only blather. It is the way the industry works.

It is a particular relief that the Big Data soufflé has at last collapsed into froth. The idea that an analysis of the background noise and chatter of the interwebs, or logs of  consumer behaviour, can reliably give you commercial insights is as daft as listening to the rushing of the waves through a seashell and imagining the voices of the sirens. The trouble is that, without knowledge of statistics, and a serious approach to rejecting any conclusions from data that could have occurred by chance, or other unknown factors, you are a danger to anyone who respects your data analysis. If you rummage through data merely to fish for ‘insights’, you will always come up with exciting correlations and trends that are likely to have occurred by chance.

I object not to the science but to the marketing hype. The tools are certainly there to help you to tease out factors that relate to any commercial trend. I remember when a ‘rocket scientist’ friend of mine in the City of London worked out, a while ago, that the price of sugar futures contracts correlated directly with the weather in Chicago. He used an Exploratory Factor Analysis technique with a huge bank of data. Sure, heavy rains, in Brazil, India and the United States, especially around harvest time, can hike the price of sugar futures, but the most significant correlation in price movement was with the occurrence of weather depressions in Chicago. Why? Chicago was the center of the bulk of sugar futures trading, at the time. The traders merely looked out of the windows and reacted instinctively. It was an insight that the bank that employed my statistician friend used very profitably.

Nothing much has changed in the art of analyzing data to gain marketing insights. It is as hard as it always was, except that it’s now possible to draw entirely the wrong conclusion from data much faster than ever before. As always, the problem is an intellectual, rather than a technical, one. It’s a problem of ensuring the quality of the data, understanding probability and population samples, and resisting the inclination to project your own beliefs onto your findings. Having the technological tools to hand to allow you to see further is no use if you’re looking in the wrong direction.

Here is just a sample of the current debate about Big Data.

3 Responses to “Big Data and the Slough of Despond”

  1. Rockstar says:

    Agree!

    Phil, you are familiar with my love of statistics. I couldn’t agree more as to how the vultures have come out in force to insist that “big data” solutions are going to give them the insights to remain competitive (or gain an advantage).

    At the end of the day the formula for success is fairly simple: build something awesome that people want and provide amazing support for whatever you build.

    Do those two things and you don’t need insights from Big Data. You get insights by talking to your customers. Even your friend with the weather in Chicago…he got his info by talking to someone and using logic, not by mining exabytes of data.

    All this talk of Big Data and statistics reminds me that I’ve got a blog series I’ve been meaning to get done…

    Tom

  2. Robert Young says:

    The problem with Big Data is the Big part. One can spend time looking in a haystack for a needle; however, it’s only a meaningful exercise if that needle is rather large and made of platinum. CIA, NSA, and MI6 have been doing Big Data for decades (with obviously mixed results); Cray made a ton of money selling them very fast Big Iron.

    The Big Data folks remind me of the Long Tail folks of a decade or so ago. Amazon has proved that there’s very little profit in Long Tail. Apple, on the other hand, has shown that one can get rich shifting gazillions of one or two SKUs.

    The other silly factor emerging, and whether there’s a correlation I’m not sure, is the surge in Bayesianism. In a nutshell, the Bayesian methods “allow” analyst bias to be raised to conclusive evidence. I’m reading up a recent text on Bayes, know your enemy kind of thing, and the author boasts that 20th century stats (so-called Frequentist) is being overturned in the 21st by an 18th century cleric. What isn’t a coincidence is that modern microprocessors (and mainframe, too; what is called a Cray today is assembled from commodity cpus/gpus) have made the simulations needed to do such analysis nearly trivial. Kind of: if the only tool you have is a hammer, everything tends to look like a nail. Doesn’t mean the needle is worth the effort.

    Yet, Nate Silver, and others, managed to execute meta-analyses on very small sample surveys of voters to predict elections.

  3. Duke Ganote says:

    I love the reference to Pilgrim’s Progress! However, I think Vincent McBurney is “spot on” when he traces Big Data back to the tabulating machines of the 1890 U.S. Census. The new Frontier of Big Data opened as the American Frontier closed.

Leave a Reply