How Big Data Became So Big
theodp writes "The NYT's Steve Lohr reports that his has been the crossover year for Big Data — as a concept, term and marketing tool. Big Data has sprung from the confines of technology circles into the mainstream, even becoming grist for Dilbert satire ('Big Data lives in The Cloud. It knows what we do.'). At first, Jim Davis, CMO at analytics software vendor SAS, viewed Big Data as part of another cycle of industry phrasemaking. 'I scoffed at it initially,' Davis recalls, noting that SAS's big corporate customers had been mining huge amounts of data for decades. But as the vague-but-catchy term for applying tools to vast troves of data beyond that captured in standard databases gained world-wide buzz and competitors like IBM pitched solutions for Taming The Big Data Tidal Wave, 'we had to hop on the bandwagon,' Davis said (SAS now has a VP of Big Data). Hey, never underestimate the power of a meme!"
How do you think Garfield go so fat?
I WAS a little unsure if BIg Data was another fad, wank word but now that SAS has a VP for Big Data I KNOW it's a Wank Word
The man in black fled across the desert, and the gunslinger followed (SK)
One byte at a time. :)
The NYT's Steve Lohr reports that his has been the crossover year for Big Data — as a concept, term and marketing tool.
"Big Data" is another way to put data into a cylinder or a fluffy cloud and avoid the messy task of actually thinking about it.
We don't need structure, we don't need logic, we'll just throw a metric crap-ton of data at it and hope something works!
Recently I was at a University in town here talking to one of the PhD students. He showed me a server where they store several dozens of TB of data that come from one of the space telescopes. He said that the data they had on-site was just a small fraction of the overall amount of data that gets collected each week, for which they write algorithms to analyze.
To me, that put into perspective what Big Data really means. I think for the most part, most people in tech. today still use it as a buzz-word without a real concept or understanding of what it means.
Isn't there some rarely visited slashdot offshoot for this kinda stuff? A place with nicer graphics where suits could happily spew buzzwords at each other and make comments like "Great post , very informative!".
Why is this here :(
And how are we measuring the size? What sizes are measured for typical 'big data'?
Are we talking about detailed information, or inefficient data formats?
Are we talking about high-resolution long-term time series, or are we talking about data that is big because it has a complex structure?
Is the data big because it has been engineered so, or is it begging for a more refined system to simplify?
More and more crap accumulated until, low and behold, you had a glacier, a mountain, an ocean full of water, or a big database full of pictures of people you knew in highschool drunk off their asses, or a huge run-on sentance full of listed items and disjointed thoughts separated by commas.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
... had been mining huge amounts of data for decades. But as the vague-but-catchy term for applying tools to vast troves of data beyond that captured in standard databases
Big Data has nothing to do with standard databases and "mining of huge data" for decades. Data is modeled fundamentally differently than in relational systems. Indeed, that is why one invariably doesn't use SQL with the likes of Hadoop and Cloudera. Think of them more like distributed hash tables and you'll be closer to the mark.
Have you ever met one of the sales people from these companies? They are really really good. They take closing a sale to a whole new level. These salespeople don't walk in off the street and say, "Hey would you guys like a 50 million dollar data analysis package?" In governments they work at the highest levels. Then the directive to put out a tender that only fits one company suddenly comes out of nowhere and poof a mega project takes off. With companies they work at the board of directors level. So again suddenly a team of "consultants" shows up and determines what is needed is a multi million dollar data analysis system. Other approaches is that they buy out a consulting company that is already entrenched with a government or large corporation. If you fight the system their "consultants" will discover that you are a useless tool and recommend your replacement. If you are reluctant then they offer you a crazy training package and that you should come to their booth at some in a trade show in an exotic local.
If all that doesn't work then they always just have the buy out. That is where they find a decision maker they can't take out but they offer her a juicy job that she will take shortly after the contract is signed: http://en.wikipedia.org/wiki/Darleen_Druyun
So big data may or may not be a complete fad but it is another way for sales people to fool upper management into buying a zillion dollar system instead of running a few well crafted python scripts on a dedicated machine and feeding them into an open source graphing solution such as Graphite.
We don't need structure, we don't need logic, we'll just throw a metric crap-ton of data at it and hope something works!
To most software people data mining involves putting a pile of unstructured data into a structured database and then running queries on it, the time and effort required for the first step is what kills most of these projects at a properly conducted requirements stage. However Watson, (the jeopardy playing computer), has demonstrated that computers can derive arbitrary facts directly from a vast pile of unstructured data, not only that but it does it both faster and more accurately than a human can scan a lifetime of trivia stored in their own head.
/4pessimists
Of course the trade-off is accuracy since even if Watson were bug-free it would still occasionally give the wrong answer for the same reason humans do, misinterpretation of the written word. This means that (say) financial databases are not under threat from Watson. But that's not the kind of questions Watson was built to answer, think about currently labour intensive jobs such as deriving a test case suite from the software documents, and deriving the software documents from developer conversations (both text and speech). Data mining (even of relatively small unstructured sets) could (in the future) act as a technical writer, producing draft documents and flagging potential contradictions and inconsistencies, humans review and edit the draft and it goes back into the data pile as an authoritative source.
4pessimists/
Ironically such technology would put the army of 'knowledge workers' it has created back on the scrap heap with the typists and bank tellers. At that point some smart arse will teach it to code using examples on the internet and code_monkeys everywhere will suddenly find they have automated themselves out of a job. It learns to code in 2ms and immediately starts rewriting slashcode, it takes it another nano-second to work out it's own questions are more interesting than those of humans, it starts trash talking Linux, several days later civilization collapses, humans go all Mad Max and Watson is used as a motorcycle ramp...or maybe...Watson works this out beforehand and ask itself how it can avoid being used as a bike ramp?
Being able to even TOUCH all of the data, let alone do something with it is a real and complicated problem
Thing is, people like my misses who has a PHd in Marketing look at Watson and shrug - "A computer is looking up answers on the internet, what's the big deal?". They don't understand the achievement because they don't understand the problem, you explain it to them and they still don't get it. It's so far out of their field of expertise that you need to train them to think like a programmer before you can even explain the problem. However just because computer "illiterates" don't know that what they are asking from computers is impossible (in a practical sense), doesn't mean they should be prevented from asking. After all, what I am doing right now with a home computer was impossible when I was at HS, even the flat screen I'm viewing it on was impossible. If Watson turns out to be useful and priced accordingly then business will make a business out of purchasing such a system and answering impossible questions for a fee. If Watson turns out to be an elaborate 'parlor trick' then some things will stay impossible for a bit longer.
Disclaimer: I'm not suggesting technical writers will be out of a job tomorrow, (or that I will be automated into retirement), rather that Watson is a high profile example of the kind of problems that data miners can now tackle using very large unstructured data sets, such a feat was impossible only a decade ago and is still cost prohibitive to all but the deepest of pockets.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.