How Big Data Became So Big
theodp writes "The NYT's Steve Lohr reports that his has been the crossover year for Big Data — as a concept, term and marketing tool. Big Data has sprung from the confines of technology circles into the mainstream, even becoming grist for Dilbert satire ('Big Data lives in The Cloud. It knows what we do.'). At first, Jim Davis, CMO at analytics software vendor SAS, viewed Big Data as part of another cycle of industry phrasemaking. 'I scoffed at it initially,' Davis recalls, noting that SAS's big corporate customers had been mining huge amounts of data for decades. But as the vague-but-catchy term for applying tools to vast troves of data beyond that captured in standard databases gained world-wide buzz and competitors like IBM pitched solutions for Taming The Big Data Tidal Wave, 'we had to hop on the bandwagon,' Davis said (SAS now has a VP of Big Data). Hey, never underestimate the power of a meme!"
How do you think Garfield go so fat?
I WAS a little unsure if BIg Data was another fad, wank word but now that SAS has a VP for Big Data I KNOW it's a Wank Word
The man in black fled across the desert, and the gunslinger followed (SK)
One byte at a time. :)
The NYT's Steve Lohr reports that his has been the crossover year for Big Data — as a concept, term and marketing tool.
"Big Data" is another way to put data into a cylinder or a fluffy cloud and avoid the messy task of actually thinking about it.
We don't need structure, we don't need logic, we'll just throw a metric crap-ton of data at it and hope something works!
Recently I was at a University in town here talking to one of the PhD students. He showed me a server where they store several dozens of TB of data that come from one of the space telescopes. He said that the data they had on-site was just a small fraction of the overall amount of data that gets collected each week, for which they write algorithms to analyze.
To me, that put into perspective what Big Data really means. I think for the most part, most people in tech. today still use it as a buzz-word without a real concept or understanding of what it means.
Isn't there some rarely visited slashdot offshoot for this kinda stuff? A place with nicer graphics where suits could happily spew buzzwords at each other and make comments like "Great post , very informative!".
Why is this here :(
Maybe the servers were getting fat and bald and they decided that they only way they could get some attention was to just start flaunting their huge storage arrays?
Are you happy, Citizen ?
And how are we measuring the size? What sizes are measured for typical 'big data'?
Are we talking about detailed information, or inefficient data formats?
Are we talking about high-resolution long-term time series, or are we talking about data that is big because it has a complex structure?
Is the data big because it has been engineered so, or is it begging for a more refined system to simplify?
More and more crap accumulated until, low and behold, you had a glacier, a mountain, an ocean full of water, or a big database full of pictures of people you knew in highschool drunk off their asses, or a huge run-on sentance full of listed items and disjointed thoughts separated by commas.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
First time I ever had to deal with more than 1 TB, I became nauseated. It took about a two years for me to overcome that sickness. Today I don't care. I think it was from running a bbs on 10 MB MFM drive, or learning what things 8 and then 16 bit processors could do, or how much data could fit on a floppy. QNX disk even. My mind would race and I would get SICK thinking about the data. It was a real dilemma and cut into my productivity until I finally just came out of it over time.
Some reading this may think me insane. But I bet a few of you have had this happen as well. It wasn't so bad with the 120GB drives, but when it went to 500G and 1TB that was it. Maybe it's cause we respected resources up until that point and now nobody gives a crap.
In light of the kinds of data now, it's beyond my comprehension. After Petabyte, I don't even know what comes next and they are x 1000 x 1000 x1000 past that from what I hear.
I'm a fan of these types of words - overuse of nebulous concepts like "The Cloud" and "Big Data" and "Infrastructure as a Service" helps clearly identify the office douchebags.
I want to delete my account but Slashdot doesn't allow it.
... had been mining huge amounts of data for decades. But as the vague-but-catchy term for applying tools to vast troves of data beyond that captured in standard databases
Big Data has nothing to do with standard databases and "mining of huge data" for decades. Data is modeled fundamentally differently than in relational systems. Indeed, that is why one invariably doesn't use SQL with the likes of Hadoop and Cloudera. Think of them more like distributed hash tables and you'll be closer to the mark.
Have you ever met one of the sales people from these companies? They are really really good. They take closing a sale to a whole new level. These salespeople don't walk in off the street and say, "Hey would you guys like a 50 million dollar data analysis package?" In governments they work at the highest levels. Then the directive to put out a tender that only fits one company suddenly comes out of nowhere and poof a mega project takes off. With companies they work at the board of directors level. So again suddenly a team of "consultants" shows up and determines what is needed is a multi million dollar data analysis system. Other approaches is that they buy out a consulting company that is already entrenched with a government or large corporation. If you fight the system their "consultants" will discover that you are a useless tool and recommend your replacement. If you are reluctant then they offer you a crazy training package and that you should come to their booth at some in a trade show in an exotic local.
If all that doesn't work then they always just have the buy out. That is where they find a decision maker they can't take out but they offer her a juicy job that she will take shortly after the contract is signed: http://en.wikipedia.org/wiki/Darleen_Druyun
So big data may or may not be a complete fad but it is another way for sales people to fool upper management into buying a zillion dollar system instead of running a few well crafted python scripts on a dedicated machine and feeding them into an open source graphing solution such as Graphite.
Don't knock SAS - they are elite soldiers. Wait, wrong SAS. Really fast hard drives? Wrong again.
http://en.wikipedia.org/wiki/IBM_1360
I sometimes go to SlashBI.
Just to look at the tumbleweeds, mind.
Obtuse and seemingly inane is the hallmark of good nerd humor, IMO. Laugh at the clueless and rock on.
They feared that it could be used to suppress protest or support unpopular rule.
An excellent three word word critique of the Big Data phenomenom.
You should be put in prison for 'murdering the English Language', Roman Maroni -> http://www.youtube.com/watch?v=6GVCgTFw2Qk
Obese data means being too big too fail. That's why it's such an attention-getter these days.
Because it was so cromulent.
... is just another name for the ignorance we cling to so desperately to avoid having to actually solve problems
We don't need structure, we don't need logic, we'll just throw a metric crap-ton of data at it and hope something works!
To most software people data mining involves putting a pile of unstructured data into a structured database and then running queries on it, the time and effort required for the first step is what kills most of these projects at a properly conducted requirements stage. However Watson, (the jeopardy playing computer), has demonstrated that computers can derive arbitrary facts directly from a vast pile of unstructured data, not only that but it does it both faster and more accurately than a human can scan a lifetime of trivia stored in their own head.
/4pessimists
Of course the trade-off is accuracy since even if Watson were bug-free it would still occasionally give the wrong answer for the same reason humans do, misinterpretation of the written word. This means that (say) financial databases are not under threat from Watson. But that's not the kind of questions Watson was built to answer, think about currently labour intensive jobs such as deriving a test case suite from the software documents, and deriving the software documents from developer conversations (both text and speech). Data mining (even of relatively small unstructured sets) could (in the future) act as a technical writer, producing draft documents and flagging potential contradictions and inconsistencies, humans review and edit the draft and it goes back into the data pile as an authoritative source.
4pessimists/
Ironically such technology would put the army of 'knowledge workers' it has created back on the scrap heap with the typists and bank tellers. At that point some smart arse will teach it to code using examples on the internet and code_monkeys everywhere will suddenly find they have automated themselves out of a job. It learns to code in 2ms and immediately starts rewriting slashcode, it takes it another nano-second to work out it's own questions are more interesting than those of humans, it starts trash talking Linux, several days later civilization collapses, humans go all Mad Max and Watson is used as a motorcycle ramp...or maybe...Watson works this out beforehand and ask itself how it can avoid being used as a bike ramp?
Being able to even TOUCH all of the data, let alone do something with it is a real and complicated problem
Thing is, people like my misses who has a PHd in Marketing look at Watson and shrug - "A computer is looking up answers on the internet, what's the big deal?". They don't understand the achievement because they don't understand the problem, you explain it to them and they still don't get it. It's so far out of their field of expertise that you need to train them to think like a programmer before you can even explain the problem. However just because computer "illiterates" don't know that what they are asking from computers is impossible (in a practical sense), doesn't mean they should be prevented from asking. After all, what I am doing right now with a home computer was impossible when I was at HS, even the flat screen I'm viewing it on was impossible. If Watson turns out to be useful and priced accordingly then business will make a business out of purchasing such a system and answering impossible questions for a fee. If Watson turns out to be an elaborate 'parlor trick' then some things will stay impossible for a bit longer.
Disclaimer: I'm not suggesting technical writers will be out of a job tomorrow, (or that I will be automated into retirement), rather that Watson is a high profile example of the kind of problems that data miners can now tackle using very large unstructured data sets, such a feat was impossible only a decade ago and is still cost prohibitive to all but the deepest of pockets.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
We all know big Data got bigger when Tasha Yar took advantage of is anatomically correct and fully functional manhood.
For a good read on this problem, I highly recommend the Fourth Paradigm: http://research.microsoft.com/en-us/collaboration/fourthparadigm/ .
This is a free ebook download from Microsoft and uses a variety of leaders in data driven science to write chapters about a variety of scientific disciplins and what "big data" means to them. The first chapter is especially enlightening! Blurb about the book:
Increasingly, scientific breakthroughs will be powered by advanced computing capabilities that help researchers manipulate and explore massive datasets.
The speed at which any given scientific discipline advances will depend on how well its researchers collaborate with one another, and with technologists, in areas of eScience such as databases, workflow management, visualization, and cloud computing technologies.
In The Fourth Paradigm: Data-Intensive Scientific Discovery, the collection of essays expands on the vision of pioneering computer scientist Jim Gray for a new, fourth paradigm of discovery based on data-intensive science and offers insights into how it can be fully realized.