'Data Science' Is Dead
Nerval's Lobster writes "If you're going to make up a cool-sounding job title for yourself, 'Data Scientist' seems to fit the bill. When you put 'Data Scientist' on your resume, recruiters perk up, don't they? Go to the Strata conference and look on the jobs board — every company wants to hire Data Scientists. Time to jump aboard that bandwagon, right? Wrong, argues Miko Matsumura in a new column. 'Not only is Data Science not a science, it's not even a good job prospect,' he writes. 'Companies continue to burn millions of dollars to collect and gamely pick through the data under respective roofs. What's the time-to-value of the average "Big Data" project? How about "Never?"' After the 'Big Data' buzz cools a bit, he argues, it will be clear to everyone that 'Data Science' is dead and the job function of 'Data Scientist' will have jumped the shark."
Unfortunately, unless this is structured data, you will be subjected to the data equivalent of dumpster diving. But surfacing insight from a rotting pile of enterprise data is a ghastly process—at best.
Sounds like this Miko Matsumura has no idea how successful Big Data projects actually work.
To refine his analogy, unstructured data is much like processing recyclables. Everything that might possibly be good gets thrown into a large bin, and several sorting processes run to extract individual relevant (though messy) pieces. While those pieces alone aren't pure enough to be useful, there's enough meaningful information in them that statistical analysis can separate the good from the bad, and that's where the insight comes from.
With a typical RDBMS, insight is readily apparent. A hypothesis that 75% of a user's purchases were widgets is simple to verify. In a non-relational database, as is often used in Big Data projects, that would be an inefficient computation (though it can be done). Rather, those databases are more aligned to produce a whole list of correlations between user demographics and purchasing habits, showing for example that users who buy widgets have often already bought foo bars. The "Data Scientist" didn't have to ever look specifically at statistics for widgets or foo bars, but the correlation is presented in a nice and accessible form, gleaned from millions or billions of independent data points.
Miko Matsumura is a Vice President at Hazelcast, an open source in-memory data grid company.
This is a SlashBI article written by executives for executives, with little basis in fact. Lovely.
You do not have a moral or legal right to do absolutely anything you want.
90% of what a data science expert do is what people like to call data-juijitso (data reconfiguration). Which basically means getting data out of your RMDBs, SAP, Twitter, Facebook, random text (.csv, etc) file dumps, random Excel/Word Files and legacy databases and into some place you can actually generate conclusions from (like inside a HDFS Hadoop cluster). Plus during this process you need to normalize all your data so you can apply the same algorithm no matter where the data came from.
All this means is that you will spend countless hours trying to connect to the client legacy stuff and then countless hours trying to get the data out (without impacting production systems!), so you can then spend countless hours formatting this data around to be able to spend countless hours trying to get this data into your Big Data(tm) solution so you can finally run some algorithms and create results. Now multiply all that by the number of different kinds of databases the client has and you get the idea.
As an IT professional you really do not want to work in this field. No organization keep its data in a clean uniform way, data scientist is like an IT janitor.