'Data Science' Is Dead
Nerval's Lobster writes "If you're going to make up a cool-sounding job title for yourself, 'Data Scientist' seems to fit the bill. When you put 'Data Scientist' on your resume, recruiters perk up, don't they? Go to the Strata conference and look on the jobs board — every company wants to hire Data Scientists. Time to jump aboard that bandwagon, right? Wrong, argues Miko Matsumura in a new column. 'Not only is Data Science not a science, it's not even a good job prospect,' he writes. 'Companies continue to burn millions of dollars to collect and gamely pick through the data under respective roofs. What's the time-to-value of the average "Big Data" project? How about "Never?"' After the 'Big Data' buzz cools a bit, he argues, it will be clear to everyone that 'Data Science' is dead and the job function of 'Data Scientist' will have jumped the shark."
Call yourself a statistician or database engineer and I promise there are still jobs around. And contrary to the summary, they are highly valuable jobs.
Some people die at 25 and aren't buried until 75. -Benjamin Franklin
But this general domain in the realm of contemporary giant data sets is the basic science research of our times. To say that 'data scientist' roles are dead in the near future based on a ROI analysis is to suggest that all these huge data sets aren't likely to pay off for a corp in the near future. And that doesn't sound right at all.
That's a very strong claim, I'll need to consult my Data Scientist to see if it actually fits the data.
No data has been cited during the creation of that blog post.
Opinion is fine, but when the observations are so weeping, just a little bit of substantiation is nice to have.
How to prevent more people from flocking into your field:
1) Write a Slashdot article
2) ???
3) Profit!
In soviet russia the government regulates the companies.
The author of this piece clearly have never done actual science, as confirmed by his resume, and his opinions on what science is and that somehow some observational sciences are "soft" are very questionable at best.
shouldn't there be a link to an article or a more in-depth argument presented than just "b/c i think so"? Perhaps, say, explain who the hell Miko Matsumura is, or provide greater context?
I get it though, nobody reads the articles on slashdot... :/
In my career I have worked for boring banks and boring monolithic enterprise software giants.
If there is one thing I know for certain it is that big enterprise will ALWAYS have a huge appetite for quantification of data. It almost doesn't matter if it actually does anything for you, executives at giant corporations have to DO SOMETHING have to REVIEW SOMETHING. Large scale data aggregation and reporting (one of the many things that go by "big data") might not have sciency uses, but any time a V level can provide a C level with "something" that says "We are doing stuff" there will be a huge market for it.
Basically what I am saying is, even if "Big Data" is nothing but a placebo, like say "HR Training", "Wellness programs", "performance reviews" or "teambuilding" it is a permanent fixture in the big, boring, high paying, stable job providing corporate world.
You can't do that, unless you can figure out how to make and file TWO resumes. Different ones, I mean.
Man, these data scientists are all pipe dreams.
Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
Unfortunately, unless this is structured data, you will be subjected to the data equivalent of dumpster diving. But surfacing insight from a rotting pile of enterprise data is a ghastly process—at best.
Sounds like this Miko Matsumura has no idea how successful Big Data projects actually work.
To refine his analogy, unstructured data is much like processing recyclables. Everything that might possibly be good gets thrown into a large bin, and several sorting processes run to extract individual relevant (though messy) pieces. While those pieces alone aren't pure enough to be useful, there's enough meaningful information in them that statistical analysis can separate the good from the bad, and that's where the insight comes from.
With a typical RDBMS, insight is readily apparent. A hypothesis that 75% of a user's purchases were widgets is simple to verify. In a non-relational database, as is often used in Big Data projects, that would be an inefficient computation (though it can be done). Rather, those databases are more aligned to produce a whole list of correlations between user demographics and purchasing habits, showing for example that users who buy widgets have often already bought foo bars. The "Data Scientist" didn't have to ever look specifically at statistics for widgets or foo bars, but the correlation is presented in a nice and accessible form, gleaned from millions or billions of independent data points.
Miko Matsumura is a Vice President at Hazelcast, an open source in-memory data grid company.
This is a SlashBI article written by executives for executives, with little basis in fact. Lovely.
You do not have a moral or legal right to do absolutely anything you want.
Since "Data Science" is dead, do we go back to using the old buzzwords? Or do we have to wait until some marketing MBA whiz-kid comes up with a sexy new word for "Analyst"?
The term reminds me of "Computer Scientist". I remember a TV commercial from the 80s for a digital watch that mimicked analog watches. The announcer would declare that the watch had been designed by "computer scientists" while an actor was displayed wearing a lab coat and looking at the watch under a microscope. The first time I saw it I was afflicted with fits of laughter.
Proverbs 21:19
Sometimes I think slashdot should really do a better job of filtering these types of things, or at least highlight that this is an opinion and the person writing it has no clue what they are really talking about. I work in the BI space and do everything from Analysis, Architecture, Dashboards, Reporting, ETL, and any other job that fits into that space. We do have a data scientist here and he does nothing close to what this article talks about. In fact, I would argue that if you do the types of jobs this article talks about you're not actually a data scientist, you're a DBA, BA, or something of the sort. A Data Scientist is something very different, and typically they don't have the IT background to create SQL or do anything in the back-end. They do know stats, various algorithms, and can actually take meaningful numbers and explain what they mean, find new trends, and even identify correlations between attributes that the business never thought to look at. They are used to determine what's going on and maybe even why, and not necessarily used to answer a specific business question.
90% of what a data science expert do is what people like to call data-juijitso (data reconfiguration). Which basically means getting data out of your RMDBs, SAP, Twitter, Facebook, random text (.csv, etc) file dumps, random Excel/Word Files and legacy databases and into some place you can actually generate conclusions from (like inside a HDFS Hadoop cluster). Plus during this process you need to normalize all your data so you can apply the same algorithm no matter where the data came from.
All this means is that you will spend countless hours trying to connect to the client legacy stuff and then countless hours trying to get the data out (without impacting production systems!), so you can then spend countless hours formatting this data around to be able to spend countless hours trying to get this data into your Big Data(tm) solution so you can finally run some algorithms and create results. Now multiply all that by the number of different kinds of databases the client has and you get the idea.
As an IT professional you really do not want to work in this field. No organization keep its data in a clean uniform way, data scientist is like an IT janitor.
"Science" lacks a robust definition, but clearly the OP's definition is overly simplistic and narrow. Stephen Hawking has a lecture somewhere (found it: http://www.hawking.org.uk/the-origin-of-the-universe.html) where he talks about the idea of the "positivist" approach defined on the ability to predict outcomes, and I like to apply that definition to Science (Hawking doesn't, directly, but it's sort of an underlying theme). That is, Science becomes the observation and experimentation required to form predictions or cause changes in predicted outcomes.
So Social Science can be a science in so far as it actually informs usefully on how people will behave or provides useful ways to affect and improve the behavior or state of society's future. Computer science is a science insofar as it is required to make computers function as expected (as predicted) -- if you want something to perform faster, you must do the research and experimentation to cause the outcome to be faster. Even archaeology can be a science by this definition in that discoveries are added to a general model of the past that predicted all sorts of things -- ancient society's behavior, glaciation, geological events... "predict" may be a stretch there (except when archaeological finds help predict the future), but in this case the method of building a model of how the world worked based on observation to describe and generalize behavior (of the earth, of ancient religions, or what have you) is a form of prediction; it's just after the fact.
Data Science is very much science in this form; the job of a data scientist is almost universally to predict what the data will say about the future given what it has said in the past. This is invaluable to businesses and while the name may fall into disfavor, in the same way "actuary" which means something very similar already has, the abuse in this article is unwarranted, unfounded, and inaccurate. I will only agree that many who sport the "Data Science" moniker may not actually be doing science by any definition, but that's the individual's fault, not the concept's.
I don't know about you but I am sick and tired of DICE's attempts to
channel and steer the employment market through astroturf postings
to Slashdot, which they also happen to own. Most of what the talking-heads
at DICE churn out regarding employment is simply untrue. Not 'not-the-truth'
as that they don't know any better, but telling lies as in spreading deliberately
misleading information, as in telling a mean-spirited lie.
DICE is not a platform for you and me to find lucrative jobs. Instead it is the
other way around, DICE is a platform for employers to find cheap labor. The
people who in THE END PAY DICE (that is those who use their system to
recruit and those who advertise on DICE.COM sites), they are not interested
in hooking you up with a $150,000 job when you could also be working for
$80,000.
I'm not a Data Scientist myself, but I work with a bunch of them and from what
see they are working on I know I'd have to go back to school for that. It also
explains why they are worth so much and hard to get.
You can't do that, unless you can figure out how to make and file TWO resumes. Different ones, I mean.
Man, these data scientists are all pipe dreams.
Well, it is not rocket science to have more than one resume. You have one work history, but you will use more than one resume format to present it in different (but veritable) ways according to the situation.
See, you are supposed to have multiple versions of your resume (which are true and accurate of course) according to job postings or fields of concentration. If you have a varied work experience, or you are contemplating lateral moves, this is a must.
Consider the following situation I had to deal with recently. After doing some C++ (and other programming bestialities), I switched Java/JEE in the commercial. I did that for about 11 years at several small and large firms (Sony, Citicorp, Motorola, etc.).
Then switched back to C/C++, for embedded systems and communication technology (and a bit of hypervisor research) with a defense contractor. The opportunity was there, after doing e-commerce/enterprisey stuff for so long, this looked very interesting (and more engaging of my CS background) and the money was good, so why not I said.
Then just recently when I tried to go back to Java, and all of the sudden my resume was being sent to the garbage can and job agencies were not submitting me to Java openings I was well qualified for.
Why is that? Well, apparently since I did C/C++ for nearly 4 years (ZOMG! no Java in 4 years!) somehow I became a retard who wouldn't know how to code EJBs, access a database, run an ant or maven build script or put a fucking dynamic web page together. 11 years of Java experience (and 18 years of software engineering) meant shit. I mean seriously?
But such is the world of HR drones and employment middlemen. You can't live out of it, and you have to work with it (or cut through it) in any way possible (otherwise you end up with a shitty job as a neophyte.)
So what I did is that I kept multiple versions of my resume. For a C++ job, I highlighted my recent work describing it in appropriate detail right of the bat, with all the different projects and positions on the first page. This would be my "default" resume.
For a Java job, I would reduce all my C++ work to two lines and bring as much past Java work experience as possible on the first page. Why is that? To ensure the HR drones and staffing middle men would see all the right Java buzzwords on the first page.
There was/is no false information at all on my resumes. I simply omitted work I already did to stress another one. How fucked up is that, that you have to remove some of your recent work history just to get contemplated by human buzzword scanners?
In the end, it worked (sort of since I was able to get a Java position via personal reference and passing the necessary technical interviews.)
But regardless. One should always try to make her case directly to the technical people in charge of hiring. But this is a very rare (and blissful) event. More often than not, you will go through HR or a staffing agency.
That is the general case. And for that general case, you better have your work history in more than one resume format, stressing items according to the desired job position (without lying of course or claiming that you have done shit you have not, of course.)
Companies might be desiring software engineers. But in practice, by accident and plain stupidity, they don't hire for software engineers. They hire for savantism, for autonomous, one-trick-pony drones that operate precisely along the lines of magically selected buzzwords. 10+ years of X, 5+ years of Y and 8+ years of Z. Mix and shake.
Imagine if we were to hire carpenters like that:We seek a master carpenter with 10+ years of experience using a Husky hammer, 8+ years using a HDX philips screwdriver, and 12+ years using a Black & Decker circular saw. Oh, and the br
I've been working with big data since before it was a term and currently run a scientific software company that touches on many aspects of "data science". Many of my colleagues also work in the field. I've seen many fads come and go. Data Science as a profession is one of those.
Most people who call themselves data scientists are really just doing "big data" processing using tools such as Hadoop. They are delivering results to managers who have jumped on the big data band wagon and, not knowing any better, have asked for these skills. In 99% of the cases, the processing is simply haphazardly looking for patterns or running basic statistics on data that really isn't that big. However, there is a lot of low hanging fruit in data that hasn't been analyzed before and most practitioners who've suddenly become data analysis experts are rewarded for trivial findings. A tiny bit of statistics, programming, and data presentation skills go a long way.
Compare this to the Web Masters of the late 1990s. The Web was new and managers knew that they needed Web sites. HTML and CGI were techie things but also fairly easy to learn. A group of people quickly figured out that they could be very important to a company by doing very little work and created the position of Web Master. A tiny bit of programming, sys admin, and design skills went a long way.
Web Masters disappeared when IT departments realized that you actually needed real software developers, real designers, and real sys admins to run a corporate Web site. Sure, the bar is still low, but expertise beyond a 'For Dummies' book is still needed. And, few people can be experts in each area, hence the need for teams.
Real data science has actually been around for a long time. Statisticians and data analysts have been performing this role for decades and have built up a lot of rigor around it. It a tough skill set to develop, but a very useful one to have. "Big Data" distracted people a bit and let the current generation of data scientists jump in and pretend everything was new and we could throw out the old methods. As the field evolves, data science will necessarily transition back to the experts (statisticians) and become a team effort that includes people skilled in programming, IT, and the target domain (analysts).
That said, there's good money to be made right now, so if you have Web Master on your resume, you might as well be a data scientist while you can. ;)
-Chris
From TFA (emphasis added):
Yes, by this standard, Astronomy and Social Sciences are also not sciences. I have no idea what Computer Science is, but no, it’s not a science either.
*sigh* RTFA was a waste of time.
I have a BSc in computer science and operations research (logistics), that bestows the title of "computer scientist" on me but I prefer the term "software developer", I currently do software development at a place that calls itself a "systems engineering research centre". There are lots of stats and data involved in the job but these things do not add up to a scientist doing science.
A Phd is valuable in that it demonstrates that you can research a given topic in an academic setting and formally communicate your findings to others, it's practically mandatory if you want to get someone to pay you to ponder the universe full time. Having said that, Science itself is a philosophy not a vocation, if you live by that philosophy then you are a "Scientist", if you are incurious and just enjoy the fruits of science then you're probably a "Utilitarian".
As to TFA, as an Aussie I've never heard the term "data scientist", I figure it must be American MBA's looking for the word "statistician". They should know however that data mining the internet has been "solved" , IBM are starting to make instances of 'Watson" available to commerce. And speaking of doctors, Watson is also expected to pass the standard exam for a US medical license.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.