Slashdot Mirror


What Does It Mean To Be a Data Scientist?

Nerval's Lobster writes What is a data scientist? "To be honest, I often don't tell people I am a data scientist," writes Simon Hughes, chief data scientist of the Dice Data Science Team. "It's not that I don't enjoy my job (I do!) nor that I'm not proud of what we've achieved (I am); it's just that most people don't really understand what you mean when you say you're a data scientist, or they assume it's some fancy jargon for something else." So how do Simon and his team define "data scientist"? In this blog posting, he breaks it down along several lines: solid programming skills, a scientific mindset, and the ability to use tools are just for starters. A data scientist also needs to be a polymath with strong math skills. "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory," he writes. "Likewise, as a data scientist, I've learned to be suspicious of models that are too accurate, or individual variables that are too predictive." His points are good to keep in mind right now, with everybody throwing around buzzwords like "Big Data" without fully realizing what they mean.

94 comments

  1. It means you jumped on the latest bandwagon by Anonymous Coward · · Score: 5, Insightful

    Just like how 10 years ago, suddenly everyone was an "Architect" and before that you were a "Developer".

    1. Re:It means you jumped on the latest bandwagon by Anonymous Coward · · Score: 2, Insightful

      And it means you're unemployed within 5 years.

    2. Re:It means you jumped on the latest bandwagon by Anonymous Coward · · Score: 0

      The "data science" moniker goes hand-in-hand with the "big data" buzzword.

      Marketers sure seem to eat it up, but really, how far can big data take us?

      Maybe there is a reason why facebook isn't talking about social graphs anymore. They're talking about releasing their own switches for data centers. They got as far as they could with data science, which wasn't really that far.

    3. Re: It means you jumped on the latest bandwagon by Anonymous Coward · · Score: 1

      I am a data scientist. It says so on my business card, which also bears the name of a large corporation. I have a hard science Ph.D. "Data Scientist" means...that I am a statistician. But that's OK, because most people with "Statistician" on their business cards are anything but.

    4. Re:It means you jumped on the latest bandwagon by davester666 · · Score: 3, Insightful

      No, you jump onto the next buzzword.

      Lather. Rinse. Repeat.

      --
      Sleep your way to a whiter smile...date a dentist!
    5. Re:It means you jumped on the latest bandwagon by gweihir · · Score: 5, Insightful

      Indeed. The actual name for this is "Computer Scientist" or in some cases "Statistician". The nonsensical "Data Scientist" is just a marketing term, created solely to inflate the perceived worth of a product, that, by all credible accounts, is not very good.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    6. Re:It means you jumped on the latest bandwagon by Tablizer · · Score: 4, Funny

      I'm not a "troll", I'm an Agitation Engineer.

    7. Re:It means you jumped on the latest bandwagon by Anonymous Coward · · Score: 0

      Just like how 10 years ago, suddenly everyone was an "Architect" and before that you were a "Developer".

      He's actually an analyst that does data modelling with a fancy new 2015 buzzword compliant title. He codes to model things. He isn't a developer nor an architect.

    8. Re:It means you jumped on the latest bandwagon by chthon · · Score: 1

      Should be modded +1 Funny

    9. Re: It means you jumped on the latest bandwagon by AssetYoYo · · Score: 1

      Agree. My card reads, "Data Specialist," but I tell most folks who ask what I do that I'm a "computer yo-yo" and they're good with that.

    10. Re:It means you jumped on the latest bandwagon by Xest · · Score: 0

      But isn't that the same for many other sciences especially relating to medicine for example whereby nearly all of their work is based on statistical analysis of data? I suspect given increased complexity of data sets that we could apply the same logic to many professions. Hell, even the folks at CERN are using wholly statistical methods to determine the likelihood whether their findings really were the Higgs or not, does this mean those physicists are actually just statisticians too?

      I think it's naive to think that as humans progress, that new jobs, and hence new titles aren't created. Sure some people are wholly undeserving of such titles and simply use them to over-inflate their egos, but I don't think such a title is invalid. If someone is doing genuine research into large data sets using the scientific method then what exactly is wrong with the description of Data Scientist?

      Calling him a Computer Scientist ignores his use of statistics, and calling him a Statistician ignores his knowledge of computing. If we're going to dumb down job titles to be less descriptive we in the technology sector might as well all just be typists. That's a lot of what we do right?

      I'm completely against overinflated job titles (like renaming bin men to Waste Disposal Technicians), but in this particular case the whining seems to be wholly unfounded as the job title minimally describes the actual role. It's the simplest yet most descriptive title for the role in question, so what's the problem?

      I don't think it's fair to instantly jump to the conclusion that any new technology term or job title is instantly bullshit. This is one of those circumstances where it's a perfectly sensible title describing an increasingly common role in a world where large data sets and analysis of that data has become ever more important to companies in growing their bottom line.

    11. Re:It means you jumped on the latest bandwagon by gweihir · · Score: 1

      I did not "instantly" jump to any conclusion. I had the first discussion about the name "Data Scientist" with a Statistician about 8 years ago. Also remember that "Statistician" is already specialized, a Statistician is a Mathematician specializing in statistics. Most of them can program these days. On the CS side, Computer Science has not yet started to specialize this strongly. What you would say is "I am a CS specializing in statistical analysis" or something like it.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    12. Re:It means you jumped on the latest bandwagon by Xest · · Score: 1

      "On the CS side, Computer Science has not yet started to specialize this strongly."

      So where is your arbitrary line drawn out of interest? What would be required for a data scientist to be a data scientist? That's assuming all data involved even has any relevance to comp. sci. What if they're using data collected non-computationally also?

      The problem is you obviously like incredibly generic names, and that's great for you, but it makes it much harder for people wanting to advertise for specific roles, or to pay specific salaries.

      If someone advertises for a "Developer", I've no idea if they're paying £10k for a minimum wage intern, or £200k for a top of their field specialist. At least with Software Architect or similar I know they're after someone with strong architecture skills and the salary is going to be in the £60k+ range. Sometimes there just isn't room (and most definitely isn't a need) to write out a whole sentence for a job title or skill requirement.

      Descriptive job titles are useful, I really just don't see what's wrong with them unless they're so over inflated as to be useless, which again, Data Scientist isn't because it's minimal and wholly descriptive of the skills involved.

      I really don't see what problem is being solved in trying to kill of a perfectly correct and perfectly useful job title. The job involves doing science with data, why muddy the waters with terms that are not wholly relevant and describe other things as well?

      I'm just struggling to see what the benefit of losing information is by pushing jobs into overly generic undescriptive or only partially descriptive boxes. What's gained by this? why is it a good thing?

    13. Re:It means you jumped on the latest bandwagon by gweihir · · Score: 1

      There are no "Data Scientists" because there is no "Data Science" other than Statistics and it already has a name. Marketing BS is not a sound reason to mess with language.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    14. Re:It means you jumped on the latest bandwagon by the+gnat · · Score: 1

      I translate "data scientist" as "PhD in hard sciences who couldn't get a job in his or her field because we've been massively over-training PhDs for the last couple of decades, so he/she took a course in statistics and learned to write simple Python scripts and use scikit-learn and Hadoop." That seems to cover most of the ones I know, anyway. (Although to be fair, some of them knew Python already.)

    15. Re:It means you jumped on the latest bandwagon by Xest · · Score: 1

      Right, except there's a problem, everywhere and everyone that matters in the world of technology disagrees with you from IBM to Apple, from Facebook to Google, from Microsoft to Oracle, from MIT to Cambridge, from Harvard to Berkley, from Tim Berners Lee to Mark Zuckerberg, from Sandy Pentland to Bill Gates, from Peter Norvig to Larry Page.

      So on one hand we have some random guy on Slashdot claiming it doesn't exist, and on the other we have the who's who of technology companies, universities, technologists, professors saying it does.

      You'll have to excuse me therefore if I can't help but think that what you're actually saying is "I've no idea what the fuck data science is, so I'm going to pretend it doesn't exist and that it's stupid". Your argument doesn't even make sense, you recognise statistics is a specialisation of mathematics and claim that's okay, but specialisations of other subjects are apparently not, yet you can't elaborate why, or even where or how you drawn the line. You claim it's just statistics, but then statistics doesn't tell us how to gather, store, manage, and work with massive data sets, data sets so large we're on the cutting edge of figuring out how to deal with them, something that requires research, you know, like science.

      Well, good luck with that but if you hadn't noticed there's a lot of people making use of it in both positive and negative ways, from large scale healthcare research, to selling us as products, to the NSA profiling us on our data. It's probably not something that you should pretend just doesn't exist, because it kind of has profound implications for our lives today, and going forward.

    16. Re:It means you jumped on the latest bandwagon by Anonymous Coward · · Score: 0

      Check yourself. From the Wikipedia page on Data Science:

      In November 1997, C.F. Jeff Wu gave the inaugural lecture entitled "Statistics = Data Science?" for his appointment to the H. C. Carver Professorship at the University of Michigan.

      ...

      In conclusion, he coined the term "data science" and advocated that statistics be renamed data science and statisticians data scientists.

      And a lot of prominent people seem to agree with gweihir's point:

      Although use of the term "data science" has exploded in business environments, many academics and journalists see no distinction between data science and statistics. Writing in Forbes, Gil Press argues that data science is a buzzword without a clear definition and has simply replaced “business analytics” in contexts such as graduate degree programs. In the question-and-answer section of his keynote address at the Joint Statistical Meetings of American Statistical Association, noted applied statistician Nate Silver said, “I think data-scientist is a sexed up term for a statistician....Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.”

    17. Re:It means you jumped on the latest bandwagon by gweihir · · Score: 1

      Well, you have insulted me, insinuated that I do not know hat Statistics and Big Data is (both wrong), but do you actually have some _arguments_? Because up to now I find none at all.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    18. Re:It means you jumped on the latest bandwagon by Xest · · Score: 1

      Well that's one of the unfortunate things about being the sort of person who thinks they know better than just about everyone that matters in the industry, you generally wont find arguments in anything you read because you've already decided that you're right and the whole world is wrong. You can't see what's right in front of your eyes because you don't want to.

      Instead you now play the victim, and keep deflecting away from the inconvenient fact that you seem unable to expand on why you arbitrarily think sometimes it's okay to call a specialisation a new role, but not other times. I'm willing to accept that you may be right, that maybe you have a good argument, but when you're not willing to explain the conflicts your own comments create then what am I to think other than that you're avoiding doing so simply because you can't do so?

      If you have a good justification as to why it's okay to say, separate statisticians from mathematicians, but not data scientists from computer scientists, then I'd genuinely really love to hear it. Similarly I'd really love to hear what you feel the benefits are in going for generic and non-descriptive job titles over job titles that better describe a role, I'd like to know what the benefits are, so please, if you really think I've been unfair on you then go ahead and explain your points further so we can iron out those inconsistencies in your original arguments.

    19. Re:It means you jumped on the latest bandwagon by Xest · · Score: 1

      Those "prominent" people are also people who have no relation to the field of technology which is where data science has it's focus (precisely because the volumes of data that require new scientific effort can only be handled by computers).

      Most journalists couldn't tell you the difference between a neurologist and neurosurgeon either, but it doesn't mean that they're not distinct roles.

      A handful of journalists and an old school medical statistician still doesn't exactly provide a compelling list of weight to counter the who's who of technology business and academia. We're talking literally thousands of the best minds in the businesses against a bunch of people in a wholly different business and a tiny handful of dissenters.

      Data science is multidisciplinary, it requires you to be a polymath. Any statistician who believes they're a data scientist needs to show they have the pre-requisite knowledge outside of statistics coming from computer science and non-statistical mathematics (i.e. graph theory). Statistics is obviously a key discipline in data science, but it's most definitely not the only discipline (even gweihir recognised this with his mention of CS).

      A statistician can analyse a dataset and pull information from it, but they cannot deal with a dataset so large that anything other than bespoke hardware and software setups can handle it (i.e. the petabytes of data CERN produces), to do that, you need data scientists. You may find that data scientists then pass on subsets of that data, or data they have resolved from that data to statisticians to work on, but the statisticians themselves wont have that knowledge to handle the data set, and if they do then they can start calling themselves data scientists because they know more than just statistics, they know statistics and a bunch of other disciplines in enough depth to be actual data scientists.

      Long story short, you can be a statistician without being a data scientist, but a data scientist will need statistics and a whole bunch of other things, at that point why is a data scientist just a statistician rather than just a computer scientist, or just a mathematician, or just a low end physicist? You can't just pick one of these fields arbitrarily, they're all as important to the role hence why you need a new term to encompass the required knowledge.

    20. Re:It means you jumped on the latest bandwagon by gweihir · · Score: 1

      And more posturing. And dyslexia. Pathetic.

      --
      Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
    21. Re:It means you jumped on the latest bandwagon by Anonymous Coward · · Score: 0

      AKA gweihir is wrong but he can't admit it. As usual. Pathetic.

    22. Re:It means you jumped on the latest bandwagon by Xest · · Score: 1

      So you make a comment, you completely fail to back it up, and you call someone else pathetic?

      You know it's probably easier to just admit you made a comment you didn't think through and that was wrong rather than to continuously try and avoid what's obvious to anyone reading - that you can't back up your point - by playing the victim and throwing random and seemingly arbitrary insults (do you actually know what dyslexia is? it would appear not).

      I really pity you.

    23. Re:It means you jumped on the latest bandwagon by majid_aldo · · Score: 1

      disagree. statistics has not traditionally solved the problems that data scientists are working on these days.

      --
      --- widget evolution: enhanced, plus, super, ultra, extreme, exxxtreme, ultra-extreme, ..etc.
  2. What Does It Mean To Be a Data Scientist? by Anonymous Coward · · Score: 5, Funny

    It means you get no women.

    1. Re:What Does It Mean To Be a Data Scientist? by seededfury · · Score: 2

      It means you get to be the second officer on the enterprise!

    2. Re:What Does It Mean To Be a Data Scientist? by hilather · · Score: 2

      It means you get no women.

      I think I really misunderstood the job posting that said "works with models" then.

    3. Re:What Does It Mean To Be a Data Scientist? by TeknoHog · · Score: 1

      I think I really misunderstood the job posting that said "works with models" then.

      I used to do modelling on a supercomputer. That's like supermodelling, right?

      --
      Escher was the first MC and Giger invented the HR department.
  3. Score!!! by DoofusOfDeath · · Score: 5, Funny

    I can't believe Slashdot managed to land an interview with someone from Dice! Time to make some popcorn, sit back, and enjoy the fireworks!

    1. Re:Score!!! by NoNonAlphaCharsHere · · Score: 1

      I know, right? I wonder how Slashdot even got anyone from Dice to even notice them, much less do a full, informative, in-depth interview about cutting-edge technology!

    2. Re:Score!!! by Anonymous Coward · · Score: 0

      He is soo good, even dummy mode went on...

  4. It means the same thing as by Anonymous Coward · · Score: 0

    being a "behavioral economist". No one knows what it means, but it sounds like you're making money for someone. And that's what counts.

  5. What does it mean to read Dicedot? by Anonymous Coward · · Score: 0

    What does anything mean anymore?

  6. What a data scientist is by Anonymous Coward · · Score: 1

    Just think - telecoms are accumulating petabytes of data from call setup and cellular handoffs EVERY FEW MONTHS. And this data can be cross referenced with subscriber data and sliced and diced in almost infinitely many different ways.

    If you're the one reciting stats like that with wide open eyes, you're a Data Scientist.

    If you just shrug and say, "Yeah. So?" like everyone else, you're not.

    1. Re:What a data scientist is by bouldin · · Score: 1

      Just think - telecoms are accumulating petabytes of data from call setup and cellular handoffs EVERY FEW MONTHS. And this data can be cross referenced with subscriber data and sliced and diced in almost infinitely many different ways. If you're the one reciting stats like that with wide open eyes, you're a Data Scientist. If you just shrug and say, "Yeah. So?" like everyone else, you're not.

      I agree, and playing with that kind of data actually sounds fun.

      The big question is, though, what can you do with that information? You could study commute patterns (interesting to a scientist but low-value to a telecom, and more easily found with GPS tracking on a sample, anyway) or you can use this for capacity planning (but the statistics are so trivial you don't really need a data scientist).

      I think people (especially marketers) tend to have inflated expectations of what you can actually accomplish with data science.

      For example, despite all facebook's claims to having a treasure trove of profile data, their ad placement does not seem to be any better than google's keyword-driven ad placement.

    2. Re:What a data scientist is by HornWumpus · · Score: 1

      I knew a guy who claimed to 'frolic in the database'; weirdo.

      I on the other hand occasionally 'wallow in the data' or 'root around in it'.

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
  7. A scientist??? by Anonymous Coward · · Score: 2, Funny

    It means you get to play with beakers and such. No self respecting scientist doesn't have lot of beakers, test tubes, and strange lab setups with tubes going in all directions.

    1. Re:A scientist??? by Anonymous Coward · · Score: 1

      God yes. And tesla coils. And one-sneeze-from-exploding alkalines. AND A WIND TUNNEL! And industrial, dangerous, heavy duty [any large machine]s.

      And a lab coat and a clipboard and a particle accelerator and lasers and goggles and oh shit I have a boner.

  8. What does it mean? by Anonymous Coward · · Score: 0

    So how do Simon and his team define "data scientist"? In this blog posting, he breaks it down along several lines: solid programming skills, a scientific mindset, and the ability to use tools are just for starters. A data scientist also needs to be a polymath with strong math skills. "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory," he writes. "Likewise, as a data scientist, I've learned to be suspicious of models that are too accurate, or individual variables that are too predictive."

    So it is a scientist who has strong math skills, can program computers, and can use tools. Thanks for that awesome summary. No wonder he mentions it to people. At least he gives a couple of examples on the actual page of things he has worked on.

    1. Re:What does it mean? by Anonymous Coward · · Score: 0

      Back off, man. I'm a data scientist.

  9. It means... by thegarbz · · Score: 5, Insightful

    You cant spell statistician and anyway were too embarrassed to put it on your business card.

    I think we should submit an Ask Slashdot where we ask data scientists precisely how they work in ways that they apply scientific method in their day to day life. Or does having a "scientific mind" now qualify as being a scientist.

    I have a scientific mindset, will I be a pornography scientist later tonight, am I a trolling scientist now?

    1. Re:It means... by Anonymous Coward · · Score: 0

      This exactly, except "data scientist" is actually a better way to describe what statisticians actually do. But thinking it's something new and cool is just flatly not true. The sad and somewhat frustrating thing is that too many people who think they're "data scientists" don't know how to think about either data or science nearly as well as a qualified statistician does.

    2. Re:It means... by Anonymous Coward · · Score: 0

      pornography scientist

      I take it you'll be looking at a lot of empirical evidence

    3. Re:It means... by gnupun · · Score: 3, Interesting

      This exactly, except "data scientist" is actually a better way to describe what statisticians actually do.

      But there's a big difference between a scientist and a statistician. A scientist pokes around and discovers new theories or mathematical models (often out of thin air). A statistician OTOH, like an engineer, simply applies the theories of scientists to accomplish real world usable things like pie charts and bar graphs.

      So unless this guy is discovering or testing new theories, he's not a scientist. He's just a statistician.

    4. Re:It means... by Anonymous Coward · · Score: 0

      But ... computer!

    5. Re:It means... by Anonymous Coward · · Score: 0

      Sounds like you've never met a statistician. No respectable one would ever use a pie chart, and only rarely would use a bar graph.

    6. Re:It means... by Anonymous Coward · · Score: 0

      I think the rest of the world has these totally switched then. Fisher was just a statistician, the dumbass misexplaining confidence intervals is a data scientist. I get it now.

  10. Give me a break... by Anonymous Coward · · Score: 0

    Data scientist, econometrician, quant... just a fancy way to say I'm a statistician who knows how to program.

  11. It means... by msobkow · · Score: 1

    It means you opted for the Blue shirt instead of the Gold. :D

    --
    I do not fail; I succeed at finding out what does not work.
  12. WHAT is a data scientist? by Anonymous Coward · · Score: 0

    What is important is for data scientists to fully understand the theories that they base their work upon, and knowing the risks involved. Not doing so is irresponsible, and can lead to misinformation and confusion, data corruption. We may never fully understand the nature of our universe, and almost certainly will never understand it in our lifetimes. But the question raised in the topic is actually a fundamental one that spans far beyond dark matter to all forms of theoretical science. Many theories are based heavily upon other theories. The "root" theories (with any luck) will eventually be proven or disproven, affecting all research and theories which follow that data "root".

    Regular NON-DATA scientists, on the other hand, have a driving desire to learn. This has nothing to do with "anti-religion" or a desire to prove there is no God. In fact, you may find that quite a few scientists do believe in God or a "creator" or what have you. They just don't try to use this "God" concept to explain away the unexplainable. They have been issued a challenge by the universe and they have chosen to rise to the occasion. My guess is because there is precious little left to explain, as most of our daily life has been easily described by data scientist machine learned science.

    1. Re:WHAT is a data scientist? by Anonymous Coward · · Score: 0

      You must be a Word Scientist.

  13. What does it mean? by Pete+Venkman · · Score: 4, Informative

    Absolutely nothing.

  14. Missing aspect: sociology by michaelmalak · · Score: 3, Interesting

    Without sociology skills (my blog) on a data science team, hypothesis formation and ability to model clients will suffer. It would seem particularly important for a people-focused company like Dice.com.

    1. Re:Missing aspect: sociology by drinkypoo · · Score: 1

      It would seem particularly important for a people-focused company like Dice.com.

      They're not people-focused, they're employer-focused. It costs money to post jobs on Dice, but it's free to look at them. You are the product. Dice has commoditized you. Dice is for employers to buy you, it's not for your use. If it were, then employers would post jobs for free, and you would pay for access (or it would be free.)

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    2. Re:Missing aspect: sociology by Uzuri · · Score: 1

      " If it were, then employers would post jobs for free, and you would pay for access"

      For the love of God, don't give them any ideas!

      --
      I'm a she-slashdotter... but I make up for it by living with my folks.
  15. Good and bad for everything by HalAtWork · · Score: 1

    I'm sure there are good reasons to datamine and bad reasons as well. Some goals yield benefits to many while others are more selfish. The question is if there can be more good done or more bad, and if the benefits outweigh the pitfalls. What are we wiling to sacrifice? Are our desires important enough to risk the pitfalls? Do we think we can account for the pitfalls and protect ourselves against them, or are we just being arrogant and blindsiding ourselves?

    Why am I asking you?

  16. Good Scotsman Fallacy by Demena · · Score: 2, Interesting

    Errr... You claim to be a scientist and yet you say "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory," .

    Circular definition, circular argument. Also, false. Many scientists (like Darwin for example) form a theory and then look for empirical evidence to test that theory. Next time start that sentence with "In my opinion" and you get away with it. You didn't and you don't.

    Reading your article, it says nothing. I would not hire you on the basis of what you have written here.

    Pardon me if that seems rude but it was in my opinion, too superficial to ignore.

    Oh! By the way, what you do has had a title for a generation. You are an analyst doing what analysts do. Analyse data.

    1. Re:Good Scotsman Fallacy by Anonymous Coward · · Score: 0

      Errr... You claim to be a scientist and yet you say "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory,"

      Circular definition, circular argument.

      Erm, logical parsing error. "All A are B" is not a definition of A. Also, you fail to show circularity.

      Also, false. Many scientists (like Darwin for example) form a theory and then look for empirical evidence to test that theory.

      Look, sonny, I don't know where you got your science education from (assuming you even did) but you might want to read up on the scientific method. You have no idea what you're talking about here. Also, FWIW, Darwin started formulating his Origin of Species theory after he gathered observations during his Beagle voyage.

      Reading your article, it says nothing. I would not hire you on the basis of what you have written here.

      Reading your post, it says you're an idiot who doesn't even question whether (s)he has the foggiest idea what (s)he's talking about, or a shred of evidence to back up his/her statements. With your complete lack of a scientific mindset (as per your second paragraph). As the saying goes, you opened your mouth and removed all doubt. I would bet good money this Dice guy, regardless of how competent he is or isn't, would choose not to work for you even if you wanted him to. In fact, were you in the position to actually hire anyone, you might find yourself with either a chronic shortage of willing candidates or, if you're lucky, an absymal retention rate for your hires.

      The one thing that I would almost agree with you on is that TFA is a fluff piece with a HR angle. This becomes clear when the author answers the title question with what a data scientist should know as tools, as opposed to what one should do with said tools.

      Oh! By the way, what you do has had a title for a generation. You are an analyst doing what analysts do. Analyse data.

      Right. And by the same token there's really no difference between the earth digging done by a miner, a construction worker or an undertaker. Earth is earth, pick and shovel use, is all the same skill, really, so why do they have different titles? While we're at it, medical specializations are superfluous and all engineers should just be that, engineers. My fellow men, we have a language revolution to undertake, let's get to it!

    2. Re:Good Scotsman Fallacy by Anonymous Coward · · Score: 0

      Looks like GP touched a nerve. You must be a "Data Scientist".

    3. Re:Good Scotsman Fallacy by Anonymous Coward · · Score: 0

      Also, false. Many scientists (like Darwin for example) form a theory and then look for empirical evidence to test that theory.

      Look, sonny, I don't know where you got your science education from (assuming you even did) but you might want to read up on the scientific method. You have no idea what you're talking about here. Also, FWIW, Darwin started formulating his Origin of Species theory after he gathered observations during his Beagle voyage.

      Uh.. what? The poster just summarized the scientific method we all learned in grade school. Form a theory, make predictions, test the theory.

      Also, you criticize heavily in your post, but make few good points. Instead, you seem to say "you're wrong because you're an idiot."

    4. Re:Good Scotsman Fallacy by penguinoid · · Score: 1

      There's a huge difference between having a theory and being convinced about a theory.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    5. Re:Good Scotsman Fallacy by Anonymous Coward · · Score: 0

      Uh.. what? The poster just summarized the scientific method we all learned in grade school. Form a theory, make predictions, test the theory.

      No, the scientific method starts from data. One does not come up with a theory in the absence of data, that Platonic nonsense is dead and buried in all natural sciences (note the emphasis, though; that excludes humanities and to a large extent Economics :). One comes up with a theory to explain regularities observed in the data, then tries to see the limits of aplicability. So it's data -> theory -> counter-examples, as opposed to the OP's amusing theory -> supporting data misguided idea.

      And no, it's you're wrong because those assertions of yours are stupid (as in, have no connection with logic or reality, and here's why) and by the way, claiming those stupid things makes you look like an idiot.

    6. Re:Good Scotsman Fallacy by HornWumpus · · Score: 1

      Dinosaurs are narrow at one end, much much thicker in the middle and narrow again at the other end...

      --
      John McAfee 'It was like that time I hired that Bangkok prostitute; to do my taxes, while I fucked my accountant'
    7. Re:Good Scotsman Fallacy by mcswell · · Score: 1

      I generally agree with you, but I think it goes both ways. One starts with some data, formulates a theory to explain that data. But often the original data is not sufficient to test the theory, so you (or other scientists) go looking for more data. Darwin did continue looking at data, and certainly physicists do that: the various colliders that have been built over the last several decades were built not just to collect data so someone could come up with a new theory, but--IIUC--to test specific theories that had already been proposed.

    8. Re:Good Scotsman Fallacy by Anonymous Coward · · Score: 0

      Darwin did continue looking at data, and certainly physicists do that: the various colliders that have been built over the last several decades were built not just to collect data so someone could come up with a new theory, but--IIUC--to test specific theories that had already been proposed.

      Ah, this is something of an expansion of the original topic, but here we go :) Most if not all experiments in HEP are done looking for 'new Physics' (the accepted term, never really liked it personally but there you go). What this means is, cases where the prevailing theory (QED, QCD, ...) breaks down. So counter-examples. Many theorists do try to come up with alternative models, but so far the efficacity has proven to be ... an elusive beast. See for the most ponderous examples the menagerie of string theories, SuSy and quantum gravity models[*]. However, even those start from elements of current theories, which are traced back to data. Simply because you have to at least provide for a way of current (exceedingly well working) approximations to fall out of your model at the appropriate parameter scales. The main problem everyone faces outside those parameter scales is that you don't exactly have a finite number of possible models, and selecting among them will require data that is not available.

      So yes, there exist theorists that are model builders in search of data. So far, they have met with remarkably little succes, so no, science really didn't evolve that way up until now. Also, it does not validate the OP's 'many scientists do this' statement (which, funnily enough, would be a better target for his No True Scotsman rant than the sentence he picked on, although neither him nor the article author did the goalpost moving needed for the actual fallacy) and it certainly isn't the case for Darwin.

      [*] note that I'm not calling them useless. They are interesting Mathematical problems and, as applied Math, have had unexpected applications. Math is this way, and it's its one-of-a-kind charm, one creates models and they sometimes apply beautifully to something, see non-Euclidean geometries and Relativity or network topology. But that's not the empiric scientific method, as Math is art, not a natural science ;)

  17. Too much accuracy by Okian+Warrior · · Score: 1

    "Likewise, as a data scientist, I've learned to be suspicious of models that are too accurate, or individual variables that are too predictive."

    I know just how you feel!

    One way around this problem is to round down to the next significance level and reduce it to a yes/no assessment.

    For example, instead of reporting the actual significance, say "p<.05" and instead of citing the correlation as a number, say "we therefore reject the null hypothesis".

    Works a peach, required in most journals, and reduces the workload of the reviewers.

  18. Goodbye to any ethics Slashdot had by Anonymous Coward · · Score: 1

    I guess whatever journalistic ethics Slashdot used to have are out the window. No indication in the OP that Dice owns Slashdot. (I mean, sure most people know that, but when OSDN owned Slashdot at least all relationships were disclosed up front.)

    1. Re:Goodbye to any ethics Slashdot had by Anonymous Coward · · Score: 0

      I guess whatever journalistic ethics Slashdot used to have are out the window.

      Way to be a misogynist.

  19. You Call Yourself a Data Scientist by brian.stinar · · Score: 1

    What it means "to be a data scientist?" It means that you call yourself a data scientist, and that someone pays you to do things that either you, they, or both of you, agree are "data scientist" types of things. If you're not getting paid, then I think it makes you an "amateur data scientist", "data scientist in training", and "intern data scientist" or my favorite, an "indentured data scientist." There may be other amazing terms to describe this phenomenon (unpaid data scientist) but I believe I am missing them.

    I could be a "data scientist", "programmer", "technical manager", "software engineer", "software architect", "pimp" or "software gangster." I prefer to call myself a "contractor" or sometimes "consultant" though. The last two tend to have the type of tax benefits I like, and don't really result in a customer specifying the time, place, and manner of my work to the same degree as if I used the term "employee."

    The only person that I've met that I wouldn't feel like punching them in the face for them calling themselves a "data scientist" had a masters degree in statistics, was super good with relational databases, and all right at programming (but not awesome.) I do live in New Mexico, and we aren't exactly trendy, so I can imagine a lot of people that might be legitimate (not amateur) data scientists that live here call themselves database administrators, or programmers, since they aren't concerned with what Dice says they should be making as a "data scientist."

    To me, this distinction has no use. That may be because I don't want to be a "data scientist" or spend time with them, despite working on analyzing large data sets and doing "data science" for paying customers.

  20. It means you're lazy. by Karmashock · · Score: 2, Interesting

    I'm sure there are some good data scientists but most of the papers I've seen lately that are based on statistics or various data sets are extremely lazy. You have someone that just combs through data and then tries to make a novel association. Nearly always they just show correlation and never causation.

    I think that is one of the bigger problems. Because you're not collecting the data or structuring the experiments that collect the data, you can't isolate anything from the data. All you can do is say "well, this might be happening"... which is often completely useless. A more useful thing they could do is find that correlation and then see if they actually have causation by doing a follow up experiment or study that isolates for a specific variable under controlled conditions.

    That is, I think data scientists would be more useful if they used the study as a jumping off point to doing an actual study. And I'm not especially interested in reading or even hearing about anything they've done until they've concluded that secondary study.

    Absent that... it is lazy, boring, not interesting, and who cares.

    --
    I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
    1. Re:It means you're lazy. by umafuckit · · Score: 1

      That is, I think data scientists would be more useful if they used the study as a jumping off point to doing an actual study.

      At which point they'd be "scientists" not "data scientists"

    2. Re:It means you're lazy. by Karmashock · · Score: 1

      Yes... actually useful.

      --
      I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
  21. Having worked as a "data scientist" for 3 months.. by Anonymous Coward · · Score: 0

    ...I'm still a little vague. Maybe it's just the place I worked, but it seemed like a grab-bag of random bits that didn't really fit elsewhere. In any case I didn't enjoy it much so I went elsewhere.

  22. I'll take the bait by Anonymous Coward · · Score: 0

    When I hear "data scientist" my mind immediately wonders to billions currently being spent invading everyone's privacy enmasse.

    What is it that you actually do that means anything to anyone other than your boss? Are you just a tool of the marketing group? Is this where "business acumen" comes in?

    I might seem a little harsh or unfair yet seeing as you have made no mention of "what" you actually do in your day job other than name dropping technologies and blabber quotes from trade rags while basically bragging about how "elite" one must be to do your job what am I supposed to think? Please be more specific.

  23. alternative to "data scientist" by ihtoit · · Score: 1

    how about "blind-input technical author"?
    Considering a good scientist goes in to a sea of data with no expectations (hence bias) about what that data is going to reveal, hence has no incentive to cherrypick. Even anomalies are data. Why are those anomalies there? Are they actually anomalies? Or are they indicators that the original hypothesis or the gathering method itself is flawed?

    Me? I'm in to highly technical writing, but not from a mechanical or electronic or programming field. I analyse human data. That which directly affects people both on an individual and collective basis. This means I can scale my research from a single person to a cast of tens of thousands. The nature of that data varies, as does the purpose of the writing. And no, the pay isn't very good but I enjoy what I do.

    --
    Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
  24. Re:Fellow data scientist doesn't know what he is, by ihtoit · · Score: 2

    it wasn't funny the first time this hoax was put out, it's not fucking funny today. Fuck off and die in a fire.

    --
    Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
  25. LMAO funny by Anonymous Coward · · Score: 0

    In the article the guy comments that anything with 'sicence in its title is not a science', disavows that 'data science' falls in to that category then procedes to demonstrate why it's actually true about 'data science'....blah...nothing wrong with being a statistician & wanting to 'coax knowledge' out of data sets. I love numbers & the potential patterns in them that can lead to 'observable facts' (within bounds of 'statistical fact' that is) but no hell no way would I go around advertising myself as a 'data scientist'...besides I have a Masters in Physics, and at one time was a card carrying 'scientist' though life took me in another direction.

  26. More like data zombie by Anonymous Coward · · Score: 0
    The institute next door is spilling out new "data scientists" at an alarming rate. For me they are just data addicted, data dredging zombies that blindly apply every machine learning algorithm to every data-set they can get a hold on, validating their results using 10-fold cross-validation, no matter whether it is appropriate or not. Unfortunately they rarely appear to think critically about what they are doing, never questioning whether a particular model makes sense – and more importantly – never wondering whether the task they are trying to achieve makes sense.

    Now I exaggerated a bit, but you get the point...

  27. Sounds like Engineering by Anonymous Coward · · Score: 0

    That's the normal, university level engineering what is being described here, on average.

  28. Data scientist? by Anonymous Coward · · Score: 0

    Hmmm, data scientists. Are not those the people employed by places like the beltway think tanks that look at a bunch of non-relavent data and use statistical and mathematical tricks to twist that data to where it fits the result the think tanks have been paid to produce?

  29. I'm not a data scientist by OakDragon · · Score: 1

    I rock the house and sign the tits, and that's it!

  30. As a statistician by ichabod801 · · Score: 1

    I've always said that data scientist is just a buzzword for statistician. Another statistician called me on that one day, and said "No, a data scientist is a programmer." I'm sorry, but in this day and age, if you are a statistician who can't program, you're not a very good statistician.

    1. Re:As a statistician by majid_aldo · · Score: 1

      agree but statisticians have not solved the kind of problems data scientists work on these days. neural networks did not come from statisticians or example. data science involves /some/ statistics and /some/ computer science and that scientific mindset.

      --
      --- widget evolution: enhanced, plus, super, ultra, extreme, exxxtreme, ultra-extreme, ..etc.
  31. Now all those starving PHDs can get employment? by Anonymous Coward · · Score: 0

    Now all those starving PHDs who specialised in physics, maths, biology etc can get employment?

  32. Reductionism Required by Anonymous Coward · · Score: 0

    The best Data Scientist I know once told me that "we should stop calling it data science because people just interpret what that means, and they are usually wrong. We should just call it 'counting' or maybe 'fancy counting' if you really need to call it something."

    Truth be told, the Job of a good data scientist is that of an analyst, just one that knows how to execute slightly larger scale analyses.

  33. Full stack by thisisauniqueid · · Score: 1

    Data science is to machine learning as "full stack" is to web development. i.e. a horrible buzzword.