What Does It Mean To Be a Data Scientist?
Nerval's Lobster writes What is a data scientist? "To be honest, I often don't tell people I am a data scientist," writes Simon Hughes, chief data scientist of the Dice Data Science Team. "It's not that I don't enjoy my job (I do!) nor that I'm not proud of what we've achieved (I am); it's just that most people don't really understand what you mean when you say you're a data scientist, or they assume it's some fancy jargon for something else." So how do Simon and his team define "data scientist"? In this blog posting, he breaks it down along several lines: solid programming skills, a scientific mindset, and the ability to use tools are just for starters. A data scientist also needs to be a polymath with strong math skills. "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory," he writes. "Likewise, as a data scientist, I've learned to be suspicious of models that are too accurate, or individual variables that are too predictive." His points are good to keep in mind right now, with everybody throwing around buzzwords like "Big Data" without fully realizing what they mean.
Just like how 10 years ago, suddenly everyone was an "Architect" and before that you were a "Developer".
It means you get no women.
I can't believe Slashdot managed to land an interview with someone from Dice! Time to make some popcorn, sit back, and enjoy the fireworks!
being a "behavioral economist". No one knows what it means, but it sounds like you're making money for someone. And that's what counts.
What does anything mean anymore?
Just think - telecoms are accumulating petabytes of data from call setup and cellular handoffs EVERY FEW MONTHS. And this data can be cross referenced with subscriber data and sliced and diced in almost infinitely many different ways.
If you're the one reciting stats like that with wide open eyes, you're a Data Scientist.
If you just shrug and say, "Yeah. So?" like everyone else, you're not.
It means you get to play with beakers and such. No self respecting scientist doesn't have lot of beakers, test tubes, and strange lab setups with tubes going in all directions.
So how do Simon and his team define "data scientist"? In this blog posting, he breaks it down along several lines: solid programming skills, a scientific mindset, and the ability to use tools are just for starters. A data scientist also needs to be a polymath with strong math skills. "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory," he writes. "Likewise, as a data scientist, I've learned to be suspicious of models that are too accurate, or individual variables that are too predictive."
So it is a scientist who has strong math skills, can program computers, and can use tools. Thanks for that awesome summary. No wonder he mentions it to people. At least he gives a couple of examples on the actual page of things he has worked on.
You cant spell statistician and anyway were too embarrassed to put it on your business card.
I think we should submit an Ask Slashdot where we ask data scientists precisely how they work in ways that they apply scientific method in their day to day life. Or does having a "scientific mind" now qualify as being a scientist.
I have a scientific mindset, will I be a pornography scientist later tonight, am I a trolling scientist now?
Data scientist, econometrician, quant... just a fancy way to say I'm a statistician who knows how to program.
It means you opted for the Blue shirt instead of the Gold. :D
I do not fail; I succeed at finding out what does not work.
What is important is for data scientists to fully understand the theories that they base their work upon, and knowing the risks involved. Not doing so is irresponsible, and can lead to misinformation and confusion, data corruption. We may never fully understand the nature of our universe, and almost certainly will never understand it in our lifetimes. But the question raised in the topic is actually a fundamental one that spans far beyond dark matter to all forms of theoretical science. Many theories are based heavily upon other theories. The "root" theories (with any luck) will eventually be proven or disproven, affecting all research and theories which follow that data "root".
Regular NON-DATA scientists, on the other hand, have a driving desire to learn. This has nothing to do with "anti-religion" or a desire to prove there is no God. In fact, you may find that quite a few scientists do believe in God or a "creator" or what have you. They just don't try to use this "God" concept to explain away the unexplainable. They have been issued a challenge by the universe and they have chosen to rise to the occasion. My guess is because there is precious little left to explain, as most of our daily life has been easily described by data scientist machine learned science.
Absolutely nothing.
Without sociology skills (my blog) on a data science team, hypothesis formation and ability to model clients will suffer. It would seem particularly important for a people-focused company like Dice.com.
I'm sure there are good reasons to datamine and bad reasons as well. Some goals yield benefits to many while others are more selfish. The question is if there can be more good done or more bad, and if the benefits outweigh the pitfalls. What are we wiling to sacrifice? Are our desires important enough to risk the pitfalls? Do we think we can account for the pitfalls and protect ourselves against them, or are we just being arrogant and blindsiding ourselves?
Why am I asking you?
Twinstiq, game news
Errr... You claim to be a scientist and yet you say "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory," .
Circular definition, circular argument. Also, false. Many scientists (like Darwin for example) form a theory and then look for empirical evidence to test that theory. Next time start that sentence with "In my opinion" and you get away with it. You didn't and you don't.
Reading your article, it says nothing. I would not hire you on the basis of what you have written here.
Pardon me if that seems rude but it was in my opinion, too superficial to ignore.
Oh! By the way, what you do has had a title for a generation. You are an analyst doing what analysts do. Analyse data.
"Likewise, as a data scientist, I've learned to be suspicious of models that are too accurate, or individual variables that are too predictive."
I know just how you feel!
One way around this problem is to round down to the next significance level and reduce it to a yes/no assessment.
For example, instead of reporting the actual significance, say "p<.05" and instead of citing the correlation as a number, say "we therefore reject the null hypothesis".
Works a peach, required in most journals, and reduces the workload of the reviewers.
I guess whatever journalistic ethics Slashdot used to have are out the window. No indication in the OP that Dice owns Slashdot. (I mean, sure most people know that, but when OSDN owned Slashdot at least all relationships were disclosed up front.)
What it means "to be a data scientist?" It means that you call yourself a data scientist, and that someone pays you to do things that either you, they, or both of you, agree are "data scientist" types of things. If you're not getting paid, then I think it makes you an "amateur data scientist", "data scientist in training", and "intern data scientist" or my favorite, an "indentured data scientist." There may be other amazing terms to describe this phenomenon (unpaid data scientist) but I believe I am missing them.
I could be a "data scientist", "programmer", "technical manager", "software engineer", "software architect", "pimp" or "software gangster." I prefer to call myself a "contractor" or sometimes "consultant" though. The last two tend to have the type of tax benefits I like, and don't really result in a customer specifying the time, place, and manner of my work to the same degree as if I used the term "employee."
The only person that I've met that I wouldn't feel like punching them in the face for them calling themselves a "data scientist" had a masters degree in statistics, was super good with relational databases, and all right at programming (but not awesome.) I do live in New Mexico, and we aren't exactly trendy, so I can imagine a lot of people that might be legitimate (not amateur) data scientists that live here call themselves database administrators, or programmers, since they aren't concerned with what Dice says they should be making as a "data scientist."
To me, this distinction has no use. That may be because I don't want to be a "data scientist" or spend time with them, despite working on analyzing large data sets and doing "data science" for paying customers.
I'm sure there are some good data scientists but most of the papers I've seen lately that are based on statistics or various data sets are extremely lazy. You have someone that just combs through data and then tries to make a novel association. Nearly always they just show correlation and never causation.
I think that is one of the bigger problems. Because you're not collecting the data or structuring the experiments that collect the data, you can't isolate anything from the data. All you can do is say "well, this might be happening"... which is often completely useless. A more useful thing they could do is find that correlation and then see if they actually have causation by doing a follow up experiment or study that isolates for a specific variable under controlled conditions.
That is, I think data scientists would be more useful if they used the study as a jumping off point to doing an actual study. And I'm not especially interested in reading or even hearing about anything they've done until they've concluded that secondary study.
Absent that... it is lazy, boring, not interesting, and who cares.
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
...I'm still a little vague. Maybe it's just the place I worked, but it seemed like a grab-bag of random bits that didn't really fit elsewhere. In any case I didn't enjoy it much so I went elsewhere.
When I hear "data scientist" my mind immediately wonders to billions currently being spent invading everyone's privacy enmasse.
What is it that you actually do that means anything to anyone other than your boss? Are you just a tool of the marketing group? Is this where "business acumen" comes in?
I might seem a little harsh or unfair yet seeing as you have made no mention of "what" you actually do in your day job other than name dropping technologies and blabber quotes from trade rags while basically bragging about how "elite" one must be to do your job what am I supposed to think? Please be more specific.
how about "blind-input technical author"?
Considering a good scientist goes in to a sea of data with no expectations (hence bias) about what that data is going to reveal, hence has no incentive to cherrypick. Even anomalies are data. Why are those anomalies there? Are they actually anomalies? Or are they indicators that the original hypothesis or the gathering method itself is flawed?
Me? I'm in to highly technical writing, but not from a mechanical or electronic or programming field. I analyse human data. That which directly affects people both on an individual and collective basis. This means I can scale my research from a single person to a cast of tens of thousands. The nature of that data varies, as does the purpose of the writing. And no, the pay isn't very good but I enjoy what I do.
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
it wasn't funny the first time this hoax was put out, it's not fucking funny today. Fuck off and die in a fire.
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
In the article the guy comments that anything with 'sicence in its title is not a science', disavows that 'data science' falls in to that category then procedes to demonstrate why it's actually true about 'data science'....blah...nothing wrong with being a statistician & wanting to 'coax knowledge' out of data sets. I love numbers & the potential patterns in them that can lead to 'observable facts' (within bounds of 'statistical fact' that is) but no hell no way would I go around advertising myself as a 'data scientist'...besides I have a Masters in Physics, and at one time was a card carrying 'scientist' though life took me in another direction.
Now I exaggerated a bit, but you get the point...
That's the normal, university level engineering what is being described here, on average.
Hmmm, data scientists. Are not those the people employed by places like the beltway think tanks that look at a bunch of non-relavent data and use statistical and mathematical tricks to twist that data to where it fits the result the think tanks have been paid to produce?
I rock the house and sign the tits, and that's it!
Dark Reflection
I've always said that data scientist is just a buzzword for statistician. Another statistician called me on that one day, and said "No, a data scientist is a programmer." I'm sorry, but in this day and age, if you are a statistician who can't program, you're not a very good statistician.
Now all those starving PHDs who specialised in physics, maths, biology etc can get employment?
The best Data Scientist I know once told me that "we should stop calling it data science because people just interpret what that means, and they are usually wrong. We should just call it 'counting' or maybe 'fancy counting' if you really need to call it something."
Truth be told, the Job of a good data scientist is that of an analyst, just one that knows how to execute slightly larger scale analyses.
Data science is to machine learning as "full stack" is to web development. i.e. a horrible buzzword.