What Does It Mean To Be a Data Scientist?
Nerval's Lobster writes What is a data scientist? "To be honest, I often don't tell people I am a data scientist," writes Simon Hughes, chief data scientist of the Dice Data Science Team. "It's not that I don't enjoy my job (I do!) nor that I'm not proud of what we've achieved (I am); it's just that most people don't really understand what you mean when you say you're a data scientist, or they assume it's some fancy jargon for something else." So how do Simon and his team define "data scientist"? In this blog posting, he breaks it down along several lines: solid programming skills, a scientific mindset, and the ability to use tools are just for starters. A data scientist also needs to be a polymath with strong math skills. "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory," he writes. "Likewise, as a data scientist, I've learned to be suspicious of models that are too accurate, or individual variables that are too predictive." His points are good to keep in mind right now, with everybody throwing around buzzwords like "Big Data" without fully realizing what they mean.
Just like how 10 years ago, suddenly everyone was an "Architect" and before that you were a "Developer".
It means you get no women.
I can't believe Slashdot managed to land an interview with someone from Dice! Time to make some popcorn, sit back, and enjoy the fireworks!
Just think - telecoms are accumulating petabytes of data from call setup and cellular handoffs EVERY FEW MONTHS. And this data can be cross referenced with subscriber data and sliced and diced in almost infinitely many different ways.
If you're the one reciting stats like that with wide open eyes, you're a Data Scientist.
If you just shrug and say, "Yeah. So?" like everyone else, you're not.
It means you get to play with beakers and such. No self respecting scientist doesn't have lot of beakers, test tubes, and strange lab setups with tubes going in all directions.
You cant spell statistician and anyway were too embarrassed to put it on your business card.
I think we should submit an Ask Slashdot where we ask data scientists precisely how they work in ways that they apply scientific method in their day to day life. Or does having a "scientific mind" now qualify as being a scientist.
I have a scientific mindset, will I be a pornography scientist later tonight, am I a trolling scientist now?
It means you opted for the Blue shirt instead of the Gold. :D
I do not fail; I succeed at finding out what does not work.
Absolutely nothing.
Without sociology skills (my blog) on a data science team, hypothesis formation and ability to model clients will suffer. It would seem particularly important for a people-focused company like Dice.com.
I'm sure there are good reasons to datamine and bad reasons as well. Some goals yield benefits to many while others are more selfish. The question is if there can be more good done or more bad, and if the benefits outweigh the pitfalls. What are we wiling to sacrifice? Are our desires important enough to risk the pitfalls? Do we think we can account for the pitfalls and protect ourselves against them, or are we just being arrogant and blindsiding ourselves?
Why am I asking you?
Twinstiq, game news
Errr... You claim to be a scientist and yet you say "All good scientists are skeptics at heart; they require strong empirical evidence to be convinced about a theory," .
Circular definition, circular argument. Also, false. Many scientists (like Darwin for example) form a theory and then look for empirical evidence to test that theory. Next time start that sentence with "In my opinion" and you get away with it. You didn't and you don't.
Reading your article, it says nothing. I would not hire you on the basis of what you have written here.
Pardon me if that seems rude but it was in my opinion, too superficial to ignore.
Oh! By the way, what you do has had a title for a generation. You are an analyst doing what analysts do. Analyse data.
"Likewise, as a data scientist, I've learned to be suspicious of models that are too accurate, or individual variables that are too predictive."
I know just how you feel!
One way around this problem is to round down to the next significance level and reduce it to a yes/no assessment.
For example, instead of reporting the actual significance, say "p<.05" and instead of citing the correlation as a number, say "we therefore reject the null hypothesis".
Works a peach, required in most journals, and reduces the workload of the reviewers.
I guess whatever journalistic ethics Slashdot used to have are out the window. No indication in the OP that Dice owns Slashdot. (I mean, sure most people know that, but when OSDN owned Slashdot at least all relationships were disclosed up front.)
What it means "to be a data scientist?" It means that you call yourself a data scientist, and that someone pays you to do things that either you, they, or both of you, agree are "data scientist" types of things. If you're not getting paid, then I think it makes you an "amateur data scientist", "data scientist in training", and "intern data scientist" or my favorite, an "indentured data scientist." There may be other amazing terms to describe this phenomenon (unpaid data scientist) but I believe I am missing them.
I could be a "data scientist", "programmer", "technical manager", "software engineer", "software architect", "pimp" or "software gangster." I prefer to call myself a "contractor" or sometimes "consultant" though. The last two tend to have the type of tax benefits I like, and don't really result in a customer specifying the time, place, and manner of my work to the same degree as if I used the term "employee."
The only person that I've met that I wouldn't feel like punching them in the face for them calling themselves a "data scientist" had a masters degree in statistics, was super good with relational databases, and all right at programming (but not awesome.) I do live in New Mexico, and we aren't exactly trendy, so I can imagine a lot of people that might be legitimate (not amateur) data scientists that live here call themselves database administrators, or programmers, since they aren't concerned with what Dice says they should be making as a "data scientist."
To me, this distinction has no use. That may be because I don't want to be a "data scientist" or spend time with them, despite working on analyzing large data sets and doing "data science" for paying customers.
I'm sure there are some good data scientists but most of the papers I've seen lately that are based on statistics or various data sets are extremely lazy. You have someone that just combs through data and then tries to make a novel association. Nearly always they just show correlation and never causation.
I think that is one of the bigger problems. Because you're not collecting the data or structuring the experiments that collect the data, you can't isolate anything from the data. All you can do is say "well, this might be happening"... which is often completely useless. A more useful thing they could do is find that correlation and then see if they actually have causation by doing a follow up experiment or study that isolates for a specific variable under controlled conditions.
That is, I think data scientists would be more useful if they used the study as a jumping off point to doing an actual study. And I'm not especially interested in reading or even hearing about anything they've done until they've concluded that secondary study.
Absent that... it is lazy, boring, not interesting, and who cares.
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
how about "blind-input technical author"?
Considering a good scientist goes in to a sea of data with no expectations (hence bias) about what that data is going to reveal, hence has no incentive to cherrypick. Even anomalies are data. Why are those anomalies there? Are they actually anomalies? Or are they indicators that the original hypothesis or the gathering method itself is flawed?
Me? I'm in to highly technical writing, but not from a mechanical or electronic or programming field. I analyse human data. That which directly affects people both on an individual and collective basis. This means I can scale my research from a single person to a cast of tens of thousands. The nature of that data varies, as does the purpose of the writing. And no, the pay isn't very good but I enjoy what I do.
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
it wasn't funny the first time this hoax was put out, it's not fucking funny today. Fuck off and die in a fire.
Political debates have me rolling my eyes so much I think I got optical whiplash. I should sue. - Foamy The Squirrel
I rock the house and sign the tits, and that's it!
Dark Reflection
I've always said that data scientist is just a buzzword for statistician. Another statistician called me on that one day, and said "No, a data scientist is a programmer." I'm sorry, but in this day and age, if you are a statistician who can't program, you're not a very good statistician.
Data science is to machine learning as "full stack" is to web development. i.e. a horrible buzzword.