Statistics Losing Ground To CS, Losing Image Among Students
theodp (442580) writes Unless some things change, UC Davis Prof. Norman Matloff worries that the Statistician could be added to the endangered species list. "The American Statistical Association (ASA) leadership, and many in Statistics academia," writes Matloff, "have been undergoing a period of angst the last few years, They worry that the field of Statistics is headed for a future of reduced national influence and importance, with the feeling that: [1] The field is to a large extent being usurped by other disciplines, notably Computer Science (CS). [2] Efforts to make the field attractive to students have largely been unsuccessful."
Matloff, who has a foot in both the Statistics and CS camps, but says, "The problem is not that CS people are doing Statistics, but rather that they are doing it poorly. Generally the quality of CS work in Stat is weak. It is not a problem of quality of the researchers themselves; indeed, many of them are very highly talented. Instead, there are a number of systemic reasons for this, structural problems with the CS research 'business model'." So, can Statistics be made more attractive to students? "Here is something that actually can be fixed reasonably simply," suggests no-fan-of-TI-83-pocket-calculators-as-a-computational-vehicle Matloff. "If I had my druthers, I would simply ban AP Stat, and actually, I am one of those people who would do away with the entire AP program. Obviously, there are too many deeply entrenched interests for this to happen, but one thing that can be done for AP Stat is to switch its computational vehicle to R."
Matloff, who has a foot in both the Statistics and CS camps, but says, "The problem is not that CS people are doing Statistics, but rather that they are doing it poorly. Generally the quality of CS work in Stat is weak. It is not a problem of quality of the researchers themselves; indeed, many of them are very highly talented. Instead, there are a number of systemic reasons for this, structural problems with the CS research 'business model'." So, can Statistics be made more attractive to students? "Here is something that actually can be fixed reasonably simply," suggests no-fan-of-TI-83-pocket-calculators-as-a-computational-vehicle Matloff. "If I had my druthers, I would simply ban AP Stat, and actually, I am one of those people who would do away with the entire AP program. Obviously, there are too many deeply entrenched interests for this to happen, but one thing that can be done for AP Stat is to switch its computational vehicle to R."
My margin of error is pretty high so things never really seem to turn out how I expect them to turn out.
As a statisticians, you should know better that you don't make your point with a succession of anecdotes as
- A few years ago, for instance, I attended a talk by a machine learning specialist who had just earned her PhD at one of the very top CS Departments. in the world. She had taken a Bayesian approach to the problem she worked on, and I asked her why she had chosen that specific prior distribution. She couldn’t answer – she had just blindly used what her thesis adviser had given her–and moreover, she was baffled as to why anyone would want to know why that prior was chosen.
- But there is no substitute for precise thinking, and in my experience, many (nominally) successful CS researchers in Stat do not have a solid understanding of the
fundamentals underlying the problems they work on. For example, a recent paper in a top CS conference incorrectly stated that the logistic classification model cannot handle non-monotonic relations
But there's only a 25% chance of that.
SJW's don't eliminate discrimination. They just expropriate it for themselves.
Statistical analysis is now more complex, and statistics are better understood in science than a decade ago. There are number of software packages and libraries that simplifies and standardizes techniques.
Correctly applying all of these require subject matter expertise. You need to understand what you analyzing. As a result pure statistician is not very useful - generic analysis can be performed by software, in-depth analysis requires specific knowledge.
This is not unlike complaining that assembly coding is dying. Well, yes, we now have less need to code everything that way because we have better tools.
Most notably psychology, economics, mathematics and beer brewing. In fact, most of the developments in stats have come about as a result of a need arising in a different discipline. Stats is inherently an applied discipline, so this is not unusual.
What is concerning is how many statistical tools, each with their own set of assumptions, have blossomed up within the past few decades. There are so many stats now that stats can no longer be an ancillary to other disciplines- it needs to be given its own space and statisticians need to be given respect for their unique expertise. There is simply too much knowledge in that domain for those in more theory-driven fields to be able to claim both expertise in the conceptual models of their fields and statistics.
I am a researcher in medical informatics, and statistics is a huge part of my job, though I am not a classically-trained statistician.
First, I would like to offer a stark contrast between two types of statisician: 1) statisticians of the old mold who are wedded to SAS and related tools and 2) research statisticians who employ modern methods such as Bayesian statistics and rather advanced calculus. The former tend to mold all problems into what is available in the canon of SAS routines, while the latter are capable of creating custom models that suit the problem at hand.
Then, there is a new breed of scientist -- the data scientist -- who tends to use black-box machine learning methods and the classical techniques, as programs such as SAS and R have "democratized" the field. I agree with the common gripe of many traditionally-trained statisticians who object that these "data scientist" tend not to understand the statistical background of these computer codes. In fact, it is easy to download R onto one's computer and start firing data through, with little regard for the merits of the model or its results. (Not all data scientists are like this, but I'm simply stating a general observation.)
Another problem with statistics is that it can be very confusing, understanding just what things like p-values mean. After a first course in statistics, it leaves many with a bad taste -- either being terribly confusing, or rather boring. In my opinion, this is because of traditional (frequentist) statistics, which have their origins from luminaries such as Fisher and Pearson.
The "action" today is in Bayesian statistics. This formulation allows for statistical concepts to be expressed is ways that (I believe) most people can understand. But executing Bayesian statistics mandates that one understand the underlying formulation of models; in general, they are not black-box methods. Furthermore, they can be quite computationally-expensive for large data.
Statistics is suffering from perceptions of being a button-pushing, boring profession. As has happened in many other fields (e.g. computational chemistry and CFD), computer programs have democratized the field so that those who have not had years of dedicated study and training can execute statistical models. In my experience, this can be a good thing, or a very bad thing. Another issue is that there is a significant build-up of half a century of code and protocols in both industry (think big business analysis) and government agencies (think FDA).
But modern statistics is actually a hot field. Provided that one understands the background, and is willing to go the extra mile to write custom code, the rewards are endless.
What the fuck does "AP" mean?
I'm dabbling on the "AP Central" website and other but they all talk about AP courses, how to get a course labelled AP, "AP is your time well spent" but never a definition of what AP is. It's ridiculous to use such a two-letter initialism and hide its meaning like it's a secret thing for "consumers" who buy higher end education in the US.
But what about Q?
[John]
Shit better not happen!
I think the problem is that statisticians have small, unconnected habitats and overly complex mating rituals.
-- Make America hate again!
Not true, there will ALWAYS be a need for statistics. What will politicians do if they can't lie about numbers? What will happen to all the global warming research if they can't use statistics to lie?
Efforts to make the field attractive to students have largely been unsuccessful."
You would think they would know which efforts work and which don't. I'm only being a bit sarcastic with the Subject line, but seriously they should be able to figure out what does and doesn't work.
Just another day in Paradise
I don't dare publish this unless I am anonymous, but I must state this observation:
We are always on the lookout for new statisticians in our medical group. About 95% of our applicants are Chinese females! I had asked one of our (Chinese) scientists about this, and he said that this is because of the proliferation of MS in statistics programs that are amenable to spouses who were interested in a profession that could be attained (and makes good bacon) while their husbands were working on advanced degrees in other fields.
There are eternal complaints about how some fields are full of old white men, and this tends to squeeze out others who do not fit this profile. As a result, those fields tend to lack innovation in thought and culture.
I believe it's possible that statistics is suffering a similar issue. Perhaps the field has become homogenized, and people just aren't interested in overcoming these cultural barriers. No one wants to be the minority, e.g. no one wants to be the lone woman in the field of a bunch of IT dudes. The opposite side of the same coin is that no one wants to be the lone white man in a room full of Chinese women. (Let's keep it clean.)
Every field needs diversity, which in turn fosters content and innovation. Statistics is no exception, and the field really needs to do something about this. I think their hands are tied, as fixing this problem would be so politically incorrect, that no one would dare.
Statistics is a dirty word today, even though modern science depends upon it. The public most commonly encounters them when they are lied about.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
As far as the general public is concerned:
When it's convenient, people use numbers, real or made up, in order to disprove the other sides point and prove their own...
When it's not convenient, all statistics become questionable ("ya, but msot statistics are made up") in order to disprove the other sides point and prove their own...
The reality of the numbers don't matter. People just don't care about actual objective facts, they just want to back up their preconcieved notions to spread their stupidity. It's just like how Americans approach science in general really.
The guy who said the election was rigged won the presidency with the second-most votes.
I'm not very trained in statistics, but I've read more than my fair share of academic computer science papers over the years.
Even with my limited training in statistics, I've known enough to be appalled by the errant statistical reasoning used. Or even not used. I.e., "We don't know how many times to run a program to get a 'valid' average running time, so we ran it three times. Here's the average: ..." The authors seemingly aren't just ignorant of how to get the answer; they often seem to have not thought through what questions they're trying to answer in the first place with their measurements and resulting statistics.
I think a few problems come into play here:
Despite CS majors thinking we're so smart about mathematical issues, I think this might be one area where that confidence is delusional. I suspect most psychology majors who paid attention in their Experimental Design courses are more capable in the appropriate mathematics than are most CS majors.
There's an app for that!
We'll have to ask M.
Hmm ... John de Lancie in the next Bond film as the gadgeteer for the CIA, with whom MI6 partners on a mission of importance to both agencies? I'd see that.
"There are three types of lies: lies, damn lies, and statistics." -attribution disputed
http://en.wikipedia.org/wiki/L...
10: PRINT "Everything old is new again."
20: GOTO 10
Quite the opposite is the case. Unless we are talking about experiments with terrabytes of data most software packages are complete overkill anyway, you could make your statistics with a pocket calculator instead. The problem is the conceptual work. Most institutes and individual scientists would be much better off if they employed a well-trained full-time statistician. Provided they were interested in correct and robust results rather than getting one more pilot study published as soon as possible (which will in turn be based on an insignificantly small non-random sample using an inadequate model).
Is this very poorly written article about: 1) students not choosing to pursue a career path in computer science rather than statistics... or... 2) CS people doing poor-quality statistics work... or... 3)banning the Advanced Placement "Statistics" class because students are relying too much on their "pocket calculators." We get three-articles-in-one to talk about here. At least they are all loosely related to something called "statistics."
You missed the point of the lesson. The point was that you didn't have enough data to demonstrate that your model was valid. That's all.
[FUCK BETA]
Take one set of data and produce two diametrically opposed answers and have them both correct? Sounds like rumor, gossip, and BS to me, not science.
No wonder there are lies, damn lies, and statistics!
Somebody missed the lecture on assumptions.
Faster! Faster! Faster would be better!
Statistics done right is hard and boring. People prefer hacking to do hard and boring stuff.
"Long run is a misleading guide to current affairs. In the long run we are all dead." (John Maynard Keynes)
I work with a couple of very good statisticians. What they do is a mystery to me, but one thing I can say for sure - a good programmer or DBA will find work much more easily than a good statistician. In large part because PHBs have no clue why they need someone with more than two semesters of probability in almost every application.
Another problem with students going into statistics in the US is that virtually all of the instructors don't speak very good English. To this day I want to say things like "probabirity", "rotatation about the ashes", and the one that confused everyone in the class - "ashama" (eventually translated to axiom).
Statistics as a subset of CS isn't unreasonable given that nearly all statistics will be calculated by software.
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
It's a shame it has such a reputation for being boring, and it is a shame that it seems to be rarely taught in an engaging way.
Statistics is the first artificial intelligence. It formalises what we know when we 'know'. It is fundamental.
It's also fairly hard to do right. But many worthwhile things are hard.
I know a lot of people who get CS gigs after school. It pays their bills. They do well.
The stats people I know are really, really rich. And there are a lot of them.
That's in Raleigh.
If you like the field of statistics it seems a better long-term bet than IT. The "laws" of math are not going to change in 40 years, where-as in IT the languages, GUI's, frameworks, and Paradigm Fad of the Day will change...several times. Plus it won't give you Carpel Tunnel (unless you can't trick a grunt into data entry). You are expected to know the domain (industry) such that outsourcing is not as likely either.
Software may pay more in the short term, but career-wise, stats seems more stable.
Table-ized A.I.
How can you compare a kid running a program three times to obtain a mean to the calculus required for even the most trivial statistical problems?
That was his whole point. The fact that many people think that the one is a substitute for the other indicates that there is a big problem in CS.
That said, trivial statistical problems usually don't require calculus to solve. I fully appreciate that commonly used statistical functions are rooted in calculus, and you need to understand it at some level to apply them properly. However, the mechanics of solving the problems usually do not require solving integrals/etc. To use a car analogy - I can use a speedometer without knowing what a derivative is.
How to lie with data structures
I read on Slashdor a lot time ago that there is a Comic book in Japan that teaches Statistics. I really would like to buy a copy in Spanish or at least in English.
Any link on where to buy it it is welcome