Statistics Losing Ground To CS, Losing Image Among Students
theodp (442580) writes Unless some things change, UC Davis Prof. Norman Matloff worries that the Statistician could be added to the endangered species list. "The American Statistical Association (ASA) leadership, and many in Statistics academia," writes Matloff, "have been undergoing a period of angst the last few years, They worry that the field of Statistics is headed for a future of reduced national influence and importance, with the feeling that: [1] The field is to a large extent being usurped by other disciplines, notably Computer Science (CS). [2] Efforts to make the field attractive to students have largely been unsuccessful."
Matloff, who has a foot in both the Statistics and CS camps, but says, "The problem is not that CS people are doing Statistics, but rather that they are doing it poorly. Generally the quality of CS work in Stat is weak. It is not a problem of quality of the researchers themselves; indeed, many of them are very highly talented. Instead, there are a number of systemic reasons for this, structural problems with the CS research 'business model'." So, can Statistics be made more attractive to students? "Here is something that actually can be fixed reasonably simply," suggests no-fan-of-TI-83-pocket-calculators-as-a-computational-vehicle Matloff. "If I had my druthers, I would simply ban AP Stat, and actually, I am one of those people who would do away with the entire AP program. Obviously, there are too many deeply entrenched interests for this to happen, but one thing that can be done for AP Stat is to switch its computational vehicle to R."
Matloff, who has a foot in both the Statistics and CS camps, but says, "The problem is not that CS people are doing Statistics, but rather that they are doing it poorly. Generally the quality of CS work in Stat is weak. It is not a problem of quality of the researchers themselves; indeed, many of them are very highly talented. Instead, there are a number of systemic reasons for this, structural problems with the CS research 'business model'." So, can Statistics be made more attractive to students? "Here is something that actually can be fixed reasonably simply," suggests no-fan-of-TI-83-pocket-calculators-as-a-computational-vehicle Matloff. "If I had my druthers, I would simply ban AP Stat, and actually, I am one of those people who would do away with the entire AP program. Obviously, there are too many deeply entrenched interests for this to happen, but one thing that can be done for AP Stat is to switch its computational vehicle to R."
My margin of error is pretty high so things never really seem to turn out how I expect them to turn out.
As a statisticians, you should know better that you don't make your point with a succession of anecdotes as
- A few years ago, for instance, I attended a talk by a machine learning specialist who had just earned her PhD at one of the very top CS Departments. in the world. She had taken a Bayesian approach to the problem she worked on, and I asked her why she had chosen that specific prior distribution. She couldn’t answer – she had just blindly used what her thesis adviser had given her–and moreover, she was baffled as to why anyone would want to know why that prior was chosen.
- But there is no substitute for precise thinking, and in my experience, many (nominally) successful CS researchers in Stat do not have a solid understanding of the
fundamentals underlying the problems they work on. For example, a recent paper in a top CS conference incorrectly stated that the logistic classification model cannot handle non-monotonic relations
But there's only a 25% chance of that.
SJW's don't eliminate discrimination. They just expropriate it for themselves.
Yeah, I'd definitely switch AP Stat's computational vehicle to R. I might even take it straight to S or T if it could be done at reasonable cost.
Statistical analysis is now more complex, and statistics are better understood in science than a decade ago. There are number of software packages and libraries that simplifies and standardizes techniques.
Correctly applying all of these require subject matter expertise. You need to understand what you analyzing. As a result pure statistician is not very useful - generic analysis can be performed by software, in-depth analysis requires specific knowledge.
This is not unlike complaining that assembly coding is dying. Well, yes, we now have less need to code everything that way because we have better tools.
If there is a decline in quality it is probably because the quality wasn't needed and people don't want to pay for things they don't need.
Students don't want to waste their time learning something they aren't going to use when they go to work and in most cases bad or even incorrect statistics will be just as useful.
In the same way most engineering doesn't require much math but rather just an understanding of how things work. Why you have that understanding the math doesn't get much more complicated than division.
It would be great if the students knew things better, but they are limited by time and money so they will focus on the things that gives most reward to the time put in. If Prof. Matloff is worried he might want to consider teaching the students statistics for free.
Most notably psychology, economics, mathematics and beer brewing. In fact, most of the developments in stats have come about as a result of a need arising in a different discipline. Stats is inherently an applied discipline, so this is not unusual.
What is concerning is how many statistical tools, each with their own set of assumptions, have blossomed up within the past few decades. There are so many stats now that stats can no longer be an ancillary to other disciplines- it needs to be given its own space and statisticians need to be given respect for their unique expertise. There is simply too much knowledge in that domain for those in more theory-driven fields to be able to claim both expertise in the conceptual models of their fields and statistics.
Any time you see a summary on Slashdot with the word "statistics" repeated a few times, that observation has to apply somewhere.
I am a researcher in medical informatics, and statistics is a huge part of my job, though I am not a classically-trained statistician.
First, I would like to offer a stark contrast between two types of statisician: 1) statisticians of the old mold who are wedded to SAS and related tools and 2) research statisticians who employ modern methods such as Bayesian statistics and rather advanced calculus. The former tend to mold all problems into what is available in the canon of SAS routines, while the latter are capable of creating custom models that suit the problem at hand.
Then, there is a new breed of scientist -- the data scientist -- who tends to use black-box machine learning methods and the classical techniques, as programs such as SAS and R have "democratized" the field. I agree with the common gripe of many traditionally-trained statisticians who object that these "data scientist" tend not to understand the statistical background of these computer codes. In fact, it is easy to download R onto one's computer and start firing data through, with little regard for the merits of the model or its results. (Not all data scientists are like this, but I'm simply stating a general observation.)
Another problem with statistics is that it can be very confusing, understanding just what things like p-values mean. After a first course in statistics, it leaves many with a bad taste -- either being terribly confusing, or rather boring. In my opinion, this is because of traditional (frequentist) statistics, which have their origins from luminaries such as Fisher and Pearson.
The "action" today is in Bayesian statistics. This formulation allows for statistical concepts to be expressed is ways that (I believe) most people can understand. But executing Bayesian statistics mandates that one understand the underlying formulation of models; in general, they are not black-box methods. Furthermore, they can be quite computationally-expensive for large data.
Statistics is suffering from perceptions of being a button-pushing, boring profession. As has happened in many other fields (e.g. computational chemistry and CFD), computer programs have democratized the field so that those who have not had years of dedicated study and training can execute statistical models. In my experience, this can be a good thing, or a very bad thing. Another issue is that there is a significant build-up of half a century of code and protocols in both industry (think big business analysis) and government agencies (think FDA).
But modern statistics is actually a hot field. Provided that one understands the background, and is willing to go the extra mile to write custom code, the rewards are endless.
What the fuck does "AP" mean?
I'm dabbling on the "AP Central" website and other but they all talk about AP courses, how to get a course labelled AP, "AP is your time well spent" but never a definition of what AP is. It's ridiculous to use such a two-letter initialism and hide its meaning like it's a secret thing for "consumers" who buy higher end education in the US.
This is not a new phenomenon. Most stats grads get big checks from banks...
Most notably psychology, economics, mathematics and beer brewing. In fact, most of the developments in stats have come about as a result of a need arising in a different discipline. Stats is inherently an applied discipline, so this is not unusual.
That can be said for most math.
Pure mathematicians tend to only deal with what is already known. If you start to look at the greatest advances in math and the mathematicians behind them you'll see that they most of the time came from another field and needed new math to solve their problem. Necessity is the mother of invention.
You can make money from writing your own statistical software, but it's not as easy to make money as an analyst I think.
In the end, we can all get jobs sweeping the floors of the AI farms.
If you get a degree in law, business management, or marketing you can now put statistics as an academic minor on your resume.
I think the problem is that statisticians have small, unconnected habitats and overly complex mating rituals.
-- Make America hate again!
Not true, there will ALWAYS be a need for statistics. What will politicians do if they can't lie about numbers? What will happen to all the global warming research if they can't use statistics to lie?
Efforts to make the field attractive to students have largely been unsuccessful."
You would think they would know which efforts work and which don't. I'm only being a bit sarcastic with the Subject line, but seriously they should be able to figure out what does and doesn't work.
Just another day in Paradise
While statisticians complain, I have to face the quite opposite situation of having my research field swamped by people doing statistical methods for everything. And in some cases this is not reasonable. Of course, many system designers dream of just throwing pieces of code together and still being able to prove that it works (with a certain probability), but getting meaningful probabilities still requires that certain independence hypotheses are respected, and in computers true stochastic independence is difficult to obtain...
I don't dare publish this unless I am anonymous, but I must state this observation:
We are always on the lookout for new statisticians in our medical group. About 95% of our applicants are Chinese females! I had asked one of our (Chinese) scientists about this, and he said that this is because of the proliferation of MS in statistics programs that are amenable to spouses who were interested in a profession that could be attained (and makes good bacon) while their husbands were working on advanced degrees in other fields.
There are eternal complaints about how some fields are full of old white men, and this tends to squeeze out others who do not fit this profile. As a result, those fields tend to lack innovation in thought and culture.
I believe it's possible that statistics is suffering a similar issue. Perhaps the field has become homogenized, and people just aren't interested in overcoming these cultural barriers. No one wants to be the minority, e.g. no one wants to be the lone woman in the field of a bunch of IT dudes. The opposite side of the same coin is that no one wants to be the lone white man in a room full of Chinese women. (Let's keep it clean.)
Every field needs diversity, which in turn fosters content and innovation. Statistics is no exception, and the field really needs to do something about this. I think their hands are tied, as fixing this problem would be so politically incorrect, that no one would dare.
True, politicians will always need statisticians to lie for them. Alternate solution to getting rid of stats, create a Hippocratic oath for statisticians. That way, if harm is done they have to commit ritual suicide or testify against their political sponsor.
Statistics is a dirty word today, even though modern science depends upon it. The public most commonly encounters them when they are lied about.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
As far as the general public is concerned:
When it's convenient, people use numbers, real or made up, in order to disprove the other sides point and prove their own...
When it's not convenient, all statistics become questionable ("ya, but msot statistics are made up") in order to disprove the other sides point and prove their own...
The reality of the numbers don't matter. People just don't care about actual objective facts, they just want to back up their preconcieved notions to spread their stupidity. It's just like how Americans approach science in general really.
The guy who said the election was rigged won the presidency with the second-most votes.
I'm not very trained in statistics, but I've read more than my fair share of academic computer science papers over the years.
Even with my limited training in statistics, I've known enough to be appalled by the errant statistical reasoning used. Or even not used. I.e., "We don't know how many times to run a program to get a 'valid' average running time, so we ran it three times. Here's the average: ..." The authors seemingly aren't just ignorant of how to get the answer; they often seem to have not thought through what questions they're trying to answer in the first place with their measurements and resulting statistics.
I think a few problems come into play here:
Despite CS majors thinking we're so smart about mathematical issues, I think this might be one area where that confidence is delusional. I suspect most psychology majors who paid attention in their Experimental Design courses are more capable in the appropriate mathematics than are most CS majors.
There's an app for that!
Statistics as taught in it's current form (just like economics) came about in the 50's, they were both designed by out of work mathematicians. Now to an unemployed Math PHD in the 50's with the start of the space program and the massive military research programs, how good were these guys? When I took Stat in college I was amazed. Take one set of data and produce two diametrically opposed answers and have them both correct? Sounds like rumor, gossip, and BS to me, not science.
No wonder there are lies, damn lies, and statistics!
Professional Politicians are not the solution, they ARE the problem.
IIRC isn't that how calculus came about?
"There are three types of lies: lies, damn lies, and statistics." -attribution disputed
http://en.wikipedia.org/wiki/L...
10: PRINT "Everything old is new again."
20: GOTO 10
Quite the opposite is the case. Unless we are talking about experiments with terrabytes of data most software packages are complete overkill anyway, you could make your statistics with a pocket calculator instead. The problem is the conceptual work. Most institutes and individual scientists would be much better off if they employed a well-trained full-time statistician. Provided they were interested in correct and robust results rather than getting one more pilot study published as soon as possible (which will in turn be based on an insignificantly small non-random sample using an inadequate model).
Is this very poorly written article about: 1) students not choosing to pursue a career path in computer science rather than statistics... or... 2) CS people doing poor-quality statistics work... or... 3)banning the Advanced Placement "Statistics" class because students are relying too much on their "pocket calculators." We get three-articles-in-one to talk about here. At least they are all loosely related to something called "statistics."
So who cares?
When the number of people doing statistics gets low enough, salaries will go up and more people will start studying them.
There's also H1Bs to solve the problem.
The number of Liberal Arts majors is declining as well; if that happens where will Starbucks get there baristas?
There four types of lies:
Lies, Damned Lies, Statistics and CS Simulations.
Participate in the c++ meetings and code up STL stats facilities make gets into next c++ standard.
How can you compare a kid running a program three times to obtain a mean to the calculus required for even the most trivial statistical problems?
In rank order:
lies damn lies statistics benchmarks {hated corporation} benchmarks
Statistics done right is hard and boring. People prefer hacking to do hard and boring stuff.
"Long run is a misleading guide to current affairs. In the long run we are all dead." (John Maynard Keynes)
I work with a couple of very good statisticians. What they do is a mystery to me, but one thing I can say for sure - a good programmer or DBA will find work much more easily than a good statistician. In large part because PHBs have no clue why they need someone with more than two semesters of probability in almost every application.
Another problem with students going into statistics in the US is that virtually all of the instructors don't speak very good English. To this day I want to say things like "probabirity", "rotatation about the ashes", and the one that confused everyone in the class - "ashama" (eventually translated to axiom).
No, neither have I. They are always dreary and depressing. Nobody wants to be a statistician.
Statistics as a subset of CS isn't unreasonable given that nearly all statistics will be calculated by software.
I've decided to stop wasting my time responding to AC trolls/sockpuppets... so if you want a response from me... login.
Looking at his publications page, he hasn't published anything of note in years: http://heather.cs.ucdavis.edu/matloff/public_html/vita.html
Reading the blog post, it seems that he is woefully ignorant of the state of modern machine learning research.
Never mind the actual arguments he makes (which others addressed), I'm not really sure this guy has much authority on this matter, either in CS *or* in statistics.
The only stats I need to understand and to apply is what is the probability of me winning the powerball lottery? how many tickets can I afford to buy per attempt? and knowing that there is a high probability that gov will show up to collect a high amount of the winnings....
It's a shame it has such a reputation for being boring, and it is a shame that it seems to be rarely taught in an engaging way.
Statistics is the first artificial intelligence. It formalises what we know when we 'know'. It is fundamental.
It's also fairly hard to do right. But many worthwhile things are hard.
I know a lot of people who get CS gigs after school. It pays their bills. They do well.
The stats people I know are really, really rich. And there are a lot of them.
That's in Raleigh.
If you like the field of statistics it seems a better long-term bet than IT. The "laws" of math are not going to change in 40 years, where-as in IT the languages, GUI's, frameworks, and Paradigm Fad of the Day will change...several times. Plus it won't give you Carpel Tunnel (unless you can't trick a grunt into data entry). You are expected to know the domain (industry) such that outsourcing is not as likely either.
Software may pay more in the short term, but career-wise, stats seems more stable.
Table-ized A.I.
How can you compare a kid running a program three times to obtain a mean to the calculus required for even the most trivial statistical problems?
That was his whole point. The fact that many people think that the one is a substitute for the other indicates that there is a big problem in CS.
That said, trivial statistical problems usually don't require calculus to solve. I fully appreciate that commonly used statistical functions are rooted in calculus, and you need to understand it at some level to apply them properly. However, the mechanics of solving the problems usually do not require solving integrals/etc. To use a car analogy - I can use a speedometer without knowing what a derivative is.
How to lie with data structures
Statistics the discipline that studies the collection, organization, analysis and interpretation of data, and it includes the design of experiments and planning of research. It uses Mathematics and Probability Theory to deal with randmoness and measurements under uncertainty, that is, with observations, so as to test or inspire scientific models. As such, it is an integral part of the Scientific Method.
Computer Science, on the other hand, relates to the acquisition, representation, processing, storing, communicating and accessing of data and information. It has to do with Mathematics, problem-solving and Technology. As such, it is an enormous aid to the achievement of scientific endeavors, perhaps even an indispensable one, but not a part of Science per se. Indeed, one might say that Science happens before the use of CS (hypothesizing, definition of variables, design of experiments, etc.) and after it (interpretation of results, model building, theorizing). The only portion of the scientific activity that involves CS more directly is data analysis, but the role here is basically one of mechanization and automation, not one of knowledge production. Indeed, most CS experts have little or no understanding of the scientific method.
Data Mining, Neural Networks, Genetic Algorithmns, Machine Learning and Intelligent Computing in general are excellent methods and techniques for problem-solving in specific and well-defined areas where one already has a significant overall understanding of the phenomena involved, but are a poor substitute for scientific modeling. Data is NOT fact, but rather the result of a series of arbitrary decisions as to what and how to observe, as well as how to record such observations, with such decisions having enormous consequences for the information, knowledge, and meaning one can extract from it. Given some randomness and enough data of the 'right' kind, one can 'prove' basically anything. This is a significant enough problem when it comes to generating predictions, but becomes a devastating one when the goal is to guide interventions, planning and decision-making in general.
Much of the hype around 'Big Data' and related issues, as well as their failure to deliver the promised results in many cases, is the direct consequence of an ignorance of the forementioned facts. Most Big Data solutions are based on the false notion that one can forget scientific models and just find any and all solutions by simply processing the right amount of data in the right way, so that one can build a bridge from the ICT all the way up to organization success solely by the judicious use of Intelligent Computing. Just to be clear: one can't. It is logically impossible. Period.
This phallacy is the most likely cause for the "loss of ground" the OP mentions. Big Data is strongly advertised to be the answer to everything, and many individuals and institutions are falling for this, with billions of dollars following them.
The solution to all of this? Ample dissemination of knowledge and education regarding science would be best, but it might take a very costly failure of the promises of Big Data to actually do the trick.
P.S.
R is great, with all the advantages of opensource development (flexibility, free use, crowdsourcing), but it has the very fatal flaw of being a command line approach. CS students and professionals may embrace it easily, but the vast majority of us who really make use of Statistics to produce knowledge need something with a 100% graphic user interface through which they can promptly forget the informatics and focus on the actual knowledge issues. Producing a version of R that frees users from any and all aspects of programming and command lines (and I mean ANY and ALL aspects) would be a very worthy contibution of CS to Science in general.
I read on Slashdor a lot time ago that there is a Comic book in Japan that teaches Statistics. I really would like to buy a copy in Spanish or at least in English.
Any link on where to buy it it is welcome.
I read on Slashdor a lot time ago that there is a Comic book in Japan that teaches Statistics. I really would like to buy a copy in Spanish or at least in English.
Any link on where to buy it it is welcome