Statistics Losing Ground To CS, Losing Image Among Students
theodp (442580) writes Unless some things change, UC Davis Prof. Norman Matloff worries that the Statistician could be added to the endangered species list. "The American Statistical Association (ASA) leadership, and many in Statistics academia," writes Matloff, "have been undergoing a period of angst the last few years, They worry that the field of Statistics is headed for a future of reduced national influence and importance, with the feeling that: [1] The field is to a large extent being usurped by other disciplines, notably Computer Science (CS). [2] Efforts to make the field attractive to students have largely been unsuccessful."
Matloff, who has a foot in both the Statistics and CS camps, but says, "The problem is not that CS people are doing Statistics, but rather that they are doing it poorly. Generally the quality of CS work in Stat is weak. It is not a problem of quality of the researchers themselves; indeed, many of them are very highly talented. Instead, there are a number of systemic reasons for this, structural problems with the CS research 'business model'." So, can Statistics be made more attractive to students? "Here is something that actually can be fixed reasonably simply," suggests no-fan-of-TI-83-pocket-calculators-as-a-computational-vehicle Matloff. "If I had my druthers, I would simply ban AP Stat, and actually, I am one of those people who would do away with the entire AP program. Obviously, there are too many deeply entrenched interests for this to happen, but one thing that can be done for AP Stat is to switch its computational vehicle to R."
Matloff, who has a foot in both the Statistics and CS camps, but says, "The problem is not that CS people are doing Statistics, but rather that they are doing it poorly. Generally the quality of CS work in Stat is weak. It is not a problem of quality of the researchers themselves; indeed, many of them are very highly talented. Instead, there are a number of systemic reasons for this, structural problems with the CS research 'business model'." So, can Statistics be made more attractive to students? "Here is something that actually can be fixed reasonably simply," suggests no-fan-of-TI-83-pocket-calculators-as-a-computational-vehicle Matloff. "If I had my druthers, I would simply ban AP Stat, and actually, I am one of those people who would do away with the entire AP program. Obviously, there are too many deeply entrenched interests for this to happen, but one thing that can be done for AP Stat is to switch its computational vehicle to R."
As a statisticians, you should know better that you don't make your point with a succession of anecdotes as
- A few years ago, for instance, I attended a talk by a machine learning specialist who had just earned her PhD at one of the very top CS Departments. in the world. She had taken a Bayesian approach to the problem she worked on, and I asked her why she had chosen that specific prior distribution. She couldn’t answer – she had just blindly used what her thesis adviser had given her–and moreover, she was baffled as to why anyone would want to know why that prior was chosen.
- But there is no substitute for precise thinking, and in my experience, many (nominally) successful CS researchers in Stat do not have a solid understanding of the
fundamentals underlying the problems they work on. For example, a recent paper in a top CS conference incorrectly stated that the logistic classification model cannot handle non-monotonic relations
But there's only a 25% chance of that.
SJW's don't eliminate discrimination. They just expropriate it for themselves.
Statistical analysis is now more complex, and statistics are better understood in science than a decade ago. There are number of software packages and libraries that simplifies and standardizes techniques.
Correctly applying all of these require subject matter expertise. You need to understand what you analyzing. As a result pure statistician is not very useful - generic analysis can be performed by software, in-depth analysis requires specific knowledge.
This is not unlike complaining that assembly coding is dying. Well, yes, we now have less need to code everything that way because we have better tools.
Most notably psychology, economics, mathematics and beer brewing. In fact, most of the developments in stats have come about as a result of a need arising in a different discipline. Stats is inherently an applied discipline, so this is not unusual.
What is concerning is how many statistical tools, each with their own set of assumptions, have blossomed up within the past few decades. There are so many stats now that stats can no longer be an ancillary to other disciplines- it needs to be given its own space and statisticians need to be given respect for their unique expertise. There is simply too much knowledge in that domain for those in more theory-driven fields to be able to claim both expertise in the conceptual models of their fields and statistics.
I am a researcher in medical informatics, and statistics is a huge part of my job, though I am not a classically-trained statistician.
First, I would like to offer a stark contrast between two types of statisician: 1) statisticians of the old mold who are wedded to SAS and related tools and 2) research statisticians who employ modern methods such as Bayesian statistics and rather advanced calculus. The former tend to mold all problems into what is available in the canon of SAS routines, while the latter are capable of creating custom models that suit the problem at hand.
Then, there is a new breed of scientist -- the data scientist -- who tends to use black-box machine learning methods and the classical techniques, as programs such as SAS and R have "democratized" the field. I agree with the common gripe of many traditionally-trained statisticians who object that these "data scientist" tend not to understand the statistical background of these computer codes. In fact, it is easy to download R onto one's computer and start firing data through, with little regard for the merits of the model or its results. (Not all data scientists are like this, but I'm simply stating a general observation.)
Another problem with statistics is that it can be very confusing, understanding just what things like p-values mean. After a first course in statistics, it leaves many with a bad taste -- either being terribly confusing, or rather boring. In my opinion, this is because of traditional (frequentist) statistics, which have their origins from luminaries such as Fisher and Pearson.
The "action" today is in Bayesian statistics. This formulation allows for statistical concepts to be expressed is ways that (I believe) most people can understand. But executing Bayesian statistics mandates that one understand the underlying formulation of models; in general, they are not black-box methods. Furthermore, they can be quite computationally-expensive for large data.
Statistics is suffering from perceptions of being a button-pushing, boring profession. As has happened in many other fields (e.g. computational chemistry and CFD), computer programs have democratized the field so that those who have not had years of dedicated study and training can execute statistical models. In my experience, this can be a good thing, or a very bad thing. Another issue is that there is a significant build-up of half a century of code and protocols in both industry (think big business analysis) and government agencies (think FDA).
But modern statistics is actually a hot field. Provided that one understands the background, and is willing to go the extra mile to write custom code, the rewards are endless.
But what about Q?
[John]
Shit better not happen!
I think the problem is that statisticians have small, unconnected habitats and overly complex mating rituals.
-- Make America hate again!
It stands for "Advanced Placement." They're college-level high school courses. At the end of the year, you take the advanced placement exam, and depending on your scores and the college you attend, you can get college credits for them.
I think getting rid of an AP is a stupendously short-sighted idea. Having students take more advanced courses earlier is a great idea. If there's reason to believe the courses aren't actually as demanding as their college equivalent (and I don't think there is, based on my experience taking AP Calculus in high school and looking at what people taking Calculus in college were seeing. We covered the same material, and if anything my high school class covered more), then you can make an argument for the tests more challenging / add to the requirements of those courses. Getting rid of it is just an attempt to waste students' time and extract more money from them by forcing them to take more university courses.
As far as the general public is concerned:
When it's convenient, people use numbers, real or made up, in order to disprove the other sides point and prove their own...
When it's not convenient, all statistics become questionable ("ya, but msot statistics are made up") in order to disprove the other sides point and prove their own...
The reality of the numbers don't matter. People just don't care about actual objective facts, they just want to back up their preconcieved notions to spread their stupidity. It's just like how Americans approach science in general really.
The guy who said the election was rigged won the presidency with the second-most votes.
I'm not very trained in statistics, but I've read more than my fair share of academic computer science papers over the years.
Even with my limited training in statistics, I've known enough to be appalled by the errant statistical reasoning used. Or even not used. I.e., "We don't know how many times to run a program to get a 'valid' average running time, so we ran it three times. Here's the average: ..." The authors seemingly aren't just ignorant of how to get the answer; they often seem to have not thought through what questions they're trying to answer in the first place with their measurements and resulting statistics.
I think a few problems come into play here:
Despite CS majors thinking we're so smart about mathematical issues, I think this might be one area where that confidence is delusional. I suspect most psychology majors who paid attention in their Experimental Design courses are more capable in the appropriate mathematics than are most CS majors.
We'll have to ask M.
Quite the opposite is the case. Unless we are talking about experiments with terrabytes of data most software packages are complete overkill anyway, you could make your statistics with a pocket calculator instead. The problem is the conceptual work. Most institutes and individual scientists would be much better off if they employed a well-trained full-time statistician. Provided they were interested in correct and robust results rather than getting one more pilot study published as soon as possible (which will in turn be based on an insignificantly small non-random sample using an inadequate model).
You missed the point of the lesson. The point was that you didn't have enough data to demonstrate that your model was valid. That's all.
[FUCK BETA]
Take one set of data and produce two diametrically opposed answers and have them both correct? Sounds like rumor, gossip, and BS to me, not science.
No wonder there are lies, damn lies, and statistics!
Somebody missed the lecture on assumptions.
Faster! Faster! Faster would be better!
Getting rid of it is just an attempt to waste students' time and extract more money from them by forcing them to take more university courses.
I suspect his complaint is that in high school, AP Statistics is taught by math teachers. In college, classes are taught by professors who specialize in statistics. This goes along with his general complaint that people in other disciplines don't take the time to really understand how statistics work. Of course, the same problem exists in college statistics courses. You can take a one semester survey course or the two semester theory course. He'd prefer that everyone took the two semester course and that it was rigorously graded.
He may be right about AP Statistics though. Taking statistics in high school means that most people will have forgotten it by the time they get to advanced courses that use statistical methods. This leads to students learning statistics from the professors in those advanced courses (who are not focused on statistical rigor). Statistics is a sophomore/junior level class, where most other AP classes substitute for freshman classes.
I would tend to agree with you about the other AP classes though. There's no such thing as a "calculus professor" -- calculus is taught by a mathematics professor who is likely interested in something very different. It doesn't make much difference whether it is taught in a small high school class or a large college lecture.