How Good Are Robo-Graders?
stoolpigeon writes "With a large study showing software grades essays as well as humans, but much faster, it might seem that soon humans will be completely out of the loop when it comes to evaluating standardized tests. But Les Perelman, a writing teacher at MIT, has shown the limits of algorithms used for grading with an essay that got a top score from an automated system but contained no relevant information and many inaccuracies. Mr. Perelman outlined his approach for the NY Times after he was given a month to analyze E-Rater, one of the software packages that grades essays."
How quickly will students learn to game the system to get perfect scores with perfect gibberish?
What political party do you join when you don't like Bible-thumpers *or* hippies?
I don't think auto-graders are a good idea. Where is the information exchange between student and teachers? Teachers need to read student essays not just to assign the grade, but to exchange knowledge with their students Opinions and comments should be two-sided exchanges, if students are writing things that aren't going to be read, how does that work?
While it is true that you can engineer essays to be 'bad' and still score 'good' - the question is - are there natural essays that score good but are actually bad; and good essays that score bad but are actually good.
Every analysis I've seen suggests that these algorithms do have problems with good essays that are highly creative. Essay graders also have difficulties with this kind of essay - giving drastically varied scores.
However there doesn't seem to be much evidence of other issues except when an extremely knowledgable issue deliberately trys to make the algorithm fail. Any student or other individual who can do this probably knows that material well enough to 'get an A' if they were to properly apply what they know so this seems like a non issue.
Read "Making the grades" by Todd Farley. Robotic graders just make the tests even more farcical.
After thorough consideration of this first post and its contents, I find this I must respond in the most considerate and throughtful way possible. This first post was clearly written before the second post and well in advance of this reply. Based on this, it is only logical to assume that this first post was written before any other posts. This leads me to think that crazyjj was quicker reflexes and reading skills than his compatriots.
My research has shown that people with quick reflexes make 80% more in real dollar terms than others[1] and are more likely to lead a longer life than their slower reading friends [2]. Clearly crazyjj is at an extreme advantage compared to the rest of slashdot.
Can America survive with this type of inequality? I think not. We must institue some type of equalizer. Perhaps crazyjj should be given a keyboard with several broken keys. Or perhaps we should simply bash his fingers a few times. In the words of Abraham Lincoln, "A man who types too fast can't be trusted."[3] Abraham Lincoln saw the danger that crazyjj represents and warned us. Will we listen?
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
Considering the fake generated paper that was published in a peer reviewed journal, I'd say that means the robo-graders are on par with human proof readers.
Even people that believe in pre-destiny look both ways before crossing the street.
So you're telling me we've not only solved the natural language problem, we're also wasting it on grading essays?
We're not even close. Robo-grading essays is not only cheating, it's probably the worst disservice a school could do to its students. When you grade an essay you're looking at far more than technical accuracy (spelling, word count, formatting, valid citations). You're looking for meaning, articulation and interesting points of view. Robots can't teach critical analysis, can't offer helpful critiques of writing style, and certainly can't make judgement calls on how "good" an essay is.
There may be situations in which simply getting a grade is of use, but, in most cases, I'd have thought that getting feedback was as important as getting the grade — knowing I have a good essay is one thing, but knowing where I went wrong, with guidance from someone skilled in the area, is the most important thing, since, otherwise, I have to guess as to where I need to improve.
Our "corporate firewall" frequently gets things wrong. A site on "Sharp calculators" was classified as a weapons site, though I would imagine that stabbing anyone with one would be difficult. A "security software slap-down" was classed as "tasteless and violence", though no security software was injured. In short robo-graders are probably only any good for politicians, where the content doesn't matter as long as its delivered right.
According to the MIT website, the "Mr. Perelman" the NYT article keeps mentioning is actually "Dr. Perelman". Does the NYT not believe in honorifics? Or do they just think that only MDs should be called "Dr."?
It a teacher is going to phone it in what does that tell the class? Why should a student even bother to write a paper? Maybe students should have auto-generation software.
For all of the things we screw up in the US one thing we've done (mostly) right is college education. People travel from all over the world to go to school in the US.
It's shit like this that will ruin it.
When will something be done about the exorbitant pay that teachers' assistants receive?
The rob grader can check spelling, grammar, structural style. The human grader can check for content accuracy and essay quality and creativity.
I dont worry too much about gaming the system. To "fool" the grader you'll have to learn spelling, grammar and structural style - exactly what the test-makers want.
I dont worry too much about gaming the system. To "fool" the grader you'll have to learn spelling, grammar and structural style - exactly what the test-makers want.
Did you actually read the paper that was used as an example? It is hilarious. And unfortunately, not that different from a lot of ramblings I've come across on the internet. If these eGraders are ONLY used to evaluate spelling and grammar and a human then evaluates the content to make sure it is not just random gibberish, then fine. But of course that is not how they will be used...
News good. Paywall bad. A Google News search for the first couple of paragraphs should bring up either the NYT article or another copy of it.
Note that "em-dashes" have been changed to hyphens and "curly" apostrophes and quotation marks have been changed to "straight" versions marks to accommodate /. as viewed in my browser. Please avoid blocks of text that have -, ', or " when selecting text for search engines.
--cut here--
Testing Absurdities, Reading Worries and Robo-Grading
April 23, 2012, 8:19 a.m.
By Mary Ann Giordano
Week 2 of standardized testing begins in the New York City public schools - and so, it seems, does another week of testing wackiness.
The English Language Arts exam week ended on Friday with the decision by the state education commissioner, John B. King Jr., to scrap the answers to an absurd question - literally and otherwise - about a pineapple and a hare that had stymied eighth-grade test takers.
--cut here--
Further down we get to the relevant part:
--cut here--
Mr. Perelman tested the e-Rater and found that âoethe automated reader can be easily gamed, is vulnerable to test prep, sets a very limited and rigid standard for what good writing is, and will pressure teachers to dumb down writing instruction.â
You have to read the column to find out the many ways that the e-Rater misreads good writing. The examples are delicious - and pitiful. But to reveal one issue identified by Mr. Perelman:
The e-Rater's biggest problem, he says, is that it can't identify truth. He tells students not to waste time worrying about whether their facts are accurate, since pretty much any fact will do as long as it is incorporated into a well-structured sentence. "E-Rater doesn't care if you say the War of 1812 started in 1945," he said.
Give E.T.S. credit for allowing Mr. Perelman to conduct his testing. Two other major testing services, Vantage Learning and Pearson - developer of the offending English Language Arts exam - said no.
--cut here--
The article linked in this /. article's summary refers to another article:
https://www.nytimes.com/2012/04/23/education/robo-readers-used-to-grade-test-essays.html
Here are some snippets from it, in case you need them for your search engine:
--cut here--
Facing a Robo-Grader? Just Keep Obfuscating Mellifluously
By MICHAEL WINERIP
Published: April 22, 2012
A recently released study has concluded that computers are capable of scoring essays on standardized tests as well as human beings do.
Mark Shermis, dean of the College of Education at the University of Akron, collected more than 16,000 middle school and high school test essays from six states that had been graded by humans. He then used automated systems developed by nine companies to score those essays.
--cut here--
This article in turn links to: /. article's summary.
www.documentcloud.org/documents/346138-essay-awarded-a-top-grade-by-e-rater.html
which is also linked in this
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
And even if it isn't delivered right, you just shake the etch-a-sketch and start over.
When I saw the title I thought it was referring to those robot graders that they use to level the road substrate whilst making roads (A new bridge is being built near my work) They are quite fascinating to watch work but I wouldn't want to get in the way of one of them.
My ism, it's full of beliefs.
Grades rob you!
We now return you to normal discussion.
My ism, it's full of beliefs.
Unless what you teach the students is worthless as well. If it is just conformance to secondary things like spelling, basic grammar, sentence-length, superficial structure, etc. then robo-grading will do fine. Of course, none of the students being taught this way will learn to write anything of worth, ever. For that you need a competent and intelligent human being (or at least an equivalent intelligence) that understands what the student was trying to say and whether he/she succeeded or not, and why precisely. Grading involves as its most important component the feedback to the student, the actual grade is secondary and does not help the student improve his/her writing at all.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Have gnu, will travel.
well we need more tech / vocational schools!
also more jobs that don't need BA for jobs that used to not need it.
As someone who is working in linguistics close to AI research I can attest that the whole idea of automated grading of essays is completely ridiculous and if it is indeed used as the post suggests will likely ruin generations of students. Apart from not working, it is also wrong in various other respects such as sending the wrong signals to young students, implicitly ridiculing the hard work that writing actually is, saving money in the wrong place, and so forth.
I mean, com'on ... all of the above is so obvious that it shouldn't even have to be mentioned. What kind of imbecile illiterate would allow grading of essays by a statistical text-mining program anyway?
This problem is not specific to robo-graders. I made a solid rule of finding topics that I found interesting -and- were highly unlikely to be areas of specialty for the teacher/professor/TA grading the paper. It took slightly more effort to find the "right" topics, but it more than paid off in the long run, since the likelihood of the average test grader spending days researching every 10+ page paper they are grading is pretty low.
Obviously as your volume of large papers and required topics narrows this becomes less effective, but it's quite a good system in high school through most of undergrad studies. I guess I assumed most people did this. FWIW, I did write pretty good papers, they weren't full of B.S. (well, just average volumes of B.S.) but by getting the topic as far "out" as possible, it helped minimize criticism outside of the basic structure, citation, etc.
Just another ignorant American.
This program could be abused just like some websites manage to fool Google's PageRank algorithm.
People say E-Rater checks for proper grammar and spellings, the pages in Google's results also have proper grammar and all. The actual content is what is not wanted. If someone manages to write a completely unrelated essay.. but complete with proper spellings and grammar, he might be able to fool the software.
A better approach would be a software that would require you to input the essay topic for which it would then scourge the internet for related keywords and all. Something just like Siri does. AND THEN checks you essay for proper content + grammar + spellings. If any weird exceptions are encountered, they are flagged for manual checking.
This should give a full-proof E-Rater.
I took the SAT a few years after they added the essay section. We were always told that it's important to back up any argument you make with facts, but the accuracy of these facts would not be checked. If you wanted to support your argument with events from a war but you weren't sure what year it was? Just guess. This is when the essays were scored by humans (maybe they still are?) according to a rubric.
Here's an interesting blog post on the subject.
Are robo-graders as good as or better than human graders because the quality of the human graders is so low? When you have literally millions of SAT essays to grade, you can't afford to be choosy with your staff and as a result the quality of the work is depressed.
I read the internet for the articles.
The article reveals frightening things about how colleges are structured:
They talk about how accurate the robo-graders are:
Computer scoring produced “virtually identical levels of accuracy, with the software in some cases proving to be more reliable,” according to a University of Akron news release.
That's amazing! So let us see why they are so good:
Graders working as quickly as they can — the Pearson education company expects readers to spend no more than two to three minutes per essay— might be capable of scoring 30 writing samples in an hour.
Aha! So it isn't that the robo-graders are as good as human graders. The robo-graders are as accurate as a person who is not given enough time to read the actual essay. So if I create a robo-surgeon that is as good as a surgeon who has only 5 minutes to perform open-heart surgery, can I then say that my robo-surgeon is as good as a real surgeon? Of course not - they gamed the metrics to make the robo-graders look good. Is anyone else concerned that the dean of University of Akron only cares about how fast the tests are graded?
Later on in the article:
They [E.T.S] say Mr. Perelman is setting a false premise when he treats e-Rater as if it is supposed to substitute for human scorers.
So the robo-graders are not a substitute for human scorers. That isn't what the schools seem to think.
This is great: students will be graded by robots, so they will get degrees with no real writing skills. Then those students become the teachers, who cannot grade essays, compounding the problem with each generation. I fear this is how Idiocracy will come to pass: Everyone will be trained and educated in professional nonsense.
I think computers have the ability to automate huge areas people think require 'judgment'. Will they be perfect or catch odd cases? Probably not. Yet, that must be weighed against the ability to provide the service on mass.
For example, radiographers are currently some of the highest paid medical professionals. Today, automated detection is already quite high in terms of accuracy (80%+). About the same as human radiographers. For example.
http://www.breastcancer.org/symptoms/testing/new_research/20081001b.jsp
Is it possible a human radiographer could detect weird anomalies or something. Of course. But as a mass provided service, the computer would be way cheaper and provide affordable healthcare. Obviously before surgery, a human should probably double check :P
While I doubt the technology is there yet, I certainly don't think it impossible to have robo-grading for the evaluation of mass essays. Again, we have to compare it to the real world with people. Sure, a human grader going through every essay in detail might be better. But on average how thorough are graders? How thorough are patent examiners in examining patents on a mass scale?
Could we not imagine a system where the professor lists points they 'expect' to see in the essay. Somehow natural language processing can check for these points.
I could certainly imagine that working for essays you might write in high school for Shakespeare or an analysis of a book.
If I remember my high school, there was always a limited set of themes and points discussed.
Of course professors can always recheck for really creative work that the program mucks up.
But I think people overestimate the creativity of people in the school environment when applied to a large user set.
If you loo
The GRE, which is like the SAT for grad students, uses robotic graders to identify essays, and as it turns out, the robot is looking for three things; grammar, essay, and the traditional "5-paragraph" format. It's not interested in your ability to compose thought, and it's completely inept at judging whether or not a given student's writing is on-par for the expectation of graduate technical writing.
I'll reply to you.
To me, that's at least part of the "educational game". If you were really given carte blanche on topics, then chops to you for writing about the role of malnutrition in Ancient Egypt or something. No matter how exhausted, a Teacher-person looked at it, used their gut guess to decide it wasn't total spam, and gave it a grade.
Being graded by Robo-Graders just thunders "Belly of the Beast" and is so dehumanizing that it begs the smarter students to play Beat the System with the funniest paper to win. Mimsy were the Borogoves, or that Isaac Asimov Thimotiline (sp?) joke story-paper 50 years ago. That's if the student even bothers. Or, in the Business School (In the dream land if I had a Rich Dad) I'd purposely use one of the Essay Generator programs, submit that, wait for it to be kicked, then write a mock paper on the "Corporate CEO approach" and about how to "outsource" the paper. Then a third one about "Litigation as a Business Tool" with an attached lawsuit. "Isn't this how it's done in the real world?" "Uh.... yes?" "Good. Now Sudo give me my A."
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine
Why not build it into office suites and see if student use it to write better papers and turn in when they are happy with the grade. My guess is that the quality of the paper will go down after the second proofreading/editing.
It is not surprising that statistical analysis can distinguish between good or bad essays.
Just take a bunch of human-graded essays and try to find correlations on things like the presence of some words, total length, etc... Smarter algorithms may analyze things like proper spelling and syntax.
The idea here is that the robo-grader does not really grade the essay, it tries to mimic the most superficial aspect of human grading. For example, most of the time the word "wether" is a spelling mistake and the robo-grader may simply lower the score each time the word is present. The rare cases where the word is properly used is statically insignificant so it doesn't matter... 99% of the time.
This approach may work well but it is totally unfair. It is like grading using race or gender as a criterion. It will certainly improve correlation but I hope that no one think about it seriously.
Robo-graders are spiffy, provided you don't have students capable of original thought or creative problem solving.
Come to think of it, there are plenty of so called wonderful teachers who can't deal with it either.
They just want regurgitation, that's why TN has the teach the controversy and ORU actually has students.
Actually, the results of the essay evaluation - that form is valued over content, that eloquence is valued over truth - strongly mirrors my own experiences in academia. So many of the "soft" arts are either teaching how to put a shiny veneer over a turd, or simply an evaluation of how closely the student's expressed beliefs match their professor's. Form exceeds function; indoctrination exceeds learning. We're coming full circle, aren't we?
Just try expressing libertarian or conservative views on campus these days. See what it does to your grades.
For what it's worth, when I took the required W.E.S.T. (Writing Effectiveness Screening Test) in my junior year at a California State University campus, my percentile ranking and evaluation placed me second in the state that year (as in, only one person scored higher). I did it with a combination of "this is probably what you want to hear," and "the entire question is full of shit". I did it grammatically correctly, spelled correctly with the flowery words, purple prose and the kind of empty turns of phrase that make liberal arts professors titter like Japanese schoolgirls in a hentai video. Shows you what evaluation boards know....
Everybody gets what the majority deserves.
So what exactly is going to motivate students to write something that no human will read?
Even a graffiti artist cares that his writing is finds an audience.
Do what thou wilt shall be the whole of the Law
For some people like me, cannabis really is a miracle cure. Maybe not the cure-all that some claim, but for migraine management it has no peer.
It take 5 minutes to get complete symptomatic relief by using indica-dominant cannabis: no auras, no nausea, no pain, no light-sound-odour sensitivity. And it works all the time.
Compare that to triptans, "modern medicine" for migraines that manipulate seratonin levels. They take 30-45 agonizing minutes to be absorbed by the stomach. They only work 40-50% of the time, giving you no relief at all the rest of the time. And when combined with SSRIs by accident, they cause a severely dangerous condition called "seratonin syndrome" which can not only cause brain damage and bi-polar or full scale schizophrenia permanently, seratonin syndrome can even be fatal.
As someone who has suffered seratonin syndrome damage, I say unequivocally and with no room for debate:
Cannabis is the superior tool for managing migraines.
I do not fail; I succeed at finding out what does not work.