How Good Are Robo-Graders?
stoolpigeon writes "With a large study showing software grades essays as well as humans, but much faster, it might seem that soon humans will be completely out of the loop when it comes to evaluating standardized tests. But Les Perelman, a writing teacher at MIT, has shown the limits of algorithms used for grading with an essay that got a top score from an automated system but contained no relevant information and many inaccuracies. Mr. Perelman outlined his approach for the NY Times after he was given a month to analyze E-Rater, one of the software packages that grades essays."
How quickly will students learn to game the system to get perfect scores with perfect gibberish?
What political party do you join when you don't like Bible-thumpers *or* hippies?
I don't think auto-graders are a good idea. Where is the information exchange between student and teachers? Teachers need to read student essays not just to assign the grade, but to exchange knowledge with their students Opinions and comments should be two-sided exchanges, if students are writing things that aren't going to be read, how does that work?
While it is true that you can engineer essays to be 'bad' and still score 'good' - the question is - are there natural essays that score good but are actually bad; and good essays that score bad but are actually good.
Every analysis I've seen suggests that these algorithms do have problems with good essays that are highly creative. Essay graders also have difficulties with this kind of essay - giving drastically varied scores.
However there doesn't seem to be much evidence of other issues except when an extremely knowledgable issue deliberately trys to make the algorithm fail. Any student or other individual who can do this probably knows that material well enough to 'get an A' if they were to properly apply what they know so this seems like a non issue.
After thorough consideration of this first post and its contents, I find this I must respond in the most considerate and throughtful way possible. This first post was clearly written before the second post and well in advance of this reply. Based on this, it is only logical to assume that this first post was written before any other posts. This leads me to think that crazyjj was quicker reflexes and reading skills than his compatriots.
My research has shown that people with quick reflexes make 80% more in real dollar terms than others[1] and are more likely to lead a longer life than their slower reading friends [2]. Clearly crazyjj is at an extreme advantage compared to the rest of slashdot.
Can America survive with this type of inequality? I think not. We must institue some type of equalizer. Perhaps crazyjj should be given a keyboard with several broken keys. Or perhaps we should simply bash his fingers a few times. In the words of Abraham Lincoln, "A man who types too fast can't be trusted."[3] Abraham Lincoln saw the danger that crazyjj represents and warned us. Will we listen?
See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
Considering the fake generated paper that was published in a peer reviewed journal, I'd say that means the robo-graders are on par with human proof readers.
Even people that believe in pre-destiny look both ways before crossing the street.
So you're telling me we've not only solved the natural language problem, we're also wasting it on grading essays?
We're not even close. Robo-grading essays is not only cheating, it's probably the worst disservice a school could do to its students. When you grade an essay you're looking at far more than technical accuracy (spelling, word count, formatting, valid citations). You're looking for meaning, articulation and interesting points of view. Robots can't teach critical analysis, can't offer helpful critiques of writing style, and certainly can't make judgement calls on how "good" an essay is.
Our "corporate firewall" frequently gets things wrong. A site on "Sharp calculators" was classified as a weapons site, though I would imagine that stabbing anyone with one would be difficult. A "security software slap-down" was classed as "tasteless and violence", though no security software was injured. In short robo-graders are probably only any good for politicians, where the content doesn't matter as long as its delivered right.
The rob grader can check spelling, grammar, structural style. The human grader can check for content accuracy and essay quality and creativity.
I dont worry too much about gaming the system. To "fool" the grader you'll have to learn spelling, grammar and structural style - exactly what the test-makers want.
News good. Paywall bad. A Google News search for the first couple of paragraphs should bring up either the NYT article or another copy of it.
Note that "em-dashes" have been changed to hyphens and "curly" apostrophes and quotation marks have been changed to "straight" versions marks to accommodate /. as viewed in my browser. Please avoid blocks of text that have -, ', or " when selecting text for search engines.
--cut here--
Testing Absurdities, Reading Worries and Robo-Grading
April 23, 2012, 8:19 a.m.
By Mary Ann Giordano
Week 2 of standardized testing begins in the New York City public schools - and so, it seems, does another week of testing wackiness.
The English Language Arts exam week ended on Friday with the decision by the state education commissioner, John B. King Jr., to scrap the answers to an absurd question - literally and otherwise - about a pineapple and a hare that had stymied eighth-grade test takers.
--cut here--
Further down we get to the relevant part:
--cut here--
Mr. Perelman tested the e-Rater and found that âoethe automated reader can be easily gamed, is vulnerable to test prep, sets a very limited and rigid standard for what good writing is, and will pressure teachers to dumb down writing instruction.â
You have to read the column to find out the many ways that the e-Rater misreads good writing. The examples are delicious - and pitiful. But to reveal one issue identified by Mr. Perelman:
The e-Rater's biggest problem, he says, is that it can't identify truth. He tells students not to waste time worrying about whether their facts are accurate, since pretty much any fact will do as long as it is incorporated into a well-structured sentence. "E-Rater doesn't care if you say the War of 1812 started in 1945," he said.
Give E.T.S. credit for allowing Mr. Perelman to conduct his testing. Two other major testing services, Vantage Learning and Pearson - developer of the offending English Language Arts exam - said no.
--cut here--
The article linked in this /. article's summary refers to another article:
https://www.nytimes.com/2012/04/23/education/robo-readers-used-to-grade-test-essays.html
Here are some snippets from it, in case you need them for your search engine:
--cut here--
Facing a Robo-Grader? Just Keep Obfuscating Mellifluously
By MICHAEL WINERIP
Published: April 22, 2012
A recently released study has concluded that computers are capable of scoring essays on standardized tests as well as human beings do.
Mark Shermis, dean of the College of Education at the University of Akron, collected more than 16,000 middle school and high school test essays from six states that had been graded by humans. He then used automated systems developed by nine companies to score those essays.
--cut here--
This article in turn links to: /. article's summary.
www.documentcloud.org/documents/346138-essay-awarded-a-top-grade-by-e-rater.html
which is also linked in this
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
The difference between teaching the test and not doing standardized testing is that now we teach the test, instead of nothing at all. If the students game the robo-grader, they've learned *something*. Standardized testing is a bad answer to a problem that's so bad that every other approach we've tried has failed. The real solution is to make parents care. However, punishment is highly unlikely to work, and we really, really shouldn't have the government trying any other approaches (propoganda is bad, government propaganda is worse).
Give me a better solution. I reject your "more money" approach; it's been demosntrated over the last 50 years to be a national scale disaster.
The solution is simple remove the kids that don't care, it seems harsh but they are the reason classrooms get stuck in a quagmire. Offer an education to everyone but do not force it on people that don't want it and will waste people's time that do want it. The the true secret of private schools is that everyone there has parents that value education and for the most part they do too. Once disruptive and unmotivated students are removed from the class the teachers can be held accountable for their classrooms, and are typically motivated because the students genuinely care.
Knowledge = Power
P= W/t
t=Money
Money = Work/Knowledge so the less you know the more you make
Unless what you teach the students is worthless as well. If it is just conformance to secondary things like spelling, basic grammar, sentence-length, superficial structure, etc. then robo-grading will do fine. Of course, none of the students being taught this way will learn to write anything of worth, ever. For that you need a competent and intelligent human being (or at least an equivalent intelligence) that understands what the student was trying to say and whether he/she succeeded or not, and why precisely. Grading involves as its most important component the feedback to the student, the actual grade is secondary and does not help the student improve his/her writing at all.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
Since when are teachers with any relationship to the students involved in standardized tests, except as proctors?
This problem is not specific to robo-graders. I made a solid rule of finding topics that I found interesting -and- were highly unlikely to be areas of specialty for the teacher/professor/TA grading the paper. It took slightly more effort to find the "right" topics, but it more than paid off in the long run, since the likelihood of the average test grader spending days researching every 10+ page paper they are grading is pretty low.
Obviously as your volume of large papers and required topics narrows this becomes less effective, but it's quite a good system in high school through most of undergrad studies. I guess I assumed most people did this. FWIW, I did write pretty good papers, they weren't full of B.S. (well, just average volumes of B.S.) but by getting the topic as far "out" as possible, it helped minimize criticism outside of the basic structure, citation, etc.
Just another ignorant American.
I'll reply to you.
To me, that's at least part of the "educational game". If you were really given carte blanche on topics, then chops to you for writing about the role of malnutrition in Ancient Egypt or something. No matter how exhausted, a Teacher-person looked at it, used their gut guess to decide it wasn't total spam, and gave it a grade.
Being graded by Robo-Graders just thunders "Belly of the Beast" and is so dehumanizing that it begs the smarter students to play Beat the System with the funniest paper to win. Mimsy were the Borogoves, or that Isaac Asimov Thimotiline (sp?) joke story-paper 50 years ago. That's if the student even bothers. Or, in the Business School (In the dream land if I had a Rich Dad) I'd purposely use one of the Essay Generator programs, submit that, wait for it to be kicked, then write a mock paper on the "Corporate CEO approach" and about how to "outsource" the paper. Then a third one about "Litigation as a Business Tool" with an attached lawsuit. "Isn't this how it's done in the real world?" "Uh.... yes?" "Good. Now Sudo give me my A."
My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine