Slashdot Mirror


How Good Are Robo-Graders?

stoolpigeon writes "With a large study showing software grades essays as well as humans, but much faster, it might seem that soon humans will be completely out of the loop when it comes to evaluating standardized tests. But Les Perelman, a writing teacher at MIT, has shown the limits of algorithms used for grading with an essay that got a top score from an automated system but contained no relevant information and many inaccuracies. Mr. Perelman outlined his approach for the NY Times after he was given a month to analyze E-Rater, one of the software packages that grades essays."

40 of 157 comments (clear)

  1. More importantly by crazyjj · · Score: 5, Insightful

    How quickly will students learn to game the system to get perfect scores with perfect gibberish?

    --
    What political party do you join when you don't like Bible-thumpers *or* hippies?
    1. Re:More importantly by sglewis100 · · Score: 3, Insightful

      How quickly will students learn to game the system to get perfect scores with perfect gibberish?

      Spammers with poor spelling and grammar figured out combinations of gibberish to get around Bayesian spam filtering, I can only imagine relatively smart students will figure out ways to beat the software in time. But hopefully, if people implement systems like this, there will be some checks and balances. Fear of receiving a '0' for a test coupled with having essays randomly graded (smaller numbers) and reviewed / skimmed quickly (larger numbers) ought to be a good start.

    2. Re:More importantly by Anonymous Coward · · Score: 5, Funny

      How quickly will students learn to game the system to get perfect scores with perfect gibberish?

      Noooooooooo.

      I had to deal with a Robo grader once during an exam. Time was up and I was still writing. Several large automatic weapons appeared and in a robotic voice it said, "Drop your pen!"

      I did immediately and it said, "Thank you for your cooperation."

      Or that might have been when I was taking an art class taught by Peter Weller .... I don't remember now.

    3. Re:More importantly by alen · · Score: 2

      yes you can

      most of the skill of a good teacher is know child psychology and how to handle kids with different issues and different stages of development

      memorizing a few facts is fairly easy

    4. Re:More importantly by BravoZuluM · · Score: 3, Insightful

      What does it mean to game the system? The game paper, while not pertaining to the subject, is a well written paper. It is not gibberish. It would take some talent to produce the gamed paper and probably more time. Given that, why wouldn't the student just write an on topic paper?

      Given the bigger picture, writing is an art form. An essay is an art form. Even a human grading the paper might miss the nuances of what is being written. Who can truly say what the author has written is incorrect, when in writing, there is no incorrect or correct. There is just a continuum from bad to good writing.

    5. Re:More importantly by TheRaven64 · · Score: 3, Insightful

      The tube drivers in London were recently on strike over pay. Their salaries are around £40k (about $65K), but for a decade or so most of the train control has been completely automated: they're just there to press the emergency stop button if there is something wrong with the automated system (which a human will notice but another automated system won't and, for example, cut power to that segment of track). So, judging by the past, teachers that did nothing but press play on a video machine would be better paid than ones that actually taught...

      --
      I am TheRaven on Soylent News
    6. Re:More importantly by gnick · · Score: 3, Funny

      Several large automatic weapons appeared and in a robotic voice it said, "Drop your pen!"

      I did immediately and it said, "Thank you for your cooperation."

      You were lucky. You should see what happened to the guy in this documentary when the robo-grader didn't hear the pen hit the floor.

      --
      He's getting rather old, but he's a good mouse.
    7. Re:More importantly by NReitzel · · Score: 5, Insightful

      Well, yes.

      E-Rater (a product with which I have some familiarity) is specifically sold to improve form and grammer, and the product explicitly states that it does not grade content.

      So, what you are saying is that the students will figure out how to write with excellent grammar and form, in order to get good grades.

      Well, yeah.

      That's the whole point. That, and the fact that you can have a student write a short essay in 30 minutes, and give them immediate feedback on what they have done wrong, as far as sentence form and grammar are concerned.

      Generally, a student may know what they want to say, and have difficulty putting it into English prose in a way that might convince the reader that they have a clue about that of which they speak.

      Don't think it matters? What kind of result do you think Mr. Churchill might have received if he had stated, "Them Nazis is bad, we gots to beat em."

      Mr. Perelman spent a month of effort carefully crafting an essay that said nothing, eloquently. If our students can do that, more power to them.

      --

      Don't take life too seriously; it isn't permanent.

    8. Re:More importantly by masternerdguy · · Score: 5, Insightful

      No. This education degree stuff is crap. A teacher should have at least a masters degree in the topic they intend to teach.

      --
      To offset political mods, replace Flamebait with Insightful.
    9. Re:More importantly by anyGould · · Score: 4, Insightful

      No. This education degree stuff is crap. A teacher should have at least a masters degree in the topic they intend to teach.

      Problem 1: Teachers don't get to choose what classes they get - I knew an English teacher who ended up teaching Intro Computing because.. they needed a computing teacher and he was available. Especially for newer teachers - you teach what they tell you to teach.

      Problem 2: Are you intending to pay all those teachers in accordance with the extra 2+ years of education you're requiring?

      Problem 3: At lower levels, you have A Teacher, not A Math Teacher and An English Teacher. Do you expect your kid's grade 1 teacher to hold multiple degrees? (And see problem #2, expanded to pay for a teacher holding half a dozen post-grad degrees so you feel comfortable letting them teach your kid ABCs.)

    10. Re:More importantly by sourcerror · · Score: 2

      Even in Europe they only require this from highschool teachers.

    11. Re:More importantly by anyGould · · Score: 3, Insightful

      So, what you are saying is that the students will figure out how to write with excellent grammar and form, in order to get good grades.

      I think that's naive. I think one kid will figure out how to get the computer to kick out excellent grammar and form (a lot easier when you don't actually care about the content), and in short order most of the smart/cunning kids will be using that (the cunning ones because it's a cheap A; the smart ones because they'll want to concentrate on subjects where knowledge matters, as opposed to something that can be outsourced to small shell scripts).

    12. Re:More importantly by Half-pint+HAL · · Score: 3, Insightful

      I don't see why this would be different from current auditing practices. If an external examiner finds that your students have been incorrectly marked, it's either an automatic scaling of grades for everyone, or back to the red pen and regrade everything.

      --
      Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
    13. Re:More importantly by bhlowe · · Score: 3, Insightful

      A student can game the system by writing their paper, running it through one or more "grading" systems... and making changes until it comes out an "A". Obviously, you would want to do this in a way that it does this while retaining the content and expected "readability" desired.

      The fact is most "jobs" that humans do will be able to be done by a robot or computer. I can easily envision a future where kids get the best personalized teaching experience from a computer "coach"... who can tailor each kid's lesson much more skillfully than the average teacher trying to teach to 120 kids of a multitude of abilities. Teacher will be left to enforce discipline, dry tears, lead group exercises (as determined by the computer) and smile and wave at the kids as they come and go.

    14. Re:More importantly by s0nicfreak · · Score: 3, Informative

      Did you know that often schools only teach students what is required to pass the tests, and much of that is forgotten during school vacations, not to mention after several years of being out of school?

      Just the day before yesterday I was behind someone in a checkout line that didn't have enough to pay their bill on their debit card. So the cashier and the lady were trying to work out how much would be remaining after the amount on the debit card was used. After several minutes of both of then failing to figure it out, and the customer just handing the cashier some money (though not enough to cover the whole bill) they called over a manager, who showed the cashier that if she charged the debit card first it would show her the remaining amount. So then they counted how much money the customer had handed the cashier... and both tried to work out how much more was needed. After a minute the manager figured out how to type the amount into the register and be told the remaining bill.

      I'm not saying cashiers don't know basic math, but quite a few of them would not be able to do their job without a register or at least a calculator.

    15. Re:More importantly by EmperorOfCanada · · Score: 2

      Nope. I would be willing to say every cashier that I have ever seen manually do math has failed. If I pull a stunt like handing them a 20 and then a dime for something that cost 19.01 they are often lost calculating the 1.09 change if they had entered 20 into the till. Another store's till broke and the cashier was nearly in tears trying to work out tax with a calculator, and this was a single item sort of store.She was taking say a 25 dollar purchase and applying 15% tax and coming up with a total purchase price of 8 dollars. Car salesman take advantage of this every day. They will sell you a car and tell you that it is one price and you are getting it at a certain interest rate and your monthly payments will be another price. But if you do the math it will usually turn out you are paying a grand or so more. They know that 99% of people can't work out loan payments.

      I don't know how exactly the schools are failing but almost regardless of the level of grade school math education people are usually unable to apply math to real life. Tell them that half the population is below any average and they will tell you that you are below average. Show them that the fees in mutual funds work against the whole idea of compound interest and they stare at you like you are speaking Greek.

    16. Re:More importantly by jc42 · · Score: 3, Funny

      What kind of result do you think Mr. Churchill might have received if he had stated, "Them Nazis is bad, we gots to beat em."

      Here in the US, we'd just elect him president.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    17. Re:More importantly by Half-pint+HAL · · Score: 2

      Mr. Perelman spent a month of effort carefully crafting an essay that said nothing, eloquently. If our students can do that, more power to them.

      But if you read TFA to the end, you'd see this quote:

      "Two former students who are computer science majors told him that they could design an Android app to generate essays that would receive 6’s from e-Rater."

      ...which kind of defeats the purpose of the exercise. Why would I spend a day trying to craft independent thought if I could get a guaranteed pass for a $0.99 download?

      The marker bot doesn't reward "good writing", it rewards the employ of a few very superficial metrics. Which is like the language exams I've done.

      --
      Got them moderator blues I blieve I walk out the do', With these mod-points I been gettin', I 'most never post no mo'
    18. Re:More importantly by anyGould · · Score: 2

      Maybe you're misunderstanding? It is often a requirement in the United States (I know it is in New York) to have a master's degree in education. So you spend two years learning God knows what (I know plenty of people with the degree, haven't been wow'ed by their responses as to what they did to earn it). However I'm unqualified to teach in public school because I have a master's in electrical engineering.

      You won't get any argument from me on the curriculum of an education degree (and I know quite a few teachers who won't either) - but the crux is this: an education degree is supposed to teach you to teach. It's all well and good for you to be an expert in the field, but if you can't get the concepts across to your students you're no better than the textbook. By contrast, I know a couple people who are excellent instructors, regardless of how much they personally know about the topic (they're also the first to admit when they've hit the edge of their knowledge), but with a week or two crash course, they can get a room of people to learn whatever the topic of the day is.

    19. Re:More importantly by bryan1945 · · Score: 2

      So that boring reading of essays to see what the students' thought processes are is better shoveled off to a machine that has no concept of what is being written. Go read TFA, it's a pathetic essay. Somehow the last 3 generations of my family's teachers managed to teach and grade and get us all into college. They even spent after hours with the kids who were slower learners. Amazing how standards have fallen.

      --
      Vote monkeys into Congress. They are cheaper and more trustworthy.
  2. Sorry, human intervention required by LostCluster · · Score: 4, Insightful

    I don't think auto-graders are a good idea. Where is the information exchange between student and teachers? Teachers need to read student essays not just to assign the grade, but to exchange knowledge with their students Opinions and comments should be two-sided exchanges, if students are writing things that aren't going to be read, how does that work?

    1. Re:Sorry, human intervention required by Dyinobal · · Score: 4, Insightful

      Yep any essay should come back with feed back written on it in the margins/space between lines. Plus I doubt auto graders will mean anything except for kids learning to write a specific way that the auto grader is programmed to grade well.

    2. Re:Sorry, human intervention required by dkleinsc · · Score: 2

      At the same time, I've seen significant flaws in the grading practices of human graders. For instance, I distinctly remember the paper I got back in my college years that said something along the lines of "Really interesting, well written, and insightful. B-". I also remember some essays that were pure unadulterated nonsense that got very high grades (including a 4-week project that I started on during school the day it was due and received an A).

      --
      I am officially gone from /. Long live http://www.soylentnews.com/
    3. Re:Sorry, human intervention required by Zordak · · Score: 3, Funny

      When I was in high school, we read A Portrait of the Artist as a Young Man. This is literally the worst alleged novel I have ever read. I actively despised it with my entire soul. So I skipped huge chunks of it wherever I figured I could get away with doing so and still pick up the threads of the mostly nonexistent plot.

      When we (finally) finished the thing, we had to write a series of short essays responsive to several prompts. One of the prompts told us to describe the symbolism and significance of the "rose."

      Having skipped huge portions of the book, I had never encountered this purported rose. And I certainly wasn't going to go back and pick through the dense, sophomoric prose to find it. Instead, I figured I could probably pick up some partial credit by saying some random insightful-sounding thing. So I started spewing what English teachers love. I used words like "juxtaposition" and "antithesis" and compared the rose to some other random symbolic object in the book. It was pure, unadulterated, Grade A, premium All-American BS.

      I got an A on the paper. The teacher was particularly profuse in her praise of my short essay on the "rose," commented that I had captured the symbolism of the "rose" perfectly. I couldn't have agreed more.

      --

      Today's Sesame Street was brought to you by the number e.
    4. Re:Sorry, human intervention required by dkleinsc · · Score: 2

      Zordak explained it: If my paper had sucked, I would have been fine with a bad grade and ideally some information on why it sucked so I could do better the next time. But instead what I got was "good work, I'm still giving you a bad grade for reasons I won't explain to you".

      --
      I am officially gone from /. Long live http://www.soylentnews.com/
  3. but how well does it work in the real world by LetterRip · · Score: 2

    While it is true that you can engineer essays to be 'bad' and still score 'good' - the question is - are there natural essays that score good but are actually bad; and good essays that score bad but are actually good.

    Every analysis I've seen suggests that these algorithms do have problems with good essays that are highly creative. Essay graders also have difficulties with this kind of essay - giving drastically varied scores.

    However there doesn't seem to be much evidence of other issues except when an extremely knowledgable issue deliberately trys to make the algorithm fail. Any student or other individual who can do this probably knows that material well enough to 'get an A' if they were to properly apply what they know so this seems like a non issue.

  4. 100% A+ Perfect Reply by MyLongNickName · · Score: 4, Insightful

    After thorough consideration of this first post and its contents, I find this I must respond in the most considerate and throughtful way possible. This first post was clearly written before the second post and well in advance of this reply. Based on this, it is only logical to assume that this first post was written before any other posts. This leads me to think that crazyjj was quicker reflexes and reading skills than his compatriots.

    My research has shown that people with quick reflexes make 80% more in real dollar terms than others[1] and are more likely to lead a longer life than their slower reading friends [2]. Clearly crazyjj is at an extreme advantage compared to the rest of slashdot.

    Can America survive with this type of inequality? I think not. We must institue some type of equalizer. Perhaps crazyjj should be given a keyboard with several broken keys. Or perhaps we should simply bash his fingers a few times. In the words of Abraham Lincoln, "A man who types too fast can't be trusted."[3] Abraham Lincoln saw the danger that crazyjj represents and warned us. Will we listen?

    --
    See my journal for slashdot ID's by year. Mine created in 2005. http://slashdot.org/journal/289875/slashdot-ids-by-year
    1. Re:100% A+ Perfect Reply by roman_mir · · Score: 2

      In about 12 years of being registered here my /. 'friends' list has grown very very slowly.

  5. Human vs. Software by Anti_Climax · · Score: 4, Insightful

    But Les Perelman, a writing teacher at MIT, has shown the limits of algorithms used for grading with an essay that got a top score from an automated system but contained no relevant information and many inaccuracies.

    Considering the fake generated paper that was published in a peer reviewed journal, I'd say that means the robo-graders are on par with human proof readers.

    --
    Even people that believe in pre-destiny look both ways before crossing the street.
  6. Robo-graders? by Anonymous Coward · · Score: 5, Insightful

    So you're telling me we've not only solved the natural language problem, we're also wasting it on grading essays?

    We're not even close. Robo-grading essays is not only cheating, it's probably the worst disservice a school could do to its students. When you grade an essay you're looking at far more than technical accuracy (spelling, word count, formatting, valid citations). You're looking for meaning, articulation and interesting points of view. Robots can't teach critical analysis, can't offer helpful critiques of writing style, and certainly can't make judgement calls on how "good" an essay is.

    1. Re:Robo-graders? by masternerdguy · · Score: 2

      The problem is human graders at the high school level only look at the things this program looks at. I've read and graded the kinds of 5 paragraph theme essays they are talking about and we don't look at content. It's sad you can replace SUBJECT in those essays with any noun (frogs, cars, china, hellcats) and the essay makes the same amount of sense.

      --
      To offset political mods, replace Flamebait with Insightful.
  7. Its like any auto-text parsing - it gets it wrong. by Chrisq · · Score: 2

    Our "corporate firewall" frequently gets things wrong. A site on "Sharp calculators" was classified as a weapons site, though I would imagine that stabbing anyone with one would be difficult. A "security software slap-down" was classed as "tasteless and violence", though no security software was injured. In short robo-graders are probably only any good for politicians, where the content doesn't matter as long as its delivered right.

  8. free graders to jusdge content by peter303 · · Score: 3, Insightful

    The rob grader can check spelling, grammar, structural style. The human grader can check for content accuracy and essay quality and creativity.

  9. beast way to fool robot is to learn how to write by peter303 · · Score: 2

    I dont worry too much about gaming the system. To "fool" the grader you'll have to learn spelling, grammar and structural style - exactly what the test-makers want.

  10. New York Times article snippet and more by davidwr · · Score: 2

    News good. Paywall bad. A Google News search for the first couple of paragraphs should bring up either the NYT article or another copy of it.

    Note that "em-dashes" have been changed to hyphens and "curly" apostrophes and quotation marks have been changed to "straight" versions marks to accommodate /. as viewed in my browser. Please avoid blocks of text that have -, ', or " when selecting text for search engines.

    --cut here--
    Testing Absurdities, Reading Worries and Robo-Grading
    April 23, 2012, 8:19 a.m.
    By Mary Ann Giordano

    Week 2 of standardized testing begins in the New York City public schools - and so, it seems, does another week of testing wackiness.

    The English Language Arts exam week ended on Friday with the decision by the state education commissioner, John B. King Jr., to scrap the answers to an absurd question - literally and otherwise - about a pineapple and a hare that had stymied eighth-grade test takers.

    --cut here--

    Further down we get to the relevant part:

    --cut here--
    Mr. Perelman tested the e-Rater and found that âoethe automated reader can be easily gamed, is vulnerable to test prep, sets a very limited and rigid standard for what good writing is, and will pressure teachers to dumb down writing instruction.â

    You have to read the column to find out the many ways that the e-Rater misreads good writing. The examples are delicious - and pitiful. But to reveal one issue identified by Mr. Perelman:

      The e-Rater's biggest problem, he says, is that it can't identify truth. He tells students not to waste time worrying about whether their facts are accurate, since pretty much any fact will do as long as it is incorporated into a well-structured sentence. "E-Rater doesn't care if you say the War of 1812 started in 1945," he said.

    Give E.T.S. credit for allowing Mr. Perelman to conduct his testing. Two other major testing services, Vantage Learning and Pearson - developer of the offending English Language Arts exam - said no.
    --cut here--

    The article linked in this /. article's summary refers to another article:

    https://www.nytimes.com/2012/04/23/education/robo-readers-used-to-grade-test-essays.html

    Here are some snippets from it, in case you need them for your search engine:

    --cut here--
    Facing a Robo-Grader? Just Keep Obfuscating Mellifluously
    By MICHAEL WINERIP
    Published: April 22, 2012

    A recently released study has concluded that computers are capable of scoring essays on standardized tests as well as human beings do.

    Mark Shermis, dean of the College of Education at the University of Akron, collected more than 16,000 middle school and high school test essays from six states that had been graded by humans. He then used automated systems developed by nine companies to score those essays.
    --cut here--

    This article in turn links to:
    www.documentcloud.org/documents/346138-essay-awarded-a-top-grade-by-e-rater.html
    which is also linked in this /. article's summary.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  11. Re:I don't care; standardized tests are corrupt by jimbolauski · · Score: 2

    The difference between teaching the test and not doing standardized testing is that now we teach the test, instead of nothing at all. If the students game the robo-grader, they've learned *something*. Standardized testing is a bad answer to a problem that's so bad that every other approach we've tried has failed. The real solution is to make parents care. However, punishment is highly unlikely to work, and we really, really shouldn't have the government trying any other approaches (propoganda is bad, government propaganda is worse).

    Give me a better solution. I reject your "more money" approach; it's been demosntrated over the last 50 years to be a national scale disaster.

    The solution is simple remove the kids that don't care, it seems harsh but they are the reason classrooms get stuck in a quagmire. Offer an education to everyone but do not force it on people that don't want it and will waste people's time that do want it. The the true secret of private schools is that everyone there has parents that value education and for the most part they do too. Once disruptive and unmotivated students are removed from the class the teachers can be held accountable for their classrooms, and are typically motivated because the students genuinely care.

    --
    Knowledge = Power
    P= W/t
    t=Money
    Money = Work/Knowledge so the less you know the more you make
  12. Without strong AI, robo-graders are worthless by gweihir · · Score: 2

    Unless what you teach the students is worthless as well. If it is just conformance to secondary things like spelling, basic grammar, sentence-length, superficial structure, etc. then robo-grading will do fine. Of course, none of the students being taught this way will learn to write anything of worth, ever. For that you need a competent and intelligent human being (or at least an equivalent intelligence) that understands what the student was trying to say and whether he/she succeeded or not, and why precisely. Grading involves as its most important component the feedback to the student, the actual grade is secondary and does not help the student improve his/her writing at all.

    --
    Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
  13. Re:Then why should students bother? by DragonWriter · · Score: 2

    It a teacher is going to phone it in what does that tell the class?

    Since when are teachers with any relationship to the students involved in standardized tests, except as proctors?

  14. This isn't robo-grader specific... by digitalsolo · · Score: 3, Interesting

    This problem is not specific to robo-graders. I made a solid rule of finding topics that I found interesting -and- were highly unlikely to be areas of specialty for the teacher/professor/TA grading the paper. It took slightly more effort to find the "right" topics, but it more than paid off in the long run, since the likelihood of the average test grader spending days researching every 10+ page paper they are grading is pretty low.

    Obviously as your volume of large papers and required topics narrows this becomes less effective, but it's quite a good system in high school through most of undergrad studies. I guess I assumed most people did this. FWIW, I did write pretty good papers, they weren't full of B.S. (well, just average volumes of B.S.) but by getting the topic as far "out" as possible, it helped minimize criticism outside of the basic structure, citation, etc.

    --
    Just another ignorant American.
  15. Re:unlikely to be areas of specialty by TaoPhoenix · · Score: 2

    I'll reply to you.

    To me, that's at least part of the "educational game". If you were really given carte blanche on topics, then chops to you for writing about the role of malnutrition in Ancient Egypt or something. No matter how exhausted, a Teacher-person looked at it, used their gut guess to decide it wasn't total spam, and gave it a grade.

    Being graded by Robo-Graders just thunders "Belly of the Beast" and is so dehumanizing that it begs the smarter students to play Beat the System with the funniest paper to win. Mimsy were the Borogoves, or that Isaac Asimov Thimotiline (sp?) joke story-paper 50 years ago. That's if the student even bothers. Or, in the Business School (In the dream land if I had a Rich Dad) I'd purposely use one of the Essay Generator programs, submit that, wait for it to be kicked, then write a mock paper on the "Corporate CEO approach" and about how to "outsource" the paper. Then a third one about "Litigation as a Business Tool" with an attached lawsuit. "Isn't this how it's done in the real world?" "Uh.... yes?" "Good. Now Sudo give me my A."

    --
    My first Journal Entry ever, in 8 years! http://slashdot.org/journal/365947/aphelion-scifi-fantasy-horror-poetry-webzine