Slashdot Mirror


Grading Software Fooled By Nonsense Essay Generator

An anonymous reader writes "A former MIT instructor and students have come up with software that can write an entire essay in less than one second; just feed it up to three keywords.The essays, though grammatically correct and structurally sound, have no coherent meaning and have proved to be graded highly by automated essay-grading software. From The Chronicle of Higher Education article: 'Critics of automated essay scoring are a small but lively band, and Mr. Perelman is perhaps the most theatrical. He has claimed to be able to guess, from across a room, the scores awarded to SAT essays, judging solely on the basis of length. (It’s a skill he happily demonstrated to a New York Times reporter in 2005.) In presentations, he likes to show how the Gettysburg Address would have scored poorly on the SAT writing test. (That test is graded by human readers, but Mr. Perelman says the rubric is so rigid, and time so short, that they may as well be robots.).'"

33 of 187 comments (clear)

  1. most schools ignore sat essay by litehacksaur111 · · Score: 3, Insightful

    I though most schools don't even care about the essay. Also the elite schools nowadays prefer the ACT and SAT II subject tests to demonstrate real knowledge. The SAT is really a dumb test, especially with all the coaching resources available now.

    1. Re:most schools ignore sat essay by Anonymous Coward · · Score: 3, Funny

      Your post tells me that you didn't score all that well on the SAT. Bad grammar, incoherent thoughts.

    2. Re:most schools ignore sat essay by ceoyoyo · · Score: 2, Insightful

      Odd you choose math as an example, a subject where your grammar must be perfect or what you've written is nonsense.

    3. Re:most schools ignore sat essay by lgw · · Score: 2

      basic math = rote memorization

      Yup, it sure is, and sadly this is contentious. Basic numeracy is impossible without memorizing tables for addition, and multiplication. Seen a modern math textbook that shows what buttons to press on the calculator? Seen the recent "common core" controversy about quite crazy approaches to basic math that seem motivated by avoidance of memorization (it's the revenge of new math!). Sigh. But then, do they allow calculators on the SAT?

      --
      Socialism: a lie told by totalitarians and believed by fools.
    4. Re:most schools ignore sat essay by AK+Marc · · Score: 2, Insightful

      I got an 800 on my math (old SAT, back in 1990), and I still count on my fingers. They allowed calculators (only a few approved ones), and, of course, I used it. The SAT math was about speed, efficiency, and answering the right question. Most people had a problem with the latter. I don't know all of my multiplication tables, it wasn't my thing, but seeing the question and figuring the best way to word it to find the answer was my thing. Didn't miss a question.

    5. Re:most schools ignore sat essay by ultranova · · Score: 2

      Basic numeracy is impossible without memorizing tables for addition, and multiplication.

      This is, of course, rubbish. Memorising results for common operations saves time when performing them, that's all.

      Seen the recent "common core" controversy about quite crazy approaches to basic math that seem motivated by avoidance of memorization (it's the revenge of new math!).

      In Real Life, if you need numerical answers often, you use a calculator. It's faster and less error-prone. And if you don't need numerical answers often, you won't remember the relevant tables, since you aren't actually using them.

      The sad fact is that most of the time memorization is a complete waste of time. Either you use some data, in which case you'll learn it as a natural side effect, or you don't, in which case it won't stick, no matter what you do. This is especially true if your reason to try to memorize is that you're being forced to, as is the case with math tables.

      But then, do they allow calculators on the SAT?

      Do they allow English, or do you have to answer in Latin? World changes. Pretending it hasn't simply makes a test irrelevant or perverts it into outright hinderance to learning.

      --

      Forget magic. Any technology distinguishable from divine power is insufficiently advanced.

  2. Irrelevant by Ol+Olsoc · · Score: 2, Insightful

    As long as Precious gets an "A', Helicopter Daddy, and Blackhawk Mommy won't try to have the school president fired for ruining Precious's permanent record.

    --
    The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
    1. Re:Irrelevant by smittyoneeach · · Score: 3, Funny

      Hey, Helicopter Daddy and Blackhawk Mommy dropped good boodle for that 'A', mister!
      You can just stand down from all that meritocratic whinging right now, mister.

      --
      Get thee glass eyes, and, like a scurvy politician, seem to see things thou dost not.--King Lear
    2. Re:Irrelevant by TheMeuge · · Score: 3, Interesting

      At least helicopter daddy and blackhawk mommy give a shit about the Precious. Or do you prefer the absent daddy and welfare mommy? People DO go overboard... but I feel like the pendulum is starting to swing entirely too far the other way.

  3. You don't need software by Anonymous Coward · · Score: 4, Insightful

    ... because Slashdot shows that humans already make evaluations about articles without reading them.

  4. Quid pro quo by Opportunist · · Score: 3, Insightful

    When you're too lazy to read my essay to grade me and let software do it, I don't really see no moral problem with doing the same to write the essay.

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
    1. Re:Quid pro quo by Anubis+IV · · Score: 4, Insightful

      As someone who graded hundreds of essays while serving as a teaching assistant for a senior-level engineering ethics course, I have to say that I find your lack of integrity rather appalling. Your moral obligation to write the essay yourself is independent of the method they use for grading it. Just because someone else is doing a lousy job does not mean that you suddenly have a license to short-change them for what you're obligated to do.

      I would guess that I graded around 300-400 essays during the three semesters I served as a TA, and that I probably averaged around 20 minutes per essay, since I was a strong believer in providing useful feedback over things the students could improve, even if they weren't necessarily incorrect. That said, other TAs spent as little as a minute or two per essay, and barely provided any feedback at all. Regardless of how much time the TAs did or didn't spend on the essays, however, the students had the same obligations, and rightfully so.

    2. Re:Quid pro quo by number17 · · Score: 3, Insightful

      Your moral obligation to write the essay yourself is independent of the method they use for grading it.

      Students pay big bucks and expect to have experts in the field teach them and grade their work. It sounds like these schools are off-shoring their marking so that they can do other work (ie Research). If the school was upfront, before paying tuition, that they were going to just send your essay to Bangladesh for marking then I would be ok with having a moral obligation to write the essay myself.

    3. Re:Quid pro quo by clifyt · · Score: 3, Interesting

      As someone that wrote software like this -- and disagreed with the subject of the story a decade ago when he tried to get us with both the Gettysburg Address as well as Kennedy's inaugural address (both of which are GREAT speeches with historical value, but shitty college entrance exams) -- you are looking at this entirely wrong.

      I can give you background of how these things are generally graded. 3 people get an essay, look at it for 30 to 45 seconds, throw a score and it and if they are all within a margin of error, they move on. If not, a senior rater comes in and and they can replace one other person and it is now within margin of error, they move on as well. If not, it is workshopped for 5 minutes.

      In 99% of the cases, you have less than 2 minutes of viewing on your essay between 3 people.

      Enter the computer...the raters are told they are going to be rated themselves. We can throw a lot more prerated essays that had been normed by a large group of raters, and train the rater. They know they are being measured and the average rater spends two or more minutes reading through these. You actually have MORE time with eyes on your essay with a computer rater involved than you do without. Having a computer rater doesn't remove humans -- it adds a safe guard. It means one person spends more time and is verified with something that is unbiased (within reason...actually was able to figure out subtle racism and otherwise that wouldn't have been detected with purely human raters...'black' or 'hispanic' names and scores go down...'asian' names and the scores go up...give the same essay with the names switched and the humans change ratings...the computer was actually more objective).

      I haven't been involved with this sort of thing in a decade, and I can only assume it is much better than when I left my project...but lazy isn't the right word. Underpaid and overworked? Yeah...but not lazy.

    4. Re:Quid pro quo by KingOfBLASH · · Score: 2

      The problem is that technology allows universities to take short cuts in education, and not in the students advantage. Add to that some of the current goings on in the university system, and the future of the education system is a little worrisome (then again the future has always been worrisome and somehow we've muddled through).

      But, while before you might have a few bad apples not providing sufficient feedback to students (or not doing it in a useful way) you have, as matters of policy, short cuts.

      Why pay any attention at all to your students work when:

      a) You can outsource checking for plagiarism to Turnitin or another similar service
      b) A computer can do grading for you automatically. Never mind that it can't tell the difference between a right answer written a different way than in the answer key and a wrong answer.
      c) An adjunct professor paid less than minimum wage can handle the actual teaching duties so the university can keep more of the students tuition.

  5. Re:To generate the keywords takes knowledge by EmagGeek · · Score: 2

    Did you happen to read TFA? In the TFA, it is said that the College Board does not take points off for factual errors. In fact, it says that it cares not for factual errors, because errors in fact seldom subtract from the quality of the essay being graded.

    WTF, right?

  6. Re:Architecture School! by Opportunist · · Score: 2

    It sounds like the software would be perfect for writing audit reports. You hand in a phone book sized report, but all they ever read is the management summary.

    But DARE to hand over just the relevant pages that you know will get read. Did you work at all, if THAT is your whole report?

    --
    We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
  7. Re:To generate the keywords takes knowledge by MrBigInThePants · · Score: 5, Insightful

    Not being from the USA, every article I ever read about your education system just leaves me scratching my head.

    How on earth did you guys let it get so ridiculous??

  8. student athlete need some like this with 60 hours by Joe_Dragon · · Score: 2

    student athlete need some like this with 60 hours a week playing football they don't have time for class.

  9. Re:Architecture School! by ewibble · · Score: 2

    Your right you are encouraged to write long documents, but it should really be the opposite, writing is about communicating, if your document is so long that people don't bother reading it, the document has failed in its main purpose.

    This standard should be applied to legal documents, such as License agreements, Insurance agreement, What your ELA is more than 100 words long, you don't expect anyone to read this do you? Agreement Invalid. If you need longer it should ensure that people understand what they are agreeing to, maybe run a 1 year course of something.

    100 words yes!

  10. The answer: essay grader graders by TsuruchiBrian · · Score: 2

    I don't see a problem with automated essay graders in principle. It's just that the current essay graders are no good. Once we are able to make computer software that can actually understand essays as well as a human it will be should be perfectly competent to grade an essay.

    I certainly see the motivation to have a computer grade essays. Who wants to read multitudes of mediocre essays. I might rather be put in solitary confinement. I am all for the automated essay graders, but only after they can be proven to be as competent as a human.

    I have no idea how to make a such a competent essay grader, but I do know how to grade an essay grader. You have a bunch of computer graders and human graders grading the same essays. If the computer graders show a more consistent performance than the humans (i.e. are the outlier less frequently), then the computer grader is better.

    If a paper is scored by 4 human judges and a computer, and the humans score the paper 1, 2, 3, 4, and the computer scores the paper as a 9, then it means that according to most of the human graders, the computer was way off. Essays are inherently subjective. Are the humans right or is the computer right? Who cares it doesn't matter.

    If a paper is scored by 4 human judges and a computer, and the humans score the paper 4, 5, 7, 9, and the computer scores the paper as a 6, then it means that according to every human grader, the computer did better than half the humans.

    If a computer can do better than the humans even by human standards, then I think it's fair to say that a computer is good enough.

    1. Re:The answer: essay grader graders by clifyt · · Score: 2

      I helped design one of these essay graders a decade+ ago with Dr. Ellis 'Bo' Page (Duke and MIT).

      Even then, we were as good as humans in solely grammar and mechanics and all that sorta stuff. We were rating on a 6 point scale and something like 70% of the scores were a perfect match, and 85% were within 1 point.

      Given that we were using professional human raters that were trained on weekly basis and had round tables to go over controversial papers, and these were considered some of the best in the US at their job...and that if you had 3 people rate the essay, take the mean score and ask the single human to rate it...they were at around 60% a perfect match.

      Again, this was not for content...most college entrance exams are looking for your writing style and nothing else. If you can write well (and my writing on this site is not representative of my professional writing), you can research your material when you aren't writing content off the cuff and actually do well.

    2. Re:The answer: essay grader graders by TsuruchiBrian · · Score: 2

      If it becomes the case that writing style is able to be analyzed and produced by a computer algorithm, it seems to me that having a good writing style will become like having good arithmetic skills (i.e. less importance is placed on these skills as they become trivial for machines to replicate), and ironically this ability to automatically test and reproduce skills drives those very skills into obscurity.

      It seems like the skills that computers can't do yet are the only ones that it is worthwhile for humans to do.

  11. Works on Slashdot posts, too! by sootman · · Score: 5, Insightful

    Artificial intelligence, while seemingly tasty on the surface, tends to be underwhelmed by insufficient fish, with regard to warrantless searches.

    --
    Dear Slashdot: next time you want to mess with the site, add a rich-text editor for comments.
    1. Re:Works on Slashdot posts, too! by mpe · · Score: 2

      Artificial intelligence, while seemingly tasty on the surface, tends to be underwhelmed by insufficient fish, with regard to warrantless searches.

      AI is HARD. Plenty of tasks which people can do easily are difficult to get machines to do, even throwing lots of processing resources at the problem.
      Natural Language Processing is one of these difficult problems. With "grading essays" also being nowhere near beginner level NLP.
      Quite possibly actual NLP experts would not attempt to write such software, because they understand exactly how difficult a task it is. (Similar issues apply to "Internet filtering software".)

  12. Let me put it in Engineering terms... by Anonymous Coward · · Score: 2, Insightful

    If I've been hired to build a Potemkin village, then it would be unethical of me to spend time constructing interiors for the buildings.

    The English department has some nice courses on compositional writing where I can get real feedback on my progress on those skills. As far as the machine-graded essays for any other Department -- either I understood the topic before writing the essay or I didn't and if I didn't then a no-feedback essay isn't going to fix the problem.

  13. Re:Can you blame them? by AK+Marc · · Score: 2

    What are you talking about? I've gotten lots of good paying jobs, and nobody once looked at my grades. Except when applying for further education, and even then, they aren't important if you test well. Where have you seen where a transcript is required for a job application? Never, that's why so many CEOs get caught lying on resumes (until they post to LinkedIn and someone recognizes them and knows they didn't get what they claim and turn them in). Even the $10,000,000 a year jobs don't look at actual grades. But no, some AC claims that grades matter. So they must, even if they don't.

  14. Re:Babel? by DoofusOfDeath · · Score: 2

    Reference to (Babel, Tower Of).

    The story is a biblical "explanation" of why humanity, despite ostensibly originating as a single tribe, uses multiple languages.

    I could be wrong, but I think it's understood primarily as an allegory regarding man sinning(?) by aspiring to accomplish what only God can.

  15. Re:Is the essay generation software available? by bobjr94 · · Score: 2

    I have checked a bunch of websites and some searching and found no link to this babel generator or even a small excerpt from the submitted paper. I would have expected at least one if not both to be easily found.

  16. Re:To generate the keywords takes knowledge by MrBigInThePants · · Score: 2

    Teachers are in strong unions also here in NZ. (despite anti union legislation decimating them in the last decade)

    The right wingers here (and their ex-currency trader, cheesy smiled leader) have been trying desperately to beat on them but NZ has one of the best bang for buck education systems in the world. (i.e. Our teachers are not paid that high but the performance indicators are in the top grouping.)

    Just wanted to mention that for the inevitable people who will read your comments and think "unions baaaad" like some ideological zombie.

  17. Re:To generate the keywords takes knowledge by AK+Marc · · Score: 2

    Do you actually think either party has a goal other than the best schools?

    Yes. I honestly believe that the Republicans want to disband public education and have a merit-based entry to private schools (parent's merit, not children's), paid for with taxpayer dollars. It's "revenge" for having forced them to educate the poor for so many years.

    I've never heard anyone arguing for no public schools,

    I have. Charter-schools and for-profit private schools only, and they would be banned by law from having unions and could reject children from admission for arbitrary reasons (including race and religion).

    I've gone to plenty of party meetings for Libertarians and Republicans, and I've seen what some people have advocated.

  18. Anonymous exams by emilv · · Score: 2

    Racism, sexism and other discrimination is quite effectively countered with anonymous grading. My university gave you a unique number before each exam and you put only that number on the sheets. Only afterwards did the administrators (not anyone involved in the course) look up and file the exam under your name. I found this helpful as a TA too because we really wanted to be fair both in grades and comments.

    You can still be biased by the handwriting but we tried to counter that ourselves. If someone in my TA group recognized the handwriting of someone they knew we made sure to let someone else in the group grade that exam.