Slashdot Mirror


Computers To Mark English Essays

digitig writes "According to The Guardian, computers are to be used in the UK to mark English examination essays. 'Pearson, the American-based parent company of Edexcel, is to use computers to "read" and assess essays for international English tests in a move that has fueled speculation that GCSEs and A-levels will be next. ... Pearson claims this will be more accurate than human marking.' Can computers now understand all the subtle nuances of language, or are people going to have to learn an especially bland form of English to pass exams?"

7 of 243 comments (clear)

  1. Graduate Record Exam by ub3r+n3u7r4l1st · · Score: 5, Informative

    The GRE Writing portion is already using it.

    From http://www.ets.org/portal/site/ets/menuitem.1488512ecfd5b8849a77b13bc3921509/?vgnextoid=ebd42d3631df4010VgnVCM10000022f95190RCRD&vgnextchannel=54c846f1674f4010VgnVCM10000022f95190RCRD

    "For the computer-based Analytical Writing section, each essay receives a score from at least one trained reader, using a six-point holistic scale. In holistic scoring, readers are trained to assign scores on the basis of the overall quality of an essay in response to the assigned task. The essay score is then reviewed by e-rater, a computerized program developed by ETS, which is being used to monitor the human reader. If the e-rater evaluation and the human score agree, the human score is used as the final score. If they disagree by a certain amount, a second human score is obtained, and the final score is the average of the two human scores."

    If you find a way on what the algorithm look for, even a software-generated essay can get 6's.

  2. Re:Context... by Anonymous Coward · · Score: 5, Funny

    Yeah because when it's written we can see the spelling difference between 'flies' and 'flies' and that ruins the joke.

  3. Re:Don't they already do this? by Anonymous Coward · · Score: 5, Insightful

    As a writing instructor, let me put it this way: I very, very seldom see a paper with misspellings and grammar mistakes that is nonetheless a well-written paper. It happens, but not often. Grammar and spelling mistakes are a symptom of sloppiness, as are poor reasoning, lack of organization, and lack of adequate support. If you can't be bothered to remember primary-school English, it is not likely that you are willing to master rhetorical structure.

    When we read a paper, we actually don't care what you're saying. There usually isn't an "interesting" score. In my case, I evaluate on three, ten-point, holistic scales: Content (which basically refers to amount and quality of support), Organization (rhetorical structure), and Mechanics (yes, grammar, vocabulary, adhering to the style guide, etc.). I do this so I don't have people claiming that their hopeless muddle of a paper got marked down for "obscure grammar errors (which no one in the real world even knows about) and simple typos".

    Guess what? Writing is not speaking. Those "obscure rules" are, indeed, usually only applied in writing. I ramble, swear, and disregard the conventions of "proper" English when speaking. But that is because those rules do not really apply in that sociocultural setting. In formal writing--you know, what you're being taught in writing class--they matter a great deal. If you don't follow them, you sound like an idiot, and no one will listen to you.

    Why are these "obscure" rules used as a "canary test" of your intelligence and noteworthiness?

    Because of what I wrote in my first paragraph. Intelligent, methodical, and rational people care enough to follow them.

    I'm sorry, but that's how it works in the "real world".

  4. Re:I doubt it! by kklein · · Score: 5, Interesting

    As an English prof myself, I'd like to confirm that we spend a lot of time on students' papers. Good papers are easy to breeze through, but the worse the paper, the more time it takes.

    As for machine-grading goes, people have been working on that for 30 years. I have no doubt that, statistically, it can provide useful results.

    The problem I'm seeing in these comments, however, is a common confusion of testing for assessment and standardized testing. I can't imagine using software to grade a student's paper in class. The student-teacher relationship is a personal one. That person is paying me to help them get better at writing, for example. It is my job to pore over that paper and show them where and how they can improve.

    I am also a tester (I actually mostly work with multiple-choice data, but I've also worked on performance rating--speaking and writing). The relationship between a rater and an examinee is very different from that of a teacher and student. The examinee is paying the rater to put them on a scale with other people. This is not a fine-grained assessment; it is always done at extremely "low resolution." When rating a paper for something like the GRE or other standardized test, it is the rater's job to compare the paper to scoring rubrics and make a call on which box of text best describes the paper, and then make note of the number in that box. That's it. It can't really go any more in-depth than that.

    For this reason, your comment about "five-paragraph themes" is an important one: Test task design always needs to be clear about what kind of performance is expected, because it is nigh impossible to write rubrics that can be applied to any performance (believe me on this, I beg of you). However, this is actually a question of test specification, not of the software or raters in question. Personally, as someone who works in EFL, I am actually in favor of retaining the "five-paragraph" formula, at least for timed essay tasks. That format is at the heart of all good rhetoric. Yes, it's stilted and silly, but if you can do it, it means that you know basically how information is expected to be organized in Western, especially Anglophone, societies. No good writer would actually use it, but any good writer could.

    Again, this is about putting people in boxes, not reading their essays. I can rate a 1-page essay in about 2 minutes, with excellent model fit (I have always used many-facet Rasch modeling for my multi-rater performance testing). I have no doubt that software could be employed whose ratings would be highly predictive of those of human raters.

  5. Re:Don't they already do this? by jonadab · · Score: 5, Informative

    > As a writing instructor, let me put it this way: I very,
    > very seldom see a paper with misspellings and grammar
    > mistakes that is nonetheless a well-written paper. It
    > happens, but not often.

    It happens most often when the writer is not a native speaker of the language. They'll write an essentially sound paper but make weird and obvious mistakes, like using the wrong preposition or spelling ph words with f. Depending on their native language they may also make other kinds of mistakes, e.g., Japanese people will frequently mess up grammatical number.

    But the other poster may have been talking about grammatical structures that are actually a regular part of English grammar but are nonetheless consistently marked down by many English teachers, for obscure reasons. Examples of this kind of thing include split infinitives, the second-person imperative, the use of the second person pronoun to refer to anyone in general, and the use of objective-case pronoun forms in the predicate after certain verbs (particularly being verbs). Linguistically speaking these aren't actually mistakes as such, and in fact some of the contortions used to avoid them actively impede clarity, but they frequently get marked as "mistakes" nonetheless.

    --
    Cut that out, or I will ship you to Norilsk in a box.
  6. Re:Depressing by psnyder · · Score: 5, Insightful

    I had seen a student who knew very little about biology do her homework by scanning in her book for specific phrases mentioned in the questions and looking for some semblance of an answer once she's found the phrases. By the time she was done, she hasn't even read the chapter but her answers would probably get her a "C"

    This is the way I always did it, and it got me A's. In fact I was taught to do this in a 6th grade "Study Skills" class. Ironically, it's a very good skill to have in the "real world" as it's a way of quickly obtaining the information you need. You could even draw a parallel between this and Googling something or any kind of computer "find" or "search".

    The ability to skim for an answer is not a problem. It's one of the solutions that children employ to deal with a school system that puts more emphasis on grades rather than inspiring them to actually learn a subject. The "inspiration" to get good grades works for some (especially with parental support), but with "average" being a 'C' (often a very shallow understanding), it can be argued that it's not working for most.

    As you said, "It took a college education and many years of reading to undo these "lessons" and really discover the joy of writing essays."

    Skimming is a skill. Learning a system, and figuring out to survive in it is also a skill. The emphasis on that 'joy' is what's usually lacking. Get a student inspired and the rest usually takes care of itself.

  7. No and no by grikdog · · Score: 5, Interesting

    I've scored English essays for professional testing services, and I've seen the results of robot scoring. It's pretty shoddy. No, computers are not able to distinguish between a paragraph of As I Lay Dying (William Faulkner) and a gallon of sophomoric babble by say, yours truly. However, within the confines of a particular exam, where the topic is known, responses are predictable, and all the supplicants hew to the general line, the 'bots can detect subpar, adequate, above average and (sometimes!) abnormally brilliant expository prose, thereby ranking papers reasonably well on the usual six point scale.

    It's worth pointing out that certain types of exams are designed to elicit extraordinary prose from respondents, that which yields a sense of competence or even brilliance, say. In these cases, the idea is not so much to detect the high end of the bell curve, but to identify the tiny pool of applicants who may be capable of Nobel Prize work in future realms of science or service. No 'bot can do that job, just as no 'bot except Deep Blue can beat Gary Kasparov, and no 'bot at all deserves the monicker Fujiwara no Sai (although Go-playing 'bots are approaching the mid-levels of highly ranked amateur players).

    That's the objective part. My personal opinion is that using robots to sort the hopes and aspirations of college-bound men and women is just begging for lawsuits. It's an approach in which differences of opinion quickly escalate to class action against universities as well as test administrators, and would not be an approach I could comfortably recommend.

    --
    ``Tension, apprehension & dissension have begun!'' - Duffy Wyg&, in Alfred Bester's _The Demolished Man_