Slashdot Mirror


Computers To Mark English Essays

digitig writes "According to The Guardian, computers are to be used in the UK to mark English examination essays. 'Pearson, the American-based parent company of Edexcel, is to use computers to "read" and assess essays for international English tests in a move that has fueled speculation that GCSEs and A-levels will be next. ... Pearson claims this will be more accurate than human marking.' Can computers now understand all the subtle nuances of language, or are people going to have to learn an especially bland form of English to pass exams?"

4 of 243 comments (clear)

  1. Re:I doubt it! by kklein · · Score: 5, Interesting

    As an English prof myself, I'd like to confirm that we spend a lot of time on students' papers. Good papers are easy to breeze through, but the worse the paper, the more time it takes.

    As for machine-grading goes, people have been working on that for 30 years. I have no doubt that, statistically, it can provide useful results.

    The problem I'm seeing in these comments, however, is a common confusion of testing for assessment and standardized testing. I can't imagine using software to grade a student's paper in class. The student-teacher relationship is a personal one. That person is paying me to help them get better at writing, for example. It is my job to pore over that paper and show them where and how they can improve.

    I am also a tester (I actually mostly work with multiple-choice data, but I've also worked on performance rating--speaking and writing). The relationship between a rater and an examinee is very different from that of a teacher and student. The examinee is paying the rater to put them on a scale with other people. This is not a fine-grained assessment; it is always done at extremely "low resolution." When rating a paper for something like the GRE or other standardized test, it is the rater's job to compare the paper to scoring rubrics and make a call on which box of text best describes the paper, and then make note of the number in that box. That's it. It can't really go any more in-depth than that.

    For this reason, your comment about "five-paragraph themes" is an important one: Test task design always needs to be clear about what kind of performance is expected, because it is nigh impossible to write rubrics that can be applied to any performance (believe me on this, I beg of you). However, this is actually a question of test specification, not of the software or raters in question. Personally, as someone who works in EFL, I am actually in favor of retaining the "five-paragraph" formula, at least for timed essay tasks. That format is at the heart of all good rhetoric. Yes, it's stilted and silly, but if you can do it, it means that you know basically how information is expected to be organized in Western, especially Anglophone, societies. No good writer would actually use it, but any good writer could.

    Again, this is about putting people in boxes, not reading their essays. I can rate a 1-page essay in about 2 minutes, with excellent model fit (I have always used many-facet Rasch modeling for my multi-rater performance testing). I have no doubt that software could be employed whose ratings would be highly predictive of those of human raters.

  2. No and no by grikdog · · Score: 5, Interesting

    I've scored English essays for professional testing services, and I've seen the results of robot scoring. It's pretty shoddy. No, computers are not able to distinguish between a paragraph of As I Lay Dying (William Faulkner) and a gallon of sophomoric babble by say, yours truly. However, within the confines of a particular exam, where the topic is known, responses are predictable, and all the supplicants hew to the general line, the 'bots can detect subpar, adequate, above average and (sometimes!) abnormally brilliant expository prose, thereby ranking papers reasonably well on the usual six point scale.

    It's worth pointing out that certain types of exams are designed to elicit extraordinary prose from respondents, that which yields a sense of competence or even brilliance, say. In these cases, the idea is not so much to detect the high end of the bell curve, but to identify the tiny pool of applicants who may be capable of Nobel Prize work in future realms of science or service. No 'bot can do that job, just as no 'bot except Deep Blue can beat Gary Kasparov, and no 'bot at all deserves the monicker Fujiwara no Sai (although Go-playing 'bots are approaching the mid-levels of highly ranked amateur players).

    That's the objective part. My personal opinion is that using robots to sort the hopes and aspirations of college-bound men and women is just begging for lawsuits. It's an approach in which differences of opinion quickly escalate to class action against universities as well as test administrators, and would not be an approach I could comfortably recommend.

    --
    ``Tension, apprehension & dissension have begun!'' - Duffy Wyg&, in Alfred Bester's _The Demolished Man_
  3. How will it mark this poem ? by Alain+Williams · · Score: 4, Interesting
    Will it decide if the following is well spelled ? If it doesn't like the spelling, will it give it marks for irony ?

    My New Spell Checker

    Eye halve a spelling chequer
    It came with my pea sea
    It plainly Marx four my revue
    Miss steaks eye kin knot sea

    Eye strike a key and type a word
    And weight four it two say
    Weather eye am wrong oar write
    It shows me strait a weigh

    As soon as a mist ache is maid
    It nose bee fore two long
    And eye can put the error rite
    Its rare lea ever wrong

    Eye have run this poem threw it
    I am shore your pleased two no
    Its letter perfect awl the weigh
    My chequer tolled me sew

    (Sauce unknown)

  4. Re:Graduate Record Exam by markov23 · · Score: 4, Interesting

    The paper scoring technology that I am familiar with ( used by the GRE's and some high school English classes ) cant be fed a random paper -- it needs to be trained on a particular assignment. Then it can score papers for that assignment. The success that they get with these is pretty surprising -- but the application is limited to these types of tests or curriculum that is designed around the assignments it has been trained for. The more interesting affect from this type of system reported from students ( not gre takers ) is that it lets them write a paper -- get it scored, make changes and see if they are getting better. When I was writing papers in high school -- you wrote it -- handed it in, then a week later got a grade and never thought about it again. This type of technology actually allows you to learn a lot more from one paper by iterating several versions and getting direct and specific feedback on how to improve.