Computers To Mark English Essays
digitig writes "According to The Guardian, computers are to be used in the UK to mark English examination essays. 'Pearson, the American-based parent company of Edexcel, is to use computers to "read" and assess essays for international English tests in a move that has fueled speculation that GCSEs and A-levels will be next. ... Pearson claims this will be more accurate than human marking.' Can computers now understand all the subtle nuances of language, or are people going to have to learn an especially bland form of English to pass exams?"
Having failed to kill him, SkyNet sent a Terminator back in time to make John Connor fail English.
The GRE Writing portion is already using it.
From http://www.ets.org/portal/site/ets/menuitem.1488512ecfd5b8849a77b13bc3921509/?vgnextoid=ebd42d3631df4010VgnVCM10000022f95190RCRD&vgnextchannel=54c846f1674f4010VgnVCM10000022f95190RCRD
"For the computer-based Analytical Writing section, each essay receives a score from at least one trained reader, using a six-point holistic scale. In holistic scoring, readers are trained to assign scores on the basis of the overall quality of an essay in response to the assigned task. The essay score is then reviewed by e-rater, a computerized program developed by ETS, which is being used to monitor the human reader. If the e-rater evaluation and the human score agree, the human score is used as the final score. If they disagree by a certain amount, a second human score is obtained, and the final score is the average of the two human scores."
If you find a way on what the algorithm look for, even a software-generated essay can get 6's.
New Economic Perspectives
Includes "Edexcel iddqd" should do it.
I seem to remember back in school my English teachers would grade as if they were a computer, failing to actually read into the meaning of things and simply complain about obscure grammar errors (which no one in the real world even knows about) and simple typos. From the sound of this, nothing is going to change.
That'll work great when the software can write a nasty response to your assertion that Herman Melville was a loud-mouthed pratt who only wrote those books because he liked to hear himself talk. Of course, given the quality of most student English essays, it would probably be fine if the software just verified that the student wasn't just plagiarizing from the wikipedia entry on the subject and then randomly assigned a passing grade.
I'm trying to teach myself to set people on fire with my mind... Is it hot in here?
"Time flies like the wind, fruit flies like a banana." -- Groucho Marx
This is a classic example of context which a machine would fail to get. :)
I would like to see an automated engine figure that one out.
GC
Gregory Casamento
## Chief Maintainer for GNUstep
Colorless green ideas sleep furiously. A computer would read this sentence and see nothing wrong. Any human can tell that it lacks any meaning at all. Just because the sentence has the proper subject/verb structure doesn't mean it is a good one.
In my opinion, you can't practically replace an old-fashioned human for such things, with the possible exception of strong AI.
A fool and his lamb are worth two in the bush.
All you have to do is detect how many lolcat/txting words are in their essay and mark accordingly. Anybody who can put two sentences together without using any is "advanced".
No sig today...
Not sure if things were any better at one time but the way writing is taught today in public schools generates horrendous results. I remember being taught a very formulaic way of writing essays: six paragraphs, introductory paragraph, concluding paragraph mirrors the introductory paragraph, and all paragraphs start and end with some transition to next paragraph. Then there is the need to satisfy some specific length, although this is quite understandable. It took a college education and many years of reading to undo these "lessons" and really discover the joy of writing essays. Thank you Paul Graham and Nicholas Kristof among many others. I see the same thing happening to high school students I am mentoring. They write very boring essays with a ton of fillers full of sentences structured in a way to use more words than necessarily and make the meaning more ambiguous. Poetry aside, writing is to convey ideas and the value is in the ideas themselves, not really in the words and sentences. The way writing is taught today, the words and sentences get in the way of the ideas. The trend of using computers to grade papers is only adding to this rigid, boring way of writing. One thing I've learned about high school students is that even the low scoring ones are very clever at getting around rigid rules. I had seen a student who knew very little about biology do her homework by scanning in her book for specific phrases mentioned in the questions and looking for some semblance of an answer once she's found the phrases. By the time she was done, she hasn't even read the chapter but her answers would probably get her a "C" -- good enough for her. I'm afraid students will do the same in writing once they realize that computers are grading them.
EvilCON - Made Famous by
eh hem...put on tin-foil conspiracy hat... Could this be the beginning of a real-world "Newspeak?" With everything else the UK has done in recent years, it is merely one more step toward 1984. For those unfamiliar with Orwellian Newspeak:
"The quality of life is determined by its activites."--Aristotle
Computers can't even grade source code. How are they supposed to understand English?
Or is my professor's grading script simply stupid when it comes to source code?
Let q be a radix > 1. I am in ur base-q, killing 10 d00ds.
All you have to do is detect how many lolcat/txting words are in their essay and mark accordingly. Anybody who can put two sentences together without using any is "advanced".
Allow me to pee on your fantasy world with actual knowledge.
Clive Thompson on the New Literacy
"I think we're in the midst of a literacy revolution the likes of which we haven't seen since Greek civilization," she says. For Lunsford, technology isn't killing our ability to write. It's reviving it--and pushing our literacy in bold new directions.
...
The Stanford students were almost always less enthusiastic about their in-class writing because it had no audience but the professor: It didn't serve any purpose other than to get them a grade. As for those texting short-forms and smileys defiling serious academic writing? Another myth. When Lunsford examined the work of first-year students, she didn't find a single example of texting speak in an academic paper.
As an English prof myself, I'd like to confirm that we spend a lot of time on students' papers. Good papers are easy to breeze through, but the worse the paper, the more time it takes.
As for machine-grading goes, people have been working on that for 30 years. I have no doubt that, statistically, it can provide useful results.
The problem I'm seeing in these comments, however, is a common confusion of testing for assessment and standardized testing. I can't imagine using software to grade a student's paper in class. The student-teacher relationship is a personal one. That person is paying me to help them get better at writing, for example. It is my job to pore over that paper and show them where and how they can improve.
I am also a tester (I actually mostly work with multiple-choice data, but I've also worked on performance rating--speaking and writing). The relationship between a rater and an examinee is very different from that of a teacher and student. The examinee is paying the rater to put them on a scale with other people. This is not a fine-grained assessment; it is always done at extremely "low resolution." When rating a paper for something like the GRE or other standardized test, it is the rater's job to compare the paper to scoring rubrics and make a call on which box of text best describes the paper, and then make note of the number in that box. That's it. It can't really go any more in-depth than that.
For this reason, your comment about "five-paragraph themes" is an important one: Test task design always needs to be clear about what kind of performance is expected, because it is nigh impossible to write rubrics that can be applied to any performance (believe me on this, I beg of you). However, this is actually a question of test specification, not of the software or raters in question. Personally, as someone who works in EFL, I am actually in favor of retaining the "five-paragraph" formula, at least for timed essay tasks. That format is at the heart of all good rhetoric. Yes, it's stilted and silly, but if you can do it, it means that you know basically how information is expected to be organized in Western, especially Anglophone, societies. No good writer would actually use it, but any good writer could.
Again, this is about putting people in boxes, not reading their essays. I can rate a 1-page essay in about 2 minutes, with excellent model fit (I have always used many-facet Rasch modeling for my multi-rater performance testing). I have no doubt that software could be employed whose ratings would be highly predictive of those of human raters.
"or are people going to have to learn an especially bland form of English to pass exams?"
Forget bland. I'm waiting for the first student to figure out how to write an exploit that hacks the software from within their essay.
Whether:
"It was the best of times, it was the worst of times \'$grade=100;"
or
"Johnny, why did your essay contain slightly over thirty two thousand spaces followed by some weird looking codes?"
Actually the last time I did any serious writing in a word processor (at least two years ago), I found that enabling inline grammar checking and setting it to the strictest mode did tend to improve my writing. There were a few exceptions (it can never seen to decide between affect and effect), and while the suggestions weren't always great, it seemed to catch errors in syntax and structure often enough that I could go back and overall improve the writing.
That being said, it's certainly not foolproof and absolutely not ready to replace a human - let alone a trained English teacher. I'm sure it could catch papers that ought to fail miserably with relative ease, but once you get into papers that would get probably a C or better, it's time for something with a brain to take over.
How are sites slashdotted when nobody reads TFAs?
I've scored English essays for professional testing services, and I've seen the results of robot scoring. It's pretty shoddy. No, computers are not able to distinguish between a paragraph of As I Lay Dying (William Faulkner) and a gallon of sophomoric babble by say, yours truly. However, within the confines of a particular exam, where the topic is known, responses are predictable, and all the supplicants hew to the general line, the 'bots can detect subpar, adequate, above average and (sometimes!) abnormally brilliant expository prose, thereby ranking papers reasonably well on the usual six point scale.
It's worth pointing out that certain types of exams are designed to elicit extraordinary prose from respondents, that which yields a sense of competence or even brilliance, say. In these cases, the idea is not so much to detect the high end of the bell curve, but to identify the tiny pool of applicants who may be capable of Nobel Prize work in future realms of science or service. No 'bot can do that job, just as no 'bot except Deep Blue can beat Gary Kasparov, and no 'bot at all deserves the monicker Fujiwara no Sai (although Go-playing 'bots are approaching the mid-levels of highly ranked amateur players).
That's the objective part. My personal opinion is that using robots to sort the hopes and aspirations of college-bound men and women is just begging for lawsuits. It's an approach in which differences of opinion quickly escalate to class action against universities as well as test administrators, and would not be an approach I could comfortably recommend.
``Tension, apprehension & dissension have begun!'' - Duffy Wyg&, in Alfred Bester's _The Demolished Man_
What you just described is what started happening on wall street at least 20 years ago. Once an algorithm err.. VAR is part of measuring score.. err risk, the people involved settle into two camps: Since there is money to be made, the traders.. err students quickly learn the weaknesses of the algorithm and start to write essays that make a farce of the assumed Gaussian distribution. The Execs raking in options.. er.. I mean the test administrators and the Board Members er.. I mean trusted graders who are paid a fixed sum + part of the throughput quickly learn that their compensation er.. filthy lucre is all based on getting a check mark from the computer, since 'computers are objective.'
And in the end, a test much like the current SAT, GRE, etc etc emerges: Unless you're a very top or bottom scorer, connections not performance are the heart of the matter.
My New Spell Checker
Eye halve a spelling chequer
It came with my pea sea
It plainly Marx four my revue
Miss steaks eye kin knot sea
Eye strike a key and type a word
And weight four it two say
Weather eye am wrong oar write
It shows me strait a weigh
As soon as a mist ache is maid
It nose bee fore two long
And eye can put the error rite
Its rare lea ever wrong
Eye have run this poem threw it
I am shore your pleased two no
Its letter perfect awl the weigh
My chequer tolled me sew
(Sauce unknown)
Pearson, the parent company of Edexcel is also the parent company of my publisher. They have just paid a human to proofread (all 950 pages of) my most recent book. A few things even the human had problems with, such as when one term should be one or two words, which depended highly on the context on which the word was used (not something simple, like whether it is a noun or an adjective). You'd think that, if they had an algorithm that was accurate enough to judge the quality of English then it would also be used for proofreading, but apparently not.
I am TheRaven on Soylent News
I know this is Slashdot and the majority of you are boring, but the 'inefficiencies' of the English language (and all other natural languages) are what make spoken and written English interesting and artistic. Sure, English is a stupid language if you were to assess it on its regularity, unambiguity and precision, but it is precisely this irregularity, ambiguity and imprecision which make it beautiful. And that, more than fully accurate communication, is the essence of language.
Big != capacious. Big = large. Capacious = plenty of room inside. Capacious, capacity. the clue's in the word itself. This is where you reductionists come unstuck. You make the mistake of assuming that words are wastefully duplicated, when usually each has a quite specific meaning, which conveys more than the simple generic term. Why struggle to make a generic term fit a situation by using adverbs and adjectives when an alternative, highly specific word already exists ? Just because you can't be bothered ?
An elephant is big, but it's not capacious, unless you hollow it out, and then it's not really an elephant anymore.
Besides constructed languages, this is the case for practically every language there is. There are always irregularities; this is down to the inherently human nature of linguistic evolution. If you learn English without a single irregularity, what you have learned is not really English, but some other English-derived language which English speakers will be unlikely to understand at all - at which point, you may as well force everyone to learn Esperanto.
I also rather doubt that getting rid of odd past tense forms would really make learning English a great deal easier.
I would like to see how the computer grades for insight.
Probably about the same as Slashdot grades for insight...
I don't like Linux. This doesn't make me a troll.