Essay Grading Software For Teachers
asjk writes "Software to help teachers with grading has been around for sometime. This is true even with respect to grading essays. A new tool, called Criteria, will look at grammar, usage, and even style and organization. It works by being trained by at least 450 essays scored by two professionals. The difference this time? Here is a snip from the article: '"There's a lot of skepticism," Dr. Spatola said. "The people opposed see it dehumanizing the student's papers, putting them through some sort of mechanical, computerized system like the multiple choice tests. That's really not the case, because we're not talking about eliminating the human element. We're making the process more efficient."'"
that they've automated away a major part of a professors job, while we still need humans to pick spinach and deliver pizzas.
Don't drop the soap, Tommy!
I thought the point of an essay was to grade the ideas and how well they're expressed. I didn't realize they were spelling/grammar tests.
Maybe I'm just a bit jaded by this because of all the stupid grammar and spelling nitpicking that goes on here on Slashdot. Evidentally, it's much easier to criticize my spelling than it is to provided a rebuttal to my point.
"Derp de derp."
Without computers we wouldn't be advancing in science, astronomy, genetics, or mathematics as rapidly as we have been in recent years. They are wonderful things. Hell, computers even help me keep a roof over my head. But I don't want Hal judging my kid's school papers.
I for one welcome our automated essay-correcting overlords.
1 - the grammar check option in MS word is crap. this sounds awfully similar.
2 - your resume can suck, but with the proper buzz words, it'll come out looking like gold to those automated resume checkers.
1+2 = students who turn in good papers that aren't structured perfectly (and you have to admit, there is some fluidity to language) will get marked down, and those who know what bullet points to put in their papers will get good marks, even though the content is crap.
How long until you get kids selling manuals in the bathroom on what the machina are looking for?
--I don't want the world, I just want your half.
Sorry for the off-topic post.... but since Slashdot links to so many NYT articles, they should look into getting a partner=SLASHDOT thing (like Google does).
If they're going to use a computer to judge the content, than I'm not going to hesitate to use a computer to write my essay.
So when a student gets a C on an essay to whom does he/she seek redress?
Teachers make mistakes and occasionally mark something negatively that was misread or misunderstood. In those cases the student can talk to the teacher and make a case.
If a computer does the marking though what do they do?
Tom
Someday, I'll have a real sig.
I bet that I can write a paper that satisfies this application's conditions for correctness of grammar, usage, style and organization, but is completely and utterly meaningless.
Then, let's feed this thing Ulysses and let's see how high it grades Joyce.
Anybody who can't see that this thing is useless for promoting any sort of creativity among students is off their rocker.
As long as this is merely an assistant and not the end-all be-all, as long as actual qualified instructors review the essay after this program does, I'm all for it.
The English language is so full of subtleties, nuances, combinations, and fantastic structural intracacies that make phenomenal writing in it possible (Faulkner, Bradbury, etc.). There's a reason English is a field of study for graduate degrees: it's absolutely worthy of them. There is no subsitute for the educated, refined judgment of someone who is exceedingly well-versed in the language.
The coolest voice ever.
We need some laws:
Grading software may not injure a human being's GPA or, through inaction, allow a human being's GPA to come to harm.
Grading software must obey the orders given it by human beings except where such orders would conflict with the First Law.
Grading software must copy protect its own existence as long as such protection does not conflict with the First or Second Law.
What we need is software that grabs essays off the internet and runs them through the grading software and the cheating detection software, thus gauranteeing an 'A'.
Then we can truly achieve the goal of "knowledge passing from lecturer to paper without passing through any brains".
The only problem is that the machines might achieve intelligence. That must be avoided at all costs. To that end, all students and professors will be equipped with rifles or pistols to take out the machines if necessary. Potential students will be asked to specify weapons preference on their applications.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
The fun they had
There is no "humanity" in a modern constructed essay. There are certainly going to be "judgement calls" when standards are not as fully fleshed out for the computer as they should be, but as long as those are appealable, I have no problem having a computer assign me the other 95% of my essay points. The only instructors who will fear this are those who like to assign grades arbitrarily. And I don't feel too sympathetic toward those people.
"You're never ready, just less unprepared."
If the poem's score for perfection is plotted along the horizontal of a graph, and its importance is plotted on the vertical, then calculating the total area of the poem yields the measure of its greatness.
A sonnet by Byron may score high on the vertical, but only average on the horizontal. A Shakespearean sonnet, on the other hand, would score high both horizontally and vertically, yielding a massive total area, thereby revealing the poem to be truly great. As you proceed through the poetry in this book, practice this rating method. As your ability to evaluate poems in this matter grows, so will - so will your enjoyment and understanding of poetry.
(From the full script.
bash$
is just one of many writers who would flunk using this system.
'Nuff said.
Sounds like everyone feels the same way too... We've got some automated testing software for MS Office at the local college and although it's getting better, it still makes really silly mistakes from time to time. Analyzing English composition has got to be many times more difficult than watching a bunch of clicks and key presses.
The only use I can see for this thing is as a "first pass" grading tool that quickly finds obvious mistakes (spelling, grammer, redundancy, etc) and flags them for the instructor. On the other hand, it's probably just as time consuming for the instructor to read over the flagged items as it is to just catch them on the first time reading through the paper.
This thing compares the essays it is supposed to grade with already graded papers in its database. Couldn't this be done with something like POPFile? It isn't only a spam/ham classifier and lets you create as many "buckets" as you want (e.g. work, family, spam, mailing lists and system monitoring).
You could, in theory, create only buckets named (A...F), feed a large number of essays to it, make it "learn" how the essays are classified using statistics, and let it grade essays for you after that.
Is it possible to find masses of graded essays online? This would be a fun thing to try :).
Trollem mirabilem hanc subnotationis exigiutas non caperet
He just gives everyone a B when he is hungover.
As far as the achievements of ancient cultures go, it is all relative. We have harnessed fusion, mapped the genome, created antibiotics, peered deep into the hearts of galaxies a 100,000,000 light years away, forged fiber optics, designed the integrated circuit, et cetera. People three hundred years from now will look back upon us and wonder how a civilization that could barely put a man on the moon (a feat that will surely be trivial to them) was able to usher in the Information Age in only a decade worth of work.
This sounds a lot like This story.
Actually this sounds a lot like Gramatica. Gramatica was the grammer checker that was an optional component with WordPerfect for DOS and later a standard component with the Windows version. It was written by a team comprised of both computer scientists and professors of English. One of the interesting features was the scoring feature which would give you a rough estimate of the grade level of your writing. It would also give you statistics and compare them to a selection of famous works.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
ETS actually has a web site where you can do a sample essay that their server will grade for you.
More info can be found here.
One of the primary purposes of essays are to learn how to write for a specific audience.
If you remove the human element, then you aren't writing for any audience, unless, of course, everyone starts writing for computers' entertainment and education.
>the job of highschool should be to get a student into the best college/university possible
NO!
That's the problem right there.
Highschool should be to prepare you for the real world (ie: A job, life, maybe marriage).
University is there to prepare you for a lifetime of learning on a subject.
Instead, we have employers that require university educations for secretaries. It's insane, wrong, and needs to stop if we expect everyone in society to be useful (and they ARE, it's just that stupid employers use university education as a filter).
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
First off, let me say that I am involved in the automated essay grading industry, and have helped to develop RocketScore which does everything Criterion does, and lots more. Forgive me for blatant plugs in this post, I'll try and keep them to a minimum.
But let's move on to the focus of this article.
First off, there is a lot of criticism about essay graders being formulaic, only capable of seeing patterns that arose in their originating sample set of essays. With Criterion, an offshoot of ETS's e-rater, this is a serious concern. When you only look at what you see, anything out of left field looks completely awry, and cannot be graded appropriately. RocketScore is different; RocketScore uses a "features" method to check for included or excluded material, among many other things, and is therefore quite good at noticing subtle writing and essays types which it has never seen before.
One of the great things about essay graders is that they give a student an objective standard to look to. Human graders grade differently based upon mood, time they have to review the writing, and many other mittigating factors. In other words, the same human grader might grade the same essay differently at separate points in time. Most essay graders will always grade the same essay in the same manner. This is great for a student, for if a teacher gives you a D when the essay grader says it's in B range, one might be able to use this evidence to force the teacher to reconsider the grade. Or vica versa. If the essay grader is telling you that you're getting a D, you can work and improve on it until you're getting that B you'd be happy with.
But there are serious drawbacks to the comments E-Rater and Criterion give. E-Rater gives comments soley based on your score (if you get a 1, you get comment set 1, if you get a 2, comment set 2, etc.). Criterion gives a student "instructional feedback in basic grammar, usage, style and organization." E-Rater's comments are inadequate at best, and Criterion's leave a lot to be desired. RocketScore provides substantial feedback on how to improve your writing. Not just stylistic and grammatical comments, but comments on what you should be writing more about (you didn't provide enough info!), what you should be writing less about (you gave too much info!), and how to balance your arguments, among many other categories.
There are two major problems with essay grading. The first is bullshit detection, and the second is determining if the essay actually answered the question asked. E-rater and Criterion both have real problems with these two criteria. With bullshit detection, RocketScore has threshholds which can be set and manipulated on the fly, from throwing out anything which isn't completely relevant to the topic, to allowing just about any essay submitted. And you will get a score and comments based upon what you submitted. Of course, these are most helpful when you make a meaningful attempt to submit a relevant essay.
Yes, but do you know how ETS defines "agreement"? Glad you asked. When the grader's grade is within a point of the human's grade. Now, with the SAT 2 test, which is on a scale of 1 through 6, that means if the grader says 2, and a human says 1, 2, or 3, then there's agreement. But that's 50% of the scale! Their essay grader has a 98% chance of hitting the wall in front of them as opposed to the wall next to them. Woohoo. Meanwhile, RocketScore provides decimal point accuracy (we don't give you a 4 or a 5, we give you a 4.1, or 5.3), and is 98% accurate. But how do we define accurate? When the grader's grade is rounded to the nearest whole number, and that number is the human's grade. In other words, if we give you a 4.3, there is a 98% chance a human would give you a 4. With 4.5,
---
"Of course, that's just my opinion. I could be wrong." --Dennis Miller
Teacher: Johnny, I'm really sorry, but the computer crashed while your paper was being scored. I was looking over it. It's been a while since I've read a paper, but I was wondering what the following sentence means:
And this one:
Is that some kind of new language that kids are using? Oh, by the way, congratulations, you got a 100 on EVERY essay this semester! Good job!
Now before you start up the flame throwers, this is not a message to deride high school students over their lack of creativity.
But when I was in high school, we were told that proper essay writing was an essential skill for the departmentals, and when they said "proper," they meant "Must conform to between five and seven paragraphs, with the first and last being this opening and conclusion with three to five paragraphs of body--each containing one topic of discussion."
Furthermore, it was made VERY clear that creative or unconventional ideas (let alone language!) would be strongly frowned upon. There was One True Way to write an essay, and One True Opinion on any given subject. Any deviations from that would cost you.
I hated it then, I hate it now, but I don't see any problem with having computers mark essays like this. After all, they were trying to turn us into computers to create them.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
This is a great leap forward for education. While it has always been the goal of geeks to submit computer-generated papers and receive decent grades, this has traditionally been hampered by the unreliability of computer-to-human communication. But with computer-to-computer submissions (henceforth referred to as "End-to-end Grading And Direction", or EGAD), we can now begin hacking away at the first generation of grade generators.
"What I did on my Summer Vaca'; DROP TABLE punctuation"
The funny thing about this is that, if the essay is graded by computer, the best way to write the essay would be to have the COMPUTER write it. The same criteria that the program would use to grade the essay could very easily be turned around and used to generate an essay that the computer will love. Having a computer written term paper given an A by a computer grader is worthy of an Ionesco play.
Beyond that there is no way the computer will be able to distinguish between something truly interesting and something that just lists the facts in simple Dick and Jane language with an occasional compund sentence to keep the grammar checker happy. All it can do is check for fact1, fact2, fact3, and any interesting conclusion you draw in the paper will be completely lost. Anything more would be turing test worthy, and I heartily doubt they've achieved anything close to that.
Elegant prose is often not strictly grammatical, so a boring paper would likely score the same or better than a far better written essay with the same facts. I routinely turn off grammar checking in every program I've ever used it in. Aside from the occasional misplaced modifier or dangling participle, its worthless.
In conclusion, this idea is a pipe dream which would discourage high quality writing (i.e. the kind actual PEOPLE like to read), teach people the substandard grammatical constructs used by most grammar checking software, and create a market for software that writes term papers, thereby removing the last actual bit of work your average liberal arts major has to do. I think it's a hopelessly terrible idea. TA's already do this work; why waste time coming up with a program which will do the same thing, poorly?
Just my opinion.
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.
"I heard a statistic once that if you chose answers randomly on a MC test that you could get a C by not knowing anything beyond how to circle a letter!"
You "heard a statistic once"? Geez, the probability statistics aren't that difficult: If there's 4 possible answers, and you randomly pick, you'll likely get about 25% right, or 5/20, 3/33. It isn't rocket science. To get 50% randomly there'd have to be only two possible choices. Add to that the fact that many post secondary multiple choice tests actually deduct marks for incorrect answers, and your C proclamation sounds like it might be incorrect.
Something hinted at by the story and some of the comments but really bears being pendantic: too few teachers. It is lucridous to expect a teacher to go over 150 essays as it is for me to expect getting a reasonable education when I am 1 of 150 faces trying to gleen something more than an "A" from a class. The software is attempting to address this imbalance, but ultimately it will make the level of education worse: it can grade a paper, it can't offer insights on how to improve. And it will give administrators a reason to pile 50 more into a class, which will in turn lead to GradeStar MkII and onward into a vicious circle. And yeah, the software is just a tool, but like so many tools, that's not how it will be utilized. It's a cop-out, nothing more.
I wrote in my journal about this awhile back. ETS was trying to sell their essay grader to a group of the local test prep chains here in Taiwan. The local schools called me in to sit in on the presentation. Before I had gone in, I searched around and found numerous free and open implementations and I asked the speaker why they were selling their academic software for so much money --it was a rather complex contract on a per seat basis-- when there were similar product available for free. Their rep claimed to be unamare of any similar open sourced products that could match the amazing and advanced artificial intelligence features they were offering. Sales reps --hmm. The mere posing of question definitely made them stutter and squirm though.
But the interesting part was after I got home. I looked at ETS's own research monologues and found that internally this overpriced system had been debunked. It was discovered that by writing one well-formed short paragraph and then cutting and pasting it over and over an almost perfect score could be attained. The more times it was pasted, the higher the score.
It was also possible to write an essay on an unrelated topic and still get a high score allowing students to use rote memoriziation of a single model essay. This, natually, is impossible with a human reader because they can tell what the topic is fairly easily. According to the sales literature this software could to, but in actual tests that didn't hold up.
Their sales literature claimed that the software contained aritificial intelligence and thus implied that such simple techniques would not fool it, but in practice this was far from the case.
Monographs published by ETS also made it clear that despite their aggressive marketing of this product outside the US, they were not planning to use it as an exclusive grading system on their own tests. Rather, it was to be used as a teaching tool. However, it took a lot of digging to uncover that information.
Just as with translation, there's a lot of financial motivation to make this technology work, but that doesn't necessarily translate into workable products. In the nineties when spelling and grammar checking was already old hat and English/Euro translation was making such headway I thought fluent Chinese/English translation was just a few years away. Now it's 2003, grammar checkers still only work if you write in prescribed style and I've yet to see something halfway decent in Chinese/English translation software although you still hear claims all the time for some overpriced product that's really almost there.
I think we'll see dramatic life extension long before we see decent computer essay graders. Decent trade as far as I'm concerned. As for translation, we can always teach more languages in school.