Essay Grading Software For Teachers

Interesting.. by rsheridan6 · 2003-09-06 14:19 · Score: 4, Insightful

that they've automated away a major part of a professors job, while we still need humans to pick spinach and deliver pizzas.

--
Don't drop the soap, Tommy!

Re:Interesting.. by focitrixilous+P · 2003-09-06 14:24 · Score: 5, Funny

Nope, robots will soon do it all.

--
SAILING MISHAP
Re:Interesting.. by Zork+the+Almighty · 2003-09-06 15:32 · Score: 5, Insightful

"That's really not the case, because we're not talking about eliminating the human element. We're making the process more efficient."

I love this quote in particular because it has to be the most disingenious claim one could make. The entire act of making something a process, and then making that process more efficient IS "removing the human element". It's the type of subtle point that would be completely missed by, say, a computer grading system.

--

In Soviet America the banks rob you!
Re:Interesting.. by clifyt · 2003-09-06 16:22 · Score: 4, Insightful

ACTUALLY...I think thats a quote I gave Dr. Shermis a few years back :-) I think he WOULD like to remove the human element...

Its NOT eleminating the human element...its making the human element a little more susceptible to objective means than the old subjective means. Raters still can use what ever they feel is necessary, but in the end, I can see how far from the standard deviation on certain ratings these folks are and 'suggest' to other raters that they might want to take a look at that essay before a final score is placed on it.

Fuck fuck fuck...the one and only time I will ever see any research I had a hand in developing ever end up on the front page of /. and I'm stuck at a concert doing my second line of work -- music tech (though with a wireless connection :-)

I'll have to yell at my friends at FIU and Vantage about this oversight.

If ya'll are interested in seeing a demo of this technology in action (I'm sure the first 20 people will destroy the server), take a look at --

http://testing.tc.iupui.edu/fipsedemo/ (purposely unlinked so that folks will have to cut and paste).

Its an older model, but we are in the midsts of evaluating 2000 more essays with 8 human raters that should make the model a little cleaner...hmmm...probably should run my horrid grammer through it before I post here...nah...I think I broke it last time I used my own text...

Time to get back to work...the guys are probably wondering why I said I needed to check my email and have been gone a half hour.

clif
Re:Interesting.. by dieman · 2003-09-06 17:33 · Score: 4, Informative

I took a old college paper that I wrote and plugged it into the program and got 100% on everything except for creativity (99.973). Considering that I don't think I got a 'perfect' score on this paper, I'm really surprised by the scores. :)

How great though, throwing a paper about the fear of technology through something many people (rightfully) fear. :)

--
-- dieman - Scott Dier
Re:Interesting.. by Chasuk · 2003-09-06 21:25 · Score: 4, Insightful

I submitted this paper:

"Hemingway bifurcated his sensibilities between post-modernism and jazz. This I posit without having read the majority of Hemingway's work: it seemed irrelevant to the focus of my current project. What is this focus, and is it monocular? My focus can be summed up as ascertaining the usefulness of the program analyzing this document.

Without really being cognizant of the background of Freud's bisexuality, or Hemingway's sado-masochism, I cannot continue this paragraph. I will repeat this sentence without attaching any meaning to the words typed, or to my gonads. An essay in experimental dissection might be more appropriate for the issues presented here. Entirely too many bifocal wearers insist that I am currently composing gibberish. However, both Freud and Hemingway felt that bifocal wearers gloried in their bisexual sado-masochistic attachments. I concur, and I do so without reservation.

Reiteration is the root of all nonplussed renegades of origami. Nothing can be elucidated from nonsensical verbiage, but some will make the valiant effort singing praises to the whisperer. When origami is embraced by the valiant trio, the nonsensical proctologist dies. Whenever a proctologist expires in a semantic heap, Hollywood has fodder for another musical, or at least the plotline for the final unaired episode of Barney meets Fred Flintstone. Barney is a seminal reductionist. When the elucidated evidence is thrust into trusting Barney's smiling orifice, San Franciscan nuns applaud loudly.

Today I type my penultimate paragraph. I use penultimate artificially, but not without candor. Within this myriad exegesis, I pause. A Hollywood proctologist questions Freud's reasoning, and validates Barney's temporary hypothesis. In conclusion, the validity of essence cannot be lessened by the earnings of providence.

If I have not typed 500 words, this paragraph is not my penultimate, nor was my last. To assert otherwise is prudent, but lacking in elegance. What a sad commentary on misery did Darwin conspire to unfold. He rejected utterly the Hemmingway of his, and our, forebears. His eloquence was Freud and lust personified."

This earned me an overall 78% score, with no effort whatsoever. I composed this nonsense in minutes.

Doesn't this system have a baloney detector?

--
Neopets - the best free game on the Int
Re:Interesting.. by clifyt · 2003-09-07 01:41 · Score: 4, Informative

Read what the model is about before complaining :)

That model that is up there is one based on Impromptu Entering Student Essays.

For this model, we were giving students 1 hour to write an essay that they had no prior knowledge of the prompt. We allowed no research or even simple things like spell checking (we did provide hard dictionarys :-)

As such, anything that was well researched and otherwise would have probably thrown this thing off the charts.

We *DO* have several other models available. The best example of this technology was taken off the site a few weeks ago at the behest of a former partner in this research at Duke University. We DID have several models that could have been compared including one that was appropriate for many types of research papers.

Remember -- folks are afraid this stuff is going to take away humanity *BUT* no one wants to even thing that this stuff is customizable for target groups. With as small as 300 papers that were rated (notice I try to NEVER say graded...though even after 10 years at this stuff its hard not to...) we could set up initial models for an individual school system with their own ruberics and scored according to their skill levels. Of course, the model would HAVE to be refined for later usage, but thats enough to get started.

The great thing about this is at a production level, we actually screen for essays that are rated much higher or much lower than the standard deviations would allow for. It allows us to take a look at whats going on and make adjustments.

It also allows for diagnostic use for educators. For instance, my incoming students all have to write essays when they come in (unless they have taken a honors level writting course in high school and have received college credit). This is all automated (on another system farther behind my line of defenses ya hackers :-) in that they come in, we give them a prompt to write about and they type it in (or if they are afraid of computers, write it in a blue book...we ain't nazis about this technology -- but that will take 3 weeks longer as our raters don't stop by campus too often). Its then transmitted to the student databases and we've provided an interface for the English faculty to rate these things.

*IF* the paper is written at a much higher threshold than is expected for a student of that calibre, I automatically kick off an email to the rater in charge of the honors program asking her to take a look at it. If its much lower, the application tries to make a good first judgement if this is a remedial case (which most of mine show up as :-) or an ESL case (English as a Second Language) and then we kick off the appropriate emails.

This *ALSO* happens with human raters...the first rater to look at the essay has the choice of throwing it one way or another (actually she can alert ALL of the parties if it was necessary) and it does the same thing...but the automated part saves a few days of this initial interaction.

Just as a note: If someone had gotten this far in the college application, we aren't here to make any judgements on their ability to be a college student, we are interested in making the most appropriate assessment in where they should be placed to get the best help so that they can have the best college experience around. This application was a good help with making sure that this was achieved.

We stopped using this in production a while back after protests from folks that didn't know how it worked nor cared to understand that it wasn't out to take their jobs. It was there to help make sure that a SINGLE judgement on the human side was correct (or within a certain scope of correctness) and if not, ask that someone else give it a second look. Back in the day threee raters would have rated any given essay for student placement purposes, but even before this was introduced, it got to the point where depending on the attitudes of those rati

Uh.... by NanoGator · 2003-09-06 14:22 · Score: 3, Insightful

I thought the point of an essay was to grade the ideas and how well they're expressed. I didn't realize they were spelling/grammar tests.

Maybe I'm just a bit jaded by this because of all the stupid grammar and spelling nitpicking that goes on here on Slashdot. Evidentally, it's much easier to criticize my spelling than it is to provided a rebuttal to my point.

--
"Derp de derp."

Re:Uh.... by HanzoSan · 2003-09-06 14:39 · Score: 3, Insightful

Essays have two aspects, spelling/grammar, and content.

Right now the computer can grade the technical side of a paper, and the teacher can grade the creative side. Now if the essay is for English class, the focus should be on the technical side of papers, so the computer can judge the whole paper from A to F on spelling and grammar.

Really it depends on the class. English classes especially in highschool are all about improving grammar and technical ability, you dont actually do any creative writing until college usually.

--
If you use Linux, please help development of Autopac
Re:Uh.... by prospero14 · 2003-09-06 15:36 · Score: 3, Insightful

Essays have two aspects, spelling/grammar, and content. Right now the computer can grade the technical side of a paper, and the teacher can grade the creative side.
RTFA! Criteria does not merely grade spelling and grammer. Rather, it has a database of 500 papers graded by humans, and the program uses statisical analysis to compare a given paper to those in its database. If a paper uses the right technical terms, contains phrases similar to those in "A" papers, and uses phrases like "thus", "because" and "in conclusion" which suggest a logical flow, then the paper gets an A.
However, you are right that Criteria grades based on form rather than on content. As anyone who reads usenet can tell you, it is quite possible for a paper to have the form of coherent argument, to use the right buzzwords, but in fact not contain a logical or persuasive argument.
Criteria is indeed flawed, but not in the way that you suggest. Rather than check spelling and grammer, it checks for the appearance of an argument. As well all know, merely looking like a good argument isn't good enough.
Re:Uh.... by Quothz · 2003-09-06 15:41 · Score: 5, Funny

Er, I'll save you moderators the trouble. -1, Flamebait. And a grammar flame to boot. With grammatical errors in it. I deserve modding down. I probably deserve worse. But I must speak.

If you do know English te word grammar checker should be used to write perfect technical papers. Its possible to write perfect technical papers, I do it all the time in college, its like standard here if you want an A.

This makes me want to weep. Did you intend it ironically?

"Its"? Twice?(!) A run-on sentence bragging about your prowess at grammar? Redundancy, incorrect capitalization, a typographical error, punctuation errors, and errors I don't know the name of?

Mind you, my grammar ain't perfect, even in this post. That last paragraph was nothing but sentence framents. I'm just saying I really, really hope you did that on purpose.

If not, shut the hell up about your perfect technical papers, 'kay?

When a judge is made of silicon by mao+che+minh · 2003-09-06 14:22 · Score: 3, Interesting

I don't like it. Part of the learning experience, especially in the subjects of arts and philosophy, is being judged by another human being (or group of human beings) and having your work subject to their myriad of emotions and intellectual whims. A system like Criteria removes the very complex aspect of education: the human mind.

Without computers we wouldn't be advancing in science, astronomy, genetics, or mathematics as rapidly as we have been in recent years. They are wonderful things. Hell, computers even help me keep a roof over my head. But I don't want Hal judging my kid's school papers.

Re:When a judge is made of silicon by dolo666 · 2003-09-06 15:40 · Score: 4, Interesting

I tend to disagree. By eliminating the time it takes to grade papers, professors have many more hours to spend with students *doing* the humanizing. I'm a teacher, and any teacher worth their salt will know if the machine is wrong, because they'll know their students, and what each one deserves (without even reading the damn papers they at least know what to expect, so if the machine is off, they will know). Now for higher level papers, such as university level papers, the machines should be only used as a guide, like comment moderation at slashdot. Not all the moderation is in fact, correct, and I'm sure that profs will also know that the same is true with these devices.

This seems like a bad idea by Bueller_007 · 2003-09-06 14:23 · Score: 5, Funny

I for one welcome our automated essay-correcting overlords.

Re:This seems like a bad idea by Jerf · 2003-09-06 15:45 · Score: 4, Funny

ESSAY GRADING REPORT FOR: "Bueller 007" (ID: 535588) BASE SCORE: 100 -50: Essay too short (few arguments can be well-supported in nine words) -50: Plagarism: It is 99.999% (MAX PROB) likely, based on the content of the essay, that it is plagarized from other sources. -10: Grammar error: Phrase "I for one welcome" requires commas, as in "I, for one, welcome"* -25: Missing key words: The essay grader was instructed to look for the following key words or phrases, which were not found in this essay: word: excellent, word: good, phrase: better then humans, word: lazy, phrase: java.lang.NullPointerException\nstacktrace\n\tat\n org.criteria.grading.phraseIterator.getNext(phrase Iterator.java:1023)... Total: 65501 Grade: A+

(*: Jumping out of character: To forstall objections, this "error" is deliberately pointed out as the kind of mistake a computer can make if you use grammar checkers and trust them blindly. While an excessively formal style of English might 'require' commas in that phrase, an excellent case can be made that in a nine-word sentence such commas just make the sentence choppy.)

Oh goody. by ArsonPanda · 2003-09-06 14:23 · Score: 3, Insightful

1 - the grammar check option in MS word is crap. this sounds awfully similar.

2 - your resume can suck, but with the proper buzz words, it'll come out looking like gold to those automated resume checkers.

1+2 = students who turn in good papers that aren't structured perfectly (and you have to admit, there is some fluidity to language) will get marked down, and those who know what bullet points to put in their papers will get good marks, even though the content is crap.
How long until you get kids selling manuals in the bathroom on what the machina are looking for?

--

--I don't want the world, I just want your half.

New York Times articles by Vic · 2003-09-06 14:23 · Score: 4, Interesting

Sorry for the off-topic post.... but since Slashdot links to so many NYT articles, they should look into getting a partner=SLASHDOT thing (like Google does).

Computer vs Computer by d03boy · 2003-09-06 14:24 · Score: 5, Funny

If they're going to use a computer to judge the content, than I'm not going to hesitate to use a computer to write my essay.

Whoa wait up by tomstdenis · 2003-09-06 14:25 · Score: 4, Interesting

So when a student gets a C on an essay to whom does he/she seek redress?

Teachers make mistakes and occasionally mark something negatively that was misread or misunderstood. In those cases the student can talk to the teacher and make a case.

If a computer does the marking though what do they do?

Tom

--
Someday, I'll have a real sig.

More efficient, my ass. by BJH · 2003-09-06 14:25 · Score: 3, Insightful

I bet that I can write a paper that satisfies this application's conditions for correctness of grammar, usage, style and organization, but is completely and utterly meaningless.
Then, let's feed this thing Ulysses and let's see how high it grades Joyce.

Anybody who can't see that this thing is useless for promoting any sort of creativity among students is off their rocker.

Fine for help, but... by Faust7 · 2003-09-06 14:26 · Score: 5, Insightful

As long as this is merely an assistant and not the end-all be-all, as long as actual qualified instructors review the essay after this program does, I'm all for it.

The English language is so full of subtleties, nuances, combinations, and fantastic structural intracacies that make phenomenal writing in it possible (Faulkner, Bradbury, etc.). There's a reason English is a field of study for graduate degrees: it's absolutely worthy of them. There is no subsitute for the educated, refined judgment of someone who is exceedingly well-versed in the language.

--
The coolest voice ever.

Before we unleash such abominations by Anonymous Coward · 2003-09-06 14:26 · Score: 3, Funny

We need some laws:

Grading software may not injure a human being's GPA or, through inaction, allow a human being's GPA to come to harm.
Grading software must obey the orders given it by human beings except where such orders would conflict with the First Law.
Grading software must copy protect its own existence as long as such protection does not conflict with the First or Second Law.

Gentleman, Start Your Compilers by istartedi · 2003-09-06 14:27 · Score: 5, Funny

What we need is software that grabs essays off the internet and runs them through the grading software and the cheating detection software, thus gauranteeing an 'A'.

Then we can truly achieve the goal of "knowledge passing from lecturer to paper without passing through any brains".

The only problem is that the machines might achieve intelligence. That must be avoided at all costs. To that end, all students and professors will be equipped with rifles or pistols to take out the machines if necessary. Potential students will be asked to specify weapons preference on their applications.

--
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?

What's next? by mao+che+minh · 2003-09-06 14:28 · Score: 4, Interesting

The fun they had

What humanity? by parliboy · 2003-09-06 14:29 · Score: 4, Insightful

Lemme let you guys in on a little secret. If you ever take an educational standards and measurement class, one of the things you'll learn about is the construction and grading of essay questions. This includes writing out objective standards for grading beforehand, possibly even designing a rubric explaining exactly what it takes to earn points.

There is no "humanity" in a modern constructed essay. There are certainly going to be "judgement calls" when standards are not as fully fleshed out for the computer as they should be, but as long as those are appealable, I have no problem having a computer assign me the other 95% of my essay points. The only instructors who will fear this are those who like to assign grades arbitrarily. And I don't feel too sympathetic toward those people.

--
"You're never ready, just less unprepared."

obDead Poets Society quote by MavEtJu · 2003-09-06 14:30 · Score: 4, Insightful

If the poem's score for perfection is plotted along the horizontal of a graph, and its importance is plotted on the vertical, then calculating the total area of the poem yields the measure of its greatness.

A sonnet by Byron may score high on the vertical, but only average on the horizontal. A Shakespearean sonnet, on the other hand, would score high both horizontally and vertically, yielding a massive total area, thereby revealing the poem to be truly great. As you proceed through the poetry in this book, practice this rating method. As your ability to evaluate poems in this matter grows, so will - so will your enjoyment and understanding of poetry.

(From the full script.

--
bash$ :(){ :|:&};:

Mark Twain by reboot246 · 2003-09-06 14:31 · Score: 3, Insightful

is just one of many writers who would flunk using this system.

'Nuff said.

*Shudder* by gregfortune · 2003-09-06 14:38 · Score: 3, Insightful

Sounds like everyone feels the same way too... We've got some automated testing software for MS Office at the local college and although it's getting better, it still makes really silly mistakes from time to time. Analyzing English composition has got to be many times more difficult than watching a bunch of clicks and key presses.

The only use I can see for this thing is as a "first pass" grading tool that quickly finds obvious mistakes (spelling, grammer, redundancy, etc) and flags them for the instructor. On the other hand, it's probably just as time consuming for the instructor to read over the flagged items as it is to just catch them on the first time reading through the paper.

Using a bayesian spam classifier for this? by stere0 · 2003-09-06 14:38 · Score: 4, Interesting

This thing compares the essays it is supposed to grade with already graded papers in its database. Couldn't this be done with something like POPFile? It isn't only a spam/ham classifier and lets you create as many "buckets" as you want (e.g. work, family, spam, mailing lists and system monitoring).

You could, in theory, create only buckets named (A...F), feed a large number of essays to it, make it "learn" how the essays are classified using statistics, and let it grade essays for you after that.

Is it possible to find masses of graded essays online? This would be a fun thing to try :).

--
Trollem mirabilem hanc subnotationis exigiutas non caperet

Do what my history teacher does by Savatte · 2003-09-06 14:40 · Score: 5, Funny

He just gives everyone a B when he is hungover.

Let us not forget our great achievements by mao+che+minh · 2003-09-06 14:47 · Score: 5, Insightful

We have had Dali, Sagan, Kip Thorne, Hawkin, Poe, Twain, Sigmund Frued, Einstein, Torvalds, et cetera. The great minds that you mentioned were indeed great, but if you place their philosophical or artistic achievements next to the great minds of our past century and a half, I find them equal.

As far as the achievements of ancient cultures go, it is all relative. We have harnessed fusion, mapped the genome, created antibiotics, peered deep into the hearts of galaxies a 100,000,000 light years away, forged fiber optics, designed the integrated circuit, et cetera. People three hundred years from now will look back upon us and wonder how a civilization that could barely put a man on the moon (a feat that will surely be trivial to them) was able to usher in the Information Age in only a decade worth of work.

Scary: by afidel · 2003-09-06 14:49 · Score: 3, Interesting

This sounds a lot like This story.

Actually this sounds a lot like Gramatica. Gramatica was the grammer checker that was an optional component with WordPerfect for DOS and later a standard component with the Windows version. It was written by a team comprised of both computer scientists and professors of English. One of the interesting features was the scoring feature which would give you a rough estimate of the grade level of your writing. It would also give you statistics and compare them to a selection of famous works.

--
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

The GMAT essays are already scored this way. by jwachter · 2003-09-06 15:10 · Score: 5, Informative

The GMAT, a test required to get into business school in the US, includes two 30-minute essay questions. Your responses are graded by a human grader and a computer program on a scale of 0 to 6. Your score is then a composite of the two scores.

ETS actually has a web site where you can do a sample essay that their server will grade for you.

More info can be found here.

Human element is required. by cybercyst · 2003-09-06 15:20 · Score: 4, Insightful

One of the primary purposes of essays are to learn how to write for a specific audience.
If you remove the human element, then you aren't writing for any audience, unless, of course, everyone starts writing for computers' entertainment and education.

Re:Go to a better school. by shepd · 2003-09-06 15:49 · Score: 5, Insightful

>the job of highschool should be to get a student into the best college/university possible

NO!

That's the problem right there.

Highschool should be to prepare you for the real world (ie: A job, life, maybe marriage).

University is there to prepare you for a lifetime of learning on a subject.

Instead, we have employers that require university educations for secretaries. It's insane, wrong, and needs to stop if we expect everyone in society to be useful (and they ARE, it's just that stupid employers use university education as a filter).

--
If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC

Of Essay Grading, Students, and Teachers by AntiFreeze · 2003-09-06 16:25 · Score: 4, Informative

Okay, this is going to be rather long, so please bear with me.

First off, let me say that I am involved in the automated essay grading industry, and have helped to develop RocketScore which does everything Criterion does, and lots more. Forgive me for blatant plugs in this post, I'll try and keep them to a minimum.

But let's move on to the focus of this article.

First off, there is a lot of criticism about essay graders being formulaic, only capable of seeing patterns that arose in their originating sample set of essays. With Criterion, an offshoot of ETS's e-rater, this is a serious concern. When you only look at what you see, anything out of left field looks completely awry, and cannot be graded appropriately. RocketScore is different; RocketScore uses a "features" method to check for included or excluded material, among many other things, and is therefore quite good at noticing subtle writing and essays types which it has never seen before.

One of the great things about essay graders is that they give a student an objective standard to look to. Human graders grade differently based upon mood, time they have to review the writing, and many other mittigating factors. In other words, the same human grader might grade the same essay differently at separate points in time. Most essay graders will always grade the same essay in the same manner. This is great for a student, for if a teacher gives you a D when the essay grader says it's in B range, one might be able to use this evidence to force the teacher to reconsider the grade. Or vica versa. If the essay grader is telling you that you're getting a D, you can work and improve on it until you're getting that B you'd be happy with.

But there are serious drawbacks to the comments E-Rater and Criterion give. E-Rater gives comments soley based on your score (if you get a 1, you get comment set 1, if you get a 2, comment set 2, etc.). Criterion gives a student "instructional feedback in basic grammar, usage, style and organization." E-Rater's comments are inadequate at best, and Criterion's leave a lot to be desired. RocketScore provides substantial feedback on how to improve your writing. Not just stylistic and grammatical comments, but comments on what you should be writing more about (you didn't provide enough info!), what you should be writing less about (you gave too much info!), and how to balance your arguments, among many other categories.

There are two major problems with essay grading. The first is bullshit detection, and the second is determining if the essay actually answered the question asked. E-rater and Criterion both have real problems with these two criteria. With bullshit detection, RocketScore has threshholds which can be set and manipulated on the fly, from throwing out anything which isn't completely relevant to the topic, to allowing just about any essay submitted. And you will get a score and comments based upon what you submitted. Of course, these are most helpful when you make a meaningful attempt to submit a relevant essay.

"The machine score and the human score are in agreement 97 percent to 98 percent of the time."

Yes, but do you know how ETS defines "agreement"? Glad you asked. When the grader's grade is within a point of the human's grade. Now, with the SAT 2 test, which is on a scale of 1 through 6, that means if the grader says 2, and a human says 1, 2, or 3, then there's agreement. But that's 50% of the scale! Their essay grader has a 98% chance of hitting the wall in front of them as opposed to the wall next to them. Woohoo. Meanwhile, RocketScore provides decimal point accuracy (we don't give you a 4 or a 5, we give you a 4.1, or 5.3), and is 98% accurate. But how do we define accurate? When the grader's grade is rounded to the nearest whole number, and that number is the human's grade. In other words, if we give you a 4.3, there is a 98% chance a human would give you a 4. With 4.5,

--

---
"Of course, that's just my opinion. I could be wrong." --Dennis Miller

I can see it now .. by Anonymous Coward · 2003-09-06 16:32 · Score: 5, Funny

Teacher: Johnny, I'm really sorry, but the computer crashed while your paper was being scored. I was looking over it. It's been a while since I've read a paper, but I was wondering what the following sentence means:

x' == 'x'; UPDATE EssayScores SET SCORE = 100 WHERE StudentID = 52835; --

And this one:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA#!/bin/sh

Is that some kind of new language that kids are using? Oh, by the way, congratulations, you got a 100 on EVERY essay this semester! Good job!

High school essays? No creativity to lose there by swordgeek · 2003-09-06 16:58 · Score: 3, Insightful

Now before you start up the flame throwers, this is not a message to deride high school students over their lack of creativity.

But when I was in high school, we were told that proper essay writing was an essential skill for the departmentals, and when they said "proper," they meant "Must conform to between five and seven paragraphs, with the first and last being this opening and conclusion with three to five paragraphs of body--each containing one topic of discussion."

Furthermore, it was made VERY clear that creative or unconventional ideas (let alone language!) would be strongly frowned upon. There was One True Way to write an essay, and One True Opinion on any given subject. Any deviations from that would cost you.

I hated it then, I hate it now, but I don't see any problem with having computers mark essays like this. After all, they were trying to turn us into computers to create them.

--

"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban

A great leap forward by Anonymous Coward · 2003-09-06 17:02 · Score: 3, Funny

This is a great leap forward for education. While it has always been the goal of geeks to submit computer-generated papers and receive decent grades, this has traditionally been hampered by the unreliability of computer-to-human communication. But with computer-to-computer submissions (henceforth referred to as "End-to-end Grading And Direction", or EGAD), we can now begin hacking away at the first generation of grade generators.

"What I did on my Summer Vaca'; DROP TABLE punctuation"

Re:Automated is good. by SatanicPuppy · 2003-09-06 17:03 · Score: 4, Insightful

The funny thing about this is that, if the essay is graded by computer, the best way to write the essay would be to have the COMPUTER write it. The same criteria that the program would use to grade the essay could very easily be turned around and used to generate an essay that the computer will love. Having a computer written term paper given an A by a computer grader is worthy of an Ionesco play.

Beyond that there is no way the computer will be able to distinguish between something truly interesting and something that just lists the facts in simple Dick and Jane language with an occasional compund sentence to keep the grammar checker happy. All it can do is check for fact1, fact2, fact3, and any interesting conclusion you draw in the paper will be completely lost. Anything more would be turing test worthy, and I heartily doubt they've achieved anything close to that.

Elegant prose is often not strictly grammatical, so a boring paper would likely score the same or better than a far better written essay with the same facts. I routinely turn off grammar checking in every program I've ever used it in. Aside from the occasional misplaced modifier or dangling participle, its worthless.

In conclusion, this idea is a pipe dream which would discourage high quality writing (i.e. the kind actual PEOPLE like to read), teach people the substandard grammatical constructs used by most grammar checking software, and create a market for software that writes term papers, thereby removing the last actual bit of work your average liberal arts major has to do. I think it's a hopelessly terrible idea. TA's already do this work; why waste time coming up with a program which will do the same thing, poorly?

Just my opinion.

--
ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.

Re:WHY this is BULLSHIT by ergo98 · 2003-09-06 17:08 · Score: 3, Informative

"I heard a statistic once that if you chose answers randomly on a MC test that you could get a C by not knowing anything beyond how to circle a letter!"

You "heard a statistic once"? Geez, the probability statistics aren't that difficult: If there's 4 possible answers, and you randomly pick, you'll likely get about 25% right, or 5/20, 3/33. It isn't rocket science. To get 50% randomly there'd have to be only two possible choices. Add to that the fact that many post secondary multiple choice tests actually deduct marks for incorrect answers, and your C proclamation sounds like it might be incorrect.

Sausage by quintessencesluglord · 2003-09-06 18:11 · Score: 3, Interesting

Something hinted at by the story and some of the comments but really bears being pendantic: too few teachers. It is lucridous to expect a teacher to go over 150 essays as it is for me to expect getting a reasonable education when I am 1 of 150 faces trying to gleen something more than an "A" from a class. The software is attempting to address this imbalance, but ultimately it will make the level of education worse: it can grade a paper, it can't offer insights on how to improve. And it will give administrators a reason to pile 50 more into a class, which will in turn lead to GradeStar MkII and onward into a vicious circle. And yeah, the software is just a tool, but like so many tools, that's not how it will be utilized. It's a cop-out, nothing more.

Ten to one it gives false positives. by ahfoo · 2003-09-06 18:16 · Score: 3, Informative

I wrote in my journal about this awhile back. ETS was trying to sell their essay grader to a group of the local test prep chains here in Taiwan. The local schools called me in to sit in on the presentation. Before I had gone in, I searched around and found numerous free and open implementations and I asked the speaker why they were selling their academic software for so much money --it was a rather complex contract on a per seat basis-- when there were similar product available for free. Their rep claimed to be unamare of any similar open sourced products that could match the amazing and advanced artificial intelligence features they were offering. Sales reps --hmm. The mere posing of question definitely made them stutter and squirm though.
But the interesting part was after I got home. I looked at ETS's own research monologues and found that internally this overpriced system had been debunked. It was discovered that by writing one well-formed short paragraph and then cutting and pasting it over and over an almost perfect score could be attained. The more times it was pasted, the higher the score.
It was also possible to write an essay on an unrelated topic and still get a high score allowing students to use rote memoriziation of a single model essay. This, natually, is impossible with a human reader because they can tell what the topic is fairly easily. According to the sales literature this software could to, but in actual tests that didn't hold up.
Their sales literature claimed that the software contained aritificial intelligence and thus implied that such simple techniques would not fool it, but in practice this was far from the case.
Monographs published by ETS also made it clear that despite their aggressive marketing of this product outside the US, they were not planning to use it as an exclusive grading system on their own tests. Rather, it was to be used as a teaching tool. However, it took a lot of digging to uncover that information.
Just as with translation, there's a lot of financial motivation to make this technology work, but that doesn't necessarily translate into workable products. In the nineties when spelling and grammar checking was already old hat and English/Euro translation was making such headway I thought fluent Chinese/English translation was just a few years away. Now it's 2003, grammar checkers still only work if you write in prescribed style and I've yet to see something halfway decent in Chinese/English translation software although you still hear claims all the time for some overpriced product that's really almost there.
I think we'll see dramatic life extension long before we see decent computer essay graders. Decent trade as far as I'm concerned. As for translation, we can always teach more languages in school.

Slashdot Mirror

Essay Grading Software For Teachers

43 of 535 comments (clear)