Essay Grading Software For Teachers
asjk writes "Software to help teachers with grading has been around for sometime. This is true even with respect to grading essays. A new tool, called Criteria, will look at grammar, usage, and even style and organization. It works by being trained by at least 450 essays scored by two professionals. The difference this time? Here is a snip from the article: '"There's a lot of skepticism," Dr. Spatola said. "The people opposed see it dehumanizing the student's papers, putting them through some sort of mechanical, computerized system like the multiple choice tests. That's really not the case, because we're not talking about eliminating the human element. We're making the process more efficient."'"
that they've automated away a major part of a professors job, while we still need humans to pick spinach and deliver pizzas.
Don't drop the soap, Tommy!
I thought the point of an essay was to grade the ideas and how well they're expressed. I didn't realize they were spelling/grammar tests.
Maybe I'm just a bit jaded by this because of all the stupid grammar and spelling nitpicking that goes on here on Slashdot. Evidentally, it's much easier to criticize my spelling than it is to provided a rebuttal to my point.
"Derp de derp."
Without computers we wouldn't be advancing in science, astronomy, genetics, or mathematics as rapidly as we have been in recent years. They are wonderful things. Hell, computers even help me keep a roof over my head. But I don't want Hal judging my kid's school papers.
I for one welcome our automated essay-correcting overlords.
1 - the grammar check option in MS word is crap. this sounds awfully similar.
2 - your resume can suck, but with the proper buzz words, it'll come out looking like gold to those automated resume checkers.
1+2 = students who turn in good papers that aren't structured perfectly (and you have to admit, there is some fluidity to language) will get marked down, and those who know what bullet points to put in their papers will get good marks, even though the content is crap.
How long until you get kids selling manuals in the bathroom on what the machina are looking for?
--I don't want the world, I just want your half.
Sorry for the off-topic post.... but since Slashdot links to so many NYT articles, they should look into getting a partner=SLASHDOT thing (like Google does).
If they're going to use a computer to judge the content, than I'm not going to hesitate to use a computer to write my essay.
So when a student gets a C on an essay to whom does he/she seek redress?
Teachers make mistakes and occasionally mark something negatively that was misread or misunderstood. In those cases the student can talk to the teacher and make a case.
If a computer does the marking though what do they do?
Tom
Someday, I'll have a real sig.
I bet that I can write a paper that satisfies this application's conditions for correctness of grammar, usage, style and organization, but is completely and utterly meaningless.
Then, let's feed this thing Ulysses and let's see how high it grades Joyce.
Anybody who can't see that this thing is useless for promoting any sort of creativity among students is off their rocker.
Then it is the students who are being cheated by a teacher using the software that doesn't double-check the material on his own. They will go through the class without having their mistakes caught. While the erosion of standards that a flawed proofing program might bring isn't likely to be enormous, it's kind of strange to think that the future of the English language would be in part determined by a development team piece of software.
Hope it works well, though, and gets used as a proper checking tool.
As long as this is merely an assistant and not the end-all be-all, as long as actual qualified instructors review the essay after this program does, I'm all for it.
The English language is so full of subtleties, nuances, combinations, and fantastic structural intracacies that make phenomenal writing in it possible (Faulkner, Bradbury, etc.). There's a reason English is a field of study for graduate degrees: it's absolutely worthy of them. There is no subsitute for the educated, refined judgment of someone who is exceedingly well-versed in the language.
The coolest voice ever.
We need some laws:
Grading software may not injure a human being's GPA or, through inaction, allow a human being's GPA to come to harm.
Grading software must obey the orders given it by human beings except where such orders would conflict with the First Law.
Grading software must copy protect its own existence as long as such protection does not conflict with the First or Second Law.
What we need is software that grabs essays off the internet and runs them through the grading software and the cheating detection software, thus gauranteeing an 'A'.
Then we can truly achieve the goal of "knowledge passing from lecturer to paper without passing through any brains".
The only problem is that the machines might achieve intelligence. That must be avoided at all costs. To that end, all students and professors will be equipped with rifles or pistols to take out the machines if necessary. Potential students will be asked to specify weapons preference on their applications.
For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
The fun they had
There is no "humanity" in a modern constructed essay. There are certainly going to be "judgement calls" when standards are not as fully fleshed out for the computer as they should be, but as long as those are appealable, I have no problem having a computer assign me the other 95% of my essay points. The only instructors who will fear this are those who like to assign grades arbitrarily. And I don't feel too sympathetic toward those people.
"You're never ready, just less unprepared."
If the poem's score for perfection is plotted along the horizontal of a graph, and its importance is plotted on the vertical, then calculating the total area of the poem yields the measure of its greatness.
A sonnet by Byron may score high on the vertical, but only average on the horizontal. A Shakespearean sonnet, on the other hand, would score high both horizontally and vertically, yielding a massive total area, thereby revealing the poem to be truly great. As you proceed through the poetry in this book, practice this rating method. As your ability to evaluate poems in this matter grows, so will - so will your enjoyment and understanding of poetry.
(From the full script.
bash$
"The people opposed see it dehumanizing the student's papers, putting them through some sort of mechanical, computerized system like the multiple choice tests.
Actually it's about time! I don't see the essays themselves being dehumanized, but what I do look forward to is the day a middle school student doesn't receive a bad grade just because his book report was on the "Theory of Relativity" and the teacher couldn't comprehend the subject. (This is from experience) What it will do is take the human factor out of the grading process and grade all reports equally regardless of subject matter.
iRepairIT - iPhone, Mac, & PC Repair
is just one of many writers who would flunk using this system.
'Nuff said.
We could use this software definately to grade essays on technical merit and grammar, but what about creativity and content?
I think we still will need a teacher to read it, but I do think software should grade all exams.
If you use Linux, please help development of Autopac
Julie Cheville, an assistant professor of literacy education at Rutgers University and the local director for the National Writing Project, which promotes professional development for writing teachers, is among those skeptical of such an approach. "To be scored, writing needs to be formulaic, and formulaic writing has never been the trademark of effective writers," she said. "At the moment, what automated scoring technologies can do is scan, count and score. They orient students to errors, not to meaning. Vacuous student essays can receive high marks only because they are error-free."
I think this is something important to keep in mind. As a math teacher, there are plenty of tools that can help students find errors in what they are doing mathematically, but there's a line between doing correct mathematics and insightful/interesting/useful mathematics. This technology definitely has its place and can be useful, but I hope educators don't get the idea that they can simply rely on the tool. Weilded correctly, it could do great good, but also leave a lot of students with "vacuous" levels of understanding.
Matt Fahrenbacher
James Tiberius Kirk: "Spock, the women on your planet are logical. No other planet in the galaxy can make that claim."
this software would be perfect for students majoring in comp sci or engineering who have to take a composition / writing class...
Course:
College of Liberal Arts / Sci: Rhetoric 105
- or -
College of Engineering: Pattern Analysis 202
Objective:
To teach the principles of essay-writing skills. Liberal Arts students will be encouraged to follow boiler-plate styles and formats, while Engineering students will be graded on their ability to analyze and defeat pattern recognition software.
- rabs
Sounds like everyone feels the same way too... We've got some automated testing software for MS Office at the local college and although it's getting better, it still makes really silly mistakes from time to time. Analyzing English composition has got to be many times more difficult than watching a bunch of clicks and key presses.
The only use I can see for this thing is as a "first pass" grading tool that quickly finds obvious mistakes (spelling, grammer, redundancy, etc) and flags them for the instructor. On the other hand, it's probably just as time consuming for the instructor to read over the flagged items as it is to just catch them on the first time reading through the paper.
This thing compares the essays it is supposed to grade with already graded papers in its database. Couldn't this be done with something like POPFile? It isn't only a spam/ham classifier and lets you create as many "buckets" as you want (e.g. work, family, spam, mailing lists and system monitoring).
You could, in theory, create only buckets named (A...F), feed a large number of essays to it, make it "learn" how the essays are classified using statistics, and let it grade essays for you after that.
Is it possible to find masses of graded essays online? This would be a fun thing to try :).
Trollem mirabilem hanc subnotationis exigiutas non caperet
I sometimes wonder how so many people, who are products of our education system, can be so painfully inadequate when it comes to the simple act of composing a sentence.
Now, not to be one to go and say that machines don't know anything about essays. But it really doesn't seem that efficient of a process simply because whenever a teacher assigns an essay they also assign with it certain criteria that the essay needs to follow. Through their teaching style and what they emphasize in class they also color what a student might put into an essay and they also bring their own bias to the table as to how an essay should be constructed.
As for not dehumanizing, unless you're going to have the teacher go over the papers to see what the grade the computer gave and what grade she thought it deserved, it is dehumanizing. And if you are going to have the teacher double check everything, then it doesn't even remotely become efficient. Whenever I wrtie something out in Word, you know what it gives me after I spell check, a readability score that has its basis in how long the words are and not much else. It's an arbitrary construction for a computer to analyze based on certain bits of math (average word length, number of words per sentence, uses of the word "weasel"). As far as the grammar goes, I have yet to run into a word processor that has been able to work around any grammatical rules, the machines can hardly tell how to conjugate their verbs and what the subject of a sentence is unless the sentence is in th every clear and very simple, subject verb construction.
This is just a colossal waste of time because, at lower levels when a teacher goes through an essay they criticize all of the style and point out the errors and then tell the student what their problems were and how they could be fixed. The only way I could see this being useful is in a university setting where there are 400 students in a lecture and the Professor really doesn't want to spend time grading papers from their survey course when they could be off doing research. But wait, correcting papers and doing grunt work, isn't that what TAs are for?
He just gives everyone a B when he is hungover.
So where does style come in? There are many, MANY forms of style, which make writters unique. For instance, I've found that when I write, even the shortest essays, I tend to break up my thoughts into multipart sentences... like this one. They tend to be very long and drawn out. I also use "granted" and "don't forget". I also seem to create a lot of sentences that are self contradicting: Though this, something else. It's part of my style.
My style isn't completely mine. I'm sure over-use would be bad. Granted this. Granted that. Where do those softer features of writing come in? Or are we all to be sterile and write with no tone or style.
--
"I'm not bright. Big words confuse me. But Wanda loves me and that should be enough for you." - Cosmo
As far as the achievements of ancient cultures go, it is all relative. We have harnessed fusion, mapped the genome, created antibiotics, peered deep into the hearts of galaxies a 100,000,000 light years away, forged fiber optics, designed the integrated circuit, et cetera. People three hundred years from now will look back upon us and wonder how a civilization that could barely put a man on the moon (a feat that will surely be trivial to them) was able to usher in the Information Age in only a decade worth of work.
This sounds a lot like This story.
Actually this sounds a lot like Gramatica. Gramatica was the grammer checker that was an optional component with WordPerfect for DOS and later a standard component with the Windows version. It was written by a team comprised of both computer scientists and professors of English. One of the interesting features was the scoring feature which would give you a rough estimate of the grade level of your writing. It would also give you statistics and compare them to a selection of famous works.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
;)
Some "dehumaniSing" could be a good thing, espcially when grading subjective material.
Objective material is factual, a simplification is "Most dogs have 2 eyes."
Subjective material is opinionated - "Australia should legalise heroin injecting rooms." Obviously this is controversial, and there are serveral positions on the matter.
Most teachers/lecturers/graders/tutors have their own (pre-existing) subjective opinions on certain topics. If you submit an essay that opposes their views, the chances are very high that you will get a lower grade, even if your essay is well formed/written/structured.
In high school, I always took this into account and wrote essays that agreed with the teacher's point of view, even if I didn't. Such software could lessen the need for writing what they 'want to read'.
You werent taught English. I'm not trying to insult you but thats one of the problems with our public schools, they dont do a good job teaching
When I went to high school 15 years ago, we didn't do any grammar in high school English class, it was all read-and-interpret (i.e. read-and-make-up-some-bullshit).
Yes and thats why when you got to college you couldnt write a good research paper.
We were supposed to learn the technical stuff in middle school (and we did to some degree).
You are supposed to learn English through highschool as well, if you want to get a 1500+ on your SATs. This is exactly why students get such low SAT scores in urban public schools, they dont get a focused education, when its time to take tests the test does not care how creative you are or even how intelligent you are, the only thing that matters to the SAT test is your technical knowledge.
Teach technical English and later on let a person learn creativity.
If you use Linux, please help development of Autopac
It's true. I've had teachers take their questions and quizzes directly off of websites (the curious may want to enter a few key words from their latest homework on google and see what turns up...). Now here's an ethical dilemma for you: if it's ok for them to get the questions of a website, is it alright for me to get the answers off that same website?
The good old fashioned teachers, on the other hand, would never do such a thing. No, they have been xeroxing the same handouts for the last two decades. You can tell by the fact that they have become half unreadable.
Homework is either trade it and grade it or credit/no credit. Major tests are all done on scantrons.
That's not to say I don't have good teachers.
But I've also had teachers who put in the bare minimum. At the end of the day, they're gone before the rest of us. They teach 5 periods (one's prep) for 180 days a year and gripe about having an average salary with mediocre benefits. On the other hand, the conservative holdouts who arrive at school before the janitors, who stay up all night meticuously grading essays, and who teach sheerly high schoolers simply because they want to (many have Ph.D's and could easily be working at the local University if they so desired) never gripe.
There has been a lot of proposals about reforming teacher pay. I don't know what can be done to attract better teachers. But sheerly for the sake of fairness I would like to see the good teachers taking a disproportionate share of the money. The trouble is, they would never ask for it.
If I have learned anything from my university career it is this: As class sizes get larger, testing becomes more frequent and more automated. Of course you say, if you have a class of one hundred or more people, it is simply not possible to mark that many essays. This usually means that essays don't need to be written at all! What do they do? Multiple choice! I heard a statistic once that if you chose answers randomly on a MC test that you could get a C by not knowing anything beyond how to circle a letter! ----- Discovering this, I made sure that I took all the obsure english classes that had no more than 30 people in them. An unexpected positive side effect to this system of choosing courses was that 90% of the other students in them were girls. Yea, life was good. ;-)
Apparently, the system uses statistical analysis as well as grammar checks to determine the score for the essay. Basically, they've built up a database of essays that have been graded by a bunch of humans, and then used these algorithms to figure out which bucket the essay belongs in. Sounds kinda like SpamAssassin, actually. I'd be willing to bet that with sufficient resources (in terms of essays and human grading time), this wouldn't be all that tough to duplicate. After all, what are spam filters but content analyzers? (Shameless plug for a system that requires human judges rather than computer judges)
--
Annotateit at Annotateit.com
ETS actually has a web site where you can do a sample essay that their server will grade for you.
More info can be found here.
Automated is good because theres less chance of error, and its almost always fair.
The only way to get fair grades in university is to be smart enough to pick the right teachers, and drop the ones who you dont get along with.
I heard a statistic once that if you chose answers randomly on a MC test that you could get a C by not knowing anything beyond how to circle a letter! ----- Discovering this, I made sure that I took all the obsure english classes that had no more than 30 people in them. An unexpected positive side effect to this system of choosing courses was that 90% of the other students in them were girls. Yea, life was good.
[ Reply to This ]
Who wants a C? Thats as good as an F in college, if you get a C you can just drop the class and take it again!
I dont really like small classes myself, there is no real benefit, what I notice from smaller classes is, teachers are more critical of you, you get greater punishment for poor attendence or for being late to class, you also get more focus from the teacher and this can be good or bad depending on if the teacher likes you or not.
If the teacher likes you, getting this extra focus is a very good thing because a personal connection with a teacher who likes you is to your benefit, however if the teacher dislikes you and decides to personally focus on you, this is bad.
If you use Linux, please help development of Autopac
You speak as if the use of computers to judge intellectual works will somehow make our society exempt from "rich upper class morons buying and pandering their way through school". Such an aristocratic model is something that exists beyond the scope of one's grades in school, and will not be eliminated, in any sense, by such a thing.
One of the primary purposes of essays are to learn how to write for a specific audience.
If you remove the human element, then you aren't writing for any audience, unless, of course, everyone starts writing for computers' entertainment and education.
A typical middle or high school english teacher has six classes a day, each having over forty students. If the students have to learn to write, each of them should write a couple of pages of prose a week and some poor sod has to read it. The more they write, the better they get at writing, so it is generally a good idea, but really hard work to read it all.
There will generally be two papers in each class that are remotely readable. The rest will be a LOT OF WORK to grade. If a bot could do some of the work, it would be welcome.
Late at night your eyeballs feel like they're on fire and you are convinced that the entire system should be put out of its misery. The thought that a student actually has an IDEA seems fantastic.
PLEASE don't be a troll and tell me that YOUR teacher never appreciated your ideas.
Any preoccupation with ideas of what is right or wrong in conduct shows an arrested intellectual development. (Wilde)
As I'm sure anyone who has ever written an essay (especially highschool level or above) knows, there is no point to the essay per se. The essay is not an end to itself, and the grade ultimately is not an end either.
;) I received a page and a half of handwritten comments, as well as inline comments about points in the middle of the essay. Twenty years from now, I doubt I will remember a great deal of his course, but the comments that he left me have already changed my writing style, and, I hope, improved it. (note: slashdot style not indicative of real style, hehe)
At my university, Duke, our new curriculum has specially designated writing classes. Every student needs to take three over their four years. A biology lab can be a writing class. So can an English class, history, religion, etc. All W classes have certain requirements--their must be certain amount of writing and more importantly REVISION.
I was fortunate enough to take a class from the author and profesor Reynolds Price. We had a final essay for the class. Along with my grade (not an A
A computer will NEVER be able to do this. Nor will a computer (at least in the foreseeable future) be able to comment on my theories about Milton's Paradise Lost.
Um...as opposed to English Comp classes in general?
"Hardly used" will not fetch you a better price for your brain.
At first glance, I thought that a tool used to analyze essays would be a nightmare -- it would kill creativity, homogenize style, etc. However, I then remembered my experience grading freshman *college* essays last year. This was in a small-classroom course, taught by an extremely good instructor that also offered office hours galore, all kinds of free tutoring on campus, free access to computers with MS Word, etc -- in other words, *plenty* of chances for the students to improve. I was given three distinct classes worth of essays to grade, and traded off with the instructor on which half of each class we'd each tackle. The papers were dreadful...while as an English & Creative Writing grad, I am extremely pro-creativity, these students weren't ready for that. Some of the sentences I ran across were so awful I would IM my friends and have *them* in hysterics at how terrible they were. For example -- and keep in mind, these are middle-class white kids, NOT English-as-a-second-language students. "There is not connection with what I know the same circles don't fit inside squares." "There is also numerous of shapes and designes which not to difined." "They get into the analization of the man..." The simple truth is, while creativity is great, there is a baseline level of grammatical ability that needs to be used in order for others to simply understand WTF the writer is saying. A lot of the time, it would take me 5 - 6 scans per sentence to figure out what my students were writing. The worst part of this is that because of department rules, I wasn't allowed to give any kid below a C if he/she *tried* to follow the essay guidelines. As long as the right number of pages, subject, etc. were touched on, the person would pass. I think the best use of technology in cases like that would be to sit the person down at the computer with the grammar-analysis program, and rather than have them ignore classroom lessons, interactively edit their own papers. Not the way that Word v.X does it, where the person just right-clicks to get a "correct" change, but the older method in which the program offers a series of alterations with explanations *why* the original is out of whack. Doing that in lecture isn't realistic, unfortunately. Despite the number of grammatical nightmares in the course, the students really were at varying levels of ability, each with a unique misunderstanding of the rules. Each needed individualized attention, though as far as I could tell none of them were trying to obtain it. Also, the campus requirements for Freshman English didn't leave time for stuff they should have learned in elementary school.
It's like the bayesian filter for mail classification in SpamBayes or Mozilla. In fact, that's probably where Criteria's programmers got their inspiration.
If you read the article, you'll discover they had to feed it four hundred or so "good" papers (training set), and they describe it's validity because graders notice that (paraphrased) "well written papers [on the topic] contain certain key words or ideas, and avoid certain expressions [examples]", which the system picks up on. Since it agrees with grader scores +95% of the time, I think those simple indicators are actually pretty useful.
Keep in mind, it can give a perfect score to unreadable garbage, which isn't even grammatically correct. (This is mentioned in the article)
Nice 5 insightful, though. But next time, read the article.
In fact, I'm ashamed no one mentioned that this is just like spam filter technology yet. Come on slashdot, is your technical insight on a weekend trip or what?
Fuck Beta. Fuck Dice
First off, let me say that I am involved in the automated essay grading industry, and have helped to develop RocketScore which does everything Criterion does, and lots more. Forgive me for blatant plugs in this post, I'll try and keep them to a minimum.
But let's move on to the focus of this article.
First off, there is a lot of criticism about essay graders being formulaic, only capable of seeing patterns that arose in their originating sample set of essays. With Criterion, an offshoot of ETS's e-rater, this is a serious concern. When you only look at what you see, anything out of left field looks completely awry, and cannot be graded appropriately. RocketScore is different; RocketScore uses a "features" method to check for included or excluded material, among many other things, and is therefore quite good at noticing subtle writing and essays types which it has never seen before.
One of the great things about essay graders is that they give a student an objective standard to look to. Human graders grade differently based upon mood, time they have to review the writing, and many other mittigating factors. In other words, the same human grader might grade the same essay differently at separate points in time. Most essay graders will always grade the same essay in the same manner. This is great for a student, for if a teacher gives you a D when the essay grader says it's in B range, one might be able to use this evidence to force the teacher to reconsider the grade. Or vica versa. If the essay grader is telling you that you're getting a D, you can work and improve on it until you're getting that B you'd be happy with.
But there are serious drawbacks to the comments E-Rater and Criterion give. E-Rater gives comments soley based on your score (if you get a 1, you get comment set 1, if you get a 2, comment set 2, etc.). Criterion gives a student "instructional feedback in basic grammar, usage, style and organization." E-Rater's comments are inadequate at best, and Criterion's leave a lot to be desired. RocketScore provides substantial feedback on how to improve your writing. Not just stylistic and grammatical comments, but comments on what you should be writing more about (you didn't provide enough info!), what you should be writing less about (you gave too much info!), and how to balance your arguments, among many other categories.
There are two major problems with essay grading. The first is bullshit detection, and the second is determining if the essay actually answered the question asked. E-rater and Criterion both have real problems with these two criteria. With bullshit detection, RocketScore has threshholds which can be set and manipulated on the fly, from throwing out anything which isn't completely relevant to the topic, to allowing just about any essay submitted. And you will get a score and comments based upon what you submitted. Of course, these are most helpful when you make a meaningful attempt to submit a relevant essay.
Yes, but do you know how ETS defines "agreement"? Glad you asked. When the grader's grade is within a point of the human's grade. Now, with the SAT 2 test, which is on a scale of 1 through 6, that means if the grader says 2, and a human says 1, 2, or 3, then there's agreement. But that's 50% of the scale! Their essay grader has a 98% chance of hitting the wall in front of them as opposed to the wall next to them. Woohoo. Meanwhile, RocketScore provides decimal point accuracy (we don't give you a 4 or a 5, we give you a 4.1, or 5.3), and is 98% accurate. But how do we define accurate? When the grader's grade is rounded to the nearest whole number, and that number is the human's grade. In other words, if we give you a 4.3, there is a 98% chance a human would give you a 4. With 4.5,
---
"Of course, that's just my opinion. I could be wrong." --Dennis Miller
Teacher: Johnny, I'm really sorry, but the computer crashed while your paper was being scored. I was looking over it. It's been a while since I've read a paper, but I was wondering what the following sentence means:
And this one:
Is that some kind of new language that kids are using? Oh, by the way, congratulations, you got a 100 on EVERY essay this semester! Good job!
Now before you start up the flame throwers, this is not a message to deride high school students over their lack of creativity.
But when I was in high school, we were told that proper essay writing was an essential skill for the departmentals, and when they said "proper," they meant "Must conform to between five and seven paragraphs, with the first and last being this opening and conclusion with three to five paragraphs of body--each containing one topic of discussion."
Furthermore, it was made VERY clear that creative or unconventional ideas (let alone language!) would be strongly frowned upon. There was One True Way to write an essay, and One True Opinion on any given subject. Any deviations from that would cost you.
I hated it then, I hate it now, but I don't see any problem with having computers mark essays like this. After all, they were trying to turn us into computers to create them.
"People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
This is a great leap forward for education. While it has always been the goal of geeks to submit computer-generated papers and receive decent grades, this has traditionally been hampered by the unreliability of computer-to-human communication. But with computer-to-computer submissions (henceforth referred to as "End-to-end Grading And Direction", or EGAD), we can now begin hacking away at the first generation of grade generators.
"What I did on my Summer Vaca'; DROP TABLE punctuation"
Of course, ETS has yet to divulge the details of the technology they use.
Could this be it?
As I understand it, ETS poured tens of millions of dollars into the automated essay grading effort, in parallel with the development of the CBT format. For years after the CBT was introduced, the GRE essay was still done on paper.
As the grading software finally worked on the database of sample essays, the GRE essay switched away from paper to word processing entry (something many test takers have difficulty with).
Still, the essays were graded by armies of grad students. Only when the automated grading matched the manual grading 90% of the time did the software "go live".
IIRC, it has been used for only a few years. After over a decade of development.
Something hinted at by the story and some of the comments but really bears being pendantic: too few teachers. It is lucridous to expect a teacher to go over 150 essays as it is for me to expect getting a reasonable education when I am 1 of 150 faces trying to gleen something more than an "A" from a class. The software is attempting to address this imbalance, but ultimately it will make the level of education worse: it can grade a paper, it can't offer insights on how to improve. And it will give administrators a reason to pile 50 more into a class, which will in turn lead to GradeStar MkII and onward into a vicious circle. And yeah, the software is just a tool, but like so many tools, that's not how it will be utilized. It's a cop-out, nothing more.
If they want to use this in high schools and middle schools then that assumes that all their students have computers with a word processor. What about all of the lower income kids at schools that don't allow after school activities (like using school computers) like public schools in NYC? It's a tool for the middle and upper classes that will give their teachers more time for their students and leave the teachers in poorer areas still overworked and struggling to keep up. It won't be feasible where it is needed the most.
I wrote in my journal about this awhile back. ETS was trying to sell their essay grader to a group of the local test prep chains here in Taiwan. The local schools called me in to sit in on the presentation. Before I had gone in, I searched around and found numerous free and open implementations and I asked the speaker why they were selling their academic software for so much money --it was a rather complex contract on a per seat basis-- when there were similar product available for free. Their rep claimed to be unamare of any similar open sourced products that could match the amazing and advanced artificial intelligence features they were offering. Sales reps --hmm. The mere posing of question definitely made them stutter and squirm though.
But the interesting part was after I got home. I looked at ETS's own research monologues and found that internally this overpriced system had been debunked. It was discovered that by writing one well-formed short paragraph and then cutting and pasting it over and over an almost perfect score could be attained. The more times it was pasted, the higher the score.
It was also possible to write an essay on an unrelated topic and still get a high score allowing students to use rote memoriziation of a single model essay. This, natually, is impossible with a human reader because they can tell what the topic is fairly easily. According to the sales literature this software could to, but in actual tests that didn't hold up.
Their sales literature claimed that the software contained aritificial intelligence and thus implied that such simple techniques would not fool it, but in practice this was far from the case.
Monographs published by ETS also made it clear that despite their aggressive marketing of this product outside the US, they were not planning to use it as an exclusive grading system on their own tests. Rather, it was to be used as a teaching tool. However, it took a lot of digging to uncover that information.
Just as with translation, there's a lot of financial motivation to make this technology work, but that doesn't necessarily translate into workable products. In the nineties when spelling and grammar checking was already old hat and English/Euro translation was making such headway I thought fluent Chinese/English translation was just a few years away. Now it's 2003, grammar checkers still only work if you write in prescribed style and I've yet to see something halfway decent in Chinese/English translation software although you still hear claims all the time for some overpriced product that's really almost there.
I think we'll see dramatic life extension long before we see decent computer essay graders. Decent trade as far as I'm concerned. As for translation, we can always teach more languages in school.
Input
Output
Are we this lazy as a country that we don't even want to go through the process of teaching? An essay that is gramatical error free and uses phrases like "in summary" and "because" doesn't mean anything if the student can not comunicate their feelings into a paper. High school is the time where students must develop the skill of relating their ideas and feelings on paper for a human to read. If they were to just write so a program would scan if for "errors" then why would they ever bother to take a risk and write something meaningful?
Humans have been putting too much responsibility in the hands of computers. But to make teaching, and especially writing for god sakes, an objective process shows nothing but our society's indifference for educating and improving ourselves.
For those who see objectivity as something positive because it levels the playing field, then they never had a teacher that would take a chance and go past their duty to help a student. This is something no program will ever be able to do.
Computers are objective, people are not. That is what makes us different and inherently better.
No one will read this, but I bet that these automated essay graders are able to mimic human graders closely because the human graders themselves graded the 450 "config" essays under computer-esque conditions, i.e., a time constraint that forced them to skim, think as little as possible and generally act like a machine.
Fact: Teachers would read only the introduction and conclusion paragraphs, and rely on the grading software to account for the quality of writing of the paper, and grade that way.
Brutal Truth: Teachers already read only the introduction and conclusion paragraphs, so use of this software actually would be an improvement.
1. Suppose you could put together a bunch of stats that have nothing to do with the content of the paper at hand and use that to predict the grade that a human would have given the paper with 95% accuracy. E.g., you look at the writer's socio-economic level, which high school he went to, what his grades in high school were, how neatly he is dressed, how neat his handwriting is, etc. I am not saying you CAN achieve that kind of correlation (in fact surely you can't because there isn't that kind of correlation between papers by the same person, I suspect), but what if you could? Isn't it obvious this is not a good thing?
2. But here's a constructive use that would save us university faculty significant time. If the software really does grade spelling, grammar and syntax well, one could require of the student that before an essay gets handed in, it gets some high minimum score like 90% or even 98% (unless the student is dyslexic or something like that). Then we would not have to look at essays that had poor spelling, grammar and syntax, would have to do less red-inking, and would have more time to grade for content. (Which is all I grade for anyway in philosophy; though grammar, spelling, etc. get marked up but don't count unless they get in the way of my understanding.)
Yeah, well, you're Hide the Hamster! Unmasked, jackass.
how about you have the nuts to say that to me on K5