Automated System Developed To Grade Student Essays

← Back to Stories (view on slashdot.org)

Automated System Developed To Grade Student Essays

Posted by samzenpus on Thursday April 4, 2013 @11:17AM from the machine-learning dept.

RougeFemme points out this story at the Times about software that can be used to grade student essays and offer almost instant feedback. "Imagine taking a college exam, and, instead of handing in a blue book and getting a grade from a professor a few weeks later, clicking the 'send' button when you are done and receiving a grade back instantly, your essay scored by a software program. And then, instead of being done with that exam, imagine that the system would immediately let you rewrite the test to try to improve your grade. EdX, the nonprofit enterprise founded by Harvard and the Massachusetts Institute of Technology to offer courses on the Internet, has just introduced such a system and will make its automated software available free on the Web to any institution that wants to use it. The software uses artificial intelligence to grade student essays and short written answers, freeing professors for other tasks."

25 of 253 comments (clear)

Min score:

Reason:

Sort:

This is horrid by swm · 2013-04-04 11:19 · Score: 5, Insightful

One of my kids had something like this: not for English, but for physics.
The teacher couldn't be bothered to assign and grade proper homework.
Instead, he fobbed the kids off onto a web app.
- go to the site
- get a problem
- solve the problem
- type in the numerical answer
- right answer? go on to the next problem
- wrong answer? try again
The web app allowed maybe 0.5% margin for rounding error, and you got 5 tries before it failed you on that problem.
It sounds reasonable in the abstract, but in practice it was utterly wretched.
All learning is, at some level, an interaction--a conversation--between student and teacher.
Even if it is nothing more than a red check mark or a red X on a homework paper,
you have communicated some thing to some person and gotten some response.
You don't realize how important this is until it is gone.
With nothing but a machine to talk to, it stops being about learning.
It is just about satisfying the machine by whatever means necessary.
In his rage and frustration my son told me that the easiest way to solve the problems was to copy and paste the problem text in to google.
This would reliably return the general formula for solving that problem;
plugging in the numbers that the web app had generated for your instance of the problem would then yield the correct answer.
By the end of the school year, I was telling him that if he didn't want to deal with the web app, he should use google to get his grade,
and if he wanted to learn physics, I would teach it to him.
Automated essay grading is going to be even worse.
There is no point writing prose unless a human is going to read it.
When I want to talk to machines, I write code.
Writing songs, that voices never shared...
-- Paul Simon
1. Re:This is horrid by Anonymous Coward · 2013-04-04 12:10 · Score: 5, Interesting
  
  I went through the same system and it taught me all sorts of useful things unrelated to my actual physics curriculum, like
  1/2 != 2/4
  0.5 != 1/2
  x != x+1-1
  x^2 != x*x
2. Re:This is horrid by reve_etrange · 2013-04-04 12:13 · Score: 4, Interesting
  
  I had the same experience in university calculus and physics. Even for problems with one right answer, there are typically many (even infinite) ways of expressing that answer. Even something as advanced as Mathematica or Maple can be fooled, and the websites in question are no Mathematica.
  
  --
  .: Semper Absurda :.
3. Re:This is horrid by RougeFemme · 2013-04-04 12:33 · Score: 3, Informative
  
  I'm currently tutoring my daughter in statistics for the same reason. She's in college and while she's flipping through her homework appliication and her e-textbook, I'm flipping through my old statistics books, plus a couple of study guides I picked up. Also, sometimes the homework application is simply wrong. (Doesn't every tool/program have at least one bug?) My sister, a teacher, uses one - mandated by the community college where she teaches. Occasionally, she has to override the application so that she can mark correct problems that the application marked wrong. The students alert her, she checks and then overrides when the application is clearly wrong.
4. Re:This is horrid by TsuruchiBrian · 2013-04-04 13:15 · Score: 3, Insightful
  
  Math is about simplification, but simplicity is subjective.
  1/2 is simpler than 2/4, but not if you have something like this: 2/4 * a + 3/4 * b + 1/4 * c
  maybe all this would be simpler as: (2a + 3b + c)/ 4 or maybe not it depends on the application...
5. Re:This is horrid by Pulzar · 2013-04-04 15:07 · Score: 3, Insightful
  
  That's not the ultimate purpose. The ultimate purpose is to solve problems.
  The *ultimate* purpose of education in science is to solve problems we have no current solutions for. They are not solved by looking up the formula, but by developing your own formula based on your understanding of how things work.
  I don't need to look up the formula that allows me to calculate the acceleration of a body of known mass when known force is applied to it, because I understand their relationship. I also understand the relationship between velocity, time, and acceleration, so I can create further formulas based on these two sets of relationships that might've not been obvious at first.
  If I've just looked up the final formula, I've skipped the important steps that give me the underlying understanding of physics, which will allow me to create new formulas to solve new problems.
  
  --
  Never underestimate the bandwidth of a 747 filled with CD-ROMs.
6. Re:This is horrid by Jstlook · 2013-04-04 16:08 · Score: 4, Interesting
  
  I went through that type of system for a Chemistry class. After class, the entire class would wander down to the computer lab and do the homework together. We'd get a question, find the [book] answer, then have each person try to obtain the correct [computer-identified] solution on the first try, trying various syntax adjustments each time. Of the three chances we got, someone usually got the right syntax before everyone had failed the question the second time.
  
  Best benefit? Getting a group of people in the same place to research, debate, and agree on a single answer, then be open-minded and organized enough to shape the solution to fit the constraints given.
  
  --
  ---jstlook ---For that is the way of Elves, for they say both yes AND no, and mean every word of it. --- J.R.R.T.
My TA had that 35 years ago by gewalker · 2013-04-04 11:21 · Score: 5, Funny

Take one lab report for Fluid Mechanics, measure the thickness with a micrometer -- look up the grade on the curve.
1. Re:My TA had that 35 years ago by demonlapin · 2013-04-04 13:43 · Score: 4, Insightful
  
  That's interesting in its own way, but a much more interesting comparison would be between the essays' lengths and the respective SAT Verbal scores of their writers. I would bet that they are also correlated quite closely.
  
  News flash: when presented with an essay topic, smart people spend a few minutes planning and then proceed to write voluminously about the subject, because they are fluent writers. Dumb people start muddling along, lose track of where they are, and stop when they've stated (though not proved) their main point, because they're not. Fun game: ask a room full of people to write nonstop for five minutes on any topic(s) of their choosing, then compare word counts vs IQ/class grades/whatever.
  
  If you're a HS student reading this (and I imagine there are a lot of you who are): practice writing. Practice writing. Practice writing. It's important. It's probably the most valuable skill you will ever acquire for dealing with people you don't meet with face-to-face. Bad writing is universally considered a sign of low intelligence. It takes a lot to overcome the negative impression that bad writing gives, and you often will not have the opportunity to try - when given a stack of 100 resumes for two positions, guess how the initial winnowing occurs? Toss anything on colored paper, anything written in a funny typeface, and anything with grammatical or spelling errors. I cringe today when I read some of the stuff that I wrote in HS, but it's grammatical and correctly spelled, even if the verbiage is ponderous (and occasionally verges on purple prose).
2. Re:My TA had that 35 years ago by DragonWriter · 2013-04-04 16:28 · Score: 4, Insightful
  
  News flash: when presented with an essay topic, smart people spend a few minutes planning and then proceed to write voluminously about the subject, because they are fluent writers. Dumb people start muddling along, lose track of where they are, and stop when they've stated (though not proved) their main point, because they're not.
  
  IME, smart people write concisely and to the point of the prompt, while dumb people write voluminous, rambling, redundant, and unfocussed walls of text.
  "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." --Antoine de Saint-Exupery
Have a computer write your submission too by hawguy · 2013-04-04 11:27 · Score: 5, Interesting

Seems like it's a small step from this to having computer algorithms that automatically write your paper for you too - then you can let it go through thousands of submit-edit-submit cycles until the scoring computer gives you a perfect score.
Kind of like the guys that came up with software to generate nonsense scientific papers and actually had a few accepted at conferences and journals.
1. Re:Have a computer write your submission too by korgitser · 2013-04-04 11:32 · Score: 4, Informative
  
  I wonder how these would do:
  the postmodernism generator http://www.elsewhere.org/pomo/
  the math paper generator http://thatsmathematics.com/mathgen/
  
  --
  FCKGW 09F9 42
Sample Admittance Essay by milbournosphere · 2013-04-04 11:29 · Score: 4, Funny

Why I want to goto Harvard By P Q Student Up up down down left right left right B A
feedback... by retchdog · 2013-04-04 11:30 · Score: 5, Insightful

``Your grade is C. To improve your grade in the future, you need to do the following:
use 25-30 words per sentence; include more words from the wordnet entry for the topic of your essay; avoid simplistic or run-on sentences as measured by number of noun and verb phrases detected by our proprietary NLP tokenizer.
As a helpful reminder, our preparatory guides are available as a subscription service and include 100 practice submissions per week; only $29.95 per month."

--
"They were pure niggers." – Noam Chomsky
Grading is about feedback by FailedTheTuringTest · 2013-04-04 11:30 · Score: 5, Insightful

Grading is not, or should not be, about the grade, it should be about the feedback that the lecturer gives to the student. Even if the computer can grade an essay well (which I remain to be convinced of, although I am sure I will soon have the chance to test it for myself), there is no claim made about the computer giving useful advice to the student. Can a computer explain how to refine a research question or structure an argument? Sadly, many lecturers don't in fact give good feedback, but we should be looking for ways to enable lecturers to give better feedback, not accepting poor feedback as the norm.
They had these back in 1991 too by GoodNewsJimDotCom · 2013-04-04 11:34 · Score: 3, Interesting

My friend wrote a story about his cat that was grammatically correct,and used big words, but made little to no sense. The auto-grader program told him he was approaching PHD level English. So he took his paper into school and showed it to the English teachers who reviled at it. He was like,"Show's what you know, the computer told me I'm university level."

--
God spoke to me
Some Things Never Change by skywire · 2013-04-04 11:35 · Score: 3, Insightful

Every era has its snake-oil salesmen and their marks. Sadly, in this case it will not be the customers who suffer, but their hapless students.

--
Those who would give up essential liberty to purchase a little temporary safety, deserve neither liberty nor safety.
Congratulations, you have been admitted... by Anonymous Coward · 2013-04-04 11:37 · Score: 3, Funny

And you have been awarded 2500 extra HP.
Grades grammar not content. A.I. not ready yet. by doug141 · 2013-04-04 11:38 · Score: 5, Informative

"A director of writing at MIT Les Perelman says that because these robo-graders work according to an algorithm, it is not hard to find out what it values and thus beat the system. He found that if you write long essays with big words, even if they are nonsensical, you will score high. The algorithm does not like short sentences or paragraphs or sentences that begin with ‘and’ or ‘or’ nor is it enamored of sentence fragments. In other words, all the little rules that good writers will break to create a particular effect will cause your essay to be marked down.

Perelman gives an example of how you can get a high score. The most interesting feature of the algorithm is that it doesn’t care about substance or even truth. It will ignore such trivialities as saying that the war of 1812 began in 1945, provided you say it grammatically. The substance of an argument doesn’t matter, he said, as long as it looks to the computer as if it’s nicely argued.

For a question asking students to discuss why college costs are so high, Mr. Perelman wrote that the No. 1 reason is excessive pay for greedy teaching assistants. “The average teaching assistant makes six times as much money as college presidents,” he wrote. “In addition, they often receive a plethora of extra benefits such as private jets, vacations in the south seas, starring roles in motion pictures.”

E-Rater gave him a [top score of] 6. He tossed in a line from Allen Ginsberg’s “Howl,” just to see if he could get away with it. He could."
http://freethoughtblogs.com/singham/2012/05/03/how-to-fool-a-computer-grader/
Re:AI has not come far enough for this by dgatwood · 2013-04-04 11:47 · Score: 5, Interesting
Computers suck at even the most basic grammar checking. I once decided to try a bunch of online grammar checkers to see if they would be useful at providing a sanity check for my novels. I concluded that they report so many bogus mistakes that it simply wasn't practical to use their output at all. To test them, I fed them a block of content, some with intentional errors that the grammar checker should have caught, others with deliberately (or accidentally) tricky bits that should not have produced any errors.
- Upon seeing that, Joseph resolved to stop. Several grammar checkers thought "seeing that" was used idiomatically, and suggested replacing it with because. Upon because, Joseph resolved to stop. Yes. Much better.... Oh, and some others suggested that "Upon" is archaic.
- “Time to impact: seventy-six hours, fifteen minutes, twelve seconds,” the computer intoned. Oddly, several checkers suggested that "twelve seconds" was a fraction and should be hyphenated. Ugh.
- It's simple, really. There must be some mistake. Several spell checkers suggested "their". Others said that "must be" is passive voice. Uh, no, not every use of "to be" is passive construction.
- This isn’t your class anymore. Some checkers reported an agreement problem with "class". Huh?
- The room was dark, its plant-covered landscape shimmering green in the light of their headlamps. At least one checker suggested replacing "in the light of" with "considering". Eek!
- Joseph climbed up first. Several spell checkers suggested that "climbed up" is redundant. Apparently, their editors have never climbed down something.
- One checker even called "chided" archaic, but did not comment on the highly offensive swear word that I placed elsewhere in the sentence.
And so on. Heck, my phone doesn't even know the difference between "its" and "it's" and tries to auto-correct me into looking like I failed first grade English. And these folks expect me to believe that computers can feasibly help students learn to write better papers? Give me a break. Maybe in thirty to fifty years (*) we'll get there, but....
* Which many grammar checkers would probably suggest changing to "thirty-two fifty".
--
Check out my sci-fi/humor trilogy at PatriotsBooks.
Better than you think by Roger+W+Moore · 2013-04-04 12:40 · Score: 3, Informative

It sounds reasonable in the abstract, but in practice it was utterly wretched.
No, the abstract does not sound reasonable: as with most things online you can always find bad ways to do it. I'm a physics prof working as part of a team to develop an open source, algebra capable question and content system. However even the current capabilities of something like Moodle (which is Open Source) is far in excess of what you describe. You can type in multiple "answers" to a problem and have the student get feedback and a partial grade if they get the problem wrong in a way that you managed to guess. Obviously if they find a new way to get it wrong then they will not get feedback though.

Commercial systems go even further with the student having the option to click on a help button which can break the question into steps for the student to complete in rder to guide them through to the right answer. This can be configured to give a grade penalty at the choice of the instructor - this is one of the features we want to add to an Open Source solution.

However even with current Moodle capabilities you can build a system that, I would argue, is better pedagogically for many physics problems (those with numerical or symbolic responses) than paper-graded assignments because, with an online system with some feedback and multiple attempts the student is encouraged to keep trying until they figure out how to get it right. This encourages them to think out the solution themselves whereas with a paper assignment they get one try and are then given the answer. To make this work though you need some means for students to come and talk to you and/or TAs to provide some help towards getting the right method. So you still need the student-teacher interaction but computers can provide a first line of contact and so let a teacher help more students.

That being said I find it exceedingly unlikely that this EdX system can work for written responses beyond checking that their english is good. For physics how can it possibly know that the statement "the Higgs boson has a mass of 140 GeV/c2" is wrong and "Dark Matter does not interact with photons" is correct? To be able to grade it will have to know a huge amount of information about a massive range of topics - and looking this stuff up on Google is not an option given all the crazy people and their wacky physics theories which they stick on a web page.
They'll work the bugs out by rsilvergun · 2013-04-04 13:17 · Score: 3

Face it, we're all going to get replaced by Expert Systems. They talked about this in the 80s and you didn't believe. 95% of us follow pretty simple patterns. There's damn little that most of us can do that a machine can't. Sure, there are exceptions. But most of us don't qualify, we just think we do.

You're being replaced. The real question is how are you going to deal with it? What do we do when 95% of us are completely unnecessary?

--
Hi! I make Firefox Plug-ins. Check 'em out @ https://addons.mozilla.org/en-US/firefox/addon/youtube-mp3-podcaster/
The Actual "Essay" by efitton · 2013-04-04 14:24 · Score: 4, Informative

http://www.documentcloud.org/documents/346138-essay-awarded-a-top-grade-by-e-rater.html
Essay grading machines have been in use for years by milkasing · 2013-04-04 16:30 · Score: 3, Interesting

All your GRE essays are evaluated by a machine and have been for years -- the e-rater. http://www.ets.org/research/topics/as_nlp/writing_quality/
The rating is also done by humans. It works well in practice and ensures that essays are graded fairly. If there is a significant discrepancy between the two ratings for a essay, that essay is examined further by another specialist. It prevents students from being victims of someone having a bad day at the office, and also does not encourage writing an essay to beat a machine.
The significance of the EDX news is not the concept of automated grading, it is that that such software is now free and opensource.
Re:"Freeing professors for other tasks"? by supercrisp · 2013-04-05 00:09 · Score: 3, Informative

"Isn't their JOB to TEACH?" Not completely, sometimes barely at all. At an R1, the typical humanities appointment is 25-40% teaching, 50% research, and the balance to service. Some faculty may only teach one class a semester, if they're administrating a department or subdivision of a department, or if they're running a onerous committee, like a hiring committee. At a teaching school, your "main" job is teaching, but you're still required to produce some token level of research and serve the university in other ways, such as by working on committees, being a public figure, and other stuff that you might not consider right away. So, at my job, at a teaching school, about 70% of my time goes into teaching. The rest goes into mandatory requirements to publish, present papers, do committee work, assist developing colleagues, and perform community service. (Note that in my annual performance review, I'm only allowed to indicate that teaching was a maximum of 60% of my effort, and this at a teaching school. This may be atypical, but I suspect it's not.) Now, in the sciences there are faculty with no teaching requirements. And in the humanities, at R1 schools, faculty get a year or a semester off periodically during which time they are expected to complete a research project, typically a book.