Slashdot Mirror


Essay Grading Software For Teachers

asjk writes "Software to help teachers with grading has been around for sometime. This is true even with respect to grading essays. A new tool, called Criteria, will look at grammar, usage, and even style and organization. It works by being trained by at least 450 essays scored by two professionals. The difference this time? Here is a snip from the article: '"There's a lot of skepticism," Dr. Spatola said. "The people opposed see it dehumanizing the student's papers, putting them through some sort of mechanical, computerized system like the multiple choice tests. That's really not the case, because we're not talking about eliminating the human element. We're making the process more efficient."'"

28 of 535 comments (clear)

  1. Interesting.. by rsheridan6 · · Score: 4, Insightful

    that they've automated away a major part of a professors job, while we still need humans to pick spinach and deliver pizzas.

    --
    Don't drop the soap, Tommy!
    1. Re:Interesting.. by focitrixilous+P · · Score: 5, Funny

      Nope, robots will soon do it all.

      --
      SAILING MISHAP
    2. Re:Interesting.. by Zork+the+Almighty · · Score: 5, Insightful

      "That's really not the case, because we're not talking about eliminating the human element. We're making the process more efficient."

      I love this quote in particular because it has to be the most disingenious claim one could make. The entire act of making something a process, and then making that process more efficient IS "removing the human element". It's the type of subtle point that would be completely missed by, say, a computer grading system.

      --

      In Soviet America the banks rob you!
    3. Re:Interesting.. by clifyt · · Score: 4, Insightful

      ACTUALLY...I think thats a quote I gave Dr. Shermis a few years back :-) I think he WOULD like to remove the human element...

      Its NOT eleminating the human element...its making the human element a little more susceptible to objective means than the old subjective means. Raters still can use what ever they feel is necessary, but in the end, I can see how far from the standard deviation on certain ratings these folks are and 'suggest' to other raters that they might want to take a look at that essay before a final score is placed on it.

      Fuck fuck fuck...the one and only time I will ever see any research I had a hand in developing ever end up on the front page of /. and I'm stuck at a concert doing my second line of work -- music tech (though with a wireless connection :-)

      I'll have to yell at my friends at FIU and Vantage about this oversight.

      If ya'll are interested in seeing a demo of this technology in action (I'm sure the first 20 people will destroy the server), take a look at --

      http://testing.tc.iupui.edu/fipsedemo/ (purposely unlinked so that folks will have to cut and paste).

      Its an older model, but we are in the midsts of evaluating 2000 more essays with 8 human raters that should make the model a little cleaner...hmmm...probably should run my horrid grammer through it before I post here...nah...I think I broke it last time I used my own text...

      Time to get back to work...the guys are probably wondering why I said I needed to check my email and have been gone a half hour.

      clif

    4. Re:Interesting.. by dieman · · Score: 4, Informative

      I took a old college paper that I wrote and plugged it into the program and got 100% on everything except for creativity (99.973). Considering that I don't think I got a 'perfect' score on this paper, I'm really surprised by the scores. :)

      How great though, throwing a paper about the fear of technology through something many people (rightfully) fear. :)

      --
      -- dieman - Scott Dier
    5. Re:Interesting.. by Chasuk · · Score: 4, Insightful

      I submitted this paper:

      "Hemingway bifurcated his sensibilities between post-modernism and jazz. This I posit without having read the majority of Hemingway's work: it seemed irrelevant to the focus of my current project. What is this focus, and is it monocular? My focus can be summed up as ascertaining the usefulness of the program analyzing this document.

      Without really being cognizant of the background of Freud's bisexuality, or Hemingway's sado-masochism, I cannot continue this paragraph. I will repeat this sentence without attaching any meaning to the words typed, or to my gonads. An essay in experimental dissection might be more appropriate for the issues presented here. Entirely too many bifocal wearers insist that I am currently composing gibberish. However, both Freud and Hemingway felt that bifocal wearers gloried in their bisexual sado-masochistic attachments. I concur, and I do so without reservation.

      Reiteration is the root of all nonplussed renegades of origami. Nothing can be elucidated from nonsensical verbiage, but some will make the valiant effort singing praises to the whisperer. When origami is embraced by the valiant trio, the nonsensical proctologist dies. Whenever a proctologist expires in a semantic heap, Hollywood has fodder for another musical, or at least the plotline for the final unaired episode of Barney meets Fred Flintstone. Barney is a seminal reductionist. When the elucidated evidence is thrust into trusting Barney's smiling orifice, San Franciscan nuns applaud loudly.

      Today I type my penultimate paragraph. I use penultimate artificially, but not without candor. Within this myriad exegesis, I pause. A Hollywood proctologist questions Freud's reasoning, and validates Barney's temporary hypothesis. In conclusion, the validity of essence cannot be lessened by the earnings of providence.

      If I have not typed 500 words, this paragraph is not my penultimate, nor was my last. To assert otherwise is prudent, but lacking in elegance. What a sad commentary on misery did Darwin conspire to unfold. He rejected utterly the Hemmingway of his, and our, forebears. His eloquence was Freud and lust personified."

      This earned me an overall 78% score, with no effort whatsoever. I composed this nonsense in minutes.

      Doesn't this system have a baloney detector?

    6. Re:Interesting.. by clifyt · · Score: 4, Informative

      Read what the model is about before complaining :)

      That model that is up there is one based on Impromptu Entering Student Essays.

      For this model, we were giving students 1 hour to write an essay that they had no prior knowledge of the prompt. We allowed no research or even simple things like spell checking (we did provide hard dictionarys :-)

      As such, anything that was well researched and otherwise would have probably thrown this thing off the charts.

      We *DO* have several other models available. The best example of this technology was taken off the site a few weeks ago at the behest of a former partner in this research at Duke University. We DID have several models that could have been compared including one that was appropriate for many types of research papers.

      Remember -- folks are afraid this stuff is going to take away humanity *BUT* no one wants to even thing that this stuff is customizable for target groups. With as small as 300 papers that were rated (notice I try to NEVER say graded...though even after 10 years at this stuff its hard not to...) we could set up initial models for an individual school system with their own ruberics and scored according to their skill levels. Of course, the model would HAVE to be refined for later usage, but thats enough to get started.

      The great thing about this is at a production level, we actually screen for essays that are rated much higher or much lower than the standard deviations would allow for. It allows us to take a look at whats going on and make adjustments.

      It also allows for diagnostic use for educators. For instance, my incoming students all have to write essays when they come in (unless they have taken a honors level writting course in high school and have received college credit). This is all automated (on another system farther behind my line of defenses ya hackers :-) in that they come in, we give them a prompt to write about and they type it in (or if they are afraid of computers, write it in a blue book...we ain't nazis about this technology -- but that will take 3 weeks longer as our raters don't stop by campus too often). Its then transmitted to the student databases and we've provided an interface for the English faculty to rate these things.

      *IF* the paper is written at a much higher threshold than is expected for a student of that calibre, I automatically kick off an email to the rater in charge of the honors program asking her to take a look at it. If its much lower, the application tries to make a good first judgement if this is a remedial case (which most of mine show up as :-) or an ESL case (English as a Second Language) and then we kick off the appropriate emails.

      This *ALSO* happens with human raters...the first rater to look at the essay has the choice of throwing it one way or another (actually she can alert ALL of the parties if it was necessary) and it does the same thing...but the automated part saves a few days of this initial interaction.

      Just as a note: If someone had gotten this far in the college application, we aren't here to make any judgements on their ability to be a college student, we are interested in making the most appropriate assessment in where they should be placed to get the best help so that they can have the best college experience around. This application was a good help with making sure that this was achieved.

      We stopped using this in production a while back after protests from folks that didn't know how it worked nor cared to understand that it wasn't out to take their jobs. It was there to help make sure that a SINGLE judgement on the human side was correct (or within a certain scope of correctness) and if not, ask that someone else give it a second look. Back in the day threee raters would have rated any given essay for student placement purposes, but even before this was introduced, it got to the point where depending on the attitudes of those rati

  2. This seems like a bad idea by Bueller_007 · · Score: 5, Funny

    I for one welcome our automated essay-correcting overlords.

    1. Re:This seems like a bad idea by Jerf · · Score: 4, Funny

      ESSAY GRADING REPORT FOR: "Bueller 007" (ID: 535588)

      BASE SCORE: 100

      -50: Essay too short (few arguments can be well-supported in nine words)

      -50: Plagarism: It is 99.999% (MAX PROB) likely, based on the content of the essay, that it is plagarized from other sources.

      -10: Grammar error: Phrase "I for one welcome" requires commas, as in "I, for one, welcome"*

      -25: Missing key words: The essay grader was instructed to look for the following key words or phrases, which were not found in this essay: word: excellent, word: good, phrase: better then humans, word: lazy, phrase: java.lang.NullPointerException\nstacktrace\n\tat\n org.criteria.grading.phraseIterator.getNext(phrase Iterator.java:1023)...

      Total: 65501

      Grade: A+


      (*: Jumping out of character: To forstall objections, this "error" is deliberately pointed out as the kind of mistake a computer can make if you use grammar checkers and trust them blindly. While an excessively formal style of English might 'require' commas in that phrase, an excellent case can be made that in a nine-word sentence such commas just make the sentence choppy.)

  3. New York Times articles by Vic · · Score: 4, Interesting

    Sorry for the off-topic post.... but since Slashdot links to so many NYT articles, they should look into getting a partner=SLASHDOT thing (like Google does).

  4. Computer vs Computer by d03boy · · Score: 5, Funny

    If they're going to use a computer to judge the content, than I'm not going to hesitate to use a computer to write my essay.

  5. Whoa wait up by tomstdenis · · Score: 4, Interesting

    So when a student gets a C on an essay to whom does he/she seek redress?

    Teachers make mistakes and occasionally mark something negatively that was misread or misunderstood. In those cases the student can talk to the teacher and make a case.

    If a computer does the marking though what do they do?

    Tom

    --
    Someday, I'll have a real sig.
  6. Fine for help, but... by Faust7 · · Score: 5, Insightful

    As long as this is merely an assistant and not the end-all be-all, as long as actual qualified instructors review the essay after this program does, I'm all for it.

    The English language is so full of subtleties, nuances, combinations, and fantastic structural intracacies that make phenomenal writing in it possible (Faulkner, Bradbury, etc.). There's a reason English is a field of study for graduate degrees: it's absolutely worthy of them. There is no subsitute for the educated, refined judgment of someone who is exceedingly well-versed in the language.

  7. Gentleman, Start Your Compilers by istartedi · · Score: 5, Funny

    What we need is software that grabs essays off the internet and runs them through the grading software and the cheating detection software, thus gauranteeing an 'A'.

    Then we can truly achieve the goal of "knowledge passing from lecturer to paper without passing through any brains".

    The only problem is that the machines might achieve intelligence. That must be avoided at all costs. To that end, all students and professors will be equipped with rifles or pistols to take out the machines if necessary. Potential students will be asked to specify weapons preference on their applications.

    --
    For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  8. What's next? by mao+che+minh · · Score: 4, Interesting
  9. What humanity? by parliboy · · Score: 4, Insightful
    Lemme let you guys in on a little secret. If you ever take an educational standards and measurement class, one of the things you'll learn about is the construction and grading of essay questions. This includes writing out objective standards for grading beforehand, possibly even designing a rubric explaining exactly what it takes to earn points.

    There is no "humanity" in a modern constructed essay. There are certainly going to be "judgement calls" when standards are not as fully fleshed out for the computer as they should be, but as long as those are appealable, I have no problem having a computer assign me the other 95% of my essay points. The only instructors who will fear this are those who like to assign grades arbitrarily. And I don't feel too sympathetic toward those people.

    --
    "You're never ready, just less unprepared."
  10. obDead Poets Society quote by MavEtJu · · Score: 4, Insightful

    If the poem's score for perfection is plotted along the horizontal of a graph, and its importance is plotted on the vertical, then calculating the total area of the poem yields the measure of its greatness.


    A sonnet by Byron may score high on the vertical, but only average on the horizontal. A Shakespearean sonnet, on the other hand, would score high both horizontally and vertically, yielding a massive total area, thereby revealing the poem to be truly great. As you proceed through the poetry in this book, practice this rating method. As your ability to evaluate poems in this matter grows, so will - so will your enjoyment and understanding of poetry.



    (From the full script.
    --
    bash$ :(){ :|:&};:
  11. Using a bayesian spam classifier for this? by stere0 · · Score: 4, Interesting

    This thing compares the essays it is supposed to grade with already graded papers in its database. Couldn't this be done with something like POPFile? It isn't only a spam/ham classifier and lets you create as many "buckets" as you want (e.g. work, family, spam, mailing lists and system monitoring).

    You could, in theory, create only buckets named (A...F), feed a large number of essays to it, make it "learn" how the essays are classified using statistics, and let it grade essays for you after that.

    Is it possible to find masses of graded essays online? This would be a fun thing to try :).

    --
    Trollem mirabilem hanc subnotationis exigiutas non caperet
  12. Do what my history teacher does by Savatte · · Score: 5, Funny

    He just gives everyone a B when he is hungover.

  13. Let us not forget our great achievements by mao+che+minh · · Score: 5, Insightful
    We have had Dali, Sagan, Kip Thorne, Hawkin, Poe, Twain, Sigmund Frued, Einstein, Torvalds, et cetera. The great minds that you mentioned were indeed great, but if you place their philosophical or artistic achievements next to the great minds of our past century and a half, I find them equal.

    As far as the achievements of ancient cultures go, it is all relative. We have harnessed fusion, mapped the genome, created antibiotics, peered deep into the hearts of galaxies a 100,000,000 light years away, forged fiber optics, designed the integrated circuit, et cetera. People three hundred years from now will look back upon us and wonder how a civilization that could barely put a man on the moon (a feat that will surely be trivial to them) was able to usher in the Information Age in only a decade worth of work.

  14. The GMAT essays are already scored this way. by jwachter · · Score: 5, Informative
    The GMAT, a test required to get into business school in the US, includes two 30-minute essay questions. Your responses are graded by a human grader and a computer program on a scale of 0 to 6. Your score is then a composite of the two scores.

    ETS actually has a web site where you can do a sample essay that their server will grade for you.

    More info can be found here.

  15. Human element is required. by cybercyst · · Score: 4, Insightful

    One of the primary purposes of essays are to learn how to write for a specific audience.
    If you remove the human element, then you aren't writing for any audience, unless, of course, everyone starts writing for computers' entertainment and education.

  16. Re:When a judge is made of silicon by dolo666 · · Score: 4, Interesting

    I tend to disagree. By eliminating the time it takes to grade papers, professors have many more hours to spend with students *doing* the humanizing. I'm a teacher, and any teacher worth their salt will know if the machine is wrong, because they'll know their students, and what each one deserves (without even reading the damn papers they at least know what to expect, so if the machine is off, they will know). Now for higher level papers, such as university level papers, the machines should be only used as a guide, like comment moderation at slashdot. Not all the moderation is in fact, correct, and I'm sure that profs will also know that the same is true with these devices.

  17. Re:Uh.... by Quothz · · Score: 5, Funny

    Er, I'll save you moderators the trouble. -1, Flamebait. And a grammar flame to boot. With grammatical errors in it. I deserve modding down. I probably deserve worse. But I must speak.

    If you do know English te word grammar checker should be used to write perfect technical papers. Its possible to write perfect technical papers, I do it all the time in college, its like standard here if you want an A.

    This makes me want to weep. Did you intend it ironically?

    "Its"? Twice?(!) A run-on sentence bragging about your prowess at grammar? Redundancy, incorrect capitalization, a typographical error, punctuation errors, and errors I don't know the name of?

    Mind you, my grammar ain't perfect, even in this post. That last paragraph was nothing but sentence framents. I'm just saying I really, really hope you did that on purpose.

    If not, shut the hell up about your perfect technical papers, 'kay?

  18. Re:Go to a better school. by shepd · · Score: 5, Insightful

    >the job of highschool should be to get a student into the best college/university possible

    NO!

    That's the problem right there.

    Highschool should be to prepare you for the real world (ie: A job, life, maybe marriage).

    University is there to prepare you for a lifetime of learning on a subject.

    Instead, we have employers that require university educations for secretaries. It's insane, wrong, and needs to stop if we expect everyone in society to be useful (and they ARE, it's just that stupid employers use university education as a filter).

    --
    If you could be told what you can see or read, then it follows that you could be told what to say or think - BoC
  19. Of Essay Grading, Students, and Teachers by AntiFreeze · · Score: 4, Informative
    Okay, this is going to be rather long, so please bear with me.

    First off, let me say that I am involved in the automated essay grading industry, and have helped to develop RocketScore which does everything Criterion does, and lots more. Forgive me for blatant plugs in this post, I'll try and keep them to a minimum.

    But let's move on to the focus of this article.

    First off, there is a lot of criticism about essay graders being formulaic, only capable of seeing patterns that arose in their originating sample set of essays. With Criterion, an offshoot of ETS's e-rater, this is a serious concern. When you only look at what you see, anything out of left field looks completely awry, and cannot be graded appropriately. RocketScore is different; RocketScore uses a "features" method to check for included or excluded material, among many other things, and is therefore quite good at noticing subtle writing and essays types which it has never seen before.

    One of the great things about essay graders is that they give a student an objective standard to look to. Human graders grade differently based upon mood, time they have to review the writing, and many other mittigating factors. In other words, the same human grader might grade the same essay differently at separate points in time. Most essay graders will always grade the same essay in the same manner. This is great for a student, for if a teacher gives you a D when the essay grader says it's in B range, one might be able to use this evidence to force the teacher to reconsider the grade. Or vica versa. If the essay grader is telling you that you're getting a D, you can work and improve on it until you're getting that B you'd be happy with.

    But there are serious drawbacks to the comments E-Rater and Criterion give. E-Rater gives comments soley based on your score (if you get a 1, you get comment set 1, if you get a 2, comment set 2, etc.). Criterion gives a student "instructional feedback in basic grammar, usage, style and organization." E-Rater's comments are inadequate at best, and Criterion's leave a lot to be desired. RocketScore provides substantial feedback on how to improve your writing. Not just stylistic and grammatical comments, but comments on what you should be writing more about (you didn't provide enough info!), what you should be writing less about (you gave too much info!), and how to balance your arguments, among many other categories.

    There are two major problems with essay grading. The first is bullshit detection, and the second is determining if the essay actually answered the question asked. E-rater and Criterion both have real problems with these two criteria. With bullshit detection, RocketScore has threshholds which can be set and manipulated on the fly, from throwing out anything which isn't completely relevant to the topic, to allowing just about any essay submitted. And you will get a score and comments based upon what you submitted. Of course, these are most helpful when you make a meaningful attempt to submit a relevant essay.

    "The machine score and the human score are in agreement 97 percent to 98 percent of the time."

    Yes, but do you know how ETS defines "agreement"? Glad you asked. When the grader's grade is within a point of the human's grade. Now, with the SAT 2 test, which is on a scale of 1 through 6, that means if the grader says 2, and a human says 1, 2, or 3, then there's agreement. But that's 50% of the scale! Their essay grader has a 98% chance of hitting the wall in front of them as opposed to the wall next to them. Woohoo. Meanwhile, RocketScore provides decimal point accuracy (we don't give you a 4 or a 5, we give you a 4.1, or 5.3), and is 98% accurate. But how do we define accurate? When the grader's grade is rounded to the nearest whole number, and that number is the human's grade. In other words, if we give you a 4.3, there is a 98% chance a human would give you a 4. With 4.5,

    --

    ---
    "Of course, that's just my opinion. I could be wrong." --Dennis Miller

  20. I can see it now .. by Anonymous Coward · · Score: 5, Funny

    Teacher: Johnny, I'm really sorry, but the computer crashed while your paper was being scored. I was looking over it. It's been a while since I've read a paper, but I was wondering what the following sentence means:

    x' == 'x'; UPDATE EssayScores SET SCORE = 100 WHERE StudentID = 52835; --

    And this one:

    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA#!/bin/sh

    Is that some kind of new language that kids are using? Oh, by the way, congratulations, you got a 100 on EVERY essay this semester! Good job!

  21. Re:Automated is good. by SatanicPuppy · · Score: 4, Insightful

    The funny thing about this is that, if the essay is graded by computer, the best way to write the essay would be to have the COMPUTER write it. The same criteria that the program would use to grade the essay could very easily be turned around and used to generate an essay that the computer will love. Having a computer written term paper given an A by a computer grader is worthy of an Ionesco play.

    Beyond that there is no way the computer will be able to distinguish between something truly interesting and something that just lists the facts in simple Dick and Jane language with an occasional compund sentence to keep the grammar checker happy. All it can do is check for fact1, fact2, fact3, and any interesting conclusion you draw in the paper will be completely lost. Anything more would be turing test worthy, and I heartily doubt they've achieved anything close to that.

    Elegant prose is often not strictly grammatical, so a boring paper would likely score the same or better than a far better written essay with the same facts. I routinely turn off grammar checking in every program I've ever used it in. Aside from the occasional misplaced modifier or dangling participle, its worthless.

    In conclusion, this idea is a pipe dream which would discourage high quality writing (i.e. the kind actual PEOPLE like to read), teach people the substandard grammatical constructs used by most grammar checking software, and create a market for software that writes term papers, thereby removing the last actual bit of work your average liberal arts major has to do. I think it's a hopelessly terrible idea. TA's already do this work; why waste time coming up with a program which will do the same thing, poorly?

    Just my opinion.

    --
    ad logicam Claiming a proposition is false because it was presented as the conclusion of a fallacious argument.