Computers Could Grade Essay Tests Better Than Profs
An anonymous reader writes "Robot essay graders could be the answer to grade inflation. New software being tested turns over the task of grading to computers — this article has an interactive demo of the software. One professor says the computer is far fairer than human graders, who get tired and become inconsistent, or play favorites."
I once got an F on a paper from a TA who wrote in the margins "How dare you try to say what Shakespeare was thinking!" Um, that's what literary analysis IS, to some extent. You try to place someone's written works within the context of their culture and society at large and reconstruct their thought processes and views on the world. But that TA was an asshole and had it out for me, and many of us complained about him bitterly for years afterward. The only person who got an A in that entire section was one cute girl.
As long as the robo-grader also includes a plagiarism check, I'd be okay with it. My husband is a professor and most of his failed papers are a result of TurnItIn.com catching outright plagiarism.
Occasionally living proof of the Ballmer peak.
I had a prof in literature who only graded well if you made your critical essay about sexual imagery. At one point I gave up trying to "be me" and went whole hog, way overboard, almost parodying the over sexualized essay. And I scored an "A" for the the first time. Lesson learned? Sometimes it's OK to tell the boss what he wants to hear and do it his way, as long as it doesn't cost you anything, and nobody gets hurt. And, of course, life's not fair.
I can't say if a computer is better than a human at marking, but in my engineering subjects, when my name was on the test papers I did not get very good grades (actually at least grade lower than expected). But as soon as all the students were given anonymous numbers the grades went up. Conclusion, the staff could no longer decide to give better grades to their pet students. So in theory, there could be many students who get better grades because there is no more favouritism.
Take Nobody's Word For It.
My essay grades in college humanities courses were terrible until I started trying to figure out the political slant of my professor (or TA if the TA is the grader) and wrote papers supporting those views (and to be fair, those views weren't always left-leaning ones). I went from a C paper student to a low-A paper student in the blink of an eye.
Consistency is a fair point, but playing favorites? Isn't this what anonymous marking codes/IDs are for? (Or at least, that's what happens in the majority of universities in the UK)
but it really needs to check for plagiarism. I saw a load of it up at Colorado State.
In addition, it would ideally be able to handle lab books. I remember grading micro-bio 201 lab books back in the 80's, and I was getting tired after the first 30. The second 30 was a pain. The last 30, well, we finished the grading at a pizza joint over beer. I suspect that was how grade inflation happens.
I prefer the "u" in honour as it seems to be missing these days.
The GRE exam uses software to grade the essay portion for quite a while, along with a human grader. If these two scores different by a point or more, then it is forwarded to another human grader and the final score will be the average of the three entities.
That cuts the cost of running the exam, considering the cost of incurring an extra human grader.
It will soon pop up everywhere at university level, when the budget cuts are everywhere.
New Economic Perspectives
Not who, what.
Blank until
The best student. Duh.
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
What's up with the mass media headlines? Reading the summary actually makes me dumber. It talks about "computers" like they are sentient and grades the tests instead. Having professors first strictly defining the rules, entering them into software and having a computer evaluate those rules is still "professors grading the essays". It's self evident that the grading is better if it's more strictly defined.
Wow, I can build a house faster with this hammer. Headline: Hammers Could Build Houses Faster Than Construction Workers (In Cyberspace)
Unless they've made some impressive advances in natural-language interpretation in the past few years that haven't trickled out into other products, I'm a bit puzzled as to how this scheme is supposed to work.
Even the (comparatively much easier) tasks of spelling and grammar checking result in a fairly steady stream of mistakes from computer systems. I can't exactly summon much optimism for the likely outcome of such a system trying to distinguish between a paper with a well supported thesis and a paper that contains some declarative statements, a few quotations, and the word "therefore" at intervals.
On the plus side, it should be pretty trivial to get the machines to do the same lousy job without the slightest consideration of the student's name/status/cuteness/willingness to flatter the professor; but what use is purely objective execution of lousy work?
And the IBM system give a F for saying Toronto is in canada. I say let the computer help but make it so there is no AUTO FAIL and make a real person review at least some flagged papers.
As someone who never effected the curve or caught the affection of a teacher, I welcome our new digital grader overlords.
Having to work for a living is the root of all evil.
I wasn't aware we now have access to AI this advanced. Spell check and (maybe) grammar check are reasonable, but how does a computer assess a student's understanding and mastery of a topic? How does the computer recognize originality, creativity, or intuitive leaps? Can the software recognize an effective argument, a convincing solution?
I'm a geology and earth science professor. When I give writing assignments, I'm usually more interested in the content than the mechanics. I'll tolerate a few spelling and grammar mistakes if the content of the essay or paper demonstrates that the student understands concepts presented in class and, even better, is THINKING about the implications.
For intro. writing classes, where grammar and structure are the point of the assignment, computerized grading is understandable; especially if your school has you teaching classes with more than 50 students (which is another issue entirely). But, in my experience at least, proficiency at writing is not always directly correlated with proficiency at class material.
But the objectivity of the grades has nothing to do with the problem of grade inflation. Professors intent on inflating grades will simply reduce the weight of tests as part of the overall grade and count class participation, homework, etc. more /or/ add a flat number of points across the board to the results of the computer scored tests.
Grade inflation, after all, isn't simple bias. We're not speaking of professors grading up people (or views) that they like and grading down people (or views) that they dislike. Rather we're speaking of professors that systematically give higher grades than they ought for one reason or the other. Some do this for ideological reasons. Others do it because they're tired of fighting students (or parents) that complain. The end result is an 'A' no longer means 'excellence in performance' but is pretty much the default grade for anyone that do a moderate amount of work.
I'm not arguing that this is a good or bad idea, but it won't do anything to change grade inflation. In my experience (as a TA for a number of different classes), college professors look at the point totals at the end of the semester and determine the letter grade cutoffs by hand so that they have the grade distribution they want. I'm not saying they're going through and making sure specific students get a particular grade, just that they want, say 50% A's 30% B's and 20% C's and they'll put the cutoffs where they need to be for that to happen. Just because the essays are graded tougher doesn't mean they can't still give half the class an A.
AIs to grade the papers I would assume would result in some folks developing AIs to create the papers...
First, for those who didn't read TFA, computers play only a small role on a handful of essays. Most of the article is in reference to having a 3rd party grade anonymized tests, rather than leaving it to the professor or TA. During college, I had a job as one of those graders.
We worked for five hours a day in the evening, though we could leave early and get the full pay if we finished all our papers. Most of the tests would be on general topics, but occasionally we'd get tests that required specific knowledge. In those cases, only qualified graders could review them, and we were given cheat sheets to make sure we didn't make factual mistakes. Essays were generally graded on a 1-5 scale (or a 0 if the essay was a blank page or similar). Each essay would be graded by two people, with a third breaking the tie in the event of a disagreement. However, we trained to be extremely consistent in the grading, so disagreements were rare and never more than a one point difference.
A few times a day, we would get fake essays intended to test our grading skills. For example, an essay that was supposed to be a perfect example of a 4 would be given to you with all the rest. If you gave it a 4, you get +1 point. Give it a 3 or 5, you get zero points. Give it a 2 or less, and you lose a point. If you accumulate a lot of points, you get a bonus up to 50% of your pay. If your total score goes too negative, you get fired.
It was a pretty good job, as crappy part-time "work your way through college" jobs go. The best part was whenever we got to grade essays by little kids. They were harder to score accurately -- it's hard to look past the abysmal handwriting and frequent misspellings. But they were frequently adorable and unintentionally hilarious.
As a grad student. My supervisor isn't allowed to ask me to grade work for him, or prepare lecture material for him. Some of them do that, but we have a union that allows us to push back against it. I am paid as a teaching assistant as well (which is not guaranteed for all grad students), for that I am to work no more than 140 hours in a semester, and supposed to be roughly 10 hours per week. Part of being an instructor (which I have also done) is setting assignments that can be graded in the hours you have available to you. If you have 1 TA, yourself, and 25 students, don't set 100 page papers. I taught a graduate/4th year computer science course, so the TA needed to be trained up (that counts against his 140 hours), we had meetings he had office hours (all counts against his hours), and he took care of some stuff with IT (counts against his hours). In the end he had about 70 hours for marking. 5 assignments per student + an exam. So he had about half an hour per student per assignment to mark. That's neither good nor bad, it's just a matter of not setting material that cannot be graded that fast.
Professors are usually 40/40/20. 40% teaching, 40% research, 20% administration. Sometimes they are a bit more teaching or research. Of that teaching they usually do 4 or 5 courses, which then means they are supposed to be spending about 10% of their time on the one course you see them in. In practice it's more like 60/20/20 or 70/20/10 but depends on the department/school/personal ability etc.
As a professor, I can attest that the diagnosis of the problem here is too simplistic and the proposed 'solution' here is unnecessarily complicated. While it is the case that TAs and insecure professors will often inflate grades as they are scared of student appeals, the solution is to employ most experienced professors. There are also relatively simple methods that can be used to prevent grades becoming skewed. For instance, it is easy to grade anonymously. Just ensure that identifying details only go on the first page and turn the work over and grade from the back. One can also compare class mean and median scores (and SDs) with the scores from other sections of the same class. Such methods can ensure fair and consistent grading, without grade inflation. I always use such methods to great effect.
Aren't they a culprit too in grade inflation debacle ???
I was a TA in a far east university in an Engineering department. Generally I consider my self a tough marker, as I expect students to arrive at answers with right logical reasoning. Having said that, I usually had a partial blind eye for students who has genuine drive towards studies -- post grad research types --, because their future shouldn't be eclipsed by a one bad grade. Also I highly control the grade distribution, such that only 5-10% of the class will get A-grade.
First time when I marked the maths assignments, the feedback was horrible. I was told off by the lecturer for marking strictly, and then he increased marks of everybody by some percentage. Then I was instructed "not to go through the workings" and "give full marks if you see the answer". Since then, more than half the class gets A-grade.
The problem here is, lecturers are evaluated every semester by handing out questionnaires to students (in that university). Bad feedback can kill lecturer's x-mas bonus to getting a promotion in the department. So him (and many others) end up pleasing students not to hurt his career as an academic.
On a separate note, most of engineering course work are now done in software level. As a consequence, hardly any hardware related experiments and report writing. Downside of all this is, it is impossible to catch plagiarism; as all experiments in a software produces same outcome, more or less. Unless all students get it wrong, everybody ends up getting A-grade.
In my time, all course work (labs, assignments) has to be submitted as a report. Highest I ever got was 8/10... mostly 7/10. In one assignment I submitted, marks were slashed for no zooming in a graph (still it covered 90% of the page). In another report, few marks were removed for not using a ruler to draw a circuit diagram. Having few bad grades eventually costed my first class, which became a major issue in my post-grad entry. Considering those days, I think college kids are having easy time now. In a way, I can understand why people in the working world pay little to no attention on college performance.
I can see how in some cases the computer would do a better job than a professor. In particular, ones that could not care less about teaching. I'm in Physics, and in one grad course there was an essay on an exam that I got a zero on. When I looked at the solutions, it appeared that the essay on the key was actually my essay with a few slight modifications. Two sentences of the short paragraph were my words exactly. When I brought this to the professor (who was also my advisor), he (a) couldn't remember my name (b) wouldn't even look at the exam (c) wouldn't discuss the answer and deferred everything to his grader, who was another grad student. The grader had better things to do and just handed my exam back to me and said, "that's what you deserve." This same professor, it should be said, makes psychotic Wikipedia self-edits about how his work "reconciles quantum mechanics with the Christian faith", rarely talks to other groups about his research (once one of his students came to me to ask a question about a problem he'd been working on for months--within minutes I identified it as being identical to a well-known NP-hard problem), and frequently "dumps" RAs he doesn't like by simply ending all communication with them.
My point is, the professors and TAs that grade unfairly don't do so because they can't. They do because they don't care. When I graded essays, I had a list of things I wanted to see in a correct answer and how many points they were worth, and a list of things that I would always take off points for. Every essay had a column of numbers next to it and a copy of my rubric so that any student could see exactly what they got points for and what they may have been penalized for. Out of classes of over a hundred students, I rarely received any complaints except for students who were on the border of failing and were desperate for one or two points. While sometimes grading essays felt like a simple application of a regular expression, searching for the gems of knowledge, equally as important was the logic that led to that conclusion. Correct answers obtained through incorrect application of concepts weren't worth any points at all, and it would be difficult for a program to match that with any regular expression.
I guess experience with bad professors did teach me one thing--despite having no passion for teaching myself, I would always treat my students like people and do my best to ensure that they got the best education possible for their tuition.
I am currently getting my Masters in Information Systems with a specialty in security. I have 15 years experience in the field including senior executive and operations targeting bad actors. However, in one of my classes, the TA would give me poor grades on my essays whenever I would write about and cite from my professional experiences and research. I decided from then on to just regurgitate the material from the PowerPoints and reading material (much of which I disagreed with, or was outdated). Guess what? My grades improved drastically. The funny thing was that the TA was a lifetime student with barely any real life professional experience.
I wrote a response to this, but I think Slashdot ate it.
Has anyone done good empirical work on similarities and differences in perceptions of literature, according to cross-cultural, demographic, or other factors? The greatest weaknesses I have seen in the litcrit I've been exposed to have been the lack of empiricism, the lack of taste, and the lack of ability to write well (in fact, the propensity to write quite poorly, despite the use of jargon). But perhaps my exposure has not been broad enough.
-- IANAL, this isn't legal advice, and definitely isn't legal advice for you. Also, Squee!
The purpose of a test is to see if you know the material that is being presented to you. If the material you are being fed looks like bullshit that's still what is required to be put in the tests and assignments.
I'd say the sarcasm was probably noticed but didn't cost any marks becuase it was used in a way that showed you were paying attention.