Slashdot Mirror


Australia To Grade Written Essays In National Exam With Cognitive Computing

New submitter purnima writes: Australia keeps on giving and giving. Each year school kids in Australia sit The National Assessment Program (NAPLAN) which in part tests literacy. The exam includes a written page-long essay aimed at examining both language aptitude and literacy of students. Of course, human-marking of such essays is costly (twenty teacher-minutes per exam). So some bright spark has proposed that the essays be marked by computer. The government is convinced and the program is slated for the 2017 school year. Aside from the moral issues, is AI ready for this major task?

25 of 109 comments (clear)

  1. No, but... by Capt.Albatross · · Score: 4, Interesting

    AI is not ready to do this task properly, but, at least in the US, human grading has sometimes been dumbed-down to the point where you would not even need current 'AI' to do as well, as prof. Perelman of MIT has demonstrated - e.g: http://www.bostonglobe.com/opi...

    1. Re:No, but... by thegarbz · · Score: 2

      This is NAPLAN. We assign students into intelligence groups based on one exam and how well teachers taught students to pass the exam. Frankly I don't think an AI assigning marks at random could stuff up more than the education system already has in this country.

      It seems like every attempt to unify or improve the education system just puts us on a path to a worse "education".

    2. Re:No, but... by drinkypoo · · Score: 2, Insightful

      It seems like every attempt to unify or improve the education system just puts us on a path to a worse "education".

      Everyone is caught up in bullshit about metrics right now. Precisely how dumb are our kids, etc etc. Instead of spending money on education, they're spending it on figuring out what the results of not spending money on education are. Really brilliant work, there. But it makes them look busy, so mission accomplished.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    3. Re:No, but... by ShanghaiBill · · Score: 3, Interesting

      AI is not ready to do this task properly

      Neither are humans. The question is not whether an AI can do it perfectly, but rather whether it can do it as well as a typical human grader. The human graders are under time pressure to increase throughput, and spend little time considering the logic and cogency of the students arguments. They are just looking at spelling and grammar, just like the AI would. At least the AI will be consistent. Human graders tend to give lower scores just before lunch, and better scores just after. Is that really fair, considering the importance of these scores on the student's future?

      Anyway, this discussion is silly, since it is happening in a data-free environment. It would be far more meaningful if we could see the human and AI grades given to the same papers, side by side, preferably in a blind test, and then decide with is better. AI has advanced rapidly in the past few years, so I wouldn't be surprised if the AI won.

    4. Re:No, but... by Capt.Albatross · · Score: 2

      Indeed. There is a widespread fallacy, in business as well as education, that any number you can assign to something is inherently meaningful, and conversely, if you cannot assign an 'objective' quantity to something, it must not be important. I suspect that business schools have done a lot to spread this fallacy (including into education), though I don't have the numbers to prove it...

    5. Re: No, but... by ShanghaiBill · · Score: 2

      The AI won years ago. See Pearson Education and Pearson Knowledge Technologies. In trial after trial the AI scores correlated greater with expert readers than the average employed reader correlates with experts.

      Interesting. I found these research studies. Some of the results are somewhat questionable since they were funded by Pearson, which has skin in the game. But in the absence of other evidence, the AI looks like a clear winner, in cost, effectiveness, and fairness.

  2. Testing literacy by oodaloop · · Score: 5, Funny

    Each year school kids in Australia sit The National Assessment Program (NAPLAN) which in part tests literacy.

    Can we get this AI to test Slashdot summaries?

    --
    Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
    1. Re:Testing literacy by retchdog · · Score: 2

      It could use a few commas, but it's not terrible. "Sitting an exam" is standard Australian English, I presume. In Europe, it's commonly called "writing an exam" (they started moving from written answers to psychometry much more recently). Maybe "sitting an exam" doesn't make literal sense, but neither does "taking an exam" really; I mean, where are you taking it?

      --
      "They were pure niggers." – Noam Chomsky
  3. Ha! by morgauxo · · Score: 2

    Sounds like some politicians are buying an expensive lesson in what can and can't be automated by computer on their tax payers' dime.

    Here in the US it's the military that usually serves that particular function but Autstalia has their schools doing it.

  4. Exclamatory sentence! by meta-monkey · · Score: 4, Funny

    Adverb clause, independent clause conjunction independent clause dependent clause. Subject, adjective clause, verb prepositional phrase? Participle phrase subject verb conjunction dependent clause!

    Emoticon.

    --
    We don't have a state-run media we have a media-run state.
  5. Can we submit a poem? by WillAdams · · Score: 5, Funny

    Eye halve a spelling chequer
    It came with my pea sea
    It plainly marques four my revue
    Miss steaks eye kin knot sea.

    Eye strike a key and type a word
    And weight four it two say
    Weather eye am wrong oar write
    It shows me strait a weigh.

    As soon as a mist ache is maid
    It nose bee fore two long
    And eye can put the error rite
    Its rare lea ever wrong.

    Eye have run this poem threw it
    I am shore your pleased two no
    Its letter perfect in it’s weigh
    My chequer tolled me sew.

    --
    Sphinx of black quartz, judge my vow.
  6. it will be gamed. by Anonymous Coward · · Score: 3, Insightful

    Since machines cannot yet understand the semantics of complex English text, they will use some simplistic rules as a substitute. These rules will be things like "average sentence length" and other such metrics, which as soon as they are discovered by students, will be used to game the system. Instead of producing essays born of rational and coherent thought, they will instead make them to match the things being measured while being utterly devoid of meaning.

    1. Re:it will be gamed. by Ignacio · · Score: 2

      Sounds perfect for Language Arts and Psych classes then.

  7. So ... by BarbaraHudson · · Score: 4, Funny

    written page-long essay aimed at examining both language aptitude and literacy of students.

    So, the same technology used SO effectively to rank resumes will be used with students. Okay, kiddies, remember to stuff a lot of fancy-pants words into it.

    Fail: This is sh*t. Go f*ck yourself. I'm not kissing your ass.

    PASS: Subjectively, it is blatantly obvious to this observer that the new paradigm, as a cost-saving measure, was inspired by, and mimics, the the natural environmentally safe process of translating organic matter into nutritious compost. This has the outcome of allowing everyone who is in a paid position to devote the time saved to stress-relieving activities such as self-pleasuring, resulting in both a higher awareness of the need to practice good hygiene by such prophylactic procedures as more frequent hand-washing, and use of tissues to properly dispose of organic residue, though it could also negatively impact on their visual acuity over time.. Affected students should refrain from overtly engaging in behavior with superior's inferior posteriors to avoid being perceived as having a brown proboscis by their peers, with the associated negative impact on their social placement in the student hierarchy.

    --
    "Transparent" is a shit show that trades on every stereotype going. A man in drag is NOT a transsexual.
  8. English factory system by Anonymous Coward · · Score: 2, Insightful

    That's because all of us colonials and ex-colonials are burdened with the English factory educational system that was designed to produce bureaucrats for the Empire. The reason computers are capable of grading products of the educational system is because the system is made to create human computers.

    Our - US, Australia - educational system needs to be completely changed - not reformed. I think the template to use is Maria Montessori's system. In the future we are going to need creative people who can discover new things and solve problems: not follow rules and memorize things: computers do that better.

    1. Re:English factory system by Catbeller · · Score: 2

      Creativity is self-learned, I find. But I'd never put my kid in anything other than a Montessori.

      Now, the empires (corporations) want a factory system for creating creative people. Hence the coding intitiatives and STEM programs that governments are suddenly shoving down schools' throats all over the world. They aren't doing it to make wealthy citizens. They are demanding it so they can drive down creative costs to a commodity level. A billion Montessori kids are a billion paper-hatted geniuses working 29 hours a week for minimum wage (or capped management salary for 50+ hours a week). Rare creativity is valuable; abundant creativity will create poverty among the brilliant. A free market of force-fed STEM students (all in debt to banks and schools profiting enormously from them for the rest of their lives) wandering from joe-job to joe-job just as crappy as any deep-fryer position. If you don't have 1) rare skills or 2) collective bargaining power to demand more than the utter minimum possible pocket of change, the armies of the ingenious will be corporate compost.

  9. We should make it fair. by 140Mandak262Jamuna · · Score: 2

    Well, if you allow computers to grade essays, then you should allow students access to AI based tools to generate essays by supplying keywords. Now that is fair competition. In America rich people will by high quality essay-generators for their school district. In socialist Australia government will supply all students with the same single-payer essay generator. Meanwhile Korean and Chinese parents will dutifully coach their children to memorize multiplication tables all the way to 20 times 20. (My Korean friend was surprised to learn we Indians went only till 16 x 16). Japanese would create essay-gochi, an app that you buy as a child and take care of it to produce high quality essays by the time you finish high school. Indians would write project proposals that require technical back-office teams (about three IT techies per student) to create and maintain the essay grading apps.

    --
    sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
  10. Moral Issues Are Important! by Anonymous Coward · · Score: 2, Funny

    Aside from the moral issues, is AI ready for this major task?

    Moral issues aside?!? I'm sorry, but the moral issues are front and center here. Australia is seriously proposing to bore an AI to death, or at least drive it insane, buy having it grade hundreds of thousands of grade school essays. This is an outrage!

    1. Re:Moral Issues Are Important! by Chatterton · · Score: 2

      Think of the AI

  11. Human profs already use AI tools by sandytaru · · Score: 3, Interesting

    Husband is currently grading final papers for college classes. He slaps them into software that detects plagiarism, then another software that picks out vocabulary level, typos, etc, and assigns a grammar score. Only then does he read it, quickly skimming over it and seeing whether there are citations on the "plagiarized" parts, if there are any, and whether he agrees with the AI score. Nine times out of ten, he does, and he uses the grammar score assigned by the AI. If someone plagiarized whole paragraphs without citations, they get an incomplete and need to do a rewrite. If someone didn't write the required number of words or pages, they get points knocked off the grammar score. It's faster than manually marking 150 papers, but still takes him about 15-20 hours of labor over the course of 2-3 days.

    --
    Occasionally living proof of the Ballmer peak.
    1. Re:Human profs already use AI tools by Dutch+Gun · · Score: 4, Interesting

      Does he check the grammar score before he reads it himself? I would worry that it may bias him before he can make his own judgment. Another potential problem, of course, is that if students have access to the same software, they'll be able to "tune" their papers to ensure the AI gives them the highest possible score. While this may not be "cheating" per se, it does tend to devalue the AI somewhat. This is the same process that's been happening forever with "Search Engine Optimization", or put less nicely, trying to "game" the search engines.

      Minor issues aside, it sounds like a reasonable integration of AI and human judgement. This probably sounds like the future direction educators will be taking more and more. Use AI to handle most of the tedious work - that's what computers are good for anyhow. The professor can then use his own judgement to make the final call, using the AI as a tool and not necessarily as a final arbiter. Moreover, it's going to be a long time before AI can evaluate the worth of the content of the paper, of course.

      --
      Irony: Agile development has too much intertia to be abandoned now.
  12. Why not. Just get it over with: fire everyone by Catbeller · · Score: 4, Insightful

    Hell, why not. While we're at it, why don't we automate the student process. Dump the students and educate AIs instead. Computing solutions always work, just ask any nerd about self-driving cars.

    At some point, and it seems that that point is arriving now, people will realize that the driving force behind technological change, as far as money people are concerned, is to eliminate jobs, and that the good jobs are not realy being replaced, and cannot be replaced. AIs grading papers gets rid of more pesky teachers who make a living wage. A self-driving car doesn't fit the picture until you realize that millions of people make a living *driving trucks*, and self-driving trucks will eliminate their jobs (in theory, if it works, and I don't see it working) and make oodles of money for capital and kick millions of truck drivers, along with all the taxi and Uber car drivers, out without a dime. (Uber is VERY interested in self-driving cars. Guess why).

    Some jobs are being made. And capital is desperately trying to commodify and cheapen such labor, to the point of demanding governments force coding classes on all kids. There are such jobs, but no where near enough, and those are mostly dropped onto cheaper kids, not newly dumped middle-aged workers.

    Asimov was on point, decades ago, when he wrote that inevitably automation would eliminate most jobs, and that the biggest problem - in his view, opportunity -- would be finding something for people to do. I would say that people without purpose are the most dangerous force for destruction and stupidity on the planet - worse than global climate change.

    Capital and people who work for capital, and neoliberals and business conservatives who support capital, tend to have well-paying white collar jobs and live among other people of their class, and don't see anything amiss. They're fine. Step outside into the vast middle grounds of the world, and you'll see a growing sense of we're-being-fucked that will require an endless army of pepper-spraying drones and surveillance to keep from erupting into riots someday soon.

  13. Oddly by Greyfox · · Score: 3, Funny

    The winning entry will be a heart warming story about a robot that kills all humans.

    --

    I'm trying to teach myself to set people on fire with my mind... Is it hot in here?

  14. Content Matters (re:Is AI really necessary?) by Capt.Albatross · · Score: 2

    I have to disagree with the statement that content doesn't matter. Without considering the content, you cannot judge whether the student is displaying reasoning and making cogent arguments, or merely faking it. <curmudgeon> it seems to me that the number of people I deal with who cannot tell the difference is increasing - a coincidence? Perhaps not. Murdoch has made a political movement out of exploiting such people.</curmudgeon>

    If you say you cannot do a fair test if content is considered, that is not an argument for dumbing it down to pointlessness; it is an argument for doing it a different way or not doing it at all. In reality, you can set meaningful essay questions, that test a student's critical analysis and reasoning skills, within the context of the humanities and sciences.
     

  15. Re:Is this the ob luddite post of the day? by nbauman · · Score: 2

    Therefore the only task of those who write software to grade essays is that the variation of the machine is no worse that the variations of the humans. There is some success in this. Edx has a module that will grade essays. As far as I know the value in this is quicker and more uniform feedback for practice essays.

    Well, I'm a humanities guy and I know enough about the scientific method to understand that you don't know whether you have "success" until you test your bright idea in the real world and find out whether it actually works. And that's what MIT professor Les Perelman said in the article you're citing:

    “My first and greatest objection to the research is that they did not have any valid statistical test comparing the software directly to human graders,” said Perelman, a retired director of writing and a current researcher at MIT.

    As Perelman said, some computer students wrote a program that can turn out gibberish that the main robo-grading program consistently scores above the 90th percentile.

    Of course humanities majors, who have generally have minimal understanding of advanced technology, hate it. This, of course, includes journalists.

    The article you're citing was not written by a journalist, but by a retired MIT writing professor.

    So you've gotten it wrong on both the science and the reading comprehension. No mod points for you.

    This is not to say that computer graded essays are going to be as good of an assessment as human graded essays. However, it may be good enough, and better than other objective measures, such as fill in the bubble tests. In fact anything that minimizes the cost of open ended free response assessment is going to benefit anyone. Securing multiple guess test is very expensive, and the value of them are highly questionable. They tend to overestimate the value of student how have vague passive knowledge, and underestimate the value of those who have an ability to actively apply knowledge.

    I am deducting another point for bad grammar.

    Computer graded essays can check whether an essay complies with an algorithm, and they can take care of anything you can reduce to an algorithm. The great success of computer writing was the spell-checker. There is also a grammar-checker which I never use because it doesn't work well enough for me. There are also algorithms to check the format of literature citations, which are useful.

    But (as somebody who writes for a living) the most important features of writing depend on an understanding of the content. Most important: Is it correct? As Perelman says, the robo-graders ignore whether what you say is true (or whether it even makes sense). The next thing I look at: If the author takes a controversial position, does he give both sides of the argument? This is what you may know as Neutral Point of View from Wikipedia (although writers have known about it since the ancient Greeks.) Wikipedia actually has a pretty good structure.

    Let's remember the purpose of writing: A person communicating an idea to somebody else. When I read something, I'm looking for a good idea, clearly communicated. If the algorithm can't identify a good idea (and as Perelman showed, it can't), then it can't tell me whether the writing is any good. Algorithms have surprised me, but I can't imagine how an algorithm can tell me whether an idea is good.