Slashdot Mirror


Software Takes On School Science Tests In Search For Common Sense

holy_calamity writes: Making software take school tests designed for human kids can help the quest for machines with common sense, says researchers at the Allen Institute for Artificial Intelligence. They've made software called Aristo that scores 75 percent on the multiple choice questions that make up most of New York State's 4th grade science exam. The researchers are urging other researchers to pit their best software against school tests, too, to provide a way to benchmark progress and spur competition.

33 of 44 comments (clear)

  1. Hopeless by pjbgravely · · Score: 2

    Good sense is no longer common.

    --
    Star Trek, there maybe hope.
    1. Re:Hopeless by Archangel+Michael · · Score: 1

      I'm seeing the slow de-evolution into the blob people of Wall-E

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    2. Re:Hopeless by Anonymous Coward · · Score: 1

      'Common Sense' never was 'Common', or we would not have a particular phrase for it

      'Common Sense' as an idea arises when 'others' do not realize what 'we' know, therefore we are free to look down at them as 'substandard' because they lack common knowledge

      In the case of 4th grade science tests, I would be much more inclined to question the validity of the tests if they are so complex that a machine designed to answer tests cannot perform above a C average. It is much more likely that they author of the tests is expecting that they taker have particular knowledge that is not germane to the subject, but based on the idiosyncrasies of the teacher

    3. Re:Hopeless by OzPeter · · Score: 1

      I would be much more inclined to question the validity of the tests if they are so complex that a machine designed to answer tests cannot perform above a C average.

      The point of TFAI is not so much that the AI does badly but rather that the author of the AI (^w Slashvertisement) wants to use the tests as a benchmark. From TFA

      Aristo is being developed by researchers at the Allen Institute for Artificial Intelligence in Seattle, who want to give machines a measure of common sense about the world. The institute’s CEO, Oren Etzioni, says the best way to benchmark the development of their digital offspring is to use tests designed for schoolchildren. He’s trying to convince other AI researchers to adopt standardized school tests as a way to measure progress in the field.

      --
      I am Slashdot. Are you Slashdot as well?
    4. Re:Hopeless by spacepimp · · Score: 1

      I'm not certain you understand what the term means. You might want to Google it. Aristotle most certainly had a different interpretation of the phrase.

    5. Re:Hopeless by LifesABeach · · Score: 1

      Maybe if the computer program went to a charter school? Where the answer 'C' is found more often.

    6. Re:Hopeless by mythosaz · · Score: 3, Informative

      Test-taking is a skill, and most test-givers include clues (and even answers) in their tests. Some test-givers, of course, mean to give these clues; many are oblivious to it. If I remember some of the bigger lessons from my test-taking classes.

      Multiple choice questions, for example (which is what this software uses) often have choices like:

      Stamen
      Pistil
      Filament
      Pistol

      While some test-givers might include the homophone pistol as a red herring, words like that are a clue that the answer isn't Stamen or Filament, but that you're expected to know how to spell "Pistil."

      Similarly, if you read page-2 of a test, you might find more detailed questions regarding the pistil, questions that might spell out exact what that part of the flower does, solidifying the answer.

      Numbers in the middle of ranges are more likely correct, as are exact numbers near general numbers (e.g. Water boils at a. 10, b. 100, c. 200, d. 212, e. 2000)

      Long answers, when not absurd, are generally correct.

      Middle answers, when not randomized by test software, are more likely to be true.

      A pair of similar answers (see above, Piltil, Pistol) generally narrows you down to 50/50.

      "Absolutes" in true-false questions are almost always false, and true is more common than false.

      Continuity errors like using the wrong article (a/an) often narrow choices.

      Some test-writers who don't randomize also don't repeat answers, or never repeat beyond a limit. Patterns may emerge after simple processes reveal some of the clues.

      ----

      After practice in this test-taking class, we all took multiple choice exams on a variety of complex subjects and passed them.

    7. Re:Hopeless by Bite+The+Pillow · · Score: 1

      You posted your tagline from your blog here, which means

      1) You must be correct
      2) There's no point in continuing to do anything

      I guess that wraps it up. pjbgravely has spoken, there's nothing we can do. Let's just all commit mass extinction and get it over with. We're only killing time until the inevitable heat death of the universe, after all.

      Wait, wait a sec. When was it common?

    8. Re:Hopeless by ShanghaiBill · · Score: 1

      Tests are a terrible benchmark for AI. It would be easy to get ~75% correct just by looking for keywords and simple pattern matching, without actually understanding the question, or using any AI.

      Winograd Schema are a better test. You can't get them correct with tricks. They test actual understanding.

    9. Re:Hopeless by Macdude · · Score: 1

      I wish I knew about this when I was in school, I wasted so much time learning the subject matter...

      --
      "Grab them by the pussy" -- President of the United States of America
    10. Re:Hopeless by narcc · · Score: 3, Insightful

      It would be easy to get ~75% correct just by looking for keywords and simple pattern matching, without actually understanding the question,

      Pattern matching, without any understanding, is state-of-the-art AI.

    11. Re:Hopeless by penguinoid · · Score: 1

      Sufficiently advanced pattern matching is understanding.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    12. Re:Hopeless by narcc · · Score: 1

      That remains to be seen.

    13. Re:Hopeless by penguinoid · · Score: 1

      Nope, it's easy enough to prove. Suppose the AI can match the pattern of neural connections in a person who understands a concept -- then that pattern matching is at the very least understanding and perhaps more. Of course, I expect the pattern matching required to be considered understanding is much simpler than this.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    14. Re:Hopeless by narcc · · Score: 1

      It's still just idle speculation. Beliefs without evidence, are still beliefs without evidence; no matter how reasonable you believe them to be.

    15. Re:Hopeless by penguinoid · · Score: 1

      There is ridiculous amounts of evidence that neurons is how we think.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    16. Re:Hopeless by narcc · · Score: 1

      Which is completely unrelated to your initial claims, which you'll quickly discover if you follow my advice here: Consider first what you've proposed, then ask what evidence exists that supports the specific model to which you've alluded.

      Then do some reading. A lot of reading, I suspect. You'll discover that what you believe is nothing more than idle speculation, with no evidence to support the claim you made in your earlier post. A lot of work has been done along those lines, none have yet proven fruitful (as it relates to your assertion).

      Just because you're beliefs seem "scientific" does not mean that those beliefs are grounded in science. It's better if you learn that now, accept reality and the cold-hard fact that we don't everything there is to know about the universe. To do anything else is to embrace pseudoscience.

      All-too-often I see pseudoscience, dressed up in the trappings of science, sold to "science fans" on the singular basis that no supernatural elements are present in their explanation. It's disgusting, but selling nonsense to under-educated skeptics is surprisingly lucrative. Don't be part of the problem you think you're solving.

  2. Myth! by Xamusk · · Score: 1

    There's no such thing as "Common sense". It's just a myth to oppress those with different opinions.

    And if you actually depend on people's common sense for things to work, you're doing it wrong./p.

    1. Re:Myth! by Ol+Olsoc · · Score: 1

      There's no such thing as "Common sense". It's just a myth to oppress those with different opinions.

      And if you actually depend on people's common sense for things to work, you're doing it wrong./p.

      And when "common sense and "science" are paired together, it's usually codespeak for creationism.

      --
      The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
  3. 75% on a 4th grade test? by turkeydance · · Score: 1

    i'm in!

  4. Professor in New York by phantomfive · · Score: 2
    This quote from the article (by a professor in New York) is all that needs to be said:

    “What’s difficult for humans is very different from what’s difficult for machines,” says Davis, who also works on giving software common sense. “Standardized tests for humans don’t get very good coverage of the kinds of problems that are hard for computers.”

    --
    "First they came for the slanderers and i said nothing."
    1. Re:Professor in New York by tgv · · Score: 1

      The only relevant response.

  5. Teach the test by Otome · · Score: 1

    Actually using these as benchmarks would bring "teach the test" to a whole new level.

  6. Well, that's not so amazing.... by bobbied · · Score: 1, Insightful

    75%? Everybody knows that you get 25% when you just guess randomly... So being able to add another 50% isn't all that amazing.

    Understand how they do this though... They have taken the existing study guides and have constructed an algorithm that does basic word association. Multiple choice tests are written to have one right answer, one plausible answer and two answers which are distracters, designed to trick you. So the trick to multiple choice when you don't know the answer is to identify the distracters and pick from the remaining answers. So armed with their word association, they eliminate the distractors by finding the answers that have the more closely associated words with the question seen in the study guides and throwing out the rest. this will get you easily to a pretty good solution, and where it isn't conclusive it will easily get you to a 50/50 choice.

    Problem is, this isn't how people do this. They've not invented a "common sense" way to do this that works for humans, but a way that is more suited to machine "learning". It's about pattern matching and possible associations, not knowledge of 4th grade science. They've not taught the computer 4th grade science, far from it, they've only figured out a way to winnow down the more likely answers in a multiple choice test and the computer then just guesses based on probabilities. While this is interesting, it has nothing to do with human "common sense" and is basically pointless.

    Now, if they only would teach KIDS how to take multiple choice tests using similar techniques, THAT would be something worthwhile....

    --
    "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
    1. Re:Well, that's not so amazing.... by bigdavex · · Score: 1

      Now, if they only would teach KIDS how to take multiple choice tests using similar techniques, THAT would be something worthwhile....

      I don't think it would.

      --
      -Dave
  7. We have several million ... by CaptainDork · · Score: 1

    ... fail at this once a week while watching, Are You Smarter Than A 5th Grader.

    --
    It little behooves the best of us to comment on the rest of us.
  8. Outsourcing/Automation by Required+Snark · · Score: 1
    This is clearly a step in outsourcing childhood. Why have real children take a test when both the test taking and the test generation can be automated. The kids don't need to go to school, but can work at home, i.e. play video games and eat junk food.

    Standardized test scores will go up so the education establishment will look good. Plus they can fire all the teachers and replace them with hourly contract workers with 1-HB visas who will work even more unpaid overtime then current teachers. English fluency not required.

    Naturally, there will be no cost savings to the public, because all the profit will go into the pockets of the outsourcing companies. That's what already happened with for profit higher education, and the public and the students are left holding the bag. But not too worry. Only the taxpayers, students and general investors were screwed. The insiders already got out with their guaranteed profit. It's the American way.

    --
    Why is Snark Required?
    1. Re:Outsourcing/Automation by Joe_Dragon · · Score: 1

      don't forget the K-12 student loans

  9. Test taking skills by Tony+Isaac · · Score: 1

    Test taking skills do not equal common sense.

    Multiple choice tests aren't that hard to pass, even if you don't know the material. Typically, there are four choices. Two are usually so obviously NOT the answer, that they can be easily discounted. Then it's just a matter of guessing which of the remaining two is more likely to be correct.

    If I can use this technique to pass a test on a subject I know nothing about, then a machine certainly doesn't have to have common sense to duplicate the feat.

    1. Re:Test taking skills by moeinvt · · Score: 1

      I don't see how that technique enables you to pass a test on a subject you know nothing about.
      Even assuming you can correctly eliminate two of the choices from 100% of the questions, you're guessing between the remaining choices. Over a sufficient number of questions, that technique will therefore tend to result in a score of ~ '50' which is typically not a passing grade.

    2. Re:Test taking skills by Tony+Isaac · · Score: 1

      Even if you are taking a test on subject on which you aren't an expert, you will generally know at least a few of the answers. Those questions raise your chances of getting a passing grade. No, it's not a foolproof method, but my point was that multiple choice tests are nearly always flawed, and often do more to test a person's test-taking ability, than their actual knowledge.

  10. Re:Not impressed by Coren22 · · Score: 1

    Doesn't that prove that Google's AI is actually sentient now?

    --
    APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
  11. If we can implement "common sense" in software... by TaleSpinner · · Score: 1

    ...I would suggest we make electing human beings to public office illegal.