Slashdot Mirror


Software Takes On School Science Tests In Search For Common Sense

holy_calamity writes: Making software take school tests designed for human kids can help the quest for machines with common sense, says researchers at the Allen Institute for Artificial Intelligence. They've made software called Aristo that scores 75 percent on the multiple choice questions that make up most of New York State's 4th grade science exam. The researchers are urging other researchers to pit their best software against school tests, too, to provide a way to benchmark progress and spur competition.

44 comments

  1. Hopeless by pjbgravely · · Score: 2

    Good sense is no longer common.

    --
    Star Trek, there maybe hope.
    1. Re:Hopeless by Archangel+Michael · · Score: 1

      I'm seeing the slow de-evolution into the blob people of Wall-E

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    2. Re:Hopeless by Anonymous Coward · · Score: 1

      'Common Sense' never was 'Common', or we would not have a particular phrase for it

      'Common Sense' as an idea arises when 'others' do not realize what 'we' know, therefore we are free to look down at them as 'substandard' because they lack common knowledge

      In the case of 4th grade science tests, I would be much more inclined to question the validity of the tests if they are so complex that a machine designed to answer tests cannot perform above a C average. It is much more likely that they author of the tests is expecting that they taker have particular knowledge that is not germane to the subject, but based on the idiosyncrasies of the teacher

    3. Re:Hopeless by OzPeter · · Score: 1

      I would be much more inclined to question the validity of the tests if they are so complex that a machine designed to answer tests cannot perform above a C average.

      The point of TFAI is not so much that the AI does badly but rather that the author of the AI (^w Slashvertisement) wants to use the tests as a benchmark. From TFA

      Aristo is being developed by researchers at the Allen Institute for Artificial Intelligence in Seattle, who want to give machines a measure of common sense about the world. The institute’s CEO, Oren Etzioni, says the best way to benchmark the development of their digital offspring is to use tests designed for schoolchildren. He’s trying to convince other AI researchers to adopt standardized school tests as a way to measure progress in the field.

      --
      I am Slashdot. Are you Slashdot as well?
    4. Re:Hopeless by Anonymous Coward · · Score: 0

      I'm seeing the slow de-evolution into the blob people of Wall-E

      Where are you, that is it happening slowly? Sounds nice.

    5. Re:Hopeless by spacepimp · · Score: 1

      I'm not certain you understand what the term means. You might want to Google it. Aristotle most certainly had a different interpretation of the phrase.

    6. Re:Hopeless by LifesABeach · · Score: 1

      Maybe if the computer program went to a charter school? Where the answer 'C' is found more often.

    7. Re:Hopeless by mythosaz · · Score: 3, Informative

      Test-taking is a skill, and most test-givers include clues (and even answers) in their tests. Some test-givers, of course, mean to give these clues; many are oblivious to it. If I remember some of the bigger lessons from my test-taking classes.

      Multiple choice questions, for example (which is what this software uses) often have choices like:

      Stamen
      Pistil
      Filament
      Pistol

      While some test-givers might include the homophone pistol as a red herring, words like that are a clue that the answer isn't Stamen or Filament, but that you're expected to know how to spell "Pistil."

      Similarly, if you read page-2 of a test, you might find more detailed questions regarding the pistil, questions that might spell out exact what that part of the flower does, solidifying the answer.

      Numbers in the middle of ranges are more likely correct, as are exact numbers near general numbers (e.g. Water boils at a. 10, b. 100, c. 200, d. 212, e. 2000)

      Long answers, when not absurd, are generally correct.

      Middle answers, when not randomized by test software, are more likely to be true.

      A pair of similar answers (see above, Piltil, Pistol) generally narrows you down to 50/50.

      "Absolutes" in true-false questions are almost always false, and true is more common than false.

      Continuity errors like using the wrong article (a/an) often narrow choices.

      Some test-writers who don't randomize also don't repeat answers, or never repeat beyond a limit. Patterns may emerge after simple processes reveal some of the clues.

      ----

      After practice in this test-taking class, we all took multiple choice exams on a variety of complex subjects and passed them.

    8. Re:Hopeless by Bite+The+Pillow · · Score: 1

      You posted your tagline from your blog here, which means

      1) You must be correct
      2) There's no point in continuing to do anything

      I guess that wraps it up. pjbgravely has spoken, there's nothing we can do. Let's just all commit mass extinction and get it over with. We're only killing time until the inevitable heat death of the universe, after all.

      Wait, wait a sec. When was it common?

    9. Re:Hopeless by ShanghaiBill · · Score: 1

      Tests are a terrible benchmark for AI. It would be easy to get ~75% correct just by looking for keywords and simple pattern matching, without actually understanding the question, or using any AI.

      Winograd Schema are a better test. You can't get them correct with tricks. They test actual understanding.

    10. Re:Hopeless by Macdude · · Score: 1

      I wish I knew about this when I was in school, I wasted so much time learning the subject matter...

      --
      "Grab them by the pussy" -- President of the United States of America
    11. Re:Hopeless by narcc · · Score: 3, Insightful

      It would be easy to get ~75% correct just by looking for keywords and simple pattern matching, without actually understanding the question,

      Pattern matching, without any understanding, is state-of-the-art AI.

    12. Re:Hopeless by Anonymous Coward · · Score: 0

      Common sense isn't ubiquitous, but it is common. It's pretty uncommon to say that someone lacks common sense. That said, we also have phrases for ubiquitous traits.

    13. Re:Hopeless by Anonymous Coward · · Score: 0

      Water (at standard pressure) boils at 99.9839C or 373.1339K.

      Your example is useless unless you give units - degrees Celcius, Kelvins, degrees Rankine, or degrees Fahrenheit. And nobody would pick 211.971, which is the degrees Fahrenheit answer, or 671.641 which is the degrees Rankine answer.

    14. Re:Hopeless by Anonymous Coward · · Score: 0

      And this is one of the many problems with tests in school.
      None of this has to do with acquiring *knowledge* and understanding you'll use later after your school days. (Other than sociology of school culture, which should be a class of its own.) It has everything to do with learning how to play the game. And if your teacher/professor has a personal agenda, even more so. It's not about choosing the answer that is correct, it's about choosing the same answers the tester would have wanted you to choose. If software AI can get decent grades using mere pattern matching, then what that says about the test taking culture is that understanding is *discouraged*: you merely need to fumble your way through life *appearing* as if you know something (e.g. via guessing about facts based on context of the discussion, or spouting hollow manager-friendly generalities that help no one solve any problem).

      Hint:
      If your prof has strong political opinions, make sure you choose answers that agree with the *prof*, rather than agreeing with what the textbook says or what is widely accepted by more reputable sources than your prof. Students are often left with the conundrum that if they choose correct answers, they'll actually get *lower* grades. And yet, the purpose of education should be to make people competent, independent thinkers. Instead, it's little more than indoctrination.

    15. Re:Hopeless by Anonymous Coward · · Score: 0

      A lot of these "tips" assume that the person who developed the exam understands and uses proper English. I had a protracted grading dispute in college because the professor didn't understand how commas work. The question was something like, "If you have 10 lbs. of oranges and lemons, what is the total weight of your load?" My answer was 10 lbs. because you have 10 lbs. of oranges and lemons, not 10 lbs. of oranges, and lemons. No amount of reason could convince the professor that commas a required to separate list items, and that without a comma "oranges and lemons" is one item.

    16. Re:Hopeless by penguinoid · · Score: 1

      Sufficiently advanced pattern matching is understanding.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    17. Re:Hopeless by narcc · · Score: 1

      That remains to be seen.

    18. Re:Hopeless by penguinoid · · Score: 1

      Nope, it's easy enough to prove. Suppose the AI can match the pattern of neural connections in a person who understands a concept -- then that pattern matching is at the very least understanding and perhaps more. Of course, I expect the pattern matching required to be considered understanding is much simpler than this.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    19. Re:Hopeless by narcc · · Score: 1

      It's still just idle speculation. Beliefs without evidence, are still beliefs without evidence; no matter how reasonable you believe them to be.

    20. Re:Hopeless by penguinoid · · Score: 1

      There is ridiculous amounts of evidence that neurons is how we think.

      --
      Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
    21. Re:Hopeless by narcc · · Score: 1

      Which is completely unrelated to your initial claims, which you'll quickly discover if you follow my advice here: Consider first what you've proposed, then ask what evidence exists that supports the specific model to which you've alluded.

      Then do some reading. A lot of reading, I suspect. You'll discover that what you believe is nothing more than idle speculation, with no evidence to support the claim you made in your earlier post. A lot of work has been done along those lines, none have yet proven fruitful (as it relates to your assertion).

      Just because you're beliefs seem "scientific" does not mean that those beliefs are grounded in science. It's better if you learn that now, accept reality and the cold-hard fact that we don't everything there is to know about the universe. To do anything else is to embrace pseudoscience.

      All-too-often I see pseudoscience, dressed up in the trappings of science, sold to "science fans" on the singular basis that no supernatural elements are present in their explanation. It's disgusting, but selling nonsense to under-educated skeptics is surprisingly lucrative. Don't be part of the problem you think you're solving.

  2. Myth! by Xamusk · · Score: 1

    There's no such thing as "Common sense". It's just a myth to oppress those with different opinions.

    And if you actually depend on people's common sense for things to work, you're doing it wrong./p.

    1. Re:Myth! by Ol+Olsoc · · Score: 1

      There's no such thing as "Common sense". It's just a myth to oppress those with different opinions.

      And if you actually depend on people's common sense for things to work, you're doing it wrong./p.

      And when "common sense and "science" are paired together, it's usually codespeak for creationism.

      --
      The shepherds did so well protecting the flock that the sheep no longer believed that wolves existed.
    2. Re:Myth! by Anonymous Coward · · Score: 0

      It's just a myth to oppress those with different opinions.

      No, it's a term to describe "stuff you should know just by quickly assessing a situation, without requiring a deep explanation of all of the issues surrounding it", but it's much easier to just say "common sense".

      "The stove is hot. If you touch it while it's operating, you will get burnt" is something you shouldn't have to be told more than once, and even then, only when you're under the age of 3. After that, you shouldn't have to be told at all. It's common sense.

      You first gain knowledge. (Example: That is a speeding train.) You then gain understanding. (Example: The train will hit me if I stand on these tracks.) Then you display wisdom. (Example: You move out of the way of the train!) That's how common sense works. Some people don't manifest it quickly enough. Some people don't manifest it at all. These people die when they get hit by a train. For example.

  3. 75% on a 4th grade test? by turkeydance · · Score: 1

    i'm in!

  4. Professor in New York by phantomfive · · Score: 2
    This quote from the article (by a professor in New York) is all that needs to be said:

    “What’s difficult for humans is very different from what’s difficult for machines,” says Davis, who also works on giving software common sense. “Standardized tests for humans don’t get very good coverage of the kinds of problems that are hard for computers.”

    --
    "First they came for the slanderers and i said nothing."
    1. Re:Professor in New York by tgv · · Score: 1

      The only relevant response.

  5. Not impressed by Anonymous Coward · · Score: 0

    Am I the only one who is not impressed with a 75% score on a 4th grade science multiple choice test? I'm not sure that AI is any better than:
    Load answers into an array A
    Load question into string B
    Search Google for string B
    Count number of times each index value of array A appears (eg count the number of times choice a appears, choice b, choice c, etc)
    return value of index with the highest count.

    1. Re:Not impressed by Coren22 · · Score: 1

      Doesn't that prove that Google's AI is actually sentient now?

      --
      APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
  6. Teach the test by Otome · · Score: 1

    Actually using these as benchmarks would bring "teach the test" to a whole new level.

  7. Well, that's not so amazing.... by bobbied · · Score: 1, Insightful

    75%? Everybody knows that you get 25% when you just guess randomly... So being able to add another 50% isn't all that amazing.

    Understand how they do this though... They have taken the existing study guides and have constructed an algorithm that does basic word association. Multiple choice tests are written to have one right answer, one plausible answer and two answers which are distracters, designed to trick you. So the trick to multiple choice when you don't know the answer is to identify the distracters and pick from the remaining answers. So armed with their word association, they eliminate the distractors by finding the answers that have the more closely associated words with the question seen in the study guides and throwing out the rest. this will get you easily to a pretty good solution, and where it isn't conclusive it will easily get you to a 50/50 choice.

    Problem is, this isn't how people do this. They've not invented a "common sense" way to do this that works for humans, but a way that is more suited to machine "learning". It's about pattern matching and possible associations, not knowledge of 4th grade science. They've not taught the computer 4th grade science, far from it, they've only figured out a way to winnow down the more likely answers in a multiple choice test and the computer then just guesses based on probabilities. While this is interesting, it has nothing to do with human "common sense" and is basically pointless.

    Now, if they only would teach KIDS how to take multiple choice tests using similar techniques, THAT would be something worthwhile....

    --
    "File to fit, pound to insert, paint to match" - Aircraft Maintenance 101
    1. Re:Well, that's not so amazing.... by bigdavex · · Score: 1

      Now, if they only would teach KIDS how to take multiple choice tests using similar techniques, THAT would be something worthwhile....

      I don't think it would.

      --
      -Dave
  8. We have several million ... by CaptainDork · · Score: 1

    ... fail at this once a week while watching, Are You Smarter Than A 5th Grader.

    --
    It little behooves the best of us to comment on the rest of us.
  9. Outsourcing/Automation by Required+Snark · · Score: 1
    This is clearly a step in outsourcing childhood. Why have real children take a test when both the test taking and the test generation can be automated. The kids don't need to go to school, but can work at home, i.e. play video games and eat junk food.

    Standardized test scores will go up so the education establishment will look good. Plus they can fire all the teachers and replace them with hourly contract workers with 1-HB visas who will work even more unpaid overtime then current teachers. English fluency not required.

    Naturally, there will be no cost savings to the public, because all the profit will go into the pockets of the outsourcing companies. That's what already happened with for profit higher education, and the public and the students are left holding the bag. But not too worry. Only the taxpayers, students and general investors were screwed. The insiders already got out with their guaranteed profit. It's the American way.

    --
    Why is Snark Required?
    1. Re:Outsourcing/Automation by Joe_Dragon · · Score: 1

      don't forget the K-12 student loans

    2. Re:Outsourcing/Automation by Required+Snark · · Score: 0

      That will be a part of Jeb Bush's education reform.

      --
      Why is Snark Required?
  10. Common sense is dead by Anonymous Coward · · Score: 0

    Common sense is dead. As evidence of this claim, I submit:

    http://news.slashdot.org/story/15/09/09/2110218/researcher-the-us-owes-the-world-4-trillion-for-trashing-the-climate?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Slashdot%2Fslashdot%2Fto+((Title)Slashdot+(rdf))

  11. Test taking skills by Tony+Isaac · · Score: 1

    Test taking skills do not equal common sense.

    Multiple choice tests aren't that hard to pass, even if you don't know the material. Typically, there are four choices. Two are usually so obviously NOT the answer, that they can be easily discounted. Then it's just a matter of guessing which of the remaining two is more likely to be correct.

    If I can use this technique to pass a test on a subject I know nothing about, then a machine certainly doesn't have to have common sense to duplicate the feat.

    1. Re:Test taking skills by moeinvt · · Score: 1

      I don't see how that technique enables you to pass a test on a subject you know nothing about.
      Even assuming you can correctly eliminate two of the choices from 100% of the questions, you're guessing between the remaining choices. Over a sufficient number of questions, that technique will therefore tend to result in a score of ~ '50' which is typically not a passing grade.

    2. Re:Test taking skills by Tony+Isaac · · Score: 1

      Even if you are taking a test on subject on which you aren't an expert, you will generally know at least a few of the answers. Those questions raise your chances of getting a passing grade. No, it's not a foolproof method, but my point was that multiple choice tests are nearly always flawed, and often do more to test a person's test-taking ability, than their actual knowledge.

  12. Not very smart... by Anonymous Coward · · Score: 0

    My son's score was 95%. And he was just in grade 2 ;)

  13. If we can implement "common sense" in software... by TaleSpinner · · Score: 1

    ...I would suggest we make electing human beings to public office illegal.