Software Takes On School Science Tests In Search For Common Sense
holy_calamity writes: Making software take school tests designed for human kids can help the quest for machines with common sense, says researchers at the Allen Institute for Artificial Intelligence. They've made software called Aristo that scores 75 percent on the multiple choice questions that make up most of New York State's 4th grade science exam. The researchers are urging other researchers to pit their best software against school tests, too, to provide a way to benchmark progress and spur competition.
Test-taking is a skill, and most test-givers include clues (and even answers) in their tests. Some test-givers, of course, mean to give these clues; many are oblivious to it. If I remember some of the bigger lessons from my test-taking classes.
Multiple choice questions, for example (which is what this software uses) often have choices like:
Stamen
Pistil
Filament
Pistol
While some test-givers might include the homophone pistol as a red herring, words like that are a clue that the answer isn't Stamen or Filament, but that you're expected to know how to spell "Pistil."
Similarly, if you read page-2 of a test, you might find more detailed questions regarding the pistil, questions that might spell out exact what that part of the flower does, solidifying the answer.
Numbers in the middle of ranges are more likely correct, as are exact numbers near general numbers (e.g. Water boils at a. 10, b. 100, c. 200, d. 212, e. 2000)
Long answers, when not absurd, are generally correct.
Middle answers, when not randomized by test software, are more likely to be true.
A pair of similar answers (see above, Piltil, Pistol) generally narrows you down to 50/50.
"Absolutes" in true-false questions are almost always false, and true is more common than false.
Continuity errors like using the wrong article (a/an) often narrow choices.
Some test-writers who don't randomize also don't repeat answers, or never repeat beyond a limit. Patterns may emerge after simple processes reveal some of the clues.
----
After practice in this test-taking class, we all took multiple choice exams on a variety of complex subjects and passed them.