Replacing the Turing Test
mikejuk writes A plan is afoot to replace the Turing test as a measure of a computer's ability to think. The idea is for an annual or bi-annual Turing Championship consisting of three to five different challenging tasks. A recent workshop at the 2015 AAAI Conference of Artificial Intelligence was chaired by Gary Marcus, a professor of psychology at New York University. His opinion is that the Turing Test had reached its expiry date and has become "an exercise in deception and evasion." Marcus points out: the real value of the Turing Test comes from the sense of competition it sparks amongst programmers and engineers which has motivated the new initiative for a multi-task competition. The one of the tasks is based on Winograd Schemas. This requires participants to grasp the meaning of sentences that are easy for humans to understand through their knowledge of the world. One simple example is: "The trophy would not fit in the brown suitcase because it was too big. What was too big?" Another suggestion is for the program to answer questions about a TV program: No existing program — not Watson, not Goostman, not Siri — can currently come close to doing what any bright, real teenager can do: watch an episode of "The Simpsons," and tell us when to laugh. Another is called the "Ikea" challenge and asks for robots to co-operate with humans to build flat-pack furniture. This involves interpreting written instructions, choosing the right piece, and holding it in just the right position for a human teammate. This at least is a useful skill that might encourage us to welcome machines into our homes.
The problem with the Turing Test is that it's so often done wrong. The test is supposed to be adversarial, with two humans and a computer. One human (the investigator) has two terminals and can communicate with the other human and the computer, but doesn't know which is which. The goal of the computer is to convince the investigator that it is the human. The goal of the second human (the foil) is to convince the investigator that he or she is the human. This is then supposed to be repeated with different investigators and foils, and only when a statistically significant portion of the investigators fail is the test passed by the computer.
Investigators should be trying to find which one is human, not simply chatting with the computer. Too often people are simply connected to a chatbot and not told that it might be a computer until after the fact, no foil is involved, etc. The test is also often declared to be passed if even a single investigator fails.
Not a sentence!
TFA repeats a common misconception about the Turing Test. It is not a test of whether an AI can fool an average person, but whether it can fool an expert. ELIZA would never fool an AI expert, because that expert would be well aware than even a simple algorithm can be quite good at generating vacuous chit-chat. The pronoun disambiguation is a good test, because AI does that poorly, and humans do it well. But that is not a replacement for the Turing Test, that IS the Turing Test. Using humor is a good way to distinguish AI from humans. As anyone who has learned a foreign language, or raised children, knows, "getting jokes" is one of the last skills mastered. Humor often requires not only knowledge about the physical world, but deep understanding of cultural nuances. But I am not sure how useful that is, since no current AI would come close to passing it, and understanding jokes is probably not the most economically useful target for current AI research.