Slashdot Mirror


Replacing the Turing Test

mikejuk writes A plan is afoot to replace the Turing test as a measure of a computer's ability to think. The idea is for an annual or bi-annual Turing Championship consisting of three to five different challenging tasks. A recent workshop at the 2015 AAAI Conference of Artificial Intelligence was chaired by Gary Marcus, a professor of psychology at New York University. His opinion is that the Turing Test had reached its expiry date and has become "an exercise in deception and evasion." Marcus points out: the real value of the Turing Test comes from the sense of competition it sparks amongst programmers and engineers which has motivated the new initiative for a multi-task competition. The one of the tasks is based on Winograd Schemas. This requires participants to grasp the meaning of sentences that are easy for humans to understand through their knowledge of the world. One simple example is: "The trophy would not fit in the brown suitcase because it was too big. What was too big?" Another suggestion is for the program to answer questions about a TV program: No existing program — not Watson, not Goostman, not Siri — can currently come close to doing what any bright, real teenager can do: watch an episode of "The Simpsons," and tell us when to laugh. Another is called the "Ikea" challenge and asks for robots to co-operate with humans to build flat-pack furniture. This involves interpreting written instructions, choosing the right piece, and holding it in just the right position for a human teammate. This at least is a useful skill that might encourage us to welcome machines into our homes.

8 of 129 comments (clear)

  1. Re:TL;DR People doesn't understand the Turing test by DMUTPeregrine · · Score: 5, Insightful

    The problem with the Turing Test is that it's so often done wrong. The test is supposed to be adversarial, with two humans and a computer. One human (the investigator) has two terminals and can communicate with the other human and the computer, but doesn't know which is which. The goal of the computer is to convince the investigator that it is the human. The goal of the second human (the foil) is to convince the investigator that he or she is the human. This is then supposed to be repeated with different investigators and foils, and only when a statistically significant portion of the investigators fail is the test passed by the computer.

    Investigators should be trying to find which one is human, not simply chatting with the computer. Too often people are simply connected to a chatbot and not told that it might be a computer until after the fact, no foil is involved, etc. The test is also often declared to be passed if even a single investigator fails.

    --
    Not a sentence!
  2. Re:TL;DR People doesn't understand the Turing test by ShanghaiBill · · Score: 5, Insightful

    TFA repeats a common misconception about the Turing Test. It is not a test of whether an AI can fool an average person, but whether it can fool an expert. ELIZA would never fool an AI expert, because that expert would be well aware than even a simple algorithm can be quite good at generating vacuous chit-chat. The pronoun disambiguation is a good test, because AI does that poorly, and humans do it well. But that is not a replacement for the Turing Test, that IS the Turing Test. Using humor is a good way to distinguish AI from humans. As anyone who has learned a foreign language, or raised children, knows, "getting jokes" is one of the last skills mastered. Humor often requires not only knowledge about the physical world, but deep understanding of cultural nuances. But I am not sure how useful that is, since no current AI would come close to passing it, and understanding jokes is probably not the most economically useful target for current AI research.

  3. Give it a pile of junk by jimmydevice · · Score: 2

    And tell it to make something useful.
    Virtual junk is okay and any virtual tools can be used.

  4. Of course there's a movement to replace it. by hey! · · Score: 2

    And of course there should be. But that doesn't diminish the importance of the Turing test.

    The Turing test has two huge and closely related advantages (1) it is conceptually simple and (2) it takes no philosophical position on the fundamental nature of "intelligence". That such huge advantages necessarily entails certain disadvantages should come as no surprise to anyone.

    The Turing test treats intelligence as a black box, but once you've contrived to pass the test the next logical step is to open up that black box and ask whether it's reasonable to consider what's inside "intelligent" or a tricky gimmick. That's a messy question, and that's *why* something like the Turing test is valuable. It is what social scientists call an "operational definition"; it allows us to explore an idea we haven't quite nailed down yet, which is a reasonable first step toward creating a useful *conceptual* definition. Science builds theories inductively from observations, after all.

    If the Turing test were a suitable *conceptual* definition of intelligence than an intelligent agent would never fail it, but we know that can and does happen. We have to assume as well that people can be fooled by something nobody would really call "intelligence". Stage magicians do this all the time by manipulating audience expectations and assumptions.

    Think of the Turing test as a screening test. Science is full of such procedures -- simple, cheap tests that aren't definitive but allow us to discard events that are probably not worth putting a lot of effort into.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
  5. Re:Why a human in the IKEA challenge? by itzly · · Score: 2

    The extra human is to see whether the computer will end up in a emotional argument with the human about the best way to interpret the instructions.

  6. How about replacing it with the ORIGINAL Test by aix+tom · · Score: 2

    The original Turing Test, as published in "Computing Machinery and Intelligence" as "Imitation Game" was not about whether a machine could successfully pretend to be a human.

    He proposed a test, where a computer and men both pretended to be women, and the test would be passed if the computer would be more successful in lying about being a woman than the men were.

    http://en.wikipedia.org/wiki/T...

  7. Re:When it's funny? by Sarten-X · · Score: 4, Funny

    The most common underlying basis of humor is subverted expectations. We expect people to behave according to the norms of society, we expect people to act to the best of their intelligence, we expect misfortune to be avoided, and we expect that words will be used according to their common meanings.

    Subvert any of those expectations, and you have various kinds of humor. How funny a particular joke is perceived to be is related to how strongly the viewer is attached to their expectations. Since a computer is only an expert in the things they've been explicitly exposed to, it's very difficult to subvert their expectations. Watson would be familiar with all of the meanings of each word in a script, for example, so it would have a difficult time identifying the usual meaning that a human would expect from a situation, and would therefore likely fail to notice that when a different meaning was used, it was an attempt at humor.

    As another example, consider a military comedy, like Good Morning, Vietnam. Much of the humor is derived from Robin Williams' fast-paced ad-lib radio show contrasting the rigid military structure, and the inversion where a superior at the radio station is practically inferior in every way. A computer, properly educated in the norms of military behavior, might recognize that the characters' behaviors are contrary to expectations, but then to really understand the jokes, the computer must also have an encyclopedic knowledge of pop culture from the period to understand why Williams' antics were more than just absurd drivel.

    Finally, a computer must also understand that humor is also based largely on the history of humor. Age-old jokes can become funny again simply because they aren't funny in their original context any more, so their use in a new context is a subverted expectation in itself. Common joke patterns have also become fixed in human culture, such that merely following a pattern (like the Russian Reversal) is a joke in itself.

    Algorithms simply haven't combined all of the relevant factors yet to recognize humor. Here on Slashdot, for instance, a computer would need to recognize the intellectual context, the pacing of a comment, the pattern of speech, and even how much class a commenter maintains, in order to realize when someone is trying to be funny.

    Poop.

    --
    You do not have a moral or legal right to do absolutely anything you want.
  8. Re:TL;DR People doesn't understand the Turing test by AthanasiusKircher · · Score: 4, Insightful

    The pronoun disambiguation is a good test, because AI does that poorly, and humans do it well. But that is not a replacement for the Turing Test, that IS the Turing Test.

    Indeed. Here's an excerpt from Turing's original paper that described the "imitation game," replying to a possible objection that his test would not be able to be used to gauge true understanding as a human might:

    Probably [the objector to the test] would be quite willing to accept the imitation game as a test. The game (with the player B omitted) is frequently used in practice under the name of viva voce to discover whether some one really understands something or has "learnt it parrot fashion." Let us listen in to a part of such a viva voce:

    Interrogator: In the first line of your sonnet which reads "Shall I compare thee to a summer's day," would not "a spring day" do as well or better?

    Witness: It wouldn't scan.

    Interrogator: How about "a winter's day," That would scan all right.

    Witness: Yes, but nobody wants to be compared to a winter's day.

    Interrogator: Would you say Mr. Pickwick reminded you of Christmas?

    Witness: In a way.

    Interrogator: Yet Christmas is a winter's day, and I do not think Mr. Pickwick would mind the comparison.

    Witness: I don't think you're serious. By a winter's day one means a typical winter's day, rather than a special one like Christmas.

    And so on, What would Professor Jefferson say if the sonnet-writing machine was able to answer like this in the viva voce? I do not know whether he would regard the machine as "merely artificially signalling" these answers, but if the answers were as satisfactory and sustained as in the above passage I do not think he would describe it as "an easy contrivance."

    THAT is the sort of standard of AI that Turing was envisioning could be passed in his "test." It isn't a computer pretending to be a non-responsive teenager with an attitude problem who doesn't really speak the same language as the interrogator (as some chatbots might claim).

    It's an idea of AI as something that could debate word replacement in a Shakespearean sonnet, would understand and be able to process poetic scansion, understand the subtle word meanings and connotations in language, and be able to synthesize these various things together while applying such concepts to evaluations of classic literary references.

    Turing's test then assumes an AI competent enough to have a flawless conversation on the level of a bright university student or even a colleague of Turing's. Now, granted, we might find the literature quiz a little unnecessary, but in a more general sense this example gets at the idea of probing the AI's understanding of concepts, connecting disparate uses of things together (like a literary character to an abstract concept to a matter of style or poetic form), and in general a fluent and adaptive recognition of linguistic meaning.

    I think we would all agree that the various chatbots that have claimed in recent years to have "passed the Turing test" are NOWHERE near this level.

    This is the kind of standard Turing himself explicitly mentioned in his original article on the test. And frankly, if I encountered an AI that could have a conversation this fluid and wide-ranging (even if not on literature specifically) in flawless English, I'd be happy to declare it "intelligent." But we don't have anything close to that -- and pretending the "Turing test" is obsolete and needs to be more strict is misunderstanding the ridiculously high expectations Turing himself set out many decades ago.