Replacing the Turing Test
mikejuk writes A plan is afoot to replace the Turing test as a measure of a computer's ability to think. The idea is for an annual or bi-annual Turing Championship consisting of three to five different challenging tasks. A recent workshop at the 2015 AAAI Conference of Artificial Intelligence was chaired by Gary Marcus, a professor of psychology at New York University. His opinion is that the Turing Test had reached its expiry date and has become "an exercise in deception and evasion." Marcus points out: the real value of the Turing Test comes from the sense of competition it sparks amongst programmers and engineers which has motivated the new initiative for a multi-task competition. The one of the tasks is based on Winograd Schemas. This requires participants to grasp the meaning of sentences that are easy for humans to understand through their knowledge of the world. One simple example is: "The trophy would not fit in the brown suitcase because it was too big. What was too big?" Another suggestion is for the program to answer questions about a TV program: No existing program — not Watson, not Goostman, not Siri — can currently come close to doing what any bright, real teenager can do: watch an episode of "The Simpsons," and tell us when to laugh. Another is called the "Ikea" challenge and asks for robots to co-operate with humans to build flat-pack furniture. This involves interpreting written instructions, choosing the right piece, and holding it in just the right position for a human teammate. This at least is a useful skill that might encourage us to welcome machines into our homes.
The Turing test was a CONCEPT, not an actual test.
"The trophy would not fit in the brown suitcase because it was too big. What was too big?"
If you change this to "The trophy would not fit in the brown suitcase snugly because it was too big" I wouldn't be able to answer it, either.
When the copyright term is "forever minus a day", live every day like it's the last.
I like the idea of the IKEA challenge but why include a human? I would think having a robot
open a box, pull out the instructions, and assemble the piece of furniture would be huge.
Having a person involved just muddles the issue. You obviously might have to start with
simple furniture but this seems like a worthwhile challenge as assembling furniture at
times can even stump humans.
Clever programming and mechanics do not make "AI" and human "robots." Interesting machines, but nothing more. Nature is not an idiot.
E Proelio Veritas.
Thats fair. However, the article is fair in its opinion too.
What was good in 1950 may not be so relevant anymore.
The base of the test is probably fine. But an updated one for things we want an AI to do today is a good idea.
Much like the ACID tests for browsers. They help set the bar for what we want out of our computers.
Right now most 'AI' is brute force depth searching with some statistical weighting. Is that AI?
The difference is that the "swiftboaters" were lying and ended up getting sued.
Really -- someone suggests a computer program could identify when to laugh at a sitcom? When humans are likely to disagree rather strongly about which parts are the funniest? Heck, even Mycroft's first jokes were on the weak side of humor. It took a lot of coaching from the humans to get (his) jokes classy.
https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
The problem with the Turing Test is that it's so often done wrong. The test is supposed to be adversarial, with two humans and a computer. One human (the investigator) has two terminals and can communicate with the other human and the computer, but doesn't know which is which. The goal of the computer is to convince the investigator that it is the human. The goal of the second human (the foil) is to convince the investigator that he or she is the human. This is then supposed to be repeated with different investigators and foils, and only when a statistically significant portion of the investigators fail is the test passed by the computer.
Investigators should be trying to find which one is human, not simply chatting with the computer. Too often people are simply connected to a chatbot and not told that it might be a computer until after the fact, no foil is involved, etc. The test is also often declared to be passed if even a single investigator fails.
Not a sentence!
Computers have pooped on me LOTS of times.
TFA repeats a common misconception about the Turing Test. It is not a test of whether an AI can fool an average person, but whether it can fool an expert. ELIZA would never fool an AI expert, because that expert would be well aware than even a simple algorithm can be quite good at generating vacuous chit-chat. The pronoun disambiguation is a good test, because AI does that poorly, and humans do it well. But that is not a replacement for the Turing Test, that IS the Turing Test. Using humor is a good way to distinguish AI from humans. As anyone who has learned a foreign language, or raised children, knows, "getting jokes" is one of the last skills mastered. Humor often requires not only knowledge about the physical world, but deep understanding of cultural nuances. But I am not sure how useful that is, since no current AI would come close to passing it, and understanding jokes is probably not the most economically useful target for current AI research.
An AI to add a laugh track to the Simpsons so you'll know when there has been a joke.
Sheesh, evil *and* a jerk. -- Jade
It seems like the startup investors would get sucked in then. Way more cool to be 2.0 than 1.0.
And tell it to make something useful.
Virtual junk is okay and any virtual tools can be used.
You don't need two terminals. All you need is the human interrogator, and have him/her talk to either a human or a computer. I agree that they must be aware of the challenge, and also have the ability to ask some decent questions.
And of course there should be. But that doesn't diminish the importance of the Turing test.
The Turing test has two huge and closely related advantages (1) it is conceptually simple and (2) it takes no philosophical position on the fundamental nature of "intelligence". That such huge advantages necessarily entails certain disadvantages should come as no surprise to anyone.
The Turing test treats intelligence as a black box, but once you've contrived to pass the test the next logical step is to open up that black box and ask whether it's reasonable to consider what's inside "intelligent" or a tricky gimmick. That's a messy question, and that's *why* something like the Turing test is valuable. It is what social scientists call an "operational definition"; it allows us to explore an idea we haven't quite nailed down yet, which is a reasonable first step toward creating a useful *conceptual* definition. Science builds theories inductively from observations, after all.
If the Turing test were a suitable *conceptual* definition of intelligence than an intelligent agent would never fail it, but we know that can and does happen. We have to assume as well that people can be fooled by something nobody would really call "intelligence". Stage magicians do this all the time by manipulating audience expectations and assumptions.
Think of the Turing test as a screening test. Science is full of such procedures -- simple, cheap tests that aren't definitive but allow us to discard events that are probably not worth putting a lot of effort into.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
There are no "current AIs", as that would require them to be intelligent, which in turn requires them to be conscious. And they aren't. Not even close.
If we really want to water the term "intelligence" down so that is applies to any clever algorithm, then perhaps we should spend a few minutes contemplating what it is we plan to call an actual intelligence, once we get that far.
"AI" has reasonable meaning in the context of research towards that goal, or as a bright line in the sand that we have yet to reach, much less cross.
I've fallen off your lawn, and I can't get up.
understanding jokes is probably not the most economically useful target for current AI research.
A joke detector? That's funny. I mean a sarcasm detector? That's real useful.
What has become of those compression tests? Wasn't the answer to AI not (at least partially) found in the ability to compress?
Religion is what happens when nature strikes and groupthink goes wrong.
A test of intelligence should be dealing with unforeseen input. The problem with chatbots is that they are just giving pre-planned responses. How about trying to land a rocket on the moon while being bombarded with spurious input from a radar device that was accidentally left on? Given the computers in use by NASA in 1969 that's pretty intelligent behavior.
Another would be landing a rocket on a small floating platform. We'll see how that plays out tonight.
That's all we need. Computers with a sense of humour:
"Oops! I deleted all your files!"
"Just kidding. I moved them to your Documents folder. :P"
I do not fail; I succeed at finding out what does not work.
The Turing Test has been abused, bypassed, and cheated to the point that almost no one knows what the actual Turing Test is. At this point, a new test needs to be created, a test that is difficult to cheat without making it obvious that it's not the real test. This could be "The Real Turing Test administered by [reputable group]".
Or we could make a new test, with incredibly explicit criteria that no one can nerf with a straight face and a different name. But from the sounds of it, it would be an easier test.
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
The original Turing Test, as published in "Computing Machinery and Intelligence" as "Imitation Game" was not about whether a machine could successfully pretend to be a human.
He proposed a test, where a computer and men both pretended to be women, and the test would be passed if the computer would be more successful in lying about being a woman than the men were.
http://en.wikipedia.org/wiki/T...
I don't see that as a problem with the test itself.
I see that as various individuals trying to cheat in order to claim that they have achieved something they have not.
Suppose someone claimed to have beaten the world record for the 200 meter dash. But could only do it with a 190 meter headstart.
Okay, no headstart but I get to use a motorcycle.
Okay, okay, no headstart and no motorcycle but I will be using "meters" that are 10cm long.
No one would bother reporting on those because those are STUPID.
But the equivalent claims can be made about "beating" the Turing Test because the people reporting on it are STUPID. As you've pointed out, the test itself is easy to set up and easy to verify. There is no problem with the test.
A long time ago I used to work in the field of AI (Expert systems and Neural nets). IT frustrates and pisses me off no end how frequently press and even a lot of IT people fail to understand what is essentially a straight forward test and then complain about it being inadequate for modern computing. No computer has passed it, not even close. The test REQUIRES, a human and a computer, The test REQUIRES the expert to be aware that one is human and the other a computer. The test REQUIRES that they then get to interrogate both to try to discover which is which (not ask one question, not read a piece of text generated and then try and guess, they get to question them for considerable time). The test REQUIRES this to be done many times to get a statistically significant sampling with differing test subjects. The test is as relevant today as it was when it was devised.
I quite liked how they handled it in recent film "The Machine" ( http://www.imdb.com/title/tt23... ). Questions like "Which smells better, a hospital corridor or a donkeys ass?" and "Mary saw a puppy in the window and she wanted it. What did Mary want?"
I am a viral sig. Please copy me and help me spread. Thank you.
Actually, I think we do. We at least have an actual model, free of woo-woo, for which no counter evidence has been brought forth as yet.
Even the low level stuff seems to finally be yielding some clarity.
I've fallen off your lawn, and I can't get up.
No, they have to talk to both a human (trying to convince the investigator that he/she is human) and a computer. Removing the foil means it's not the Turing test anymore, it's a very different test.
Not a sentence!
You are correct, I should have said it's a problem with the popular conception of the Turing Test. The popular descriptions in the media are quite unlike the test described by Turing.
Not a sentence!
Bullshit. http://www.artificial-intelligence.com/comic/7
People's vanity about human exceptionalism has them move the goal-posts as required to preserve their sense of identity at the top of the food chain.
AI is already able to beat people at most isolated tasks. With ROS uniting the disparate fractured efforts under a single framework, the inefficiency of researchers working on problems in isolation from each other has been solved. There's a standard now, and a couple versions of jQuery from now: your personal shopper sales AI will be loading your psychological profile push-buttons as a cookie and monetizing the fuck out of human frailty.
H1B visas will be able to replace the human touch of the retail experience in under 10 lines of code. The singularity is already here, most people are just too blind to recognize what's staring them in the face. The machines will have us by the balls long before they start cackling like a super villain from a movie. We're already working for them in the same way alcoholics work for the bottle.
It won't be long until the contents of every written word on the internet will be linguistically fingerprinted identifying the author better than an IP address. All the sock-puppets will fall off and a search engine like archive.org will allow you to track down every word written online by anyone given on a writing sample.
The cylons won. Humans are on the retreat.
The Turing Test is a thought experiment. It's just saying "if you can talk to this, and can't tell if it's a person or a computer, then it doesn't matter: it's intellegent." It's not a method for a scientific, practical process. It's just something to think about when considering what might constitute intelligence.
"The WOPR spends all of its time thinking about [Turing Tests]. 24 hours a day, 365 days a year, it plays an endless series of [Turing test 'games'], using all available information on the state of [human sentience]. It has already proved the existence of [machine intelligence] as a game, time and time again. It estimates human and machine responses to our test responses to their responses, and so on. Estimates probabilities, tallies the score, and it looks for ways to ---"
"The point is, key decisions of every available option in determining [the presence of Artificial Intelligence] have already been made by the WOPR."
"So what you're really telling me is all this trillion dollar hardware is really at the mercy of those men with the little brass keys...?"
"That's exactly right. Whose only problem is that they are human beings. In 30 days, we could upgrade the Turing Test scoring process with electronic relays. Get the men out of the loop."
Which... as it would seem... we might all welcome, I for one.
And then, 150,000 years later...
<blink>down the rabbit hole</blink>
"Describe in single words, only the good things that come into your mind. About your mother."
Have gnu, will travel.
Exactly! The actual Turing test is a great test, but the common modifications remove its ability to determine anything of interest.
Not a sentence!
The pronoun disambiguation is a good test, because AI does that poorly, and humans do it well. But that is not a replacement for the Turing Test, that IS the Turing Test.
Indeed. Here's an excerpt from Turing's original paper that described the "imitation game," replying to a possible objection that his test would not be able to be used to gauge true understanding as a human might:
Probably [the objector to the test] would be quite willing to accept the imitation game as a test. The game (with the player B omitted) is frequently used in practice under the name of viva voce to discover whether some one really understands something or has "learnt it parrot fashion." Let us listen in to a part of such a viva voce:
Interrogator: In the first line of your sonnet which reads "Shall I compare thee to a summer's day," would not "a spring day" do as well or better?
Witness: It wouldn't scan.
Interrogator: How about "a winter's day," That would scan all right.
Witness: Yes, but nobody wants to be compared to a winter's day.
Interrogator: Would you say Mr. Pickwick reminded you of Christmas?
Witness: In a way.
Interrogator: Yet Christmas is a winter's day, and I do not think Mr. Pickwick would mind the comparison.
Witness: I don't think you're serious. By a winter's day one means a typical winter's day, rather than a special one like Christmas.
And so on, What would Professor Jefferson say if the sonnet-writing machine was able to answer like this in the viva voce? I do not know whether he would regard the machine as "merely artificially signalling" these answers, but if the answers were as satisfactory and sustained as in the above passage I do not think he would describe it as "an easy contrivance."
THAT is the sort of standard of AI that Turing was envisioning could be passed in his "test." It isn't a computer pretending to be a non-responsive teenager with an attitude problem who doesn't really speak the same language as the interrogator (as some chatbots might claim).
It's an idea of AI as something that could debate word replacement in a Shakespearean sonnet, would understand and be able to process poetic scansion, understand the subtle word meanings and connotations in language, and be able to synthesize these various things together while applying such concepts to evaluations of classic literary references.
Turing's test then assumes an AI competent enough to have a flawless conversation on the level of a bright university student or even a colleague of Turing's. Now, granted, we might find the literature quiz a little unnecessary, but in a more general sense this example gets at the idea of probing the AI's understanding of concepts, connecting disparate uses of things together (like a literary character to an abstract concept to a matter of style or poetic form), and in general a fluent and adaptive recognition of linguistic meaning.
I think we would all agree that the various chatbots that have claimed in recent years to have "passed the Turing test" are NOWHERE near this level.
This is the kind of standard Turing himself explicitly mentioned in his original article on the test. And frankly, if I encountered an AI that could have a conversation this fluid and wide-ranging (even if not on literature specifically) in flawless English, I'd be happy to declare it "intelligent." But we don't have anything close to that -- and pretending the "Turing test" is obsolete and needs to be more strict is misunderstanding the ridiculously high expectations Turing himself set out many decades ago.
It is not a test of whether an AI can fool an average person, but whether it can fool an expert.
You are not allowed to redefine the test just because it makes you more comfortable to do so. The original paper simply said "A man, a woman, and an interrogator". It did not qualify that interrogator as an expert, but simply the one who poses the questions (thus, an interrogator)
Well, please re-read the original paper.
You are correct that the original test did not specify an AI expert as interrogator. On the other hand, read the types of dialogue Turing offers as examples. It's very clear that he is imagining "interrogators" (note that word -- it implies someone with a strong drive to ask probing questions) who are not only quite intelligent but also keep asking very probing questions designed to test the intellect of the person/thing on the other side.
The standard is clearly NOT, "Gee, can I have a nice small talk conversation?" Instead, the "interrogator" uses questions varying from computational problems to chess problems to questions about composing a sonnet to detailed discussion of subtle linguistic meanings in English, related in abstract ways to classic literature.
That doesn't sound like your "average Joe" interrogator to me. Does it to you? I'm sure Turing didn't expect all his interrogators to be so intelligent, but they were clearly expected (based on his sample dialogues) to understand how to probe intelligence at a pretty sophisticated level.
"Mary saw a puppy in the window and she wanted it. What did Mary want?"
An ambiguous subject in a phrase is a classic problem in AI, however natural language algorithms (such as the one found in Watson) have been able to resolve ambiguous statements like your "Mary" example and the trophy/bag example in TFS, for over a decade now. The trick to resolving such ambiguities is the same one used by humans; context, probability, and lots of prior examples.
And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
The Turing Test is a thought experiment. It's just saying "if you can talk to this, and can't tell if it's a person or a computer, then it doesn't matter: it's intellegent." It's not a method for a scientific, practical process.
If that's true, then why did Turing claim in his original paper that by the year 2000, computers would be able to fool humans and "pass the test" 30% of the time? Why state such a specific prediction for a test that was not intended to be practical and only a "thought experiment"?
It's just something to think about when considering what might constitute intelligence.
Why can't it be both? In Turing's time (and still today) there were (and are) people who think real strong human-like AI is impossible. In order to evaluate "intelligence," though, we need a standard test that we could agree on. Turing attempted to roughly define the outlines of such a test, which also involved a lot of philosophical debate. On the other hand, he predicted within 50 years of his paper that computers would be around which could pass this test, which suggests that he thought it was in fact a practical (if a little vague) way of gauging progress in AI.
A rigorous definition of general intelligence now exists and has been applied by the Deep Mind folks. See this video lecture by Deep Mind's Shane Legg at Singularity Summit 2010 on a new metric for measuring machine intelligence.
If you want something more accessible to the general public, The Hutter Prize for Lossless Compression of Human Knowledge has the same theoretic basis as the test used by Deep Mind and has the virtue that it uses a natural language criterion, in the form of a Wikipedia snapshot. If the 100M snapshot of Wikipedia used by the Hutter Prize is no longer challenging enough, then substitute Matt Mahoney's Large Text Compression Benchmark which is basically just the Hutter Prize enlarged by an order of magnitude.
Seastead this.
To easily detect most AI, tell it this:
The Turing Test was set up as a three-entity interaction: one questioner, one human, and one AI. The questioner is supposed to converse with both the human and the AI (presumably by typing and reading messages), and decide which of the others is the human and which the AI. There was no mention of expertise in any field, and it would be hard for Turing to put that in since there were no AI experts in Turing's day.
Two of the questions could be put into the Turing test easily: the pronoun assignment one and the when-to-laugh one, although the latter would have to be in reference to something the questionee claimed to have seen. The assembly one couldn't be part of it, but is a good test.
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes
Turing's test was about the ability to imitate human behavior/knowledge. The real question we need to answer I will call the Mycroft test. The purpose of the test is to determine if the program has earned the right to not be turned off, that is, does it have a right to a trial before it is "terminated"? A program that has earned that right has crossed the blurry line between inanimate and "human" in a way that should be important to us. Defining a test that can measure this is at the heart of deciding what makes us us, vs what makes us tick.
"There is no god but allah" - well, they got it half right.
Let me see if I've got this straight. If you can watch an episode of the Simpsons and know when to laugh, then you're intelligent.
Or at least a real person.
Better go with answer number two. Doh!