Replacing the Turing Test
mikejuk writes A plan is afoot to replace the Turing test as a measure of a computer's ability to think. The idea is for an annual or bi-annual Turing Championship consisting of three to five different challenging tasks. A recent workshop at the 2015 AAAI Conference of Artificial Intelligence was chaired by Gary Marcus, a professor of psychology at New York University. His opinion is that the Turing Test had reached its expiry date and has become "an exercise in deception and evasion." Marcus points out: the real value of the Turing Test comes from the sense of competition it sparks amongst programmers and engineers which has motivated the new initiative for a multi-task competition. The one of the tasks is based on Winograd Schemas. This requires participants to grasp the meaning of sentences that are easy for humans to understand through their knowledge of the world. One simple example is: "The trophy would not fit in the brown suitcase because it was too big. What was too big?" Another suggestion is for the program to answer questions about a TV program: No existing program — not Watson, not Goostman, not Siri — can currently come close to doing what any bright, real teenager can do: watch an episode of "The Simpsons," and tell us when to laugh. Another is called the "Ikea" challenge and asks for robots to co-operate with humans to build flat-pack furniture. This involves interpreting written instructions, choosing the right piece, and holding it in just the right position for a human teammate. This at least is a useful skill that might encourage us to welcome machines into our homes.
How about learning what it is about before giving idiotic opinions?
The Turing test was a CONCEPT, not an actual test.
To really foul things up, you need a computer. -- Paul Ehrlich
I'll build the furniture on my own, thank you very much.
When a computer can poop then it is A.I. Because everybody poops..everybody but computers, that is.
"The trophy would not fit in the brown suitcase because it was too big. What was too big?"
If you change this to "The trophy would not fit in the brown suitcase snugly because it was too big" I wouldn't be able to answer it, either.
When the copyright term is "forever minus a day", live every day like it's the last.
I like the idea of the IKEA challenge but why include a human? I would think having a robot
open a box, pull out the instructions, and assemble the piece of furniture would be huge.
Having a person involved just muddles the issue. You obviously might have to start with
simple furniture but this seems like a worthwhile challenge as assembling furniture at
times can even stump humans.
that should be the criteria
Clever programming and mechanics do not make "AI" and human "robots." Interesting machines, but nothing more. Nature is not an idiot.
E Proelio Veritas.
fit
v. fitted or fit, fitted, fitting, fits
v.tr.
1.a. To be the proper size and shape for. e.g: These shoes fit me.
Really -- someone suggests a computer program could identify when to laugh at a sitcom? When humans are likely to disagree rather strongly about which parts are the funniest? Heck, even Mycroft's first jokes were on the weak side of humor. It took a lot of coaching from the humans to get (his) jokes classy.
https://app.box.com/WitthoftResume Code: https://github.com/cellocgw
It's not about moving the goalposts, it's that bullshit "solutions" are coming up. Like programming a chatbot pretend they are a child that does not speak the language well. That's not AI, that's meta-gaming.
The goal is to come up with challenges that are less exploitable.
An AI to add a laugh track to the Simpsons so you'll know when there has been a joke.
Sheesh, evil *and* a jerk. -- Jade
It seems like the startup investors would get sucked in then. Way more cool to be 2.0 than 1.0.
And tell it to make something useful.
Virtual junk is okay and any virtual tools can be used.
First, what was talked about in the summary is not a replacement for the Turing test, but other tests unrelated to the Turing test outcomes.
Second, why? It seems that the proposal is simply trying to lower the bar so not-so-bright AIs can appear useful to justify their continued, mediocre development. It's the same BS as No Child Left Behind. Sorry, that didn't work out well and neither will this.
And of course there should be. But that doesn't diminish the importance of the Turing test.
The Turing test has two huge and closely related advantages (1) it is conceptually simple and (2) it takes no philosophical position on the fundamental nature of "intelligence". That such huge advantages necessarily entails certain disadvantages should come as no surprise to anyone.
The Turing test treats intelligence as a black box, but once you've contrived to pass the test the next logical step is to open up that black box and ask whether it's reasonable to consider what's inside "intelligent" or a tricky gimmick. That's a messy question, and that's *why* something like the Turing test is valuable. It is what social scientists call an "operational definition"; it allows us to explore an idea we haven't quite nailed down yet, which is a reasonable first step toward creating a useful *conceptual* definition. Science builds theories inductively from observations, after all.
If the Turing test were a suitable *conceptual* definition of intelligence than an intelligent agent would never fail it, but we know that can and does happen. We have to assume as well that people can be fooled by something nobody would really call "intelligence". Stage magicians do this all the time by manipulating audience expectations and assumptions.
Think of the Turing test as a screening test. Science is full of such procedures -- simple, cheap tests that aren't definitive but allow us to discard events that are probably not worth putting a lot of effort into.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
There are no "current AIs", as that would require them to be intelligent, which in turn requires them to be conscious. And they aren't. Not even close.
If we really want to water the term "intelligence" down so that is applies to any clever algorithm, then perhaps we should spend a few minutes contemplating what it is we plan to call an actual intelligence, once we get that far.
"AI" has reasonable meaning in the context of research towards that goal, or as a bright line in the sand that we have yet to reach, much less cross.
I've fallen off your lawn, and I can't get up.
"The one of the tasks"...
"TV program"...
Idiots.
What has become of those compression tests? Wasn't the answer to AI not (at least partially) found in the ability to compress?
Religion is what happens when nature strikes and groupthink goes wrong.
Could there be more things wrong with this line?
A plan is afoot to replace the Turing test as a measure of a computer's ability to think.
The Turing test doesn't involve the ability to think. It intentionally avoids the concept of thinking, and explicitly targets imitation instead.
It also doesn't measure anything. The computer either succeeds at a subjective imitation or it doesn't.
It also isn't any sort of standard practice that can be "replaced" by something else.
A test of intelligence should be dealing with unforeseen input. The problem with chatbots is that they are just giving pre-planned responses. How about trying to land a rocket on the moon while being bombarded with spurious input from a radar device that was accidentally left on? Given the computers in use by NASA in 1969 that's pretty intelligent behavior.
Another would be landing a rocket on a small floating platform. We'll see how that plays out tonight.
That's all we need. Computers with a sense of humour:
"Oops! I deleted all your files!"
"Just kidding. I moved them to your Documents folder. :P"
I do not fail; I succeed at finding out what does not work.
The Turing Test has been abused, bypassed, and cheated to the point that almost no one knows what the actual Turing Test is. At this point, a new test needs to be created, a test that is difficult to cheat without making it obvious that it's not the real test. This could be "The Real Turing Test administered by [reputable group]".
Or we could make a new test, with incredibly explicit criteria that no one can nerf with a straight face and a different name. But from the sounds of it, it would be an easier test.
Don't waste your vote! Vote for whoever you want, unless you live in a swing state it won't matter anyways
The original Turing Test, as published in "Computing Machinery and Intelligence" as "Imitation Game" was not about whether a machine could successfully pretend to be a human.
He proposed a test, where a computer and men both pretended to be women, and the test would be passed if the computer would be more successful in lying about being a woman than the men were.
http://en.wikipedia.org/wiki/T...
Not having interacted with Watson, I don't know what it is capable of. I have interacted with Goostman, and it can fool you only if you want to be fooled - it does not take more than three or four exchanges to notice that it is barely better than the venerable Eliza. As far as Siri, Cortana, etc. I have never been able to use them for anything but the most trivial tasks (which I can do myself faster and more reliably) and for grins and giggles - they are way too stupid for anything else.
Actually, I think we do. We at least have an actual model, free of woo-woo, for which no counter evidence has been brought forth as yet.
Even the low level stuff seems to finally be yielding some clarity.
I've fallen off your lawn, and I can't get up.
"No existing program — not Watson, not Goostman, not Siri — can currently come close to doing what any bright, real teenager can do: watch an episode of "The Simpsons," and tell us when to laugh."
Doesn't the Simpsons have a laugh track to tell you when to laugh? I think a program to recognize the laugh track would be pretty easy.
Fucking bravo!!!! Finaly.
What the test has proved is computer science majors have no fucking clue what the turing test is about.
I ask all computer science morons in what language they propose to hold their new test?
If they find one without language, then they are finally no longer using the Turing test; but there is also a good chance they are no longer testing inteligence either.
P.s. the real turing test never ends. The toy turing test is simply pr bullshit.
The code that can fool all the human participants and other AIs while detecting all the other AIs is clearly intelligent, and more so than the average human, which is the sort of functionality that is actually required.
The next stage is a GAI that can generate a better AI that passes the above test, including evading and detecting it's own parent GAI.
d@3-e.net
"The WOPR spends all of its time thinking about [Turing Tests]. 24 hours a day, 365 days a year, it plays an endless series of [Turing test 'games'], using all available information on the state of [human sentience]. It has already proved the existence of [machine intelligence] as a game, time and time again. It estimates human and machine responses to our test responses to their responses, and so on. Estimates probabilities, tallies the score, and it looks for ways to ---"
"The point is, key decisions of every available option in determining [the presence of Artificial Intelligence] have already been made by the WOPR."
"So what you're really telling me is all this trillion dollar hardware is really at the mercy of those men with the little brass keys...?"
"That's exactly right. Whose only problem is that they are human beings. In 30 days, we could upgrade the Turing Test scoring process with electronic relays. Get the men out of the loop."
Which... as it would seem... we might all welcome, I for one.
And then, 150,000 years later...
<blink>down the rabbit hole</blink>
that maybe the point shouldn't be to recreate human intelligence, but lay a foundation for a unique intelligence to evolve itself. It may end up not even understanding the concept of words and sentences, but still be capable of horizontal associations that haven't been considered as of yet that yield data that makes sense to them, and could possibly further our own progress as humans?
If you understand that intelligent life on other planets, aliens, can be as simple as a microbe, then I don't think this should be hard to grasp.
Rather than recreate something that can "beat us at our own game" or "fool" us, maybe we should focus on something that is in itself it's own unique, silicon based self-referential and self-modifying species.
Using a language based approach completely undermines this concept. Thoughts?
If the trophy is too big, the sentence contains a dangling particple, which would be bad grammar. According to the rules of English, the suitcase is too big. Simple logic tells you which is too big.
How is this a replacement for the Turing test?
"Describe in single words, only the good things that come into your mind. About your mother."
Have gnu, will travel.
A rigorous definition of general intelligence now exists and has been applied by the Deep Mind folks. See this video lecture by Deep Mind's Shane Legg at Singularity Summit 2010 on a new metric for measuring machine intelligence.
If you want something more accessible to the general public, The Hutter Prize for Lossless Compression of Human Knowledge has the same theoretic basis as the test used by Deep Mind and has the virtue that it uses a natural language criterion, in the form of a Wikipedia snapshot. If the 100M snapshot of Wikipedia used by the Hutter Prize is no longer challenging enough, then substitute Matt Mahoney's Large Text Compression Benchmark which is basically just the Hutter Prize enlarged by an order of magnitude.
Seastead this.
Turing wanted to show that computers could be intelligent, while avoiding the nasty problem of giving it a definition. (he was a smart guy--decades later we still haven't found a good definition.)
His trick was to use humans, the only intelligent beings available, as a standard for comparison. Hence the imitation game.
But his real trick was to strip away the bullshit. "Machines can't taste strawberries", "machines can't feel love", "machines don't have consciousness". blah, blah. By using a teletype for communication he reduced human behavior to a stream of ascii characters, while still allowing the essence of intelligence behavior to get through.
But he didn't take it far enough. We need a stronger filter that hides pop-culture references, language idioms, maybe even pronoun misuse. Don't imitate a human--just imitate the intelligent aspects of a human.
Anyway, I use to think he was a clever, sneaky bastard but it turns out he really believed this was a good idea. A few years later he was talking about the imitation game in a radio interview, saying that we should really do it. Sadly the press still thinks that AI researchers care about the imitation game.
I saw a shadow puppet show today. An expert, with only his hands, created landscapes, animals and detailed caricatures of people all in captivating brilliant morphing motion. The thought struck me; "Here's a good 'Turing test' for robotic prosthesis", for the dexterity on display is seldom encountered and seemingly still so far off from being replicated in any capacity by our crude roboticized attempts utilizing rigid polymers and metals.
Reasons the Turing Test will always come up:
1. The Turing Test.. was invented by Alan Turing, and he was a genuinous. Thought about this stuff decades before others. Academic will make new tests. This will always be the original.
2. The Turing Test.. is the ultimate test. If you can be convinced a machine is a human, that's it. No test of some abilities, the Turing test addresses all mental abilities.
3. The Turing Test.. is ambiguous. The is a major criticism, but also why well never forget it. It shows how complex it is to be human, and can be qualified in so many ways.
4. The Turing Test.. is hard to win. We need intermediate tests to show progress, but the TT will always show how much further we have to go.
Thus, I am officially tired of media trying to call the TT outdated, obsolete, or no longer relevant - because the usual motive is to lessen the blow of how poorly cognitive ai does now. We''ll be taking about the Turing Test for decades to come, even for the simple reason that it was the first of its kind. It won't be "replaced".
One cannot make a "plan" to "replace" something which has already been committed to history. . Every computer knows this (stupid humans).
"No existing program — not Watson, not Goostman, not Siri — can currently come close to doing what any bright, real teenager can do: watch an episode of "The Simpsons," and tell us when to laugh."
This may be easier to do than you would expect. Laughter is an innate response to a particular type of surprise. The surprise is triggered when you're expecting something to happen, but instead an unexpected turnout to a situation occurs. Your brain then triggers laughter as a way to deal with your shattered worldview.
A neural network trained on voice recognition and text prediction may then be able to expect where these cognitive dissonances occur. This could be measured as a high error rate in expected output vs actual output.
Would be fun to test at least ;)
Turing's test was about the ability to imitate human behavior/knowledge. The real question we need to answer I will call the Mycroft test. The purpose of the test is to determine if the program has earned the right to not be turned off, that is, does it have a right to a trial before it is "terminated"? A program that has earned that right has crossed the blurry line between inanimate and "human" in a way that should be important to us. Defining a test that can measure this is at the heart of deciding what makes us us, vs what makes us tick.
"There is no god but allah" - well, they got it half right.
Let me see if I've got this straight. If you can watch an episode of the Simpsons and know when to laugh, then you're intelligent.
Or at least a real person.
Better go with answer number two. Doh!
I have watched episodes of the Simpsons where I had no idea where to laugh...