Machine Learning Confronts the Elephant in the Room (quantamagazine.org)
A visual prank exposes an Achilles' heel of computer vision systems: Unlike humans, they can't do a double take. From a report: In a new study [PDF], computer scientists found that artificial intelligence systems fail a vision test a child could accomplish with ease. "It's a clever and important study that reminds us that 'deep learning' isn't really that deep," said Gary Marcus, a neuroscientist at New York University who was not affiliated with the work. The result takes place in the field of computer vision, where artificial intelligence systems attempt to detect and categorize objects. They might try to find all the pedestrians in a street scene, or just distinguish a bird from a bicycle (which is a notoriously difficult task). The stakes are high: As computers take over critical tasks like automated surveillance and autonomous driving, we'll want their visual processing to be at least as good as the human eyes they're replacing.
It won't be easy. The new work accentuates the sophistication of human vision -- and the challenge of building systems that mimic it. In the study, the researchers presented a computer vision system with a living room scene. The system processed it well. It correctly identified a chair, a person, books on a shelf. Then the researchers introduced an anomalous object into the scene -- an image of an elephant. The elephant's mere presence caused the system to forget itself: Suddenly it started calling a chair a couch and the elephant a chair, while turning completely blind to other objects it had previously seen.
"There are all sorts of weird things happening that show how brittle current object detection systems are," said Amir Rosenfeld, a researcher at York University in Toronto and co-author of the study along with his York colleague John Tsotsos and Richard Zemel of the University of Toronto. Researchers are still trying to understand exactly why computer vision systems get tripped up so easily, but they have a good guess. It has to do with an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance.
It won't be easy. The new work accentuates the sophistication of human vision -- and the challenge of building systems that mimic it. In the study, the researchers presented a computer vision system with a living room scene. The system processed it well. It correctly identified a chair, a person, books on a shelf. Then the researchers introduced an anomalous object into the scene -- an image of an elephant. The elephant's mere presence caused the system to forget itself: Suddenly it started calling a chair a couch and the elephant a chair, while turning completely blind to other objects it had previously seen.
"There are all sorts of weird things happening that show how brittle current object detection systems are," said Amir Rosenfeld, a researcher at York University in Toronto and co-author of the study along with his York colleague John Tsotsos and Richard Zemel of the University of Toronto. Researchers are still trying to understand exactly why computer vision systems get tripped up so easily, but they have a good guess. It has to do with an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance.
If an elephant suddenly appeared in my room I'd lose my shit to.
Sig. Sig. Sputnik
I'm not as bullish on "artificial intelligence" as a lot of Slashdotters, but, the fact that they can't do double take is a silly argument.
You can have multiple AI systems approach the same problem. Sort of like you may go to 3 or 4 mechanics to diagnose a problem and see if there is a consensus or not, you can have multiple AI systems with different biases and tunings approach the same problem and see what the results are.
And the beginning of the beginning of a new AI winter?
Deep Learning isn't deep. And "Neural Networks" work nothing like a real neural network (a.k.a brain) does. They are all terms that "AI researchers" use to inflate their importance and to obtain funding for their work. The entire AI field is a massive joke, but now we have dropped some major taxpayer money on it so it isn't going away anytime soon.
We shouldn't go for automated surveillance?
Might, in the future, be doing something really useful.
The other night a machine learning system correctly identified an elephant in my pajamas... but how the machine learning system got into my pajamas, I'll never know!
When you can't realize the Laughing Man is a hack, you can't realize reality, or your perception of it, is being hacked.
-- Tigger warning: This post may contain tiggers! --
I had to read the text below the image, and then look again, to see an elephant.
Btw, in case you are wondering what kind of chair I mean, it’s this kind of comfy chair (of which, unfortunately I only know this way of finding it. I’m terribly sorry).
These problems have been well known in AI circles for decades. The crappy tech media are finally catching on that marketing departments selling AI solutions maybe exaggerate the capabilities of their tech a twinge.
My Other Computer Is A Data General Nova III.
I'll just leave this right here.
If it weren't for deadlines, nothing would be late.
Will the future be fun?
You can't go around trying to undermine this new technology right now, it is too soon! We have at least 2-3 more years before anyone is allowed to be disappointed by the lack of progress in autonomous cars / artificial intelligence / deep learning. Think of the jobs!
ML and DL make no predictions about the real world, nor do they make any explanations. There is no Origin of Species, Principa Mathematica or theory of relativity for intelligence. You need a true mathematical framework to articulate what is learn in a non-stationary data set.
AI vision can do some things that no human can do. Quickly and accurately identify handwritten postcodes on envelopes was an early win. Matching colours happens at every paint shop.
It is certainly not human capable, yet. But it has improved dramatically over the last decade, and is likely to do so. And tricks such as stereo vision, wider colour sense, and possibly Lidar help a lot.
The one elephant example seems to be a shitty AI. There is a modern tendency to leave everything to a simplistic Artificial Neural Network, and then wonder why weird things can happen. Some symbolic reasoning is also required, ultimately.
When AI approaches human capability, it will not lose its other abilities. So it will be far better than human vision, eventually.
Ask yourself, when the computers can eventually program themselves, why would they want us around?
https://en.wikipedia.org/wiki/Recurrent_neural_network
The elephant's mere presence caused the system to forget itself: Suddenly it started calling a chair a couch and the elephant a chair, while turning completely blind to other objects it had previously seen.
Researchers are still trying to understand exactly why computer vision systems get tripped up so easily, but they have a good guess. It has to do with an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance.
An elephant may cause me to take a second glance, of course, but in that first glance I most definitely did not mistake that elephant for a chair.
The problem is not needing a double-take. The problem is the nature of today's AI: it's based on matching data with databases. Databases that contain information such as "chair in room" and "couch in room" and not "elephant in room" or "chair in zoo". Confounding information like thinking it's identified an elephant in a room will naturally throw it off because now the identification is wrong or the database isn't accurate - or both. But a 3-year-old would be able recognize a chair or an elephant regardless of whether it is in a room or at a zoo.
You have to remember what the word "artificial" as in AI truly means. It does not mean that the system exhibiting the intelligence is artificial, like a computer. What it means is that the intelligence itself is artificial. Something that does not actually perform cognitive reasoning but acts in a way to make it seem like it does. Which means no, "SELECT object FROM room WHERE color = 'gray' AND number_legs = 4" is not intelligence.
Shouldn't this system be tagging objects to go back to and have some sort of memory retention (database)?
If you take a look at the two pictures in the article, it kind of goes against what the article was trying to claim,
In fact nothing at all on the right side of the right image was altered from the left version with no elephant. Even the confidence numbers were identical.
The only descriptions and confidence factors affected were things that visually we congruent to the elephant, in a way that they could have been related. In fact I couldn't even make out an elephant the way they put it in without looking hard, and was impressed the system still had such a strong lock on the person.
If you have ever built one of those classifiers you can understand that an item in the image would not have the global effect they are describing...
On top of that it turns out humans are pretty easily fooled anyway, a computer will probably end up much better in that regard. It's just that it's a bit hard for us to understand conceptually how it is "seeing" and so a bit hard to guard against error..
"There is more worth loving than we have strength to love." - Brian Jay Stanley
AI is not as complex as human intelligence but it operates more on those principles than it does on if statements and algorithms. Maybe you should actually look at research from the last 20 years before drawing an extremely outdated conclusion. Neural networks and machine learning are able to effectively build their own pattern recognition by an iterative process.
An "AI" algorithm matches something in a picture to something it has been previously trained/programmed to match.
Trained, yes. Programmed, not so much. You're really going to have to educate yourself here because it's way too much tl;dr.
Humans learn WHY a class of somethings behaves a certain way. Humans are then able to quickly and accurately apply the WHY to new situations. Algos are not there. Yet.
That's just better pattern recognition. A why is just a metapattern but not a fundamentally different concept. Today's AI isn't there yet because the neural networks are vastly simplified compared to the human brain, not because the approach is totally wrong.
Either way, little of this relates to if statements specific to the task at hand.
it has probably seen an elephant, but probably not in a living room.
and the net has probably a limited concept of the context.
(the big gray blob with a leathery texture in the middle of aiving room is usy a sofa)
cue in the recently published research about machine vision and sheeps
(whenever the system sees white dot spread on a green scenery backfround, it says "sheep". even if it is white rocks sprinkled around the grass.
this prompted the researcher to crowd-mine pictures of goats and sheeps doing unusual stuff. and whenever the CV net saw a fluffy texture, it assumed the most frequent word in that context, calling "dog" any fluffy texture carried by a human in their arms, and "cat" any fluffy texture on a kitchen table, even in case of a shpeherdess carrying a lamb, or a mischievous goat invading a kitchen)
the thing is: CV Net are basically only at what they were trained for. if you give them something completely weird an unusual, they might reacg weirdly.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]
We're probably only imagining our own intelligence anyway.
I got to attend a seminar at MIT on AI. It was pretty cool, especially then ending... "We've only got one problems left to solve in AI... We've no friggin' clue about how the brain works!"
I spoke to him later and asked him what he meant. He said, "Essentially we're at best scratching the surface of what the brain does and how the brain does most of what we think it does. And we've not made a lot of progress since the heady days of the 1980's."
I keep getting the impression that these computer vision systems rely on a single vision system to get it right in one take. Why not have three independently trained systems watch simultaneously and vote on what they're seeing?
I remember reading ages ago that the F-16's fly-by-wire system has three computers voting on what to do, and that's 1970s technology. Why would we not use something similar for cars? Three systems are much harder to fool than one.
"The elephant's mere presence caused the system to forget itself"
Also known as "shitting the bed."
They made the system recognize objects in a china shop, then added a bull. They say with that they covered all the cases.
I think computers may have intelligence without necessarily having a conscience, if we agree on these terms. If computers and robots will become able to do everyday tasks like or better than humans, that's ok even if they don't "feel" like humans. In fact I think the latter would be counterproductive.
The word "deep" was never intended to mean we solved the whole problem all at once.
Nor is human-equivalent vision anywhere close to requisite for 90% of the initial applications.
We've barely scratched the surface on this recent breakthrough.
Many of these problems are fixable within the current regime.
Capabilities will evolve as relentlessly as chess engines.
But, let's all pause to remember "this isn't deep". That's the key lesson to take home, here, as this technology rapidly reshapes the entire global economy: this isn't deep.
elephant in the room? wow the development of AI is amazing.
I don't think this has anything to do with the lack of reasoning or putting things in context and much more with a statistical glitch.
The state of the art in object detection is around 50% mAP, which is not that great. Even on untaylored images, you have many some false alarms and misdetections, so it no surprise that by modifying images in a way that separates them completely from the training data, it leads to some strange false alarms.
I think the authors could just have looked at the validation set and exhibit the images that are performing the worst and draw the same conclusion that computer vision sucks, if that's what they really wanted to do.
The fact is, although computer vision systems are nowhere near perfect, they are improving are an impressive rate. Even without ideological statements about the necessity of reasoning or what not.
Video of some good progressive thrash music
What is "deep" in deep learning is the neural network used, and you only need that if you have no clue how your data is structured. The thing about deep leaning is that it is a bit worse or not better than normal learning, but you also lean the network structure from the data. That makes it cheaper in general. It is _not_ better except for that.
Most ACs are not even worth the keystrokes to insult them. Be generically insulted by this and ignored otherwise.
"As computers take over critical tasks like automated surveillance and autonomous driving, we'll want their visual processing to be at least as good as the human eyes they're replacing"
I for one don't want "their visual processing" to be any good. Especially not in the case of surveillance.
I've Got Things to Hide(TM).
Give it 60 years. Computers used to take up a warehouse and now you can have a watch with more processing power.
I think of A.I. as it is today as the giant vacuum tube obnoxious computer. It functions, does some neat things, but only after a generation or two will it really start to show promise in everyone's day to day lives.
It takes intelligence to designate the task. Humans do this easily - they want a particular outcome, so they determine what needs to be done to achieve that outcome and satisfy the want. For example, I want to get my son a birthday present. In order to achieve that outcome, I need to acquire cash, determine what my son would like for a present, find somewhere that has the present, get the present, wrap it, and give it to my son.
AI doesn't want anything humans haven't told them to want, nor is it capable of identifying wants of others unless specifically told what they are. It has no need to give my son a birthday present, so it won't accomplish the task - even given complete capability to do so - unless it is specifically told to.
And here I thought this was going to have something to do with the seven blind men encountering an elephant and each getting the wrong or incomplete idea of what they'd found.
"It has to do with an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance."
Utter nonsense. What it has to do with is that neural nets divide up very high dimensional spaces by training, and no one knows what those divisions actually are, not what they actually mean, nor why they actually got divided that way. So, when the scene is different, either because you went into another room, got teleported to a Klingon spaceship, or because someone put an elephant into this room when you weren't looking, there is no telling how the neural net will now name objects, because all you knew before was that in the tests you previously tried it performed well, but you don't know why.
All the shit they keep calling 'AI' has no ability whatsoever to 'think' which is why it can't handle even simple things we take for granted.
I've said it before a thousand times: The entire approach being used is wrong; until we can understand how our own brains produce the phenomenon of conscious thought, we will not be able to build machines that can do the same thing. All the 'deep learning alogorithms' won't do it. Throwing more and more hardware at it won't do it. We don't even have the instrumentation to really understand how a living brain, as a complete system, does what it does, and once you kill it and cut it up, what you can learn is so severely limited as to be useless. You want REAL AI? Focus on building better instrumentation that can scan a living, working brain, and really, truly map out how it operates. Before anyone says it: fMRI won't cut it, if it could we'd already have the answers.