Recognizing Scenes Like the Brain Does
Roland Piquepaille writes "Researchers at the MIT McGovern Institute for Brain Research have used a biological model to train a computer model to recognize objects, such as cars or people, in busy street scenes. Their innovative approach, which combines neuroscience and artificial intelligence with computer science, mimics how the brain functions to recognize objects in the real world. This versatile model could one day be used for automobile driver's assistance, visual search engines, biomedical imaging analysis, or robots with realistic vision. Here is the researchers' paper in PDF format."
If my computer could "see me" I think that it would BSOD its self to sleep. Long long sweet slumber.
I understand the reasoning behind modeling these systems on our own highly-evolved (ok, maybe not in some people) biological systems. What I want to see, however, is something capable of learning and improving its' own ability to learn. If our intelligent systems are always evolution-limited by the progress of our own biological systems then I can't see how A.I. smarter than a human will ever ben achieved. But if we are able to give these systems our own abilities as a starting point and then watch it somehow create something more intelligent than we are... then we really have something. Whether or not what we have is good at that point I can't say, though there are many people and communities in the world who are working on making sure this post-human intelligence doesn't essentially destroy us. Foresight for example.
I'm not knocking the MIT research, I think it's amazing. It just seems to me like imitation rather than imagination. Granted, highly evolved and complicated imitation. But does it even have the abilities of a parrot?
TLF
I do not respond to cowards. Especially anonymous ones.
After scanning this paper, their model extends nothing in the state of the art in cognitive modeling. Others have produced much more comprehensive and much more biologically accurate models. There's no retinal ganglion contrast enhancement, no opponent color in LGN (or color at all), no complex cells, no Magno/Parvocellular pathways, no cortical magnification, no addressing of aperture problem (seem to treat scene as a sequence of snapshots, while the brain... does not) the object recognition is not biologically inspired. Some visual system processes can be explained with feedforward only mechanisms, but all visual system processes can't.
Gabor wavelets, newral networks, hierarchical classifiers in some semi-new combination - there are dozens image recognition papers like this every month. Why this exact paper is special ?
Researchers at the MIT McGovern Institute for Brain Research have used a biological model to train a computer model to recognize objects, such as cars or people, in busy street scenes.
this is, of course, the first step in finding Sarah Connor.
Push Button, Receive Bacon
There was. You didn't recognize it.
They do discuss the lack of feedback projections, but I also think it's fair to ignore those for the present purposes, because feedback makes things a lot more complicated, modeling-wise.
Finally, I don't have time to go back and check this, but it seemed like the SVM was used to classify the output of the network. That is, it struck me as a test to see how well the highest layer in the network ended up representing the input (after all, you need *some* way to see how well it's doing, and that's a straightforward way). Could be wrong, though.
I've written here before about epileptic seizures I have that start somewhere in the right occipital lobe possibly near V1, based on the nature of the aura and a recent video EEG last month. These things started for no reason when I was a teenager and now involve these interesting post-ictal fugue states where only chunks of my brain seem to be working but I'm still able to run around and get in trouble. I've developed a talent over the years for coping with brain trauma and sort of bullshitting my way through it.
Usually I'm not forming long term memories during fugue states, but when I do, I remember some pretty interesting stuff. One thing that is typically impaired is object recognition, since this mostly seems to be handled by the right occipital lobe. I can see things but can't immediately recognize what they are, unless I use these left-brain techniques. The left occipital lobe can recognize objects too, but the approach it takes is different and more of a pain in the ass to have to rely on. It's more of a thunky symbolic recognition, as opposed to an ability to examine subtle forms, shapes, and colors. I have to basically establish a set of criteria that define what I'm looking for and then examine things in the visual field to see if they match those criteria. I'll look for a bed by trying to find things that appear flat and soft; I'll look for a door by looking for things with attributes of a doorknob such as being round and turnable; I'll find water to drink by looking closely at wet things. My wife says I make some interesting mistakes, like once confusing her desk chair for a toilet (forgetting for a moment that part of a toilet has to be wet, but at that point memory formation and retrieval is disrupted to the point where I could imagine forgetting that it's not enough to just be able to be sat on, toilets have to have water in them too). I have trouble recognizing faces, and she says I'm sometimes obviously pretending to recognize her. Recognizing a face using cold logic can be tricky even when you're not impaired. Recognizing familiar scenes and places becomes difficult. I drove home in a fugue state once, back in my twenties, and while I didn't crash into anybody or have any sort of accident, I did get lost on the way home from work. I ended up driving miles past where I lived. Even as a pedestrian, getting lost in familiar areas is still a problem.
People have been trying to come up with image processing algorithms that mimic cortical signal analysis for decades. I remember reading papers ten years ago like this. It's amazing to see they're still mistaking road signs for pedestrians. I don't think even I could make an error like that. The state of the art was totally miserable back then, too. Neuroscience has got to be one of the sciences most poorly understood by humans.
As someone in AI research myself, I'd say the more common reasons are:
1. The code is in a horrible hacked-together state and so not really fit for release, and nobody wants to put in the effort that would be needed to clean it up; or
2. The researchers don't want to release their code because keeping it secret creates a "research moat" that guarantees that they'll get to publish all the follow-up papers themselves, since anyone else who wanted to extend the work would have to first invest the time to reimplement it from scratch (this is more common in implementation-intensive areas like graphics)
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
1. WPJ Mackeown (1994), A Labelled Image Database, unpublished PhD Thesis, Bristol University.
2. WPJ Mackeown, P Greenway, BT Thomas, WA Wright (1994).Road recognition with a neural network, Engineering Applications of Artificial Intelligence, 7(2):169-176.
3. NW Campbell, WPJ Mackeown, BT Thomas, T Troscianko (1997).
Interpreting image databases by region classification. Pattern Recognition, 30(4):555-563.
There has been various follow up research since then
Scroogle
The paper claims the source code is (or will be) here. Next time, ask the paper.
Creating "biologically inspired" models of AI is by no means a new topic of research. From what I can tell, most of these algorithms work by stringing together specialized algorithms and mathematical functions that are, at best, loosely related to the way the brain works (at a high level). By contrast, the brain is a huge, complicated, connectionist network (neurons connected together).
That isn't my real problem with this algorithm and the 100s of similar ones that have come before it. What bothers me is that they don't really get at the *way* the brain works. It's a top-down approach, which looks at the *behavior* of the brain and then tries to emulate it. The problem with this technique is it may miss important details by glossing over anything that isn't immediately obvious in the specific problem being tackled (in this case vision). This system can analyze images, but can it also do sound? In a real brain, research indicates that you can remap sensory inputs to different parts of the brain and have the brain learn it.
I'm still interested in this algorithm and would like to play around with the code (if it's available), but I am skeptical of the approach in general.
It's going to change everything.
Robotic vision is a tipping point.
A large number of humans become unemployable shortly after this becomes a reality.
Anything where the only reason a human has the job is because they can see is done in the 1st world.
Why should you pay $7.25 an hour (really $9.25 w/benefits & overheard for workers comp, unemployment tax, etc.) when you can buy a $12,000 machine to do the same job (stocking grocery shelves, cleaning, painting, etc.).
The leading edge is here with things like roomba's.
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.
We've got an overstock of these in California, Texas, Nevada, Arizona and New Mexico. We'll be glad to ship 'em either north _or_ south if y'all will pay the freight or, at the very least, provide a destination address.
Come on, you all want this! A near perfect pr0n search engine.
-- Will program for bandwidth
Still trying to think of a clever sig...
Interested readers can browse the content of PAMI current and back issues and either go to their local scientific library (PAMI is recognisable from afar by its bright yellow cover) or search on the web for interesting articles. Often researchers put their own paper on their home page. For example, here is the publication page of one of the authors (I'm not him).
For the record, I think justifying various ad-hoc vision/image analysis techniques using approximations of biological underpining is of limited interest. When asked if computer would think one day, Edsgerd Dijkstra famously answered by "can submarine swim?". In the same manner, it has been observed that (for example) most neural network architectures make worse classifiers than standard logistic regression, not to mention Support Vector Machines, which what this article uses BTW.
The summary by our friend Roland P. is not very good
I could go on with lists and links but the future is already here, generally inconspicuously. Read about it.