Recognizing Scenes Like the Brain Does

adverse effects by prelelat · 2007-02-11 09:32 · Score: 5, Funny

If my computer could "see me" I think that it would BSOD its self to sleep. Long long sweet slumber.

Interesting, but what comes next? by The+Living+Fractal · 2007-02-11 09:46 · Score: 4, Insightful

I understand the reasoning behind modeling these systems on our own highly-evolved (ok, maybe not in some people) biological systems. What I want to see, however, is something capable of learning and improving its' own ability to learn. If our intelligent systems are always evolution-limited by the progress of our own biological systems then I can't see how A.I. smarter than a human will ever ben achieved. But if we are able to give these systems our own abilities as a starting point and then watch it somehow create something more intelligent than we are... then we really have something. Whether or not what we have is good at that point I can't say, though there are many people and communities in the world who are working on making sure this post-human intelligence doesn't essentially destroy us. Foresight for example.

I'm not knocking the MIT research, I think it's amazing. It just seems to me like imitation rather than imagination. Granted, highly evolved and complicated imitation. But does it even have the abilities of a parrot?

TLF

--
I do not respond to cowards. Especially anonymous ones.

Re:Interesting, but what comes next? by the+grace+of+R'hllor · 2007-02-11 10:05 · Score: 2, Insightful

Of course it's imitation. So is machine-learning and machine procreation. What makes you think we're currently limited by our biological capabilities? We're biologically almost identical to cave men, but where they smeared charcoal and spit animal paintings on walls, we now land probes on Mars. We're on a roll.

Give machines our own capabilities? We can't even have them move about in a reliable fashion, what makes you think we're even *close* to endowing machinery with creativity and abstract thought at human levels? Or even parrot levels, since you mention it? There are many hurdles to be cleared before we can consider creating an AI that has a practical chance of surviving to do anything useful, and machine vision (and the processes involved in making this robust) are critically important.
Re:Interesting, but what comes next? by zappepcs · 2007-02-11 10:09 · Score: 5, Interesting

It is interesting to consider the problem of AI researchers. How to create intelligence when it is not really understood. In the time between now and when we do understand it, we'll have to develop systems using logic and software that approximates how we understand it. A simple example is to ask yourself how many times that you had to learn that fire is hot? An AI system may have to learn this every time that you turn it on.

There is software systems that can approximate the size and distance between objects in a picture with reasonable accuracy, and if the scope of scenery presented to the system is limited, then that ability combined with sensing motion of objects is enough to determine a large percentage of what is desired. This is not the trouble or the hard part. The hard part is determining object classification and purpose in those times when it is not simple.

Each of us can almost always look at a scene and determine the difference between a jogger and a purse thief on the run or a businessman late for an appointment. For computers to do so takes a great deal more work. It is only a subtle difference and one where both objects maintain similar base characteristics.

The point? Even mimicking human skills is not easy, and fails at many points without the overwhelming store of knowledge that humans have inside their heads. This would point to the theory that if more memory was available, AI would be easier, but this is not true either. Humans can recognize a particular model of car, no matter what color it is and usually despite the fact that it might have been in an accident. The thinking that comes into play when using the abstract to extract reality from a scene is not going to happen for computers for quite some time.

The danger is when such ill prepared systems are put in charge of important things. This is always something to be wary of, especially when it is used to define/monitor criminal acts and identify those who are guilty whether that is on cameras at intersections or security systems, or government surveillance systems.

--
Support NYCountryLawyer RIAA vs People
Re:Interesting, but what comes next? by cyphercell · 2007-02-11 10:23 · Score: 2, Interesting

we are able to give these systems our own abilities as a starting point and then watch it somehow create something more intelligent than we are... then we really have something.
This technology is prerequisite to providing an AI system with a starting point. It offers for instance the powers of perception as input for a learning system. A baby for example opens their eyes and simply sees, this is only part of the baby's starting point. Other aspects of your "starting point" include predetermined goals such as eating and also include points of failure like starving. Many avenues of input are required for effective learning at different capacities, Helen Keller for instance learned very early the value of eating, however formal communication was a remarkable accomplishment to say the least.

I agree with you that I would love to see a true A.I. system fully capable of learning, but discounting research that provides an AI system with the ability to see seems rather counter-productive.

If our intelligent systems are always evolution-limited by the progress of our own biological systems then I can't see how A.I. smarter than a human will ever ben achieved.
This will be achieved by more input streams, a more sophisticated "starting point", well thought out points of success and failure, and finally the fact that we can make cooperation mandatory between artificial "minds". This is of course that point at which humans become lost, try to pull the plug and Skynet launches the Nukes in retaliation.

--
Under the influence of Post-Cyberpunk Gonzo Journalism
Re:Interesting, but what comes next? by suv4x4 · 2007-02-11 10:42 · Score: 3, Insightful

If our intelligent systems are always evolution-limited by the progress of our own biological systems then I can't see how A.I. smarter than a human will ever ben achieved.

You know this is pretty misleading so you can't take any blame for thinking so. Lots of people also think that we're also "a hundred years smarter" than those living in the 1900's, just because we were lucky to be born in a higher culture.

But think about it: what is our entire culture and science, if not ultra sped-up evolution. We make mistakes, tons of mistakes, as human beings, but compared to completely random mutations, we have supreme advantage over evolution in the signal/noise ratio of the resulting product.

Can we ever surpass our own complexity in what we create? But of course. Take a look at any moderately complex software product. I won't argue it's more complex than our brain, but something else: can you grasp and asses the scope of effort and complexity in, say (something trivial to us), Windows running on a PC, as one single product? Not just what's on the surface, but comprehend at once every little detail from applications, dialogs, controls, drivers, kernel, to the processor microcode.

I tell you what: even the programmers of Windows, and the engineers at Intel can't.

Our brain works in "OOP" fashion, simplifying huge chunks of complexity into a higher level "overview", so we could think about it in a different scale. In fact, lots of mental diseases, like autism or obsessive compulsive disorders revolve around the loss of ability to "see the big picture" or concentrate on a detail of it, at will.

Together, we break immensely complex tasks into much smaller, manageable tasks, and build upon the discoveries and effort we made yesterday. This way, although we still work on tiny pieces of a huge mind-bogglingly complex puzzle, our brain can cope with the task properly. There aren't any limits.

While I'm sure we'll see completely self-evolving AI in the next 100 years, I know that developing highly complex sentient AI with only elements of self-learning is quite in the ability of our scientists. Small step, by small step.
Re:Interesting, but what comes next? by Xemu · 2007-02-11 11:00 · Score: 3, Insightful

Each of us can almost always look at a scene and determine the difference between a jogger and a purse thief on the run or a businessman late for an appointment.

Actually, we can't, we just base this recognition on stereotypes. A well known Swedish criminal called "the laser man" exploited this in the early 90s when robbing banks. He would rob the bank and then change clothes to a business man or a jogger, and then escape the scene. The police would more often than not let him pass through because they were looking for a "escaping robber", not for a "business man taking a slow paced walk".

The police caught on eventually and caught the guy. Computers would of course have even greater difficulties to think "outside the box".

--
Tell your friends about xenu.net
Re:Interesting, but what comes next? by kurzweilfreak · 2007-02-11 15:30 · Score: 2, Insightful

I think you overestimate the human ego. I don't know about you, but I'd be perfectly happy to give up the mundane task of driving to an intelligent machine if it can do it better than I can. That frees me up to read the paper on the drive to work, or countless other more useful things I could be doing if I didn't have to constantly keep my eyes on the road.
I do agree with you on one point, but not for the reason you do: the problem of control. If there's any reason that an intelligent driving system wouldn't take off it would be because there isn't a human in control, so who gets blamed when something does go wrong? How would insurance companies handle this? Do our rates go down because we now have a machine in control that does a better job than we do? Do our rates go up if somehow there is an accident, even though it wasn't due to human error? Will people even accept an artificially intelligent driving machine if it has a less than completely, 100% reliable and error free record?
My gut reaction tells me probably not, because when something goes wrong, people look for someone to blame. If you can't blame the driver, do we blame the company that makes the IDS? If someone dies in an accident involving one of these systems, do we hold the company liable for it, even if it reduces the number of overall auto fatalities by, say, 90%? 95%? What level of imperfection are people prepared to accept? Is there ANY level that would be acceptable when you take the control out of the hands of humans, who we know and accept to be imperfect and therefore don't expect to be?

--
kurzweil_freak
5th Kyu Genbukan Ninpo/KJJR student
Be the darkness that allows the light to shine.

nothing new by Anonymous Coward · 2007-02-11 10:01 · Score: 4, Insightful

After scanning this paper, their model extends nothing in the state of the art in cognitive modeling. Others have produced much more comprehensive and much more biologically accurate models. There's no retinal ganglion contrast enhancement, no opponent color in LGN (or color at all), no complex cells, no Magno/Parvocellular pathways, no cortical magnification, no addressing of aperture problem (seem to treat scene as a sequence of snapshots, while the brain... does not) the object recognition is not biologically inspired. Some visual system processes can be explained with feedforward only mechanisms, but all visual system processes can't.

Re:nothing new by kripkenstein · 2007-02-11 19:03 · Score: 3, Informative

I agree that the paper isn't revolutionary. In addition, it turns out that, after the 4-layer biologically-motivated system, they feed everything into a linear SVM (Support Vector Machine) or gentleBoost. For those that don't know, SVMs and Boosting are the 'hottest' topics in machine learning these days; they are considered the state of the art in that field. So basically what they are doing is providing some preprocessing before applying standard, known methods. (And if you say "but it's a linear SVM", well, it is linear because the training data is already separable.)

That said, they do present a simple and biologically-motivated preprocessing layer that appears to be useful, which reflects back on the brain. In summary, I would say that this paper helps more to understand brain functioning than to develop machines that can achieve human-like vision capabilities. So, very nice, but let's not over-hype it.

I'm not getting it, why it is significant ? by S3D · 2007-02-11 10:08 · Score: 3, Insightful

Gabor wavelets, newral networks, hierarchical classifiers in some semi-new combination - there are dozens image recognition papers like this every month. Why this exact paper is special ?

research done at cyberdyne by macadamia_harold · 2007-02-11 10:23 · Score: 4, Funny

Researchers at the MIT McGovern Institute for Brain Research have used a biological model to train a computer model to recognize objects, such as cars or people, in busy street scenes.

this is, of course, the first step in finding Sarah Connor.

--
Push Button, Receive Bacon

Re:Does anybody know where to find the actual pape by gardyloo · 2007-02-11 10:34 · Score: 4, Funny

There was. You didn't recognize it.

Re:not like the brain does. by dfedfe · 2007-02-11 10:49 · Score: 2, Interesting

I admit I only gave the paper a quick read, so I can't say for sure. But my impression was that spatial information was only discarded in passing information to the next layer in the model. That strikes me as reasonable. For one, they're simulating the dorsal stream, which, in my understanding, is basically attended-object specific, so it seems proper to discard the relationship between the attended object and the rest of the scene. As for discarding spatial relationships between two features of the same object, that also strikes me as roughly reasonable. In real brains there isn't a strict tree-like hierarchy, projections from one region go both to the next higher region but also skip past it and go to yet higher regions. Thus if we have projections A->B->C, B can discard the spatial relationship of two units in A, as long as A also projects to C, which would then still get the spatial information from A as well as the combined information from B (hope that makes sense). It's true that they didn't include such connections in this model, though. I still think it's fair, at least as a starting point for more complex models.

They do discuss the lack of feedback projections, but I also think it's fair to ignore those for the present purposes, because feedback makes things a lot more complicated, modeling-wise.

Finally, I don't have time to go back and check this, but it seemed like the SVM was used to classify the output of the network. That is, it struck me as a test to see how well the highest layer in the network ended up representing the input (after all, you need *some* way to see how well it's doing, and that's a straightforward way). Could be wrong, though.

My own two cents by MillionthMonkey · 2007-02-11 11:02 · Score: 5, Interesting

I've written here before about epileptic seizures I have that start somewhere in the right occipital lobe possibly near V1, based on the nature of the aura and a recent video EEG last month. These things started for no reason when I was a teenager and now involve these interesting post-ictal fugue states where only chunks of my brain seem to be working but I'm still able to run around and get in trouble. I've developed a talent over the years for coping with brain trauma and sort of bullshitting my way through it.

Usually I'm not forming long term memories during fugue states, but when I do, I remember some pretty interesting stuff. One thing that is typically impaired is object recognition, since this mostly seems to be handled by the right occipital lobe. I can see things but can't immediately recognize what they are, unless I use these left-brain techniques. The left occipital lobe can recognize objects too, but the approach it takes is different and more of a pain in the ass to have to rely on. It's more of a thunky symbolic recognition, as opposed to an ability to examine subtle forms, shapes, and colors. I have to basically establish a set of criteria that define what I'm looking for and then examine things in the visual field to see if they match those criteria. I'll look for a bed by trying to find things that appear flat and soft; I'll look for a door by looking for things with attributes of a doorknob such as being round and turnable; I'll find water to drink by looking closely at wet things. My wife says I make some interesting mistakes, like once confusing her desk chair for a toilet (forgetting for a moment that part of a toilet has to be wet, but at that point memory formation and retrieval is disrupted to the point where I could imagine forgetting that it's not enough to just be able to be sat on, toilets have to have water in them too). I have trouble recognizing faces, and she says I'm sometimes obviously pretending to recognize her. Recognizing a face using cold logic can be tricky even when you're not impaired. Recognizing familiar scenes and places becomes difficult. I drove home in a fugue state once, back in my twenties, and while I didn't crash into anybody or have any sort of accident, I did get lost on the way home from work. I ended up driving miles past where I lived. Even as a pedestrian, getting lost in familiar areas is still a problem.

People have been trying to come up with image processing algorithms that mimic cortical signal analysis for decades. I remember reading papers ten years ago like this. It's amazing to see they're still mistaking road signs for pedestrians. I don't think even I could make an error like that. The state of the art was totally miserable back then, too. Neuroscience has got to be one of the sciences most poorly understood by humans.

that's a generous view of it by Trepidity · 2007-02-11 11:06 · Score: 2, Informative

As someone in AI research myself, I'd say the more common reasons are:

1. The code is in a horrible hacked-together state and so not really fit for release, and nobody wants to put in the effort that would be needed to clean it up; or

2. The researchers don't want to release their code because keeping it secret creates a "research moat" that guarantees that they'll get to publish all the follow-up papers themselves, since anyone else who wanted to extend the work would have to first invest the time to reimplement it from scratch (this is more common in implementation-intensive areas like graphics)

--
10 PRINT CHR$(205.5+RND(1)); : GOTO 10

Earlier work 1989-1997 on street scene analysis by Wills · 2007-02-11 11:14 · Score: 4, Informative

Apologies for blowing my own trumpet here, but there was much earlier work in the 1980s and 1990s on recognizing objects in images of outdoor scenes using neural networks that achieved a similarly high accuracy compared to the system mentioned in this article:

1. WPJ Mackeown (1994), A Labelled Image Database, unpublished PhD Thesis, Bristol University.

Design of a database of colorimetrically calibrated, high quality images of street scenes and rural scenes, with highly accurate near-pixel ground-truth labelling based on a hierarchy of object categories. Example of labelled image from database
Design of a neural network system that recognized categories of objects by labelling regions in random test images from the database achieving 86% accuracy
The database is now known as the Sowerby Image Database and is available from the Advanced Technology Centre, British Aerospace PLC, Bristol, UK. If you use it, please cite: WPJ Mackeown (1994), A Labelled Image Database, PhD Thesis, Bristol University.

2. WPJ Mackeown, P Greenway, BT Thomas, WA Wright (1994).
Road recognition with a neural network, Engineering Applications of Artificial Intelligence, 7(2):169-176.

A neural network system that recognized categories of objects by labelling regions in random test images of street scenes and rural scenes achieving 86% accuracy

3. NW Campbell, WPJ Mackeown, BT Thomas, T Troscianko (1997).
Interpreting image databases by region classification. Pattern Recognition, 30(4):555-563.

A neural network system that recognized categories of objects by labelling regions in random test images of street scenes and rural scenes achieving 92% accuracy

There has been various follow up research since then

--
Scroogle

Re:More importantly, where is the source code? by NTiOzymandias · 2007-02-11 11:18 · Score: 2, Informative

The paper claims the source code is (or will be) here. Next time, ask the paper.

Revolutionary? Probably not... by rm999 · 2007-02-11 11:24 · Score: 2, Insightful

Creating "biologically inspired" models of AI is by no means a new topic of research. From what I can tell, most of these algorithms work by stringing together specialized algorithms and mathematical functions that are, at best, loosely related to the way the brain works (at a high level). By contrast, the brain is a huge, complicated, connectionist network (neurons connected together).

That isn't my real problem with this algorithm and the 100s of similar ones that have come before it. What bothers me is that they don't really get at the *way* the brain works. It's a top-down approach, which looks at the *behavior* of the brain and then tries to emulate it. The problem with this technique is it may miss important details by glossing over anything that isn't immediately obvious in the specific problem being tackled (in this case vision). This system can analyze images, but can it also do sound? In a real brain, research indicates that you can remap sensory inputs to different parts of the brain and have the brain learn it.

I'm still interested in this algorithm and would like to play around with the code (if it's available), but I am skeptical of the approach in general.

Hope most folks realize, once they get down vison by Maxo-Texas · 2007-02-11 11:42 · Score: 4, Insightful

It's going to change everything.

Robotic vision is a tipping point.

A large number of humans become unemployable shortly after this becomes a reality.

Anything where the only reason a human has the job is because they can see is done in the 1st world.

Why should you pay $7.25 an hour (really $9.25 w/benefits & overheard for workers comp, unemployment tax, etc.) when you can buy a $12,000 machine to do the same job (stocking grocery shelves, cleaning, painting, etc.).

The leading edge is here with things like roomba's.

--
She was like chocolate when she drank... semi-sweet at first and then increasingly bitter.

That Would Be An Illegal Immigrant... by littlewink · 2007-02-11 15:32 · Score: 2, Funny

Now, of course, if someone was to design and build a [$12,000] robot, completely for their own interest, that could build copies of itself, *and* do useful work like stocking shelves...

We've got an overstock of these in California, Texas, Nevada, Arizona and New Mexico. We'll be glad to ship 'em either north _or_ south if y'all will pay the freight or, at the very least, provide a destination address.

What it will be used for by rossz · 2007-02-11 16:24 · Score: 2, Funny

Come on, you all want this! A near perfect pr0n search engine.

--
-- Will program for bandwidth

Re:not like the brain does. by odyaws · 2007-02-11 16:26 · Score: 3, Informative

Disclaimer: I work with the MIT algorithms daily and know several of the authors of this work (though I'm not at MIT).

This paper's claim to recognize scenes like the brain does, is overdrawn. As far as i can tell from their paper (it is a journal version of their cvpr paper) only their low-level Gabor features are similar to what the brain does.

Their low-level Gabor filters are indeed similar to V1 simple cells. The similarity between their model and the brain goes a lot further, though. The processing goes through alternate stages of enhanced feature selectivity with roughly Gaussian tuning (the S layers) and pooling over spatial location and scale via a max operation (the C layers). If you read more papers from their lab, there is a significant amount of biological plausibility in both of these operations, and a great deal of effort has gone into tuning the various layers to behave in accordance with physiological data.

The rest of the paper uses the currently popular bag-of-features model, which is a model that discards all spatial information between image features, which i don't think the brain does.

The model is roughly equivalent to a bag-of-features, but with the nice feature (from a biologist's perspective) that it builds the bag in a biologically plausible way. The features themselves are picked randomly from natural images in a training stage that takes the place of human development. Discarding spatial information makes the model a lot more tractable, and it isn't clear what role spatial information plays in the processing of the ventral visual system, which is what their algorithm models.

Furthermore, for classification algorithms they consider a Support Vector Machine and Boosting. Both of these classifiers are certainly not comparable to what the brain does. Why not use a neural network if they aim is to mimic the brain?

They use these classifiers on top of their algorithm simply to determine how good the model was at extracting relevant feature information. Since they want to quantify how much information is there, it is wise to choose the best method they can to locate the information.

Furhtermore, they only conside feed-forward information, where research shows that there is at least as much information going back as there is going forward.

Feedback is definitely very important (this is what my own research is about), but feedforward accomplishes a lot with a vastly simpler computational model.

Don't get me wrong, it is still a nice paper, with good results. (however, all Caltech datasets are highly artificial, with objects artificially rotated in 1 direction) So, nice paper, but to compare it with the workings of the human brain is too much.

Here are the Caltech datasets they used: vision.caltech.edu. I think the "artificial" datasets you refer to are the "3D objects on turntable," which are a bit artificial. However, the images they refer to in the paper discussed here are from the Caltech-101 dataset, which consists of real-world images of objects from 101 different categories - most of the images are not at all artificial.

--
Still trying to think of a clever sig...

Fine paper, but why not quote all of PAMI ? by HuguesT · 2007-02-12 01:28 · Score: 4, Informative

This is a nice paper by respected researchers in AI+Vision, however pretty much the entire content of the journal this was published in (IEEE Pattern Analysis and Machine Intelligence) is up to that level. Why single out that particular paper ?

Interested readers can browse the content of PAMI current and back issues and either go to their local scientific library (PAMI is recognisable from afar by its bright yellow cover) or search on the web for interesting articles. Often researchers put their own paper on their home page. For example, here is the publication page of one of the authors (I'm not him).

For the record, I think justifying various ad-hoc vision/image analysis techniques using approximations of biological underpining is of limited interest. When asked if computer would think one day, Edsgerd Dijkstra famously answered by "can submarine swim?". In the same manner, it has been observed that (for example) most neural network architectures make worse classifiers than standard logistic regression, not to mention Support Vector Machines, which what this article uses BTW.

The summary by our friend Roland P. is not very good :

This versatile model could one day be used for automobile driver's assistance, visual search engines, biomedical imaging analysis, or robots with realistic vision

There already exist working automated driving software. The december 2006 issue of IEEE Computers magazing was on them last month. Read about the car that drove a thousand miles on Italy's road thanks to Linux, no less.
Visual search engine exist, at the research level. The whole field is called "Content Based Retrieval", and the main issue is not so much to search, but to formulate the question.
Biomedical image analysis has been going strong for decades and is used every day in your local hospital. Ask your doctor !
Robotic vision is pretty much as old as computers themselves. There are even fun robot competitions like robocup.

I could go on with lists and links but the future is already here, generally inconspicuously. Read about it.

Slashdot Mirror

Recognizing Scenes Like the Brain Does

24 of 115 comments (clear)