Hitachi Develops New Visual Search
Tech.Luver writes to tell us that Hitachi has developed a new visual search engine that can supposedly find similar images from within millions of video and picture data entries in around 1 second. "The technology assesses the similarity of images based on image characteristics presented as high-dimensional numeric information. The information is acquired by automatically detecting information regarding the images, such as color distribution and shapes."
This is interesting to me - if it performs well - because this is one of the key missing elements for robotics; robots have a lot of trouble trying to match the environment around them to stored records of objects unless the environment is severely constrained. I'm not speaking of AI here (or at least, not yet) but just robots that would be able to clean your floor, carry your groceries, navigate in a burning building, walk your dog, tend your lawn. If they can classify images against stored images well, we're that much closer to generally useful and at least semi-autonomous robot devices.
Training might be a little annoying the first few times, but once you had a good database, you could replicate - or share via RF, that'd be freaky... neighbor's robot learns what a ferret looks like, now yours knows too - so that newer models were more and more informed right out of the box. Crate. Coffin. Whatever.
Add an associative database so that images normally found near other images which have just been found are searched first, and perhaps you could get the general search time down from the quoted 1 second, I'm thinking. One second is kind of pokey for a lot of robotic applications. But if the thing is in a kitchen, why would it need to be looking to recognize images that are found in a shipyard?
And I, for one, would welcome our semi-autonomous, environment recognizing, floor cleaning robot underlings.
I've fallen off your lawn, and I can't get up.
great! new way to find even more porn.
Calling this a search engine is a stretch. The article even calls it "search technology".
Super Vista Forum
and life ain't nuthin but bitchez n money.
I am very small, utmostly microscopic.
Sounds like a an on-disk format involving hash tables. They'd probably win a patent on it, too.
Usually, HDD = Hard Disk Drive
I would think this would be a big and useful upgrade for http://images.google.com/
Always be polite.
"Kill multi button gadgets! Steve Jobs robot army angry!"
Novel theory: Modern Man evolved from psychopath
On what hardware? A 16MHz 386?
Using these words, search engine style indices and techniques can be used to make searching -- by supplying an example image area which can have its words computed -- quite fast.
The key bottle neck here is the clustering stage: reducing the original input of typically hundreds of features per frame -- multiplied by 25 frames per second by minutes, or hours, of video -- to a much smaller set of clusters. It looks like the work in the linked article is using a modified clustering algorithm which does not require all of the data to be in memory at once.
The TRECVID project is a challenge style exercise where groups compete to provide the best search results for a given set of queries where the search material is hours of video.
I frequently have to create large collections of images from all sorts of file types -- some text-based, some graphics -- that get housed in a collection of images for easy, standardized review. If there were something that could avoid the step of extracting text from them, or later OCRing them and still end up with a searchable image collection, well, that would be exceedingly cool. It would cut the initial time outlay I have to devote to virtually any given project I have to deal with by 25 to 50%.
If you never make mistakes, it's probably because you're not doing anything.
...of 48yr old, 365 lb. pound guys who steal pictures of girls from MySpace and pretend to be 14/f/kali.
The technology can't determine what aspects of the image you're looking for.
For example: I want to find more cat images. I feed it a picture of a white cat. I am more likely to be returned results of white dogs than, say, tabby or black cats.
Unless I'm misunderstanding something?
Caffeine is my anti-drug!
Duranin - A NWN2 Roleplaying Persistent World
"What is this HDD they speak of repeatedly?"
Heavy Dump Diapers.
I could see the FBI paying some millions of dollars for a dedicated system like this... I mean, since they have that known terrorist photo database or whatever, they might want to improve performance... Of course, I would hope that the FBI would properly configure the servers if they were to buy this. They accidentally forget to change the server from images.google.com (or something similar) to terrorists.fbi.gov, and all of a sudden, your granny is a known terrorist. Oh no!
"You teach a child to read and he or her will be able to pass a literacy test." - President George W. Bush
If we deployed a large solar array in space and then used it to generate energon cubes then we could just use the shuttle to collect the energon cubes. Later after we gathered enough we could build large transforming robots to collect the cubes. These robots would be powered off the cubes as well.
This signature would be better if I was creative.
... long since forgotten, be responsible for such innovative technology?
Prof. Farnsworth - "Oh a lesson in not changing history from Mr I'm-My-Own-Grandpa!"
Consider the scope of comparison - that's a lot of data to crunch, regardless of tableture.
If you have a PERFECT match, I'd think it would be straightforward, performance not withstanding.
What happens if the image got resized, cropped, pixelated, format/attributes, etc? Fast, but robust?
If you look for a license plate, (as they do now) the range is very specific.
If you want to find the guy with the fake beard and a limp, it gets tricky fast.
Time to update my disguises so the robots don't know who they're crushing with their giant metal claws.
B*n.Lee, I hope you witness the death of privacy and are appropriately wistful. Cheers dude.
... and the theory behind what I was doing is up at my blog. Or at least most of it is (all of it modulo time constraints. It'll all get there eventually).
Back in the day (almost 2 decades ago), I was using video rather than still images (which allowed me temporal information as well as spatial information) but I recently wrote a simple application to just use the spatial information to find me images "most-like" a source one. The original goal was to train the system and then try to leverage a semantic processor from the trained system. It worked reasonably well (sometimes astoundingly well) on the database I had (some 300,000 images downloaded from keyed-searches on google images).
As Hitachi said, the key is to develop a matching system within a higher numerical dimension. One of the missing pages on the blog (I'll get there!) is how to evaluate the usefulness of any given feature (=dimension) of a region of an image. With this, one can approximate a numerical value for the information being relayed to the recognition-system using that feature, and therefore establish its worth as a feature.
When you know what you're looking for (your feature set) *and* the value of each of those features to your recognition system targets {man,boat,grass,house,...} you can create reasonably useful discriminators and rule-systems based on those discriminators. Note that the discriminators and the rule-system can be given to the system as a-priori information, but most of them are created and destroyed automatically *by* the recognition system as it evolves. It sounds complex, but really it's a bunch of simple ideas applied one after the other.
Simon.
Physicists get Hadrons!
How long before you'll be able to search through pictures or video and the computer does image pattern recognition. So you can type the word "beach" or "jogging" and it will show you all the pictures showing scenery of a beach (or jogging .. or err jogging on the beach). Since camera makers dropped the ball and don't have easy intuitive image tagging capability built into the camera. Ideally a camera would have by now had voice recognition or recording so that you can tag a photo like "me in front of eiffel tower in paris" prior to taking it. Or at the very least a touchscreen system with common options. So now it's up to the AI of the computer to figure out the content of images using mad ocr level image analysis technology. Hmm maybe I should patent something like that.
Example - my brother burnt me a CD a while back with an irish instrumental I just love. No idea who it is, haven't heard from him yet about it. I was thinking it'd be neat to be able to search for say, a match to maybe 10 seconds of the chorus.
I have been sitting on a proof that there exists a set of laws of physics for which the Turing Machine halting problem can be solved. It is also possible in that set of laws for the halting problem to be solved for itself.
I wonder what that does to Godel's theorem.
Duh, because like what if your kitchen window overlooks a shipyard? The robot might look out the window and be confused if it didn't have a database of things found in a shipyard!!
However, on a more practical note, suppose as before your kitchen overlooks a shipyard. Did you ever stop to think that maybe a runaway freighter might crash into your kitchen? I betcha didn't think about that, Mr. Smarty pants. If you don't program the robot to recognize a ship crashing into a building, then it won't know it needs to try to unplug the toaster and salvage your pop-tarts before they get wet! Sheeesh.
Kids these days. Next you'll be asking me why a robot would need to know what sharks look like...
Some time between 1992 and 1994 IIRC when I was working at the photo/press agency Pacific Press Service in Tokyo, I saw a demo of a system created IIRC by NEC which searched 90,000 photos in under one second, based on a color freehand drawing you would draw on the screen of the EWS unix workstation on which it ran. Basically if you drew a horizontal blue mass at the bottom of the screen you would get a lake, etc. In other words you could search by rough photographic composition. I am less impressed that after over 10 years Hitachi was able to do something along the same lines.
You may tell me that a turing system could emulate the brain, and i could tell you that a lighter could melt a bridge. A turing machine is meant compute in distinct sequences; no matter how many cores your cpu is running some basic algorithms will always have an sequence of steps.
/"Free-will" sounds like a very continuous concept, doesn't it? //It might look like intelligence isn highly discontinuous if you examine humans too closely :)
The answer to real intelligence has zero-zip to do with sequential calculation. Here's a (ahem) parallel:
an analog delay pedal for music is a few dollars worth of tape and circuits that anyone can build in their living room. it has no latency of calculation, actually, the exact algorithm was never expressed in mathematical form at all!
The reason why turing machines will never emulate neurological systems effectively (even if you make a silicon-laser based neural network, let's say) is that they do not calculate their result as the SIMULTANEOUS calculation that a physical property (the addition of light, the addition of electricity)
And we know that reality is discontinuous at um... Plank size.. but computers are discontinuous by whole numbers (it's all binary in computers, whole binary numbers) and to *calculate* the number that represents the summing of say, 15000 other nodes is going to exorbitantly increase and increase.. and be a rat race to nowhere.
Moral of the story,
- DIGITAL aka sequential discrete algorithms will always be out of reach
the length of time it takes to solve a problem increases exponentially
faster with problem difficultyblooms with the ANALOG aka physical
continuous summation
- ANALOG aka intantaeous continuous algorithms based on real-world phenomena
scales linearly with the diffculty of the problem. I can't really say that -
no one really knows how the brain scales in that regard, but I can certainly
say that a whole class of problems exist that will forever be out of the reach
"classical" sequential machines.
What some people are thinking is right though, digital computers can do things that brains can't, and they'd be right. We've spent along time with digital logic and found alot of it's basic theorems and limits.
To close: It's the other side, the continuous field of analog electronics - not turing! - that has the promise of creating the type of intelligence we equate with "intelligence" and "free will".
CS majors know the time/space tradeoff, but they never get taught the 3rd, crucial, tradeoff of the set: comprehension!
took too long to write and i didn't read it again, sorry about the typos at the bottom and the forgotten parallel about what a digital delay pedal entails. The difference between iron-tape and RAM. It's not the best example. ah well.
CS majors know the time/space tradeoff, but they never get taught the 3rd, crucial, tradeoff of the set: comprehension!
mod the godel gotchas gabbed garishly up
CS majors know the time/space tradeoff, but they never get taught the 3rd, crucial, tradeoff of the set: comprehension!
Mother: Go find your little brother
Older Brother: Found him! He's behind the sofa.
*RING!*
Mother: Hello?
Voice On The Other End Of The Line: Ma'am, this is Pubert Skewya. I'm a lawyer for Duey, Cheatham, and Howe. We represent Hitachi.
Mother: Uhm. Yeah? So what?
VOTOEOTL: Ma'am, we have a record that you just encouraged your son to violate our client's patent on visual searches. Natually, we'll settle out of court for one billion dollars, American. If you refuse, with the state of the economy as it is, we'll go after you in court, but we'll go after you for one billion dollars, Canadian. If you act now, and concede to our extor*COUGH*rightful demands, you'll save yourself money in the long run.
Mother: Uhm. Yeah. Who is this really?
VOTOEOTL: Ma'am, this is serious. Our client has a patent on visual searches. Every time you tell your son to go look for something, you're contributing to the violation of our client's patents.
Mother: And I know ONE young man who's going to get his ass beaten for putting one of his idiot friends up to this stupid little prank...
*CLICK*
VOTOEOTL to the rest of his call center: SHIT! That's like the millionth one!
Chas - The one, the only.
THANK GOD!!!
http://web.engr.oregonstate.edu/~hess/index.html#% 5B%5BSIFT%20Feature%20Detector%5D%5D
These techniques allow you to preprocess the image into a set of feature vectors which can be organized into a database and indexed with some effectiveness.
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
http://forums.somethingawful.com/showthread.php?th readid=2551167
7 4770c2c2a42433b4636f3c9621942c7c3.png
NWS http://ft.mirror.waffleimages.com/files/e1/e16518
THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
I agree that a serial system can emulate a parallel one. You can even determine how much longer a serial computer will take to execute a massively parallel program. Problem is, the time it takes to solve a problem scales very very badly with the complexity of the problem. If you have a true parallel machine (as possible using physical phenomena such that analog systems use in one form or another) and not just a system that computes in parallel, but in parallel of thousands or hundreds of thousands of nodes in just a few "steps" - you can't really always define the processing of a real neural network as "step based".
That's my argument, that there is an entirely different way of solving problems in a discrete "digital" system then a continuous "analog" system. They have their strengths and *compliment* each other, not *equal* each other, in terms of their strengths.
Trying to model certain aspects of intelligence in silicon is like trying to make a rocket out of some number of aerosol deodorant cans - theoretically you can fly if you just have enough but in practice you'll always need more aerosol then space available on the thing to be lifted! And it never gets better using aerosol - lifting a person is just as hard as lifting a space-shuttle.
On certain intelligence related problems, digital will *never* enjoy the economies of scale that analog neural network design provides. This is a deep intuition after much study. Bet on it.
CS majors know the time/space tradeoff, but they never get taught the 3rd, crucial, tradeoff of the set: comprehension!
The approach is decades old... but no doubt, it's newly patented.
I wonder if they've managed to solve the slowness of this sort of search anyway other than just throwing a lot of boxes at it?
My own system does 13 million images in about a minute, but with enough RAM to fit the dataset in memory I can do 10-20 seconds.
I hope they're not just using a cluster to speed up access, that's a workable solution but it doesn't really help those of us who can't afford a dozen boxes to power their searcher.
I respond to your sigs
So it can recognize shapes and colour distribution? ;-)))
Then it can auto-categorize all my pr0n
To compare million images in a second on a comodity pc was already done since 2004/2005 see http://www.immem.com/en/. By now the state of the art is to compare a video stream against a 24h video pool in realtime, using this technology.
What's in a sig?
So now maybe I won't have to go to 4chan to find moar of some chick whose name I don't know.
The human brain does one thing only: pattern matching.
More specifically, the body sensors ask questions to the brain, and the brain searches its database of experiences to find the experience which maximizes survival in the current situation. Once the experience is found, it is activated and answers are sent to the sensors.
The above mechanism has been developed because mathematical logic can not prove that a situation is dangerous for an animal or not. For example, it can not be proved that facing a lion is dangerous, because not all the facts about the environment and the lion's status are known. But pattern matching can 'prove' that something is dangerous by recalling past experiences or knowledge.
That is the reason we have religions: we could not understand certain physical phenomena around us, so we had to invent a reason for them...and since we were not able to reproduce those phenomena, someone with higher capabilities than us must have been responsible for those phenomena. For example, when we saw thunder, we did not understand how they were produced, and since we did not produce them, we had to believe that someone else did them, someone with higher powers than us.
Needing to comprehend those phenomena was crucial to our survival: by "understanding" that a god did not make thunders unless we disobeyed his rules, we could keep our dopamine levels down, and thus being calm and be able to assess the various dangers better.
In conclusion, it's all about maximizing survival. True AI will come only when the above mechanism will be transfered to mechanical devices. By AI, it does not mean machines will suddenly write poems, but that machines would not need to be programmed but taught.
Yes, I'm an idiot. Spoke before I thought.
But come on now, flamebait? Hardly. Offtopic, overrated, wrong, whatever. But hardly flamebait.
Lame.
if ($search_request =~ /.*/){
return images_matching_query
("select img from images where lower(name) like '%paris%hilton%');
}
Oh, I do. What I don't count on is that NN's, or more generally, vague models of animal function, are the only, or even the, answer. I have said all along that the general purpose computer can model anything at little or no extra hardware cost; once - if - we find what works, by all means, hand it to the engineers and let them take the most active elements and create hardware specialized to do those operations. I can just about guarantee that this is how it'll go. The problem with (for instance) hardware neural net development is that creating non-NN elements requires painstaking hardware design; in a computer, you just model the inputs, engine, and outputs, and go on with your day. This is more generally true for any hardware-based approach. Even if it is the right path, it's a slow path. Simulate, identify, then go to hardware as and if required. That's what makes the most sense.
I've fallen off your lawn, and I can't get up.
This looks similar to how Photosynth stores and makes image correlation. And I'd say, it's one of the more impressive things I've seen Microsoft do, but I think they bought this technology.
http://www.youtube.com/watch?v=s-DqZ8jAmv0
No sig for you! Come back one year!
I wouldn't say camera makers have dropped the ball... I'd say you're just looking at the wrong ball. Canon, Nikon et al are coming out with fantastic low-end DSLRs with "easy intuitive" controls for ISO, aperture, shutter, white balance... things that are important for taking pictures. Adding tagging directly to the camera would either mean a clunky typing interface (not enough buttons), an expensive and fragile touchscreen, or an expensive (both $$$ and electricity) speech recognition chip.
I'm happy with a camera that does a really good job at taking pictures. Leave the tagging for some software "image productivity" suite.
I got a chance to see this software in Japan the last time I visited Hitachi's Central Research Labs. It was impressive. Unfortunately, I couldn't tell anyone about it because it was still under wraps. Now that it's out in the open, here's a post with some details. Briefly, it does rely on pre-indexing of the images, it doesn't rely on any text tagging, and it's not intended to compete with Google Image Search et. al. It is intended as an Enterprise application. It is remarkably good at finding faces, even when you don't tell it you're looking for faces. And it even works on video clips. Unfortunately, they didn't give me a copy to take home.