Slashdot Mirror


Hitachi Develops New Visual Search

Tech.Luver writes to tell us that Hitachi has developed a new visual search engine that can supposedly find similar images from within millions of video and picture data entries in around 1 second. "The technology assesses the similarity of images based on image characteristics presented as high-dimensional numeric information. The information is acquired by automatically detecting information regarding the images, such as color distribution and shapes."

5 of 166 comments (clear)

  1. Hmmm.... robotics? by fyngyrz · · Score: 5, Interesting

    This is interesting to me - if it performs well - because this is one of the key missing elements for robotics; robots have a lot of trouble trying to match the environment around them to stored records of objects unless the environment is severely constrained. I'm not speaking of AI here (or at least, not yet) but just robots that would be able to clean your floor, carry your groceries, navigate in a burning building, walk your dog, tend your lawn. If they can classify images against stored images well, we're that much closer to generally useful and at least semi-autonomous robot devices.

    Training might be a little annoying the first few times, but once you had a good database, you could replicate - or share via RF, that'd be freaky... neighbor's robot learns what a ferret looks like, now yours knows too - so that newer models were more and more informed right out of the box. Crate. Coffin. Whatever.

    Add an associative database so that images normally found near other images which have just been found are searched first, and perhaps you could get the general search time down from the quoted 1 second, I'm thinking. One second is kind of pokey for a lot of robotic applications. But if the thing is in a kitchen, why would it need to be looking to recognize images that are found in a shipyard?

    And I, for one, would welcome our semi-autonomous, environment recognizing, floor cleaning robot underlings.

    --
    I've fallen off your lawn, and I can't get up.
    1. Re:Hmmm.... robotics? by lawpoop · · Score: 4, Interesting

      I'm not speaking of AI here (or at least, not yet) but just robots that would be able to clean your floor, carry your groceries... Well, you are talking about AI here. It turns out that it's relatively easy to make a computer that can beat humans in chess or do complex math equations, but something as simple as walking with 6, 4, or two legs, which a lot of really stupid organisms do, is really difficult. Something like distinguishing 'indoors' from 'outdoors' or a cloud bank from the bushes, seems way in the future.

      My pet theory is that we don't have the right kind of device yet. A mind, the 'function' of an organic nervous system, is not a Turing machine. I don't really understand the math behind it, but Goedel's incompleteness theorem seems to show that a human mathematician can understand certain mathematical proofs that a Turing machine can never prove. Since all computers are a essentially a Turing machine, no matter how fast or parallelized they are, or how much memory they have, they will never be able to do what a human mind can do. So, maybe someday we will have artificial intelligence, or, a floor-washing robot, but we currently don't have the right kind of device that can do it.
      --
      Computers are useless. They can only give you answers.
      -- Pablo Picasso
    2. Re:Hmmm.... robotics? by mikael · · Score: 4, Interesting

      To implement a visual search engine you need to be able to perform the following:

      texture segmentation - splitting up a picture into segments of distinct objects. In a panoramic scene, you want to split the picture up into objects such as sky, ocean, waves, beach, boats, pier, wall, people, animals. As a psychological experiment, you can show someone a picture , point to a particular point and ask them what the first word that the associate with that point is. Then you will see how every scene becomes segmented by our own vision systems.

      Basic image segmentation is implemented using edge detection by Fourier Transforms (FFT, IFFT, DFT). This is a very computation intensive stage that is typically implemented using DSP's, GPU's or even dedicated ASIC's. Data used by the FFT can be in any dimension 1D (audio/radar), 2D (images) and 3D (volume visualisation). But to match the resolution of a human eye, you would need a 100 Megapixel floating point framebuffer.

      texture classification - having identified the silhouette of an object, now attempt to match the contents to a particular object. Simple ways include colour histograms and silhouette matching. More advanced methods attempt to simulate the first few layers of the human retina using Gabor filters, Ring filters and Wedge filters.
      But just to model a single type of retinal cell requires one or more FFT operations for an entire image. And
      there are at least twelve different types of such cells. For efficiency precalculated results of sample images are generated (these are referred to as feature vectors) and then compared against the results of any new image.
      For a really technical explanation of how human vision works have a look at The organisation of the retina and visual system

      texture retrieval - the actual design of the search engine to retrieve images through content rather than just keyword:

      QBIC - Query By Image Content. IBM's image retrieval database system

      All of this has to performed for a single image. For an entire movie requires the processing of hundreds of thousands of images.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
  2. pr0n by tronicum · · Score: 4, Insightful

    great! new way to find even more porn.

  3. Since when can an ancient indian tribe ... by SengirV · · Score: 4, Funny

    ... long since forgotten, be responsible for such innovative technology?

    --

    Prof. Farnsworth - "Oh a lesson in not changing history from Mr I'm-My-Own-Grandpa!"