Slashdot Mirror


Full-Text Audio Search

Captain Chad writes "The latest print edition (12/16/2002) of InfoWorld has an interesting article about an audio search program by Fast-Talk Communications. (The article is not yet available on the InfoWorld web site, but the Fast-Talk site has some good info, including a downloadable trial version.) The product works by breaking the audio stream into phonemes, which are the 'basic units of sound in a language.' The search is then performed for a specific sequence of phonemes. This method is faster and far superior to traditional audio searches which convert to text and then perform a normal text search. The author of the Infoworld article, Jon Udell, tried a variety of searches that were surpisingly successful. If this technology is as good as he claims, there is a reasonable chance it will revolutionize the way we store data. Maybe there will even be an 'Audio' tab on Google." Here's the Infoworld article.

10 of 135 comments (clear)

  1. Just one more step on the road to TIA by Grapes4Buddha · · Score: 5, Insightful

    How long before the feds start digitizing all of our telephone conversations and using this technology to google our private conversations?

    Yay!

    1. Re:Just one more step on the road to TIA by m.lemur · · Score: 5, Insightful

      You think its not happening already?

  2. Converting to Text by TrekkieGod · · Score: 1, Insightful

    I don't know much about the subject, but isn't this the method used to convert speech to text? Sounds to me like it's the only way to do it...comparison of a sequence of phonemes to another, except that the each word in the dictionary is associated with a sequence of phonemes. And that's why you're required to "train" the software with your own voice/accent.

    Somebody who knows about the subject, please post and explain the process.

    --

    Warning: Opinions known to be heavily biased.

  3. Imagine... by tjamme · · Score: 4, Insightful

    ... Or imagine Google recording all possible audio streams (TV, radio, ... streets?) and allowing us to search those? All it takes is enough procesors, a bit of wiring...

    Now if you record street conversations or all types of public conversations... Do a search on 'bomb'... How appealing is that to big brother.

    All right... I'm learning sign language. Now.

    1. Re:Imagine... by Idarubicin · · Score: 3, Insightful
      Just googled for "bomb". Got 5,430,000 results.

      I just did the same. Got 5,580,000 results, only three hours later.

      At that rate of growth, (50,000 bombs per hour, or about 14 bombs per second) there's going to be an awful lot of poor bastards at the FBI/CIA/NSA chasing noise...

      --
      ~Idarubicin
  4. Exciting Implications by flopsy+mopsalon · · Score: 3, Insightful
    By focusing on phonemes rather than syllables or whole words, this software can operate independent of any one languange. This has exciting implications not just for audio searching, but implies a strong beginning for voice-recognition and even speech translation software.

    I just hope one of those nuisance lawsuits from Tzsvestaeya Zolskovova, the eccentric widow of Sergei Zolskovova, (Russian lunguist who coined the word phoneme) over the use of the term "phoneme" doesn't hobble progress in this fascinating area.

  5. Wow, cool idea by Drakonian · · Score: 3, Insightful
    I've always thought that audio/video is one huge information bank that has never been easily accessed. If you know of something textual, you go to Google to find it. But what if you wanted to read a Steve Jobs keynote from a couple years back? It's not particularly likely that anyone transcribed it for you. The video stream is probably long gone. But with this technology, you can have a searchable record of that fairly easily. Brilliant stuff.

    Someone mentioned it can be used by the government for TIA stuff - agreed, but same with any technology. It has its positive and negative uses. I don't think we are all going to revert to cavemen to get away from it.

    --
    Random is the New Order.
  6. Re:Point? by lux55 · · Score: 2, Insightful

    Aside from searching for music, I can see this being really useful in web conferencing software. Consider this:

    You hold a meeting where each person's channel was recorded and stored as part of the meeting info. Upon saving the meeting minutes, the software builds a phonetic index of the entire conversation.

    Searches later on would be no more taxing on the server than a fulltext search in MySQL is now.

    Useful? Definitely. And that's just one possibility.

  7. wordspotting by GoBears · · Score: 2, Insightful

    The basic idea of using audio similarity to "grep" short sounds out of audio streams (as opposed to using ASR and text-matching) is quite old - some classic papers based on dynamic timewarping date back to 1977, and HMMs became popular for this application about ten years after that. Papers on this kind of thing appear in conferences like ICASSP - look for keywords like "keyword spotting" or "wordspotting." The phone company wanted to do this for obvious reasons.

    Note that I'm not saying the GATech technology used by this company is derivative - I haven't looked at the specifics of this approach.

  8. Re:Point? by Chasing+Amy · · Score: 3, Insightful

    Well, I'd personally love to have an audio search tool to comb through all the mp3 files of talk radio programs such as *Loveline*, *Opie & Anthony*, and *The Greaseman*, which I have. Sometimes I think, "Now which show had that cool bit about..." and I have no hope of finding it.

    For a professional rather than personal use, imagine how useful this could be to radio stations if they keep digital archives of their programs--if someone wanted to look up a particular program based on a vague memory of some of the text, a tool like this would be invaluable.

    --

    Chasing Amy
    (We all chase Amy...)
    "The more corrupt the state, the more numerous the laws"-Tacitus