Slashdot Mirror


Full-Text Audio Search

Captain Chad writes "The latest print edition (12/16/2002) of InfoWorld has an interesting article about an audio search program by Fast-Talk Communications. (The article is not yet available on the InfoWorld web site, but the Fast-Talk site has some good info, including a downloadable trial version.) The product works by breaking the audio stream into phonemes, which are the 'basic units of sound in a language.' The search is then performed for a specific sequence of phonemes. This method is faster and far superior to traditional audio searches which convert to text and then perform a normal text search. The author of the Infoworld article, Jon Udell, tried a variety of searches that were surpisingly successful. If this technology is as good as he claims, there is a reasonable chance it will revolutionize the way we store data. Maybe there will even be an 'Audio' tab on Google." Here's the Infoworld article.

13 of 135 comments (clear)

  1. How long... by slashuzer · · Score: 2, Interesting
    How long before this technolog is actually banned under DMCA? After all, if you *need* to search music by electronic means then you are obviously a *thief*.....

    /sarcasm

  2. How long... by ejdmoo · · Score: 2, Interesting

    before we have a "video" tab on google? :)

  3. Point? by Anonymous Coward · · Score: 1, Interesting

    I can't help but wonder what the point of a "Full text audio search" would be. Most songs have their lyrics online, most speaches are already in document form, etc etc. Plus, wouldn't searching through audio files be incredibly computer-taxing? What about the services like AOL that cache almost everything? Would they start having to cache 300meg wav files?

    Now, if I could hum a tune into my computer and have it find what song I was humming (for those songs you just can't remember the lyrics too), i would be much happier.

  4. Conversion? by The-Perl-CD-Bookshel · · Score: 2, Interesting

    Do the songs need to be converted to this new phoenic format or can you just search the audio? Wouldn't this use a tremendous amount of computing power?

    --
    I don't keep a lid on my coffee so when I walk around I look busy -me
  5. Dialects and Foreign Language application? by beanerspace · · Score: 3, Interesting

    So does anyone out there know how well this technology deals with accents and dialects? If so, perhaps we could finally see that 'Star Trek' like universal translator - or at least translate on a large scale media works from the past century into other languages.

    Of course, noble thoughts aside, I keep thinking how useful it would have been to have such technology in college when I had to transpose long lectures from my chicken scratch.

    Hmmm ... does this spell the end to stenography as we know it?

  6. Phoenetic search engines by Moonshadow · · Score: 3, Interesting

    I once wrote a phoenetic search engine for a site that took keywords and broke them down into their soundex phonemes, then stored those. Then, when a search was executed, it would convert the text words into phoenetic pieces and search the database for matches. It was quite accurate, actually. For example, one could search for "olif ghardin" and one of the returned results would be "olive garden".

    I guess this is a similar idea. Pretty cool tech.

  7. I've often wondered about using a tech like this.. by asparagus · · Score: 3, Interesting

    For editing films. For documentaries in particular, this would be a godsend. Imagine if, in addition to video/audio tracks, you had a simple 'text' track with which you could easily assemble your cuts.

    If nothing else, putting the computer to work on the 'condense 100 hours of footage into pieces of paper' stage would be a nice step.

    If it prevented just one assitant editor from going insane, it'd be worth it. Do it for the children.

    -Brett

  8. That's called query-by-humming by V.P. · · Score: 2, Interesting

    and there are several research prototypes that can do it (check out a paper from Cornell, or just google for query-by-humming)

  9. Re:Just one more step on the road to TIA by Anonymous Coward · · Score: 3, Interesting
    You think its not happening already?

    Yes and no; more no than yes:

    Until about two years ago, speaker-independent telephone speech dictation for accurate word-spotting wasn't good enough to run on mass volumes of calls.

    So, the story goes that it finally got tested in 2000. Even with fairly high accuracy rates, the utility of the extracted data was near or below zero, in that the amount of time a human agent would spend reviewing the tagged calls (international calls can be eavesdropped without a warrant) was less effective by some measure than if the same agent had been following other readily available leads.

    The whole problem is that most of the people who talk about {drugs, bombs, nerve gas, etc.} are not the people engaged in the manufacture and smuggling of those contraband; usually such people use code words. A code book with coordinates, maps, and timetables sent by FAX between anonymous hotel business centers can completely confound even the most concerted traditional eavesdropping scheme, let alone an automated word-spotting system. So, the agents ended up reviewing hundreds of calls between people talking about cocaine, but not one call between people talking about shipping or producing cocaine.

    Speech recognition has a place in law enforcement by mass-eavesdropping, but I don't think that place is found, yet. I predict it will probably end up being used more to ferret gays out of the military than anything else.

  10. Re:Just one more step on the road to TIA by RDPIII · · Score: 4, Interesting
    How long before the feds start digitizing all of our telephone conversations and using this technology to google our private conversations?

    Let's see, given 5000 billion dial equipment minutes in 2001, we'd have around 150 trillion seconds of conversations. Assuming you could code everything at a bitrate of 8kbps, this would mean roughly 150 terabytes of compressed data for 2001 alone. Presumably the storage would be distributed at the switches where you record the conversations. So the problem is now to compress, transcribe, index, search, decompress, and access 150 terabytes of distributed storage.

    And keep in mind that doing a phoneme transcription rather than full-blown speech-to-text is likely to generate a whole lot of nonsense transcriptions, precisely because you don't have any guiding information from the words in the conversation.

    While I enjoy Popular Paranoia as much as the next guy, the whole TIA thing does not really get to me. My reaction is mostly: bring it on, if you really think you can convince yourself that it can be done.

    --
    Marklar: marklar
  11. potential for medical applications is exciting by ColMstrd · · Score: 3, Interesting

    As 90% of the data for diagnosis comes from the history-taking (interviewing) the patient, the potential for automating/supporting diagnosis is exciting.

    Imagine a system that listens to a consultation in real time, making helpful suggestions for diagnosis based on analysis of the patient and the doctor's phoneme streams! And no tedious data entry, just an unobtrusive microphone.

    I've been waiting for this.

    --
    You can never eat too much, only cycle too little.
  12. The trench oily guy feels great by xixax · · Score: 3, Interesting

    Italys nose probes them et al. fingering our hose
    the fee Cult to longer stained syrups and Hussein marmot pervert sucks eggs rat. Intact, eye amusing into dick tape his pest of flash snot.

    - - -

    It has no problems at all firguring out those difficult to understand lyrics and has an almost perfect success rate. In fact, I am using it to dictate this post to Slashdot.

    --
    "Everything is adjustable, provided you have the right tools"
  13. Old News by annodomini · · Score: 2, Interesting

    I saw a demonstration essentially the same technology at Compaq's CRL about two and a half years ago (formerly DEC CRL, or Cambridge Research Lab, the guys who did research for AltaVista). It did exactly the same thing. It broke sound files down into phonemes, then searched based on the phoneme. It was mostly used for finding a clip on the web rather than a specific place in a long file, but it was the same idea. The nice thing was that it was OK for its application if it missed once or twice. If the audio file was relevant, the word or phrase was probably used multiple times in the clip. It was pretty good at finding NPR stories about certain events. In fact, you can try it out for yourself at an online demo.