Full-Text Audio Search

← Back to Stories (view on slashdot.org)

Posted by timothy on Sunday December 22, 2002 @09:43AM from the information-awareness dept.

Captain Chad writes "The latest print edition (12/16/2002) of InfoWorld has an interesting article about an audio search program by Fast-Talk Communications. (The article is not yet available on the InfoWorld web site, but the Fast-Talk site has some good info, including a downloadable trial version.) The product works by breaking the audio stream into phonemes, which are the 'basic units of sound in a language.' The search is then performed for a specific sequence of phonemes. This method is faster and far superior to traditional audio searches which convert to text and then perform a normal text search. The author of the Infoworld article, Jon Udell, tried a variety of searches that were surpisingly successful. If this technology is as good as he claims, there is a reasonable chance it will revolutionize the way we store data. Maybe there will even be an 'Audio' tab on Google." Here's the Infoworld article.

23 of 135 comments (clear)

Just one more step on the road to TIA by Grapes4Buddha · 2002-12-22 09:48 · Score: 5, Insightful

How long before the feds start digitizing all of our telephone conversations and using this technology to google our private conversations?

Yay!
1. Re:Just one more step on the road to TIA by m.lemur · 2002-12-22 09:56 · Score: 5, Insightful
  
  You think its not happening already?
2. Re:Just one more step on the road to TIA by Anonymous Coward · 2002-12-22 11:09 · Score: 3, Interesting
  
  You think its not happening already?
  
  Yes and no; more no than yes:
  Until about two years ago, speaker-independent telephone speech dictation for accurate word-spotting wasn't good enough to run on mass volumes of calls.
  So, the story goes that it finally got tested in 2000. Even with fairly high accuracy rates, the utility of the extracted data was near or below zero, in that the amount of time a human agent would spend reviewing the tagged calls (international calls can be eavesdropped without a warrant) was less effective by some measure than if the same agent had been following other readily available leads.
  The whole problem is that most of the people who talk about {drugs, bombs, nerve gas, etc.} are not the people engaged in the manufacture and smuggling of those contraband; usually such people use code words. A code book with coordinates, maps, and timetables sent by FAX between anonymous hotel business centers can completely confound even the most concerted traditional eavesdropping scheme, let alone an automated word-spotting system. So, the agents ended up reviewing hundreds of calls between people talking about cocaine, but not one call between people talking about shipping or producing cocaine.
  Speech recognition has a place in law enforcement by mass-eavesdropping, but I don't think that place is found, yet. I predict it will probably end up being used more to ferret gays out of the military than anything else.
3. Re:Just one more step on the road to TIA by RDPIII · 2002-12-22 13:17 · Score: 4, Interesting
  
  How long before the feds start digitizing all of our telephone conversations and using this technology to google our private conversations?
  
  Let's see, given 5000 billion dial equipment minutes in 2001, we'd have around 150 trillion seconds of conversations. Assuming you could code everything at a bitrate of 8kbps, this would mean roughly 150 terabytes of compressed data for 2001 alone. Presumably the storage would be distributed at the switches where you record the conversations. So the problem is now to compress, transcribe, index, search, decompress, and access 150 terabytes of distributed storage.
  
  And keep in mind that doing a phoneme transcription rather than full-blown speech-to-text is likely to generate a whole lot of nonsense transcriptions, precisely because you don't have any guiding information from the words in the conversation.
  
  While I enjoy Popular Paranoia as much as the next guy, the whole TIA thing does not really get to me. My reaction is mostly: bring it on, if you really think you can convince yourself that it can be done.
  
  --
  Marklar: marklar
Yes, but... by Erpo · 2002-12-22 09:55 · Score: 5, Funny

...can it decode rap and/or reggae? I swear I can't understand 3/4 of those lyrics. Songs could start with

-----BEGIN PGP MESSAGE-----

and I wouldn't be able to tell the difference.
Re:Google's Voice Searc by The-Perl-CD-Bookshel · 2002-12-22 09:55 · Score: 5, Informative

The Google voice search is used to search Google by telephone rather than online. This doesn't search through voice/audio records for matches.

--
I don't keep a lid on my coffee so when I walk around I look busy -me
Patches... by Rhubarb+Crumble · 2002-12-22 09:57 · Score: 4, Funny

Combine this with the Streamed Audio Kernel source, and it's only a matter of time before people leave patches on Linus' voicemail! The great thing is that to patch an audio kernel, you only need a tape recorder.... :P
1. Re:Patches... by orangesquid · 2002-12-22 10:12 · Score: 3, Funny
  
  Now what would be cool is a C virtual machine running in a tape recorder, so you could just get one of those larger-capacity audio tapes, record the linux kernel broadcast for the next 600-something days, and then boot up linux on a tape recorder! Hah! NetBSD, beat that!
  
  --
  --TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
Dialects and Foreign Language application? by beanerspace · 2002-12-22 10:04 · Score: 3, Interesting

So does anyone out there know how well this technology deals with accents and dialects? If so, perhaps we could finally see that 'Star Trek' like universal translator - or at least translate on a large scale media works from the past century into other languages.

Of course, noble thoughts aside, I keep thinking how useful it would have been to have such technology in college when I had to transpose long lectures from my chicken scratch.

Hmmm ... does this spell the end to stenography as we know it?

--

healyourchurchwebsite.com - WWJB?
Imagine... by tjamme · 2002-12-22 10:09 · Score: 4, Insightful

... Or imagine Google recording all possible audio streams (TV, radio, ... streets?) and allowing us to search those? All it takes is enough procesors, a bit of wiring...

Now if you record street conversations or all types of public conversations... Do a search on 'bomb'... How appealing is that to big brother.

All right... I'm learning sign language. Now.
1. Re:Imagine... by Idarubicin · 2002-12-22 18:20 · Score: 3, Insightful
  
  Just googled for "bomb". Got 5,430,000 results.
  I just did the same. Got 5,580,000 results, only three hours later.
  At that rate of growth, (50,000 bombs per hour, or about 14 bombs per second) there's going to be an awful lot of poor bastards at the FBI/CIA/NSA chasing noise...
  
  --
  ~Idarubicin
Exciting Implications by flopsy+mopsalon · 2002-12-22 10:09 · Score: 3, Insightful

By focusing on phonemes rather than syllables or whole words, this software can operate independent of any one languange. This has exciting implications not just for audio searching, but implies a strong beginning for voice-recognition and even speech translation software.
I just hope one of those nuisance lawsuits from Tzsvestaeya Zolskovova, the eccentric widow of Sergei Zolskovova, (Russian lunguist who coined the word phoneme) over the use of the term "phoneme" doesn't hobble progress in this fascinating area.
Phoenetic search engines by Moonshadow · 2002-12-22 10:09 · Score: 3, Interesting

I once wrote a phoenetic search engine for a site that took keywords and broke them down into their soundex phonemes, then stored those. Then, when a search was executed, it would convert the text words into phoenetic pieces and search the database for matches. It was quite accurate, actually. For example, one could search for "olif ghardin" and one of the returned results would be "olive garden".

I guess this is a similar idea. Pretty cool tech.
I've often wondered about using a tech like this.. by asparagus · 2002-12-22 10:10 · Score: 3, Interesting

For editing films. For documentaries in particular, this would be a godsend. Imagine if, in addition to video/audio tracks, you had a simple 'text' track with which you could easily assemble your cuts.

If nothing else, putting the computer to work on the 'condense 100 hours of footage into pieces of paper' stage would be a nice step.

If it prevented just one assitant editor from going insane, it'd be worth it. Do it for the children.

-Brett
What we need now is by Mr.+Shiny+And+New · 2002-12-22 10:14 · Score: 3, Funny

A search engine that lets you hum a song and it figures out which one it is.
Wow, cool idea by Drakonian · 2002-12-22 10:15 · Score: 3, Insightful

I've always thought that audio/video is one huge information bank that has never been easily accessed. If you know of something textual, you go to Google to find it. But what if you wanted to read a Steve Jobs keynote from a couple years back? It's not particularly likely that anyone transcribed it for you. The video stream is probably long gone. But with this technology, you can have a searchable record of that fairly easily. Brilliant stuff.
Someone mentioned it can be used by the government for TIA stuff - agreed, but same with any technology. It has its positive and negative uses. I don't think we are all going to revert to cavemen to get away from it.

--
Random is the New Order.
Refined twist on an old idea by pongo000 · 2002-12-22 10:41 · Score: 3, Informative

Soundex, which uses the way words sound rather than the way they are spelled, has been widely used by the government and genealogy researchers for the past 60 years. This isn't exactly "new" technology.

Why are more and more /. articles starting to sound like corporate press releases?
So... by dacarr · 2002-12-22 10:46 · Score: 5, Funny

Does it recognize speech, or does it wreck a nice peach?

--
This sig no verb.
InfoWorld article by prostoalex · 2002-12-22 10:58 · Score: 3, Informative

The article is not yet available on the InfoWorld web site
Actually it is. InfoWorld: The Power of Voice.
I looked into this recently by RebornData · 2002-12-22 11:17 · Score: 4, Informative

There are a few papers available for download from their website, but you have to register. Basically, traditional voice recognition parses the audio stream into some meta-form, usually representing phonemes (the low-level "atomic" sounds that your speech consists of). These phonemes are then matched against a dictionary of known words (and the phonemes they consist of) and text is produced.

Because phoneme recognition is not particularly accurate (for example, it's hard to tell the difference between "hard d" as in "Dan" and "hard b" as in "Ban" over a noisy phone line), traditional speech to text systems use several approaches to improve accuracy. One is to improve the accuracy of the basic phoneme recognition by "training" it for a specific voice. Another is to use all sorts of hairy-language-specific grammar / syntax algorithms.

Computationally, it's the matching of the phonemes against the dictionary that's the most difficult, and the larger the dictionary, the less accurate and more CPU-chomping it becomes. In addition, searching the resulting text for specific matches grows less accurate as the search string increases in length, due to the likelihood of a transcription errors.

The cool thing that Fast Talk has done is to store and index the phoneme meta-data, rather than complete the recognition to text. When you enter search words, they break the search string into phonemes and look for matches that way. This has several positive benefits:

1. Computational resources are dramatically lessened, since the "phoneme recognition algorithms" are fast and there's no dictionary matching.
2. The matching doesn't depend on having the right words in the dictionary at input time. It works just as well for unusual proper names and technical jargon as it does for common words, since they're all formed from the same basic phonemes.
3. The longer the search string, the greater probability of an accurate match.
4. No need for accurate search string spelling. It doesn't matter if you know how to spell a word, as long as you can write it down phonetically.

In theory, the system should work for any language, but reality is that different languages do have different sets of phonemes, and I think Fast Talk has only really worked on English. So languages like Spanish that are fairly similar phonetically to English would probably work pretty well, but tonal languages like Mandarin Chinese or those with non-vocal sounds like the clicks and pops of the African Bushmen would require a rework of the phoneme recognition code.

The main downside of their system is that it doesn't actually produce text... which means that you'd need another speech-to-text system if you wanted transcripts, or want the data to be searchable with whatever standard text-based search engine you are using on your intranet. But they appear to be aiming at applications where that's not necessary. One of my favorite ideas is integrating it with a video editing suite and being able to jump to different cues in your video clip library simply by stating the dialogue that's found there.

Of course, one of the most obvious applications is for intelligence and security. So far it doesn't appear that the company is pushing too hard in that direction -- it was founded by an academic group that originally developed the technology for a library project at Georgia Tech. However, I'm betting that's where the real money is, and it's only a matter of time before their ideas are found in your favorite national department of big-brotherhood.

-R
potential for medical applications is exciting by ColMstrd · 2002-12-22 13:42 · Score: 3, Interesting

As 90% of the data for diagnosis comes from the history-taking (interviewing) the patient, the potential for automating/supporting diagnosis is exciting.

Imagine a system that listens to a consultation in real time, making helpful suggestions for diagnosis based on analysis of the patient and the doctor's phoneme streams! And no tedious data entry, just an unobtrusive microphone.

I've been waiting for this.

--
You can never eat too much, only cycle too little.
The trench oily guy feels great by xixax · 2002-12-22 14:58 · Score: 3, Interesting

Italys nose probes them et al. fingering our hose
the fee Cult to longer stained syrups and Hussein marmot pervert sucks eggs rat. Intact, eye amusing into dick tape his pest of flash snot.

- - -

It has no problems at all firguring out those difficult to understand lyrics and has an almost perfect success rate. In fact, I am using it to dictate this post to Slashdot.

--
"Everything is adjustable, provided you have the right tools"
Re:Point? by Chasing+Amy · 2002-12-22 17:19 · Score: 3, Insightful

Well, I'd personally love to have an audio search tool to comb through all the mp3 files of talk radio programs such as *Loveline*, *Opie & Anthony*, and *The Greaseman*, which I have. Sometimes I think, "Now which show had that cool bit about..." and I have no hope of finding it.

For a professional rather than personal use, imagine how useful this could be to radio stations if they keep digital archives of their programs--if someone wanted to look up a particular program based on a vague memory of some of the text, a tool like this would be invaluable.

--

Chasing Amy
(We all chase Amy...)
"The more corrupt the state, the more numerous the laws"-Tacitus