Slashdot Mirror


Full-Text Audio Search

Captain Chad writes "The latest print edition (12/16/2002) of InfoWorld has an interesting article about an audio search program by Fast-Talk Communications. (The article is not yet available on the InfoWorld web site, but the Fast-Talk site has some good info, including a downloadable trial version.) The product works by breaking the audio stream into phonemes, which are the 'basic units of sound in a language.' The search is then performed for a specific sequence of phonemes. This method is faster and far superior to traditional audio searches which convert to text and then perform a normal text search. The author of the Infoworld article, Jon Udell, tried a variety of searches that were surpisingly successful. If this technology is as good as he claims, there is a reasonable chance it will revolutionize the way we store data. Maybe there will even be an 'Audio' tab on Google." Here's the Infoworld article.

135 comments

  1. How long... by slashuzer · · Score: 2, Interesting
    How long before this technolog is actually banned under DMCA? After all, if you *need* to search music by electronic means then you are obviously a *thief*.....

    /sarcasm

    1. Re:How long... by slashuzer · · Score: 0
      We can already search currently though it doesn't work as well because the search depends on the filename. Most of the video's stored on net (porn?) are rather cryptical like vid01.mpg

      How do ou know what's in that file? What if there are more than one version of the same file? Finall, if we use this process for videos, I wonder what will be the processing overheads....

      Even more so, How long before people start listing false "videos" signature to improve their ratings...

    2. Re:How long... by Anonymous Coward · · Score: 1, Informative

      Actually, there is already an equivalent to the proposed Audio tab on Google. It is an HP Research project, SpeechBot

      Another similar product is already on the market from Scansoft (formerly Dragon Systems) but it uses a complete different approach than the fast-talk product. We are actually using Scansoft product where I work to index all of the media (audio and video) files on the corprate lan.

    3. Re:How long... by sryx · · Score: 1

      before we have a "video" tab on google? :)
      With the way Bush is going how long before he gets a "homeland" tab on his own version of Google :P

      -Jason

  2. Just one more step on the road to TIA by Grapes4Buddha · · Score: 5, Insightful

    How long before the feds start digitizing all of our telephone conversations and using this technology to google our private conversations?

    Yay!

    1. Re:Just one more step on the road to TIA by m.lemur · · Score: 5, Insightful

      You think its not happening already?

    2. Re:Just one more step on the road to TIA by Anonymous Coward · · Score: 3, Interesting
      You think its not happening already?

      Yes and no; more no than yes:

      Until about two years ago, speaker-independent telephone speech dictation for accurate word-spotting wasn't good enough to run on mass volumes of calls.

      So, the story goes that it finally got tested in 2000. Even with fairly high accuracy rates, the utility of the extracted data was near or below zero, in that the amount of time a human agent would spend reviewing the tagged calls (international calls can be eavesdropped without a warrant) was less effective by some measure than if the same agent had been following other readily available leads.

      The whole problem is that most of the people who talk about {drugs, bombs, nerve gas, etc.} are not the people engaged in the manufacture and smuggling of those contraband; usually such people use code words. A code book with coordinates, maps, and timetables sent by FAX between anonymous hotel business centers can completely confound even the most concerted traditional eavesdropping scheme, let alone an automated word-spotting system. So, the agents ended up reviewing hundreds of calls between people talking about cocaine, but not one call between people talking about shipping or producing cocaine.

      Speech recognition has a place in law enforcement by mass-eavesdropping, but I don't think that place is found, yet. I predict it will probably end up being used more to ferret gays out of the military than anything else.

    3. Re:Just one more step on the road to TIA by Anonymous Coward · · Score: 0

      Speech recognition has a place in law enforcement by mass-eavesdropping, but I don't think that place is found, yet. I predict it will probably end up being used more to ferret gays out of the military than anything else.
      Well, they'd just have to check the colour of their underwear, wouldn't they?

    4. Re:Just one more step on the road to TIA by RDPIII · · Score: 4, Interesting
      How long before the feds start digitizing all of our telephone conversations and using this technology to google our private conversations?

      Let's see, given 5000 billion dial equipment minutes in 2001, we'd have around 150 trillion seconds of conversations. Assuming you could code everything at a bitrate of 8kbps, this would mean roughly 150 terabytes of compressed data for 2001 alone. Presumably the storage would be distributed at the switches where you record the conversations. So the problem is now to compress, transcribe, index, search, decompress, and access 150 terabytes of distributed storage.

      And keep in mind that doing a phoneme transcription rather than full-blown speech-to-text is likely to generate a whole lot of nonsense transcriptions, precisely because you don't have any guiding information from the words in the conversation.

      While I enjoy Popular Paranoia as much as the next guy, the whole TIA thing does not really get to me. My reaction is mostly: bring it on, if you really think you can convince yourself that it can be done.

      --
      Marklar: marklar
    5. Re:Just one more step on the road to TIA by Grapes4Buddha · · Score: 1

      150 terabytes to manage is not really all that much if you consider the budget that this program will have. Even if it was, the ability to store and process data will increase at a much faster rate than the amount of data generated by telephone conversations, so your argument about it not being entirely feasable from a technical standpoint would only hold water for a few years at most.

    6. Re:Just one more step on the road to TIA by RDPIII · · Score: 1

      Sure, you can easily store 150 TB of data per year. But the problem is: how are you going to get anything out of it? You'll need an index, but what would you put in the index if all you have is phonemes? Similarly, you can search for suspicious words, but you can't easily build word-based concordances from just phonemes. So the phoneme-based word spotting technology (which, by the way, is not all that new) is definitely not something that any form of TIA should be based on, if it's supposed to have any practical utility.

      --
      Marklar: marklar
    7. Re:Just one more step on the road to TIA by Anonymous Coward · · Score: 0

      You do realize that most long distance calls have been monitored in just such a fashion, but automated tools looking for key names and phrases since the fifties?

      It's called Echelon and you can read about it at FAS.org's Echelon writeup. Note that there are references to recognizing phrases out of electronic communications as well as to technologies that can recognize phrases in audio streams.

    8. Re:Just one more step on the road to TIA by Anonymous Coward · · Score: 0
      I predict it will probably end up being used more to ferret gays out of the military than anything else.


      I can't understand why people don't want queers in the military. a) Cannon fodder b) Removes unwanted genes from the gene pool c) They could knit their own uniforms.
    9. Re:Just one more step on the road to TIA by Anonymous Coward · · Score: 0

      Removes unwanted genes from the gene pool

      I hate to break this to you, but the chances of a gay couple concieving a child aren't very good in the first place :o)

    10. Re:Just one more step on the road to TIA by smokin_juan · · Score: 1

      "bring it on, if you really think you can convince yourself that it can be done."

      Maybe the idea that they CAN'T do it right is the exact idea that should worry you. Think about this: They can't execute the war on (some) drugs correctly because it's a bullshit idea, same as TIA. Now we have jack-boot thugs busting in on completely innocent 70 year old couples and scaring them (literally) to death... And we have agents (of Matrix quality) imprisoning, denying medication, and consequently killing medical MJ patients who are using the drug legitmately (Peter McWilliams).

      In short, things the government does that it is not supposed to do or cannot do correctly are the exact things you should worry about.

    11. Re:Just one more step on the road to TIA by peter · · Score: 2

      Is that supposed to be funny? Are we supposed to be laughing at you for being so prejudiced?

      --
      #define X(x,y) x##y
      Peter Cordes ; e-mail: X(peter@cordes , .ca)
  3. Most Likely New Application by Anonymous Coward · · Score: 0

    Mass searches of phone calls for "terrorist," "bomb," "Allah," and "Slashdot."

    1. Re:Most Likely New Application by Maggot75 · · Score: 1

      don't forget: goatse.cx

  4. How long... by ejdmoo · · Score: 2, Interesting

    before we have a "video" tab on google? :)

  5. Point? by Anonymous Coward · · Score: 1, Interesting

    I can't help but wonder what the point of a "Full text audio search" would be. Most songs have their lyrics online, most speaches are already in document form, etc etc. Plus, wouldn't searching through audio files be incredibly computer-taxing? What about the services like AOL that cache almost everything? Would they start having to cache 300meg wav files?

    Now, if I could hum a tune into my computer and have it find what song I was humming (for those songs you just can't remember the lyrics too), i would be much happier.

    1. Re:Point? by Blacklist+Blacklist · · Score: 0

      Why the hell not?

      --

      Fight the Troll Blacklist
    2. Re:Point? by Eimi+Metamorphoumai · · Score: 2

      I can see a good use for it, and that's taking notes. Imagine carrying around a microphone with you 24/7 and recording everything you hear. We've got the space to store it all, after all. Then you can go back and check "When did he say that meeting was?" or whatever. And those with significant others know how often you'll end up arguing over who said what.

      --

      Visit me on #weirdness on the Galaxynet.

    3. Re:Point? by lux55 · · Score: 2, Insightful

      Aside from searching for music, I can see this being really useful in web conferencing software. Consider this:

      You hold a meeting where each person's channel was recorded and stored as part of the meeting info. Upon saving the meeting minutes, the software builds a phonetic index of the entire conversation.

      Searches later on would be no more taxing on the server than a fulltext search in MySQL is now.

      Useful? Definitely. And that's just one possibility.

    4. Re:Point? by Chasing+Amy · · Score: 3, Insightful

      Well, I'd personally love to have an audio search tool to comb through all the mp3 files of talk radio programs such as *Loveline*, *Opie & Anthony*, and *The Greaseman*, which I have. Sometimes I think, "Now which show had that cool bit about..." and I have no hope of finding it.

      For a professional rather than personal use, imagine how useful this could be to radio stations if they keep digital archives of their programs--if someone wanted to look up a particular program based on a vague memory of some of the text, a tool like this would be invaluable.

      --

      Chasing Amy
      (We all chase Amy...)
      "The more corrupt the state, the more numerous the laws"-Tacitus
    5. Re:Point? by binaryDigit · · Score: 2

      Well if your the govt, then the implications are huge. If they had a way of effieciently "keyword" scanning spoken conversations (esp. phone), this would help intelligence gathering tremendously. If this company can make their stuff work as advertised, they have huge upside potential with the likes of the NSA, DOD, CIA, etc.

  6. Conversion? by The-Perl-CD-Bookshel · · Score: 2, Interesting

    Do the songs need to be converted to this new phoenic format or can you just search the audio? Wouldn't this use a tremendous amount of computing power?

    --
    I don't keep a lid on my coffee so when I walk around I look busy -me
    1. Re:Conversion? by brain159 · · Score: 1

      'cmon, would it kill you to RTFA?

      oh, what am I saying, of course it would!

      and in answer to your question, you index it once which can be done in pretty much 1:1 time, then you save the index files and just search *those* to find things - the index files tell you which recording and timecode the result is found at, then you playback from there.

  7. Converting to Text by TrekkieGod · · Score: 1, Insightful

    I don't know much about the subject, but isn't this the method used to convert speech to text? Sounds to me like it's the only way to do it...comparison of a sequence of phonemes to another, except that the each word in the dictionary is associated with a sequence of phonemes. And that's why you're required to "train" the software with your own voice/accent.

    Somebody who knows about the subject, please post and explain the process.

    --

    Warning: Opinions known to be heavily biased.

    1. Re:Converting to Text by 91degrees · · Score: 1

      Looking at it, it seems essentially the same, except that this will remove the most error prone part. Conversion of phonemes to proper sentences is not easy. there are too many homonomes. If you were to talk about a hare, it could easily be translated as a hair. Therefore a search for hare will nto produce a match.

      If we convert the text to phonemes instead, hare and hair resolve to the same result. So a search for either of those words will produce a match.

      In hindsight, this is an obvious idea. Like many obvious ideas, the person who spotted it was a genius.

    2. Re:Converting to Text by 25PercentBySmoot · · Score: 1

      Most speech to text conversion software has to initially bind the word it thinks it hears to a word in the dictionary. And it could be the wrong word - therefore you will never find the word you are looking for (also you would have to re-index if you are even able to change the word to the correct word)... This Fast Talk technology does not do that, all it tries to do is match the phonemes it hears initially to the phonemes from the query screen and as long as the words sound similar, it will find it. Also dictionaries that the speech to text software's use are not good with proper names, acronyms, and slang words, etc.

    3. Re:Converting to Text by Fast-Talk · · Score: 1

      The Fast-Talk technology doesn't convert Speech to Text. It indexes audio (thats where we apply the secret sauce to phonemes) and then provide the capability for you to search for words, terms, or phrases by applying that secret sauce to your search query and matching that to the indexed audio. The audio that is indexed is already trained to deal with various accents and dialects so it is not necessary to train it for your own voice or accent.

  8. Google's Voice Searc by greenreaper · · Score: 1, Informative

    Actually, Google already has a voice search, albeit in beta form.

    1. Re:Google's Voice Searc by The-Perl-CD-Bookshel · · Score: 5, Informative

      The Google voice search is used to search Google by telephone rather than online. This doesn't search through voice/audio records for matches.

      --
      I don't keep a lid on my coffee so when I walk around I look busy -me
  9. Woohoo. by Nordberg · · Score: 2, Funny

    Now I can finally search for the Free Radio Linux kernel reading by phonemes!

    --
    *Splort*
  10. Flux that snip... by BSOD+from+above · · Score: 1
    eat wheel nibble fork,

    linx ear

    --
    Karma: Censored (mostly affected by decency laws)
  11. Yes, but... by Erpo · · Score: 5, Funny

    ...can it decode rap and/or reggae? I swear I can't understand 3/4 of those lyrics. Songs could start with

    -----BEGIN PGP MESSAGE-----

    and I wouldn't be able to tell the difference.

  12. Re:Google's Voice Search by greenreaper · · Score: 1

    Of course, if I'd read the post properly I'd have known they were talking about searching on audio, not by audio . . . *sigh*

  13. The text of the Article by slashuzer · · Score: 0

    ANALYSIS by John_udell A T infoworld d o t com

    The power of voice

    By Jon Udell
    December 13, 2002

    CHEAP STORAGE MAKES it feasible to save voice recordings of many of our meetings, teleconferences, interviews, and other conversations. In some environments -- call centers and certain sectors of finance and government -- that already happens. But audio surveillance isn't yet routine, and the thorny legal, social, and cultural issues it raises haven't yet been widely debated. That's because, until now, there was no practical way to mine voice data.

    As with other forms of practical obscurity, this artificial barrier was bound to topple, and now it has. Fast-Talk Communications' revolutionary phonetic indexing and search technology brings the magic of full-text search to the formerly opaque realms of audio recordings and video soundtracks. If you consider the way in which Google has already become everyone's indispensable "outboard brain," and extrapolate that to all the voice data that exists -- and to the vast quantities that soon will exist -- it's hard to avoid the conclusion that Fast-Talk is one of the most disruptive technologies in the pipeline.

    A phonetic search engine

    What Fast-Talk sells is an engine and a software development kit, not an end-user product. The kit includes a "technology demo," however, which is a fully functional tool that has changed how I work in a dramatic way. Though I've been a journalist on and off for many years, I had never integrated audio recording into my routine. Finding quotes in those recordings was a painful process, and sending them out for transcription (as my InfoWorld colleagues routinely do) incurred delay and expense. So, being a fast typist, I just captured what I needed live. That technique was stressful, not always accurate, and obviously not appropriate for most people. So when I interviewed Antarctica Systems CTO Tim Bray recently for InfoWorld's CTO Zone (see "Mapping the future"), I used Fast-Talk to record, index, and then search the conversation.

    The Fast-Talk engine can work with multiple audio formats, using pluggable "media accessors" to encapsulate them. The technology demo supports only WAV files, which it indexes to create PAT (phonetic audio track) indexes. If you want to search video, Fast-Talk recommends using VirtualDub, an open-source program, to extract the audio track as a WAV file. You can use Fast-Talk's demo to index pre-existing WAV files or, as I did, to index a WAV file while recording. This near-real-time indexing meant I was able to begin searching the index as soon as the 45-minute conversation ended. That was true because Fast-Talk's phonetic technology is orders of magnitude faster than the conventional alternative: speech-to-text translation followed by text indexing.

    Like many great innovations, Fast-Talk is simple to describe. Phonemes are the basic units of sound in a language, and North American English has 39 of them. You can look up a word's phonetic spelling in the Carnegie Mellon dictionary (see Kevin Lenzo's Web site at www.speech.cs.cmu.edu/cgi-bin/cmudict). "Dictionary," for example, works out to "D IH K SH AH N EH R IY." Fast-Talk's indexer recognizes phonemes and notes the time of their occurrence. The searcher converts text input to phoneme strings, looks for them, and returns their time-codes. It's as simple -- and brilliant -- as that.

    Fast-Talk in action

    When my interview with Tim Bray was done, the first segment I looked for was the one where Bray said, "Jean Paoli spent four hours showing me XDocs." The name "Jean Paoli" was, not surprisingly, ineffective as a search term. But "four hours" found the segment instantly, as did "fore ours" -- which of course resolves to the same string of phonemes. "Zhawn Powli" also worked, illustrating what will soon become a new strategy for users of voice-aware search engines: When in doubt, spell it out phonetically. In practice, I find myself resorting to this strategy less often than I'd have expected. And it was fairly obvious when to do so. I guessed correctly that "MySQL" would not work, for example, but that "my sequel" would.

    The query language is dead simple, but there's an interesting twist on proximity. In a conventional search engine, proximity means "find a word within so many words of another word." In Fast-Talk's engine, it means "find a string of phonemes within so many seconds of another string of phonemes."

    I was unable to find any variant of "XDocs," but I chalk that up to the recording's poor quality -- I was testing an IP phone at the time. There were some dropouts, and "XDocs" came during one of them. The marginal recording quality was, in fact, an excellent test. Like most people, I have no special audio engineering skill and no special recording equipment. To succeed in the real world, Fast-Talk will have to work well with whatever raw material it can get -- and it does. Although it is tuned for North American English, the international nature of our industry made it inevitable that I would push those limits. Sure enough, the accents I threw at it included Ximian CTO Miguel de Icaza's (Mexican), OpenLink Software CEO Kingsley Idehen's (Nigerian/British), and Systinet CEO Roman Stanek's (Czech), with usable results in each case. It's preferable, of course, to have a high-quality recording of a native speaker of North American English. When I indexed a well-modulated phone conversation that Test Center Director Steve Gillmor had with Microsoft's Mark Lucovsky, the results were simply uncanny.

    Developers will find Fast-Talk to be a clean, well-documented toolkit. The engine is packaged as a static link library for use in Microsoft's C++ environment, and from other languages by way of a COM (Component Object Model) wrapper. (There's not yet a managed interface for .Net, but C# or Visual Basic .Net programmers can use the COM API.) The API supports multithreading so that indexing and search tasks can be parceled out to a set of processors. Non-Windows packaging of the engine, when needed, will be straightforward to produce.

    Call centers are obvious first candidates for the Fast-Talk treatment. "Think about running a support center," says Patrick Taylor, Atlanta-based Fast-Talk's vice president of sales and marketing. In theory, answers to hard questions are written down in a knowledge base. In practice, that rarely happens. "It's compelling to just index everything that's said by the best experts," suggests Taylor, "so you can instantly find where they mention, say, NT kernel error 304."

    Clearly, that's just the tip of the iceberg. The implications are both exhilarating and frightening. "This business of recording everything scares the bejesus out of me," says Ray Ozzie, CEO of Groove Networks in Beverly, Mass. With entry-level deployment of Fast-Talk starting at $10,000, routine meetings and phone calls won't be indexed anytime soon. But it's coming, and it is scary. As always, great power brings great responsibility. The genie's out of the lamp, though, so we'll just have to learn to use this new power well.

    BOTTOM LINE
    Fast-Talk's phonetic searching
    EXECUTIVE SUMMARY
    With Fast-Talk Communication's revolutionary phonetic indexing and search engine, you can instantly find words and phrases buried in many hours of spoken recordings. It's a major breakthrough that will forever transform voice data.

    TEST CENTER PERSPECTIVE
    Google has become the "outboard brain" that we increasingly cannot function without. However, while Google is a voracious reader, it can't hear a thing. Fast-Talk's technology promises to remedy that handicap someday soon. It's a dizzying, if sobering, prospect.

  14. Patches... by Rhubarb+Crumble · · Score: 4, Funny

    Combine this with the Streamed Audio Kernel source, and it's only a matter of time before people leave patches on Linus' voicemail! The great thing is that to patch an audio kernel, you only need a tape recorder.... :P

    1. Re:Patches... by orangesquid · · Score: 3, Funny

      Now what would be cool is a C virtual machine running in a tape recorder, so you could just get one of those larger-capacity audio tapes, record the linux kernel broadcast for the next 600-something days, and then boot up linux on a tape recorder! Hah! NetBSD, beat that!

      --
      --TheOrangeSquid Is it any wonder things seem so awry? We swim in a sea of confusion and don't have to think to survive
    2. Re:Patches... by Rhubarb+Crumble · · Score: 1
      and then boot up linux on a tape recorder!

      And party like it's 1986! Don't go there! (nightmare visions of a zx spectrum enter my mind...."look, it's the yellow and blue pattern and the random static noise! that means it's loading the program!") ;-)

    3. Re:Patches... by Anonymous Coward · · Score: 0

      "you only need a tape recorder"

      What is that? Is that compatible with CDRs?

    4. Re:Patches... by Ed+Avis · · Score: 1

      I keep trying to sample my collection of BBC Micro software on tape and find a program to convert it to files. Apparently it is possible to read the sound in software and write a dump suitable for an emulator. I just need to figure out how to record sound under Linux on my hardware (builtin sound on A7V333).

      --
      -- Ed Avis ed@membled.com
  15. w00t NPR, and NLP feasability? by Flamesplash · · Score: 2

    First I'd like to say that this would be wonderful for NPR to use. *drool*

    On a serious note. I really didn't think NLP software was to the point to make this plausible. I've never actually used NLP tools, but what I've heard in the main stream is that while they work they aren't perfect. This is fine for someone starting at a screen while talking or someone who is going to review the transcription, but it seems like it would break any automated system when there is not system of checks in place, since this involves a human.

    --
    "Not knowing when the dawn will come, I open every door." - Emily Dickinson
    1. Re:w00t NPR, and NLP feasability? by xyzzy · · Score: 2

      The nice thing about using speech recognition for an application is that the reco rate doesn't HAVE to be 100% accurate. After all, YOUR speech recognition system (the one in wetware, between your ears) isn't 100% accurate, and that doesn't stop you from understanding people, right?

      Dictation (the only application where you need/want 100% accuracy) is only one small application for speech recognition.

    2. Re:w00t NPR, and NLP feasability? by Flamesplash · · Score: 2

      How can you reliably search for something if the automatic translation can be wrong?

      the one in wetware, between your ears) isn't 100% accurate, and that doesn't stop you from understanding people, right?

      The wetware has knowledge of context which helps reduce the choices of a potential vocal sound. This is one of the hardest things to build into a NLP system and afaik it currenlty isn't anywhere near perfect, otherwise we'd have voice interfaces to a lot more things in the common world

      --
      "Not knowing when the dawn will come, I open every door." - Emily Dickinson
    3. Re:w00t NPR, and NLP feasability? by xyzzy · · Score: 2

      You can search reliably because of the redundancy of language. You don't need 100% accuracy because people repeat themselves, names are more distinct than common words, etc.

      It really depends on the problem. If you want to guarantee that if a person says "frobnitz" that you're going to find it, yes, you need 100% accuracy. But even human listners aren't going to give that to you. If you want to find people mentioning words associated with terrorism, your accuracy rate can drop somewhat. If you want to find people talking about the *topic* of terrorism, your accuracy can drop even more.

      Your comment about context is very perceptive, however -- I would say that NO current ASR system has essentially ANY real-world context, and as you said this is a tremendous boost in how humans interpret speech. Once that breakthrough is made, NLP in general will take a quantum step forward.

    4. Re:w00t NPR, and NLP feasability? by Flamesplash · · Score: 2

      Here's a real world problem. I was listening to NPR last weekend and only caught part of a program. I was able to track it down rather easily from the stations schedule of programs, but what if I couldn't? What if all I remember from the show is the line "and then she turned away from me and walked away. I was destroyed" how do you search for such a thing in the escentially fuzzy system you have described?

      I see how your description would work in a key-word environment, but what about a phrasal aspect? Or is our best bet ever going to be a fuzzy system?

      --
      "Not knowing when the dawn will come, I open every door." - Emily Dickinson
    5. Re:w00t NPR, and NLP feasability? by xyzzy · · Score: 2

      Well, that's a bit of a contrived problem. It's unlikely you'd ever only remember that about the show. When was the last time you searched for a web page only knowing a single phrase with almost no content words or names, out of context, without any idea of the context?

      In the case of the NPR show, it's likely you'd know the general topic, or a speaker, or some proper names that were mentioned. All of these can be used to augment your search, and all of them can contribute to the accuracy of your results.

    6. Re:w00t NPR, and NLP feasability? by Flamesplash · · Score: 2

      In that case you're be correct most of the time you have enough context.

      However I've seen other threads where they have I guess assumed that this technology would be used for music. I think that remembering a single line would be more apt and example in such a case.

      --
      "Not knowing when the dawn will come, I open every door." - Emily Dickinson
    7. Re:w00t NPR, and NLP feasability? by Fast-Talk · · Score: 1

      Actually you're both right. This technology has the capability of searching for an individual term or for a phrase or line. When speed is important you can use both the time/channel to limit the search and then apply the quote to go directly to that part of the program. When you use phrases or lines of speech the search accuracy actually improves.

  16. Search the Geeks in Space archive! by The-Perl-CD-Bookshel · · Score: 1

    Now you can pretend you care about what they are saying and not just listening for the sweet intro music.

    --
    I don't keep a lid on my coffee so when I walk around I look busy -me
  17. Dialects and Foreign Language application? by beanerspace · · Score: 3, Interesting

    So does anyone out there know how well this technology deals with accents and dialects? If so, perhaps we could finally see that 'Star Trek' like universal translator - or at least translate on a large scale media works from the past century into other languages.

    Of course, noble thoughts aside, I keep thinking how useful it would have been to have such technology in college when I had to transpose long lectures from my chicken scratch.

    Hmmm ... does this spell the end to stenography as we know it?

    1. Re:Dialects and Foreign Language application? by js7a · · Score: 2
      perhaps we could finally see that 'Star Trek' like universal translator

      It's been done already.

    2. Re:Dialects and Foreign Language application? by mberman · · Score: 2

      These are totally seperate issues. All this does is pattern match on a string of phonemes, it makes no claim to understand the meaning behind the sounds, which is what's necessary for a universal translator. In fact, this is actually farther from a translator than the old-style, convert-to-words-then-string-match methods, since that one cares about what the words the sounds make up. This is, however, more efficient, and more versatile, since it needs no dictionary, for this one particular application.

      Phonemes are pretty much language independent. One particular sound sounds the same in different languages. It might be spelled differently, and it certainly falls into different places in different words, but it's made the same way in the mouth, and it has the same acoustic pattern (there are some variations, since some languages make distinctions between different sounds, and others don't, like [l] and [r] are the same phoneme in Japanese, but not in English, and [p] and [p^h] (aspirated [p]) are the same in English, but not in Hindi, but this mostly doesn't matter, since in a given language each form tends to be used in the same words regardless of the speaker). Converting an audio stream to a sequence of phonemes is basically a solved problem (given lack of inflection/emphasis, no background noise, etc.), this is just a new, useful application of an established technique. The problem of translation lies in finding the meaning behind the phonemic sequences.

      --

      This is a self-referential sig

    3. Re:Dialects and Foreign Language application? by npendleton · · Score: 1

      This speech search tool works because phonemes are simple.
      Cognitition and translation are VERY complex.

      Phonemes are a unifying constant of speech, not cognition of language. By definition, words are converted to audible phoneme, not spelling with etemological, gramatical and syntactical meanings.

      Any words used to search are converted to phonemes and then searched against a phoneme trascript of human speech, simplifying and broadening the chance of a match.

      This is MUCH easier than the reverse, of converting phonemes into words, particulaarly homonyms, e.g. "pair" and "pear", "two", "to", and "too". The complexity of extracting correct words, grammar and syntax makes understanding the orignial spoken message VERY hard. We have yet to reliably solve computer cognition of (machine readable) written or spoken language.

      We often use proper nouns such as people and place names. Is "Victor" some guy or the battle winner?
      What would a universal translater think of a spoken question "Do you listen to Phish?"

      How about the amazing French subtitles for "Pulp Fiction"? to paraphrase Travolta's pun joke... A family of tomatos are walking down the street, and the baby Tomato keeps dawdling and getting distracted, and after two warnings to keep up with the family from the Daddy tomato, the Daddy finally just looses his composure on the third time and pounds the pulp out of the baby tomato and yells 'Ketchup!'. A pun for catch-up and tomato ketchup.
      The French subtitle change the joke from Tomatos to Lemons, err... rather "Citron" and on doling out punishment yells the pun "Citron presse!" meaning "Lemon hurry up!" and "lemonade!"

      Universal translators are years off. High level translations will require humans for the forseeable future.

      Mac Refugee, Paper MCSE, Linux wanna be

  18. Imagine... by tjamme · · Score: 4, Insightful

    ... Or imagine Google recording all possible audio streams (TV, radio, ... streets?) and allowing us to search those? All it takes is enough procesors, a bit of wiring...

    Now if you record street conversations or all types of public conversations... Do a search on 'bomb'... How appealing is that to big brother.

    All right... I'm learning sign language. Now.

    1. Re:Imagine... by Bitsy+Boffin · · Score: 2

      Vendor : Look at it's sleek lines, it's a real goer this one. Client : Looks like a bit of an old bomb to me. CIA : Down on the ground everybody !

      --
      NZ Electronics Enthusiasts: Check out my Trade Me Listings
    2. Re:Imagine... by Anonymous Coward · · Score: 0

      I know i'm going to learn telepathy. That way they wouldn't even know i was communicating!

      *rushes off*

    3. Re:Imagine... by Trusty+Penfold · · Score: 1

      All right... I'm learning sign language. Now.

      Too bad ... computers already know sign language

    4. Re:Imagine... by xyzzy · · Score: 2

      I dunno, why don't you try it. Go to Google and do a search on 'bomb'. Was that very useful or appealing to you? I imagine not! :-)

      Now, try that same tactic on every conversation in America. The utility would be some order of magnitude LESS than the crap you got back from google! (if you can have utility less than zero, that is!)

    5. Re:Imagine... by Esion+Modnar · · Score: 1

      Just googled for "bomb". Got 5,430,000 results. Imagine being Big Brother and having to sort through THAT.

      --

      They say the first thing to go is your penis. Well, it's either that or your brain. I forget which...
    6. Re:Imagine... by Idarubicin · · Score: 3, Insightful
      Just googled for "bomb". Got 5,430,000 results.

      I just did the same. Got 5,580,000 results, only three hours later.

      At that rate of growth, (50,000 bombs per hour, or about 14 bombs per second) there's going to be an awful lot of poor bastards at the FBI/CIA/NSA chasing noise...

      --
      ~Idarubicin
    7. Re:Imagine... by Anonymous Coward · · Score: 0

      All right... I'm learning sign language. Now.

      We'd prefer it if you stopped communicating with us altogether.

    8. Re:Imagine... by Anonymous Coward · · Score: 0
  19. Exciting Implications by flopsy+mopsalon · · Score: 3, Insightful
    By focusing on phonemes rather than syllables or whole words, this software can operate independent of any one languange. This has exciting implications not just for audio searching, but implies a strong beginning for voice-recognition and even speech translation software.

    I just hope one of those nuisance lawsuits from Tzsvestaeya Zolskovova, the eccentric widow of Sergei Zolskovova, (Russian lunguist who coined the word phoneme) over the use of the term "phoneme" doesn't hobble progress in this fascinating area.

  20. Phoenetic search engines by Moonshadow · · Score: 3, Interesting

    I once wrote a phoenetic search engine for a site that took keywords and broke them down into their soundex phonemes, then stored those. Then, when a search was executed, it would convert the text words into phoenetic pieces and search the database for matches. It was quite accurate, actually. For example, one could search for "olif ghardin" and one of the returned results would be "olive garden".

    I guess this is a similar idea. Pretty cool tech.

    1. Re:Phoenetic search engines by AnyoneEB · · Score: 1
      For example, one could search for "olif ghardin" and one of the returned results would be "olive garden".
      In other words: this is a great tech for people who can't spell... /.er's will love it :).
      --
      Centralization breaks the internet.
    2. Re:Phoenetic search engines by Danny+Rathjens · · Score: 2

      mysql has a soundex function builtin so you can build indices with them and do SQl queries with it.
      Perl also has the nifty String::Approx module which is great for catching typos instead of homonyms.

  21. I've often wondered about using a tech like this.. by asparagus · · Score: 3, Interesting

    For editing films. For documentaries in particular, this would be a godsend. Imagine if, in addition to video/audio tracks, you had a simple 'text' track with which you could easily assemble your cuts.

    If nothing else, putting the computer to work on the 'condense 100 hours of footage into pieces of paper' stage would be a nice step.

    If it prevented just one assitant editor from going insane, it'd be worth it. Do it for the children.

    -Brett

  22. What we need now is by Mr.+Shiny+And+New · · Score: 3, Funny

    A search engine that lets you hum a song and it figures out which one it is.

    1. Re:What we need now is by pandemonia · · Score: 1

      Fraunhofer Labs in Germany, the same laboratory that invented the MP3 compression a few years ago, has been working on such a technology for quite a while.

      Google for it and i'm sure you'll find some interesting articles.

      --
      -mz
    2. Re:What we need now is by akuma(x86) · · Score: 1

      Check out Shazam. They offer a unique service that does something similar to this.

    3. Re:What we need now is by Kashif+Shaikh · · Score: 2

      Al Bundy tried this with a record-store jockey and failed. You think a computer is going to be better?

  23. I can see the day by Anonymous Coward · · Score: 0

    I can see it now...

    meta HTTP-EQUIV="Keywords" CONTENT="slashdot"
    meta NAME="Description" CONTENT="News for Nerds. Studd that matters"
    meta NAME="Voice" CONTENT="/slashdot.wav"

    Seriously though, this would be great, except for when the web developer is from the deep south and the .wav to index the page is encrypted in 'southern talk'

  24. Wow, cool idea by Drakonian · · Score: 3, Insightful
    I've always thought that audio/video is one huge information bank that has never been easily accessed. If you know of something textual, you go to Google to find it. But what if you wanted to read a Steve Jobs keynote from a couple years back? It's not particularly likely that anyone transcribed it for you. The video stream is probably long gone. But with this technology, you can have a searchable record of that fairly easily. Brilliant stuff.

    Someone mentioned it can be used by the government for TIA stuff - agreed, but same with any technology. It has its positive and negative uses. I don't think we are all going to revert to cavemen to get away from it.

    --
    Random is the New Order.
    1. Re:Wow, cool idea by xyzzy · · Score: 2

      One thing not obvious from the InfoWorld article is that the FastTalk system does NOT leave you with a transcribed version of the speech. It is *SEARCH ONLY* -- meaning, it can tell you that the word "frobnitz" was spoken at 1'30 into a particular article, and then you have to go and listen.

      This is of somewhat limited utility. Perhaps if you had 2000 hours of John Gotti talking about lasagne, and one minute where he was talking about rubbing out Sammy the Bull, you could speed forward to find that bit. But if he talks about Sammy a whole lot, it's going to be easier to *read* and skim than listen.

    2. Re:Wow, cool idea by Kashif+Shaikh · · Score: 2

      I've always thought that audio/video is one huge information bank that has never been easily accessed.

      I consider audio/video data "fuzzy" -- there is no clear cut method of intrepreting such data. Here's a real world example: tell a computer to determine if a movie is pornographic or better still find out if a picture is showing a vagina, penis, etc.

  25. The simpsons by DonniKatz · · Score: 1

    If you had a database big enough, you could almost make your own audio episode of any tv/radio show there is out there. With the right equipment, you could have a celebrity say anything, anything at all you want. This sort of program was coming, but with it brings cool and at the same time terrifying feelings from me...

  26. Worse yet by oO0OoO0Oo · · Score: 1

    Forget rap and reggae lyrics. How about technical terms using letters and numbers, especially since they often are only intended to have one pronunciation of several possibilities (See Roman Numerals, MAC OSX(="ten"), etc.). I'm wondering how it would deal with those.

    --
    We Are Familiar With Elephants By Virtue Of Their Size.
    1. Re:Worse yet by Henry+V+.009 · · Score: 2

      You search by pronounciation as well as store by it. The Census Office has stored names by this method for more than a century.

    2. Re:Worse yet by Anonymous Coward · · Score: 0

      You gave an example: See Roman Numerals, MAC OSX(="ten"), etc.)
      I can see the consequences of one of these misinterpreted conversations;
      What you said: "Yo! I've got MAC OSX. My machine is LOADED. Fsk that blue screen of death!!"
      They interpret it as: "Yo! I've got a Mac-10. My machine gun is loaded! F$#k the blues'll be screaming for death!!"

      Yikes!

  27. Audio search by PurpleBob · · Score: 2

    There's one kind of "audio search" I'd really like to see: searches for a song by tune.

    I've seen a couple of web sites which offer tune searches, but they all work on the index system used in fake-books: start from the first note, and then from there, say whether the next note is higher, lower, or the same. But this system has problems: a reasonably short search will match a whole lot of songs; it's often hard to tell whether certain extra notes are considered part of the tune; and some songs have an obscure beginning and an easily recognizable theme farther in, and you don't know which one is indexed.

    These sites have also tended to only index very well-known tunes - usually, folk songs, show tunes, and a few jazz standards.

    One site allows you to send them a recording of you whistling the tune, which seems like an improvement, but it actually just translates it into the up-down-repeat notation.

    My ideal music search would be something that would take large quantities of music (let's ignore for the moment where it gets the large quantities of music without pissing off the RIAA) and scan each song for prominent tunes. You could then search these with perhaps the up-down-repeat notation, but also by inputting music notation, for people who know it. The search would have to be key-insensitive, and allow fairly fuzzy matches.

    If it could give me the name of that pop song/jazz tune/classical piece I just heard on the radio, it'd be pretty good.

    But if it works really well, it'd be a blessing for music composers - they could just search for that tune that just popped into their head, instead of worrying over whether they're subconsciously ripping off another song.

    --
    Win dain a lotica, en vai tu ri silota
    1. Re:Audio search by Trusty+Penfold · · Score: 1

      If it could give me the name of that pop song/jazz tune/classical piece I just heard on the radio

      It would be better if the radio station just transmitted the name and other relevent information along with the tune itself. Digital radio must be able to do this, surely? And if not, why not?

    2. Re:Audio search by teamhasnoi · · Score: 2
      I thnk that would be a nightmare for composers - too many are worried that they are subconciously ripping off a song.

      I can just see composers checking the database everytime they 'hear' something familiar, and never ignoring the fact that 'I just ripped off a Elvis Costello chorus. Oh, well. It worked for him.' Just write the damn song; if it is good it will stand on it's own merits. Everything old is new again! The public has a ever-shorter memory, and when you got just 12 notes, you like it that way.

      Besides, every musican has a friend who will pipe up and say, "That sounds just like that Archies song!" Don't be afraid to rip off a couple of notes, otherwise Dylan and the Beatles own the world.

      Bad artists borrow, good artists steal. (Thanks to whoever this sig belongs to..;)

  28. Where have you been? by uberstool · · Score: 2

    We have three > 300,000 sqft underground facilities loaded with rows of 2U rack systems with eight 120gig hard drives in each. Every phone call you have had since 1994 is now stored in this massive datasystem. Transferring all the old calls from tapes to the new system was probably the most tedious job I have ever done.

    1. Re:Where have you been? by corsec67 · · Score: 1

      Can we access these archives? I think that it would be pretty cool to hear some of my conversations from 1994. Or is this archive just for the government to search when it wants to?

      --
      If I have nothing to hide, don't search me
    2. Re:Where have you been? by Anonymous Coward · · Score: 0

      Maybe you didn't notice that the clown is just trolling.

      The hints were: For one there are much more efficient ways to store bulk audio data (ones that would require much less space, money, and new technologies such as 120GB disks, and ones that would actually work when 'loaded with rows' of racks in underground facilities without overheating the whole facility itself)...

  29. Refined twist on an old idea by pongo000 · · Score: 3, Informative

    Soundex, which uses the way words sound rather than the way they are spelled, has been widely used by the government and genealogy researchers for the past 60 years. This isn't exactly "new" technology.

    Why are more and more /. articles starting to sound like corporate press releases?

    1. Re:Refined twist on an old idea by Anonymous Coward · · Score: 0

      probably because in a capitalist society, innovation usually comes from those who can make money off it. i don't care who does this stuff, i'm just glad it's being done.

  30. So... by dacarr · · Score: 5, Funny

    Does it recognize speech, or does it wreck a nice peach?

    --
    This sig no verb.
  31. That's called query-by-humming by V.P. · · Score: 2, Interesting

    and there are several research prototypes that can do it (check out a paper from Cornell, or just google for query-by-humming)

  32. old technique by g4dget · · Score: 2

    Direct search using various phonetic representations has been around for many years. All things being equal, it's known to be somewhat better than searching the output of speech recognizer using approximate string matching. But you have to weigh against that that both speech recognition and approximate string matching are being pushed much harder than this kind of search, so you may end up getting better performance using speech recognition and string searching anyway.

  33. Streamsage has this technology by kornack · · Score: 1

    The folks down at Streamsage have been working on this for a while now. They are working on an index of NPR, last I heard. They do video and audio; a search retrieves the relevant clips of video. It works really well, apparently. This will be a fantastic boon for universities who have all kinds of lectures on video with no way of knowing where to find the information a student needs.

  34. InfoWorld article by prostoalex · · Score: 3, Informative
    The article is not yet available on the InfoWorld web site

    Actually it is. InfoWorld: The Power of Voice.

  35. It's raining in Prince Rupert by Anonymous Coward · · Score: 0, Offtopic

    Mod me up boys, it's raining in Prince Rupert.

    Fuck now what are we going to do?

  36. Physical privacy? by Anonymous Coward · · Score: 0

    Could you physically protect conversations by some means? Maybe talking with a little encoder around your mouth? /Half-sarcasm

  37. Why not use... by B.J.+Blazkowicz · · Score: 0

    A speech synthetizer (http://slashdot.org/article.pl?sid=02/12/22/14342 40&mode=thread&tid=106) in order to do a Full-Audio text search?

    I'm sure someone could do it IN SOVIET RUSSIA

  38. I looked into this recently by RebornData · · Score: 4, Informative

    There are a few papers available for download from their website, but you have to register. Basically, traditional voice recognition parses the audio stream into some meta-form, usually representing phonemes (the low-level "atomic" sounds that your speech consists of). These phonemes are then matched against a dictionary of known words (and the phonemes they consist of) and text is produced.

    Because phoneme recognition is not particularly accurate (for example, it's hard to tell the difference between "hard d" as in "Dan" and "hard b" as in "Ban" over a noisy phone line), traditional speech to text systems use several approaches to improve accuracy. One is to improve the accuracy of the basic phoneme recognition by "training" it for a specific voice. Another is to use all sorts of hairy-language-specific grammar / syntax algorithms.

    Computationally, it's the matching of the phonemes against the dictionary that's the most difficult, and the larger the dictionary, the less accurate and more CPU-chomping it becomes. In addition, searching the resulting text for specific matches grows less accurate as the search string increases in length, due to the likelihood of a transcription errors.

    The cool thing that Fast Talk has done is to store and index the phoneme meta-data, rather than complete the recognition to text. When you enter search words, they break the search string into phonemes and look for matches that way. This has several positive benefits:

    1. Computational resources are dramatically lessened, since the "phoneme recognition algorithms" are fast and there's no dictionary matching.
    2. The matching doesn't depend on having the right words in the dictionary at input time. It works just as well for unusual proper names and technical jargon as it does for common words, since they're all formed from the same basic phonemes.
    3. The longer the search string, the greater probability of an accurate match.
    4. No need for accurate search string spelling. It doesn't matter if you know how to spell a word, as long as you can write it down phonetically.

    In theory, the system should work for any language, but reality is that different languages do have different sets of phonemes, and I think Fast Talk has only really worked on English. So languages like Spanish that are fairly similar phonetically to English would probably work pretty well, but tonal languages like Mandarin Chinese or those with non-vocal sounds like the clicks and pops of the African Bushmen would require a rework of the phoneme recognition code.

    The main downside of their system is that it doesn't actually produce text... which means that you'd need another speech-to-text system if you wanted transcripts, or want the data to be searchable with whatever standard text-based search engine you are using on your intranet. But they appear to be aiming at applications where that's not necessary. One of my favorite ideas is integrating it with a video editing suite and being able to jump to different cues in your video clip library simply by stating the dialogue that's found there.

    Of course, one of the most obvious applications is for intelligence and security. So far it doesn't appear that the company is pushing too hard in that direction -- it was founded by an academic group that originally developed the technology for a library project at Georgia Tech. However, I'm betting that's where the real money is, and it's only a matter of time before their ideas are found in your favorite national department of big-brotherhood.

    -R

  39. Actually, by voodoo1man · · Score: 2, Informative
    phonemes are abstract groupings of phones (the most basic units of human-made sound) that are differentiated from each other by the environment they occur in. Allophone (grouping of phones in a phoneme) distribution and even the basic phone set differ from language to language and even dialect to dialect rather significantly, and so this system would actually be pretty inefficient as a replacement for regular text searches (where, for the most part, spelling is pretty standardized across dialects). It is faster and more accurate because it bypasses text recognition, but existing text would have to be converted to phone representation with several dialects/pronounciations stored, or existing audio is going to have to undergo text/word recognition and the different pronounciations generated, for it to be used in a general purpose search engine.

    And this doesn't even begin to deal with "Engrish" speakers =]

    --

    In the great CONS chain of life, you can either be the CAR or be in the CDR.

  40. Digitised phone convesations by jbanana · · Score: 1

    My understanding is that most phone calls are digitised already. The connections between exchanges are (mostly) fibre-optic, so the traffic is digital. I can't comment on who's listening in. Surely that doesn't happen without due cause and the appropriate courtroom approvals?

    </wide-eyed innocence>

    Ironically, I'm posting this over a dial-up which modulates my digital data to analogue. The signal is then digitally encoded at the telephone exchange, with the whole process being reversed as the signal reaches my ISP.

    1. Re:Digitised phone convesations by Grapes4Buddha · · Score: 1
      Perhaps I should have been more explicit for the overly analytical slashdot crowd. I meant digitized and archived.

      Sheesh.

  41. there would be many uses for this technology...... by maharg · · Score: 1

    1 - real-time key-word alerting - i.e. having software listen out for key-words on an audio/visual source, and alert someone appropriately when they have detected. TIA, anyone ?

    2 - data retrieval - a phonetic query language - cool !!!

    3 - pr0n video spider - just listen out for lots of Ooohs, Aaahs, and the like (sorry, could not resist ;o))

    --

    $ strings FTP.EXE | grep Copyright
    @(#) Copyright (c) 1983 The Regents of the University of California.
  42. Sound search? by AyeRoxor! · · Score: 2

    How is this different from soundex? For decades, databases of names have been stored in soundex. If your driver's license number begins with letter-number-number-number, it is probably soundex. If you have done any ancestry searching, as I have, you have encountered soundex; this way, if you search for John Smith in 1732, you will find records for Juan Smyth, Jon Smythe, John Smitt, etc.

    The benefits of having actual sound? If it's just going to use a soundex-type formula in the core functioning, the sound would just be a gimmick, and a storage-taking one at that. Sure, compression has gotten amazing, but will the sound of Smith really take anything near the same 4 bytes as "S720" ??

  43. CallMiner by NoSlack913 · · Score: 1

    Fast-Talk assumes you know the word you are looking for before you search, which is not super useful, except for google style searches, also it is extraordinarily slow (only searches about 10 hours of audio a second, think about how slow that is for say a year of NPR). Check out CallMiner for a much cooler use of speech to text technology. CallMiner uses trending to find trends in large volumes of calls. a real business use.

  44. Re:My F FP by Anonymous Coward · · Score: 0

    Congratulations on your first post. I am former Nigerian dictator Mugabe, and I am happy to finally have met such a dedicated (and obviously quite successful) individual as you are.

    My need is as follows - I am currently seeking to transfer $50,000,000.00 (FIFTY MILLION US DOLLARS AND ZERO US CENTS) to a dedicated account in the US. My technical advisor pointed to Slashdot first posters being an extremely capable and trustworthy group of individuals capable of coping with such task for a measly 25% commission. After all, the advisor said, how many people DO YOU know that can boast a successful first post? Not too many.

    Please leave your coordinates, pictures and bank account numbers as a reply to my post. The money is already waiting on this side.

  45. Re:FP by Anonymous Coward · · Score: 0

    I am former Nigerian dictator Mugabe and I am currently seeking to transfer $50,000,000.00 (FIFTY MILLION US DOLLARS AND ZERO US CENTS) to a dedicated account in the US.

    But since you are not the first poster, NO MONEY FOR YOU. Keep trying.

  46. compaq has such a tool by Anonymous Coward · · Score: 0

    www.speechbot.compaq.com

  47. potential for medical applications is exciting by ColMstrd · · Score: 3, Interesting

    As 90% of the data for diagnosis comes from the history-taking (interviewing) the patient, the potential for automating/supporting diagnosis is exciting.

    Imagine a system that listens to a consultation in real time, making helpful suggestions for diagnosis based on analysis of the patient and the doctor's phoneme streams! And no tedious data entry, just an unobtrusive microphone.

    I've been waiting for this.

    --
    You can never eat too much, only cycle too little.
  48. wordspotting by GoBears · · Score: 2, Insightful

    The basic idea of using audio similarity to "grep" short sounds out of audio streams (as opposed to using ASR and text-matching) is quite old - some classic papers based on dynamic timewarping date back to 1977, and HMMs became popular for this application about ten years after that. Papers on this kind of thing appear in conferences like ICASSP - look for keywords like "keyword spotting" or "wordspotting." The phone company wanted to do this for obvious reasons.

    Note that I'm not saying the GATech technology used by this company is derivative - I haven't looked at the specifics of this approach.

  49. Echelon by Esion+Modnar · · Score: 1

    Is probably already using something like it.

    --

    They say the first thing to go is your penis. Well, it's either that or your brain. I forget which...
  50. No you didn't! by Anonymous Coward · · Score: 0

    That is the most ridiculous claim I have ever heard. It is impossible to do.

    1. Re:No you didn't! by Moonshadow · · Score: 2
      That is the most ridiculous claim I have ever heard. It is impossible to do.

      Wanna bet?

  51. Linux Kernel Radio Broadcast by Esion+Modnar · · Score: 0, Redundant

    Now I'll be able to search for particular lines of code.

    --

    They say the first thing to go is your penis. Well, it's either that or your brain. I forget which...
  52. google sound search hypothetical example: by Anonymous Coward · · Score: 0

    "beep +techno" .... Results 1-30 of 27596432189831415926535 displayed, search took 0.91 seconds.

  53. Terrorist by Esion+Modnar · · Score: 2, Funny

    Really, when would a terrorist's conversation actually include the word "terrorist" ? Maybe he would say something like, "Hey Abdul, we need another terrorist in on this bombing." Or maybe: "Terrorist Jim, meet terrorist Mike."

    --

    They say the first thing to go is your penis. Well, it's either that or your brain. I forget which...
  54. The trench oily guy feels great by xixax · · Score: 3, Interesting

    Italys nose probes them et al. fingering our hose
    the fee Cult to longer stained syrups and Hussein marmot pervert sucks eggs rat. Intact, eye amusing into dick tape his pest of flash snot.

    - - -

    It has no problems at all firguring out those difficult to understand lyrics and has an almost perfect success rate. In fact, I am using it to dictate this post to Slashdot.

    --
    "Everything is adjustable, provided you have the right tools"
    1. Re:The trench oily guy feels great by Anonymous Coward · · Score: 0

      This is the best post I've seen in a long time.

    2. Re:The trench oily guy feels great by Anonymous Coward · · Score: 0

      What was that?? Seriously who is the rapper?

  55. Shhhhhh, don't bother me. I'm "grepping". . . . by kfg · · Score: 1

    Abbey Road backwards.

    KFG

  56. reading, writing... audio what? by ubiquitin · · Score: 2

    After reading a posting that someone had probably typed in silence and submitted to slashdot, I posted this reply silently and now you're reading it. Chances are you aren't reading this out loud. Nobody said a word, or even had to hear one. Reading can be an astoundingly efficient way to transfer information.

    I have yet to meet anyone in good health who prefers getting ten voice mails over ten emails.

    What the world needs is fewer karma whores and more good friends.
    Go ahead, friend. :) Click that white button and turn it green.

    --
    http://tinyurl.com/4ny52
  57. Just looking at kazaa right now by danny256 · · Score: 2

    there are 4 petabytes of files being shared, a search through 150 terabytes dosn't seem so big.

    1. Re:Just looking at kazaa right now by restauff · · Score: 1

      Granted, you can go on KaZaA and search through 4 petabytes of data fairly quickly, but keep in mind, a large portion of that data is redundant, and the data is not what you are searching. In the situation of TIA, the software would have to analyze every portion of the 150 tb of data to perform a search, whereas KaZaA has a set of meta information for each file. For example, for a 1 GB copy of the Matrix (not that anyone would have an illegal copy of a movie on KaZaA), there is probably 1 kb at most of data to search.

  58. Old News by annodomini · · Score: 2, Interesting

    I saw a demonstration essentially the same technology at Compaq's CRL about two and a half years ago (formerly DEC CRL, or Cambridge Research Lab, the guys who did research for AltaVista). It did exactly the same thing. It broke sound files down into phonemes, then searched based on the phoneme. It was mostly used for finding a clip on the web rather than a specific place in a long file, but it was the same idea. The nice thing was that it was OK for its application if it missed once or twice. If the audio file was relevant, the word or phrase was probably used multiple times in the clip. It was pretty good at finding NPR stories about certain events. In fact, you can try it out for yourself at an online demo.

  59. Melody search by axxackall · · Score: 2
    Maybe there will even be an 'Audio' tab on Google.

    I think there should be three tabs instead of one 'Audio' one:

    • Speach - to search a text pattern in recognized speech;
    • Melody - to search among music records using querying by accords or by notes;
    • Noise - to search by comparing to audio fragment/pattern;
    --

    Less is more !
  60. Binladen by edox. · · Score: 0

    http://audio.google.com/search?hl=en&ie=UTF-8&oe=U TF-8&q=binladen+ahh+pains

    Buffer overflow!!
    [hmm!]

    --
    quote:port 17 udp
  61. Soundex by gr8_phk · · Score: 1
    Possibly more effective would be to convert to Soundex. This is an old technique where similar phonemes are assigned the same "soundex" code. It can be used for comparing strings where you may not know the correct spelling. I suspect it could help in cases where the audio-to-phoneme conversion isn't quite perfect too.

    Paul

  62. google wouldnt index conversations well by Anonymous Coward · · Score: 0

    there isnt any linking structure

  63. Last Post! by alpg · · Score: 1

    The wise programmer is told about the Tao and follows it. The average
    programmer is told about the Tao and searches for it. The foolish programmer
    is told about the Tao and laughs at it. If it were not for laughter, there
    would be no Tao.
    The highest sounds are the hardest to hear. Going forward is a way to
    retreat. Greater talent shows itself late in life. Even a perfect program
    still has bugs.
    -- Geoffrey James, "The Tao of Programming"

    - this post brought to you by the Automated Last Post Generator...