Slashdot Mirror


Online Speech Indexing

Thomas Edwards from The Sync (where we host Geeks in Space) sent us an interesting site: "Speechbot" is a Compaq Research project that is indexing online radio shows. Apparently it found terms like 'Red Hat' and 'Yahoo' in past episodes of GiS. Interesting technology. Imagine when it lets me ask my TV to find me every show that mentions Sarah Michelle Gellar.

28 of 87 comments (clear)

  1. Re:Impressive...? by xyzzy · · Score: 3

    Ohoh, I've stirred up a firestorm here :-)

    Re: SQL -- it can be any SQL server, really. However, I will add that we are somewhat in bed with Microsoft on the visualization end, simply because IE5 does XML quite well (note to Mozilla people: get with the program).

    Re: Open source. Unfortunately, not up to me. Much of the technology is "open source" in the sense that papers have been published about it (not what you were looking for, I know), but we've already licensed some of the core technology to another company, and being a phone company (GTE) we consider the speech rec somewhat of a competitive advantage (wipe those Echelon thoughts out of your mind! We use it for call center and directory assistance automation! Sheesh :-)

    As I posted probably about 6 months ago in a thread about speech recognition, there are some significant issues with open-sourcing beyond the recognizer code. The learning processes behind the recognition are based on a considerable amount of data for which licensing is an issue, such as CNN broadcasts. In fact, we use over 100 hours of broadcast news audio to train the system, and several million words of text for the language model. This comes to us through the Linguistic Data Consortium at the University of Pennsylvania (http://www.ldc.upenn.edu). This is an academic group set up to maintain these common train-and-test databases for researchers, and there's a fairly sizeable fee to join. They handle the intellectual property issues with the training data.

    And, unfortunately, without the training data, it's kind of hard to use the system. At least, if you want to use it on something it's not already trained on (in our case: north american broadcast news).

  2. The Remarkable Media Search by GoRK · · Score: 3

    What is particularly interesting to note is that the quality of these Internet Raido shows is generally fairly poor. The voice recognition and dictation software that I have toyed with before have always suggested using better microphones and higher sampling rates to achieve decent results. Some even claimed that low quality audio results in a severe accuracy penalty.

    It is very remarkable that this thing can index these low quality streams with the accuracy that they do! I hope that searchable media (other than text) continues to get better like this. Companies like Virage and Compaq definately deserve our support. I hope that standard interfaces appear soon.

    ~GoRK

  3. No linux? ESR is a duck? by Otto · · Score: 3

    It found no instances of the word Linux, which I found humorous.

    However, a little brain usage, search for "line" and get this:

    ... there an a to think you're doing is making good news slash my next monday's announcement makes it you can use less leonard still want to which it's tilman of the of the open sores movement I have not part of the open sewers that but why in part of the priests out their foundations giving his last line next to the flashlight next with you a while and we can end of the various duckling and in the he's serving snacks that promptly opening the top of that there is god who will bomb and the crowd is bernie this is definitely the most exciting play a thing would have to have one of I mean you for a column about how ...

    The words "end of the various duckling" and around there are in fact "Eric Raymond" in the clip, which I thought utterly hysterical. You can tell because they say "it's Eric Raymond, and he's serving snacks," which partway comes out correct.

    Linux seems to have came out as "line next" a lot, and "line of" in some clips I've found..

    Obviously, the technology is not quite there yet. :-)

    ---

    --
    - Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
  4. The possiblities by TheFitz · · Score: 2

    Just think about it, add a regexp to this...you can get Bill gates saying things like: "Microsoft Windows dominates the market due to our huge inovation." and apply a quick couple regexp's (excuse me if they are long code, not really trying)

    $gatespeak =~ s/Microsoft/Micro\$oft/g;
    $gatespeak =~ s/Windows/Winbloze/g;
    $gatespeak =~ s/dominates the market/controls your lives/g;
    $gatespeak =~ s/inovation/stealing and strongarm tactics/g;
    To get this: "Micro$oft Winbloze controls your lives due to our huge stealing and strongarm tactics". Wow, you can actually get Billy boy to speak the TRUTH!

    --
    "Out, OUT! You demons of STUPIDITY!" - Dogbert
  5. Re:Processing power and time? by anl · · Score: 3

    The press release has a little more information. We use workstations running NT to spider the sites; processing is done on a farm of Linux servers, and the UI runs on AlphaServer DS20 machines.

  6. Speech recognition worries by Anonymous Coward · · Score: 2

    "Note: Indexed text does not match audio exactly." And you wonder what kind of technology the NSA has listening to us all right now?

  7. Echelon, anyone? by kevin805 · · Score: 2

    The cryptography community usually believes they are a couple years behind the NSA, given that the NSA reads all their papers, but doesn't publish its own work.

    I had been skeptical of Echelon being able to do word recognition on phone conversations, but I expect that the NSA is ahead of private industry in this area too, so Echolon looks plausible.

    --Kevin

  8. Echelon!!! by um...+Lucas · · Score: 2

    Based on everyones assumptions around here, this would peg the NSA as having that capability since 1990 or so (just to pick a round number)... And it only came to light this year.

    oh, and first post too... maybe

    1. Re:Echelon!!! by jd · · Score: 2
      At the very latest, I'd say. The late 1980's sound more likely, but (as you say), that's a convenient round number.

      This would also be about the time UK piliticians were banned from Menworth Hill, an NSA listening post in the UK, which would have been a likely candidate for early deployment of such technology.

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  9. Re:searches the whole transcript by Myddrin · · Score: 2

    However, it can do "exact phrase" searches which is almost as good.

    For example, searching on Black 47 returns 2,000+ hits when using the default search, but 0 when using an exact phrase....

    Linux OTOH returns only two matches... sigh. Actually I wonder how much that has to do with the confusion over the pronunciation(sp?), considering I've never met two people who say it with the same exact phonetics....

    Oh well,
    RobK

    --
    Myddrin
  10. Comment removed by account_deleted · · Score: 2

    Comment removed based on user account deletion

  11. Dragon Systems by Col.+Klink+(retired) · · Score: 3

    Dragon Systems (makers of Naturally Speaking continuous VR) announced a similar product at Comdex. They call it audiomining.

    --

    -- Don't Tase me, bro!

  12. Comment removed by account_deleted · · Score: 2

    Comment removed based on user account deletion

  13. Yummy by freakho · · Score: 2

    Not only for the SMG thing, but also imagine the possibilities when applied to C-SPAN. Now you don't have to listen to hours of mind-numbing, coma-inducing boredom or hope and pray that the media will deign to bring a certain issue out of the Washinton black hole in order to find out about your favorite target of litigation. Like, say, the one you're reading.

  14. speechbot transcript by technos · · Score: 2

    a hundred twenty gates at a note let me give you every moment in your life right we've got we've got that word the key guess like thirty five to defend the trees now we don't like pancakes free

    Methinks a lexical generator produces better speech than the triumvirate of ./

    --
    .sig: Now legally binding!
  15. Another audio indexing system by xyzzy · · Score: 3

    Might as well use this as a chance to plug my project:

    http://www.gte.com/AboutGTE/gto/bbnt/speech/rese arch/extraction/roughn_ready/index.html

    ...which not only tells you what words were said, but who said them, and what topics were being talked about...

  16. Hmm... Mars murder ritual rides? by Croaker · · Score: 3

    Did a search for "Mars Probe" in the Science Friday show, and got this snippet:

    .. of deep space walk which show the first I am to arrive and interplanetary space another mars or murder ritual rides the september twenty third of mars lander which lands on...

    Err... yeah. That would explain a great many things about space probes. Actually, I'm sure the textified show would be a lot more interesting than the real show. And then, we could shove it through Babelfish for added enjoyment...

    I recently installed the ViaVoice beta for Linux, and found its recognition not quite ready for prime time... at least for my needs. I'd be surprised if radio shows, which often have people on fairly crummy phone connections, would be an ideal candidate for automated indexing.

  17. check out this logic by / · · Score: 3

    "I want to die" turns up 6
    "Grits" turns up 12
    "Sex with animals" turns up 5.
    "Your mother" turns up 200.

    My conclusion: "Your mother is still almost ten times as important as suicide, sex with animals, and grits combined."

    Remember that, always.

    --
    "If one is really a superior person, the fact is likely to leak out without too much assistance" -- John Andrew Holmes
  18. Reads like bad poetry. by Dast · · Score: 2

    Check out the results of "Show me more". The transcript reads like bad poetry:

    settling rata at and legs
    to the team network concept
    fighting ends
    the single monolithic entity
    of those are all things
    that the challenger are undermined
    neither side I think
    what I look at the to the the marcie katz
    the respective committees

    (I didn't change anything but adding line breaks.)

    Good for a chuckle. :)

    --

    This sig is false.

    1. Re:Reads like bad poetry. by dr · · Score: 2
      The transcript reads like bad poetry:

      The site's FAQ admits to that (in not so many words)...

      Warning: The "transcript" that is output by the speech recognition software (and shown in small extracts on the Results and Details pages) rarely matches what was spoken exactly, and often often does not read very well. Because different people speak at different rates and with different degrees of clarity, speech recognition software does not correctly interpret every word. However, research has shown that meaningful words are recognized with a high degree of accuracy, and that even when a word is missed, it will most likely be recognized when it is spoken somewhere else in the program.

      And in all fairness, they are not claiming to be a "transcript service" per se, though I can certainly see a lot of transcript writers losing their jobs in the future as the technology advances.
      -dr

    2. Re:Reads like bad poetry. by gorilla · · Score: 2

      What would be interesting would be to link up to a machine translation (such as babelfish), and then finally text to speech.

  19. Buffy by quadong · · Score: 2

    The vampire Slayer.

    Mind you, I didnt know this until I did a web search for her.

  20. babelfish by quadong · · Score: 3

    Someone should take these pseudo-transcripts and run them thru babelfish. Think of the gibberish level we could achive!

  21. spooky by ubermuffin · · Score: 2

    Speech recognition is really neat and stands to greatly improve indexing and organization of non-text media. It looks like this is a pretty cool application of it, too.

    That being said, let me say that something like this scares the crap out of me. This sort of technology is exactly what the FBI had in mind when it began to pressure telecommunications companies to make their phone lines more tappable. Now I don't remember the exact figure, but they wanted something like 1% of the phones in any metropolitan area tappable at once. 1% of the phones in New York City is something on the order of 50,000 phones. Tell me how you're going to keep track of all of that without a computer monitoring 50,000 conversations and looking for key words. You can't.

    Monitor 1% of the population's conversations for some suspect keywords like 'bomb', 'assasinate', 'cocaine' or perhaps 'open source' and you've got one scary computer-assisted big brother watching over everything. If you don't hear anything juicy, shift to another 1%. I suppose people have had the technology to do realtime speech recognition and filtering for some time now, but the idea of maintaining searchable archives of phone conversations (enter Speechbot) is a genuinely spooky privacy violation.

    Now, any technology is only as good or as evil as the people who use it. I will be cautiously interested to watch what Speechbot evolves into.

  22. Processing power and time? by dr · · Score: 2
    The FAQ is incredibly vague and the About page doesn't say much either in terms of the actual technology used. It says that they index 20 shows and index daily. Does anyone know what the time to actually do an index is and what kind of processing power these guys are using?

    On an un-related note, the about page says that Compaq has a research lab in Australia... sweet.
    -dr

  23. Nice... by Goner · · Score: 2

    \rant{But does where is realaudio at? The company itself is worse (in its smaller domain) than microsoft, I mean their version numbering (5,G2,7) is absurd, their website pushes you to download the plus version of their player (ie the one you have to pay for), and their monopoly on video (and most of the sound) on the web lets them get away with it. I believe they have an ok product, but their marketing schemes are stuck in mid-nineties "pay-for-this-better-version-now-even-though-a-bet ter-free-one-will-be-out-in-three-months ."

    Things like pointcast have died due to this type of scheme, but real is still staying strong. The linux (unix) install scenario, and html documentation is absurd as well, and to be honest the reason for this rant. We'll just have to wait until some sort of disruptive technology forces real to compete, instead of stagnate.}

    As far as the implications of this technology, echelon, etc. I just can't wait until I can do boolean searches through my old phone calls. Not like they're listening anyway...

    -Rich

  24. Some interesting bytes. by kaniff · · Score: 3

    When the guys introduce themselves, the translator has a fun time with their names and nick names.

    Rob "CmdrTaco" Malda -- rob commander topple mall
    Jeff "Hemos" Bates -- jeff in both states
    Nate Ostendorf -- the husband or the smoke

    I also searched for linux and I'll bet that it can't find any instances, because it doesn't translate it right. With all the different pronounciation possibilities.

    It's a cool idea, but has a ways to go. Go Compaq.
    yay.

  25. filtering out celine dion by Travoltus · · Score: 2


    I'm gonna patent that.

    That'll make me rich as f**k!

    --
    --- Grow a pair, liberals... stop letting the Republicans bully you!