Online Speech Indexing
Thomas Edwards from The Sync (where we host
Geeks in Space) sent us an
interesting site:
"Speechbot" is a Compaq Research project that is indexing online radio shows. Apparently it found terms like 'Red Hat' and 'Yahoo' in past episodes of GiS. Interesting technology. Imagine when it lets me ask my TV to find me every show that mentions Sarah Michelle Gellar.
Ohoh, I've stirred up a firestorm here :-)
:-)
Re: SQL -- it can be any SQL server, really. However, I will add that we are somewhat in bed with Microsoft on the visualization end, simply because IE5 does XML quite well (note to Mozilla people: get with the program).
Re: Open source. Unfortunately, not up to me. Much of the technology is "open source" in the sense that papers have been published about it (not what you were looking for, I know), but we've already licensed some of the core technology to another company, and being a phone company (GTE) we consider the speech rec somewhat of a competitive advantage (wipe those Echelon thoughts out of your mind! We use it for call center and directory assistance automation! Sheesh
As I posted probably about 6 months ago in a thread about speech recognition, there are some significant issues with open-sourcing beyond the recognizer code. The learning processes behind the recognition are based on a considerable amount of data for which licensing is an issue, such as CNN broadcasts. In fact, we use over 100 hours of broadcast news audio to train the system, and several million words of text for the language model. This comes to us through the Linguistic Data Consortium at the University of Pennsylvania (http://www.ldc.upenn.edu). This is an academic group set up to maintain these common train-and-test databases for researchers, and there's a fairly sizeable fee to join. They handle the intellectual property issues with the training data.
And, unfortunately, without the training data, it's kind of hard to use the system. At least, if you want to use it on something it's not already trained on (in our case: north american broadcast news).
What is particularly interesting to note is that the quality of these Internet Raido shows is generally fairly poor. The voice recognition and dictation software that I have toyed with before have always suggested using better microphones and higher sampling rates to achieve decent results. Some even claimed that low quality audio results in a severe accuracy penalty.
It is very remarkable that this thing can index these low quality streams with the accuracy that they do! I hope that searchable media (other than text) continues to get better like this. Companies like Virage and Compaq definately deserve our support. I hope that standard interfaces appear soon.
~GoRK
It found no instances of the word Linux, which I found humorous.
... there an a to think you're doing is making good news slash my next monday's announcement makes it you can use less leonard still want to which it's tilman of the of the open sores movement I have not part of the open sewers that but why in part of the priests out their foundations giving his last line next to the flashlight next with you a while and we can end of the various duckling and in the he's serving snacks that promptly opening the top of that there is god who will bomb and the crowd is bernie this is definitely the most exciting play a thing would have to have one of I mean you for a column about how ...
:-)
However, a little brain usage, search for "line" and get this:
The words "end of the various duckling" and around there are in fact "Eric Raymond" in the clip, which I thought utterly hysterical. You can tell because they say "it's Eric Raymond, and he's serving snacks," which partway comes out correct.
Linux seems to have came out as "line next" a lot, and "line of" in some clips I've found..
Obviously, the technology is not quite there yet.
---
- Give a man a fire and he's warm for a day, but set him on fire and he's warm for the rest of his life.
Just think about it, add a regexp to this...you can get Bill gates saying things like: "Microsoft Windows dominates the market due to our huge inovation." and apply a quick couple regexp's (excuse me if they are long code, not really trying)
$gatespeak =~ s/Microsoft/Micro\$oft/g;
$gatespeak =~ s/Windows/Winbloze/g;
$gatespeak =~ s/dominates the market/controls your lives/g;
$gatespeak =~ s/inovation/stealing and strongarm tactics/g;
To get this: "Micro$oft Winbloze controls your lives due to our huge stealing and strongarm tactics". Wow, you can actually get Billy boy to speak the TRUTH!
"Out, OUT! You demons of STUPIDITY!" - Dogbert
The press release has a little more information. We use workstations running NT to spider the sites; processing is done on a farm of Linux servers, and the UI runs on AlphaServer DS20 machines.
"Note: Indexed text does not match audio exactly." And you wonder what kind of technology the NSA has listening to us all right now?
The cryptography community usually believes they are a couple years behind the NSA, given that the NSA reads all their papers, but doesn't publish its own work.
I had been skeptical of Echelon being able to do word recognition on phone conversations, but I expect that the NSA is ahead of private industry in this area too, so Echolon looks plausible.
--Kevin
Based on everyones assumptions around here, this would peg the NSA as having that capability since 1990 or so (just to pick a round number)... And it only came to light this year.
oh, and first post too... maybe
However, it can do "exact phrase" searches which is almost as good.
For example, searching on Black 47 returns 2,000+ hits when using the default search, but 0 when using an exact phrase....
Linux OTOH returns only two matches... sigh. Actually I wonder how much that has to do with the confusion over the pronunciation(sp?), considering I've never met two people who say it with the same exact phonetics....
Oh well,
RobK
Myddrin
Comment removed based on user account deletion
Dragon Systems (makers of Naturally Speaking continuous VR) announced a similar product at Comdex. They call it audiomining.
-- Don't Tase me, bro!
Comment removed based on user account deletion
Not only for the SMG thing, but also imagine the possibilities when applied to C-SPAN. Now you don't have to listen to hours of mind-numbing, coma-inducing boredom or hope and pray that the media will deign to bring a certain issue out of the Washinton black hole in order to find out about your favorite target of litigation. Like, say, the one you're reading.
a hundred twenty gates at a note let me give you every moment in your life right we've got we've got that word the key guess like thirty five to defend the trees now we don't like pancakes free
./
Methinks a lexical generator produces better speech than the triumvirate of
.sig: Now legally binding!
Might as well use this as a chance to plug my project:
e arch/extraction/roughn_ready/index.html
http://www.gte.com/AboutGTE/gto/bbnt/speech/res
...which not only tells you what words were said, but who said them, and what topics were being talked about...
Did a search for "Mars Probe" in the Science Friday show, and got this snippet:
Err... yeah. That would explain a great many things about space probes. Actually, I'm sure the textified show would be a lot more interesting than the real show. And then, we could shove it through Babelfish for added enjoyment...
I recently installed the ViaVoice beta for Linux, and found its recognition not quite ready for prime time... at least for my needs. I'd be surprised if radio shows, which often have people on fairly crummy phone connections, would be an ideal candidate for automated indexing.
"I want to die" turns up 6
"Grits" turns up 12
"Sex with animals" turns up 5.
"Your mother" turns up 200.
My conclusion: "Your mother is still almost ten times as important as suicide, sex with animals, and grits combined."
Remember that, always.
"If one is really a superior person, the fact is likely to leak out without too much assistance" -- John Andrew Holmes
Check out the results of "Show me more". The transcript reads like bad poetry:
:)
settling rata at and legs
to the team network concept
fighting ends
the single monolithic entity
of those are all things
that the challenger are undermined
neither side I think
what I look at the to the the marcie katz
the respective committees
(I didn't change anything but adding line breaks.)
Good for a chuckle.
This sig is false.
The vampire Slayer.
Mind you, I didnt know this until I did a web search for her.
Someone should take these pseudo-transcripts and run them thru babelfish. Think of the gibberish level we could achive!
Speech recognition is really neat and stands to greatly improve indexing and organization of non-text media. It looks like this is a pretty cool application of it, too.
That being said, let me say that something like this scares the crap out of me. This sort of technology is exactly what the FBI had in mind when it began to pressure telecommunications companies to make their phone lines more tappable. Now I don't remember the exact figure, but they wanted something like 1% of the phones in any metropolitan area tappable at once. 1% of the phones in New York City is something on the order of 50,000 phones. Tell me how you're going to keep track of all of that without a computer monitoring 50,000 conversations and looking for key words. You can't.
Monitor 1% of the population's conversations for some suspect keywords like 'bomb', 'assasinate', 'cocaine' or perhaps 'open source' and you've got one scary computer-assisted big brother watching over everything. If you don't hear anything juicy, shift to another 1%. I suppose people have had the technology to do realtime speech recognition and filtering for some time now, but the idea of maintaining searchable archives of phone conversations (enter Speechbot) is a genuinely spooky privacy violation.
Now, any technology is only as good or as evil as the people who use it. I will be cautiously interested to watch what Speechbot evolves into.
On an un-related note, the about page says that Compaq has a research lab in Australia... sweet.
-dr
\rant{But does where is realaudio at? The company itself is worse (in its smaller domain) than microsoft, I mean their version numbering (5,G2,7) is absurd, their website pushes you to download the plus version of their player (ie the one you have to pay for), and their monopoly on video (and most of the sound) on the web lets them get away with it. I believe they have an ok product, but their marketing schemes are stuck in mid-nineties "pay-for-this-better-version-now-even-though-a-bet ter-free-one-will-be-out-in-three-months ."
Things like pointcast have died due to this type of scheme, but real is still staying strong. The linux (unix) install scenario, and html documentation is absurd as well, and to be honest the reason for this rant. We'll just have to wait until some sort of disruptive technology forces real to compete, instead of stagnate.}
As far as the implications of this technology, echelon, etc. I just can't wait until I can do boolean searches through my old phone calls. Not like they're listening anyway...
-Rich
When the guys introduce themselves, the translator has a fun time with their names and nick names.
Rob "CmdrTaco" Malda -- rob commander topple mall
Jeff "Hemos" Bates -- jeff in both states
Nate Ostendorf -- the husband or the smoke
I also searched for linux and I'll bet that it can't find any instances, because it doesn't translate it right. With all the different pronounciation possibilities.
It's a cool idea, but has a ways to go. Go Compaq.
yay.
I'm gonna patent that.
That'll make me rich as f**k!
--- Grow a pair, liberals... stop letting the Republicans bully you!