...and I know that the single biggest problem with these systems is not the accuracy of the speech rec software itself. It's actually very, very good. There are two things preventing the resulting systems from being just as good:
1. The integrator: Making speech recognition grammars is hard. You have to design the dialog in a way that makes sense to people, and you have to write grammars that interpret every possible response. People who have absolutely no qualifications are doing it. The technology companies that produce this stuff put linguists and psychologists on it (people with phd's). The systems need to be tuned over time, and the client/integrator is usually too cheap/ignorant to do it right.
2. Cell phones: Nine out of ten cell phone calls I have with other humans usually involve this conversational snippet... "What?! I can't hear you!" How the hell a computer's supposed to understand you is beyond me. And, the cell phone network providers make it even harder on the computer, because of endpointing. "Endpointing" is the process of listening at the beginning of a call, before the conversation begins, to figure out the background noise level. Cell phone carriers do it all the time. Have you ever been on a cell phone call where the line goes silent, but the call hasn't dropped? That's the cell carrier being conservative with its bandwidth. It figures out that no one's speaking, and it transmits nothing until someone speaks up again. Originally, this made people uncomfortable, thinking their calls were dropped. So, cell phone manufacturers starting inserting fake noise on the local loop to mask the endpointing. It's called "comfort noise" (I'm not kidding)
So, how does this bugger speech rec systems? Well, they do their own endpointing, so that they don't attempt to recognize background noise as speech. They hear silence (because of the cell network's endpointing), and they figure it's a really clean line with no background noise. So, when the caller really says something, the speech rec system gets it all at once, noise and all.
Often, if you're having trouble with a speech rec system, and you're on a cell phone, redialing and getting a fresh endpointing will help. And, of course, you're always better off with a land line.
Every time this happens, it's because the system you're talking to is a bridge between two older systems (in place from a merger, acquisition, etc), and they can't get the old systems to talk to each other very well.
1. Free 2. Trivial to configure/run (as easy as debugging in NB) 3. You can specify a subset of the app to profile, and the rest of the app runs at full speed (profilers like JProbe force you to slow down the whole VM). This also makes the output much more readable, since you're not seeing all kinds of statistics from code you're not interested in. 4. Attach/detach to a running application, which requires no special cmd line options when launching. 5. Analysis tools are good at finding memory leaks, based on discerning patterns in allocation/deallocation
Columbus? What was he "correct" about? The earth being round? Everyone knew that then. The myth that there was any common perception in 1492 that the earth was flat was created by Washington Irving in his biography of Columbus, written centuries later. (source: James Leowen, "Everything You've Been Taught is Wrong" -- great book, BTW)
They compare a lot like apples and oranges. ViaVoice is designed for dictation-type applications. No restricted grammar, but you have to train it on your voice.
Sphinx is designed for "directed dialogs". The computer asks you a question. The computer has a set of expected responses. The thing you say had better be one of those.
This is not as useless as it sounds. Directed dialog is used quite widely in the call center business, because "what city would you like to fly to?" is a lot better than "press 1 for boston, press 2 for new york, press 3 for los angeles, press 4 for..."
The stuff they're open sourcing is all pretty minor. They're not releasing their speech recognition engine. Just some tools for creating voicexml content that _runs_ on the engine. You still get charged a per-port license when deploying the application, as well as professional services involved in helping you create a good app.
And by "you", I mean large enterprise call center customers. This software has just about zero usefulness for average joe hacker (though it is fun to play with).
Oh, and the tools they _are_ open sourcing represent their state of the art... about 3 years ago.
Calm down, dude. They're not open sourcing their engine, just some developer tools.
Oh, and this is purely stuff for telehony-based speech recognition. As in over-the-phone systems to replace "press 1 for this, press 2 for that" with "say the name of the movie you'd like to see".
The deployed systems are still subject to licensing fees, though.
With IBM's new donation, you could build a peice of consumer hardware that plugs into a wall socket & a phone line and runs your voice applications over the phone.
You could build 10,000 boxes and sell them around the world without any licensing fees.
That is somewhat different from a solution developed with Microsoft Speech Server 2004.
Afraid not. IBM is open sourcing 2 things, neither of which is their speech recognition engine. One is just a JSP library, with some tags for generating voicexml for dates, times, currency grammars, etc. The other is some tools code for eclipse. Modules for editing voicexml, ccxml, grxml, etc. A fancy XML editor. Ho hum.
Wow, this is a step up from what we had at my school. We actually had to make the call ourselves, though the system did have it's own way of helping out:
Device and Network Interfaces PIZZA(7)
NAME
pizza, chinese, food - order take-out food in the WashU area.
SYNOPSIS
569-3463 Dining Express (Delivers food from various local
restaurants)
962-0898 Hunan Wok Chinese Restaurant
862-0009 Cicero's Restaurant (Pizza and other Italian)
726-3030 Domino's Pizza
862-4667 Imo's Pizza
367-7272 Papa John's Pizza
647-4434 Pizza Hut
644-2000 Pointer's Pizza
727-6699 Sub Express
EXAMPLES
To order a Domino's Pizza, dial "726-3030" and answer all questions the phone asks of you.
BUGS
A few of the places are often lazy and late. Don't forget to ask about delivery charges and if they _really_ know how to get here.
The list is incomplete. Send updates to requests@cts.wustl.edu.
I would add to your definition of "class" the concepts of polymorphism and dynamic binding. A class doesn't merely add methods to a struct, but usually implies the ability to override those methods in subclasses, and refer to any number of subclass types with a reference typed to the superclass.
In c++ lingo, a class ain't a class without a vptr.
I'm all in favor of leaving MDI as a selectable option, like it is in NetBeans IDE, for instance.
The code necessary to maintain the ability to switch between MDI and SDI in Netbeans is a beast, and the main reason it "feels slow". The next version of Netbeans has a complete rewrite of the windowing system, and it's MDI-only.
You laugh, but I never realized that my IT-installed NAV was sending out these rediculous bounces until some kindly gentlement replied to one and informed me of it (in a polite way, I might add). I immediately figured out how to configure NAV not to do that anymore, and notified IT, who proceeded to disable the bounces company-wide.
i'm not allowed to play with razor blades...
on
Razor Blade Games?
·
· Score: 1
I must've read the topic wrong. Razor blade games? Reminds me of that old SNL bit with Dan Akroyd and Jane Curtin. Akroyd is a sleazy toy manufacturer pushing Christmas toys like "Bag O' Glass" and "Bag O' Sulfuric Acid".
I don't know where the parent poster got the impression that Sun doesn't accept contributions to NetBeans. I'm a non-Sun contributor.
You wouldn't use Apache, you'd use one of the many OSS Java app servers (Glassfish, JBoss, Resin, Geronimo).
They had to leave something for the huge aftermarket accessory market!
88 mph
Great, now he's got 3 years to go travel the world! Now _that_ would be a well-rounded education.
...and I know that the single biggest problem with these systems is not the accuracy of the speech rec software itself. It's actually very, very good. There are two things preventing the resulting systems from being just as good:
1. The integrator: Making speech recognition grammars is hard. You have to design the dialog in a way that makes sense to people, and you have to write grammars that interpret every possible response. People who have absolutely no qualifications are doing it. The technology companies that produce this stuff put linguists and psychologists on it (people with phd's). The systems need to be tuned over time, and the client/integrator is usually too cheap/ignorant to do it right.
2. Cell phones: Nine out of ten cell phone calls I have with other humans usually involve this conversational snippet... "What?! I can't hear you!" How the hell a computer's supposed to understand you is beyond me. And, the cell phone network providers make it even harder on the computer, because of endpointing. "Endpointing" is the process of listening at the beginning of a call, before the conversation begins, to figure out the background noise level. Cell phone carriers do it all the time. Have you ever been on a cell phone call where the line goes silent, but the call hasn't dropped? That's the cell carrier being conservative with its bandwidth. It figures out that no one's speaking, and it transmits nothing until someone speaks up again. Originally, this made people uncomfortable, thinking their calls were dropped. So, cell phone manufacturers starting inserting fake noise on the local loop to mask the endpointing. It's called "comfort noise" (I'm not kidding)
So, how does this bugger speech rec systems? Well, they do their own endpointing, so that they don't attempt to recognize background noise as speech. They hear silence (because of the cell network's endpointing), and they figure it's a really clean line with no background noise. So, when the caller really says something, the speech rec system gets it all at once, noise and all.
Often, if you're having trouble with a speech rec system, and you're on a cell phone, redialing and getting a fresh endpointing will help. And, of course, you're always better off with a land line.
Every time this happens, it's because the system you're talking to is a bridge between two older systems (in place from a merger, acquisition, etc), and they can't get the old systems to talk to each other very well.
I know, I've worked on such systems.
JFluid (a.k.a the NetBeans Profiler)
1. Free
2. Trivial to configure/run (as easy as debugging in NB)
3. You can specify a subset of the app to profile, and the rest of the app runs at full speed (profilers like JProbe force you to slow down the whole VM). This also makes the output much more readable, since you're not seeing all kinds of statistics from code you're not interested in.
4. Attach/detach to a running application, which requires no special cmd line options when launching.
5. Analysis tools are good at finding memory leaks, based on discerning patterns in allocation/deallocation
Check it out at http://profiler.netbeans.org/
No OSS ORBs? What about ACE/TAO, MICO, JacORB, and ORBacus? Most, if not all, were in existence back in '97, IIRC. Certainly ACE/TAO was.
Except that, according to that article, the actor was listed as an "electrician".
Columbus? What was he "correct" about? The earth being round? Everyone knew that then. The myth that there was any common perception in 1492 that the earth was flat was created by Washington Irving in his biography of Columbus, written centuries later. (source: James Leowen, "Everything You've Been Taught is Wrong" -- great book, BTW)
From Paula Poundstone's standup routine:
"He's against abortion, but for capital punishment. Spoken like a true fisherman. Throw 'em back, kill 'em when they're bigger!"
They compare a lot like apples and oranges. ViaVoice is designed for dictation-type applications. No restricted grammar, but you have to train it on your voice.
..."
Sphinx is designed for "directed dialogs". The computer asks you a question. The computer has a set of expected responses. The thing you say had better be one of those.
This is not as useless as it sounds. Directed dialog is used quite widely in the call center business, because "what city would you like to fly to?" is a lot better than "press 1 for boston, press 2 for new york, press 3 for los angeles, press 4 for
The stuff they're open sourcing is all pretty minor. They're not releasing their speech recognition engine. Just some tools for creating voicexml content that _runs_ on the engine. You still get charged a per-port license when deploying the application, as well as professional services involved in helping you create a good app.
... about 3 years ago.
And by "you", I mean large enterprise call center customers. This software has just about zero usefulness for average joe hacker (though it is fun to play with).
Oh, and the tools they _are_ open sourcing represent their state of the art
Nah, they're not open-sourcing the engine. Just some tools used to develop voicexml applications.
Calm down, dude. They're not open sourcing their engine, just some developer tools.
Oh, and this is purely stuff for telehony-based speech recognition. As in over-the-phone systems to replace "press 1 for this, press 2 for that" with "say the name of the movie you'd like to see".
The deployed systems are still subject to licensing fees, though.
You could build 10,000 boxes and sell them around the world without any licensing fees.
That is somewhat different from a solution developed with Microsoft Speech Server 2004.
Afraid not. IBM is open sourcing 2 things, neither of which is their speech recognition engine. One is just a JSP library, with some tags for generating voicexml for dates, times, currency grammars, etc. The other is some tools code for eclipse. Modules for editing voicexml, ccxml, grxml, etc. A fancy XML editor. Ho hum.
Really not much to see here.
I would add to your definition of "class" the concepts of polymorphism and dynamic binding. A class doesn't merely add methods to a struct, but usually implies the ability to override those methods in subclasses, and refer to any number of subclass types with a reference typed to the superclass.
In c++ lingo, a class ain't a class without a vptr.
Hungarian soldier: "Got any cigarettes?"
UK soldier 1: "What is he writing?"
UK soldier 2: "I will not buy this record. It is scratched."
I don't know about you guys, but I've got a few more years until I go red, then I'm gonna try like hell for renewal!
There is no santuary!
Because there's a cleanroom gpl replacement driver in there (forcedeth). You no longer need nvnet at all.
I'm all in favor of leaving MDI as a selectable option, like it is in NetBeans IDE, for instance.
The code necessary to maintain the ability to switch between MDI and SDI in Netbeans is a beast, and the main reason it "feels slow". The next version of Netbeans has a complete rewrite of the windowing system, and it's MDI-only.
You laugh, but I never realized that my IT-installed NAV was sending out these rediculous bounces until some kindly gentlement replied to one and informed me of it (in a polite way, I might add). I immediately figured out how to configure NAV not to do that anymore, and notified IT, who proceeded to disable the bounces company-wide.
I must've read the topic wrong. Razor blade games? Reminds me of that old SNL bit with Dan Akroyd and Jane Curtin. Akroyd is a sleazy toy manufacturer pushing Christmas toys like "Bag O' Glass" and "Bag O' Sulfuric Acid".