Open Source Speech Recognition - With Source
Paul Lamere writes " This story
on ZD-Net and this recent story
on Slashdot
describes the recent open sourcing of IBM's voice
recognition software. This release, unfortunately, doesn't include
any source for the actual speech recognition engine. Olaf Schmidt, a
developer on the KDE Accessibility Project ,
is quoted as saying 'There is no speech-recognition system available
for Linux, which is a big gap.' In an attempt to close this gap, we
have just released Sphinx-4,
a state-of-the-art, speaker-independent, continuous
speech recognition system written entirely in the Java programming
language. It was created by researchers and engineers from Sun, CMU,
MERL, HP, MIT and UCSC. Despite (or because of) being written in the
Java programming language, Sphinx-4 performs as well as similar
systems written in C. Here are the release notes and
some performance data."
That's for the IBM one, dummy. Let me guess - you saw that sentence and had an instant knee-jerk reaction without reading the rest of the summary to find out what it's talking about.
Karma: Segmentation fault (tried to dereference a null post)
When are we going to get GOOD text to speech, that uses modeled parameters of human vocal tracts rather than stitching together a bunch of pre-recorded phonemes?
In OS/2. Really, it was just about a decade ago. It worked pretty well, especially when you take into account the computer power of the time.
Old and busted = voice recognition
New hotness = word spotting
When are we going to see software for Linux that allow us to search keywords in audio or video files like Dragon MediaIndexer does?
--
Try Nuggets , the mobile search engine. We answer your questions via SMS, across the UK.
So how long before this is integrated with Asterix for voice activated linux telephone apps?
Michael
Speech recognition is one of the worst means of input there is for a computer. Keyboards work so much better. Even for those who don't have full use of their hands, there are many other options for user input, all of which are better than speech recognition. Worst thing ever is someone trying to use speech input in a cubicle environment.
Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Does this incarnation of Sphinx or any of the other open-source speech recognition systems allow us to process acoustic scores and potential phonemes while the user is still talking?
In other words, can we access a time stream of phone-probabilities as it is being updated?
Thanks!
AC
Heh, I can just see a hacker running through the halls of a crowded office screaming commands through a megaphone.
The very small vocabulary needed for desktop control makes the speech recognition much more accurate and usable.
Argh. Why does everyone insist on using VMs? /Natively compile/ your Java (I used Excelsior JET) and it will in fact run at C++ speed but with a lot less hassle and security risk. I used to prototype in Java and port to C for optimisation, but with native compilation I no longer have to bother.
Yea, we will all speak BASH. Seriously, the real problem is not speech recognition, it is in the area of speech understanding. A good example from an SR book from my college days.. "Please plant some more tulips." or was it... "Please plan sum ore two lips." It is not a trivial computer problem to resolve this. In fact, I would venture to say that once you have an algorythm to resolve the above then you probably also have a "sentient" computer that can pass the Turing test. That would be pretty sweet as you will have solved many of other problems in the world.
Insightful?
1)You don't have to run voice-recog, software when root or anytime.
2)You are allowed to wear headphones when you listen to music.
3)Your brain is a wonderful thing. So is the developer's brain. If you've thought of a possible problem, I'm pretty sure the developer has too. So - don't expect this to be a problem.
> It's amazing that the myth of Java being slow is so persistant
Before you mod me down as a Troll , I work on a virtual machine as a hobby.
The problems with Java being slow have little to do with the "execution of code" part. The part that takes a hit are the Garbage Collector and the Class Loader. The latter causes a HUGE hit in the start up. The former is responsible for those strange Swing freezes I've been seeing when I switch into a Java app.
Unicode also brings its own set of junk , for example "Hello World" in dotgnu's JIT does 7302 hastable inserts, 6000+ StringBuffer operations to initialize the Unicode encoder/decoder. And that is the standard way of decoding unicode (mono uses the same code).
Lastly , C/C++ commonly uses a lot of fields while Java brings in get/set methods for these. A method calls for a get or set is a LOT more expensive than a pointer read . Design has a lot to do with why Java is slow.
The enterprise apps where Java is popular are essentially backend applications which run for long periods of time (so have all the classes looked up and loaded) with a HUGE heap (256 MB or more) where occasional GC freeze won't destroy the entire experience (as it is often JSP/Web based interfaces).
Java *is* fast, if you don't count the slow parts.
Quidquid latine dictum sit, altum videtur
Every year the Java naysayers get more and more frustrated and more desperate to find a reason that Java just won't do. For years it was that Java was too slow... that one was true for about 18 months in 1995. Well, maybe now that we can do crypto in Java, play DOOM in Java, and do speech recognition in Java we can finally put it to rest.
Next up - Java's footprint and startup time is too slow... Take a look at what they're doing in Java 1.5 to memory map and share core classes and pre-bind read only classes. Also think about the fact that all that work the HotSpot engine does to optimize things at runtime just gets thrown away every time the VM restarts and ask - why?
Pat Niemeyer
Author of Learning Java, O'Reilly & Associates