Open Source Speech Recognition - With Source
Paul Lamere writes " This story
on ZD-Net and this recent story
on Slashdot
describes the recent open sourcing of IBM's voice
recognition software. This release, unfortunately, doesn't include
any source for the actual speech recognition engine. Olaf Schmidt, a
developer on the KDE Accessibility Project ,
is quoted as saying 'There is no speech-recognition system available
for Linux, which is a big gap.' In an attempt to close this gap, we
have just released Sphinx-4,
a state-of-the-art, speaker-independent, continuous
speech recognition system written entirely in the Java programming
language. It was created by researchers and engineers from Sun, CMU,
MERL, HP, MIT and UCSC. Despite (or because of) being written in the
Java programming language, Sphinx-4 performs as well as similar
systems written in C. Here are the release notes and
some performance data."
By we I assume you mean "the open source community" and the answer is "when you get off your ass and code it". If by "we" you mean the world at large then go and look at AT&T's Natural Voices project.
How we know is more important than what we know.
Title: I'm(Aim) using(You Sing) it(Ate) right(Write) now(How)
Body: It(Ate) works(lurks) very(barry) well(wall).
Your CPU is not doing anything else, at least do something.
"There is no speech-recognition system available for Linux, which is a big gap."
Um, Sphinx 2 (a predecessor of Sphinx 4) has been around for quite some time now. Like Sphinx 4, it's speaker-independent. Unlike Sphinx 4, it's a C library, and is thus easily interfaced with other languages (insert shameless plug for a simple Python interface for Sphinx 2 I wrote).
Hey moron, it's R2D2 that beep-booped. C3PO was fluent in over 6 million forms of communication. ;-)
An expecially odd statement considering much of speech recognition can be broken down into great big vector operations, which are perfect for hand coding in C. Bet I could quadruple the speed of it in a couple of hours with some hand coded SIMD ops in x86 assembler.
It's funny because Java is fantastic at JIT compiling code with lots of non-local behaviour (e.g complex UIs) because it can take into account global behaviour at runtime. But it sucks at tight, heavy computation loop. DSP is a fantastic example of something Java is going to get creamed at when pitched against non-virtual machines.
Of course, if you have some cross-platform standard API calls for those vector DSP ops, then it's a different argument...
December 2003 http://www.voip-info.org/wiki-Sphinx
Speech recognition is not really a solved problem. For some applications it works adequately, but if you take a look at the error rates for the Sphinx system to which the post links, you'll see that the Word Error Rate for large vocabulary is over 18%. Even for 5,000 words it is 7%. For many applications that is unacceptable.
A second factor is that these statistical speech recognition systems require extensive data for their language model. Building such a system requires recording real speech, segmenting it and creating a set of examples from which to compute the probabilities, which requires some knowledge of acoustic phonetics, and doing the computation for the model. This is time-consuming.
Speech recognition technology isn't a dark secret, but it isn't trivial to create a system with good performance either.
I find that startup/shutdown for a simple Java program takes about 200ms at 1GHz with the vanilla Sun JDK 1.5 JVM, or 150ms using gcj (gcc), and an equivalent C program takes about 2ms.
The overhead of starting a JVM should be incurred only once per browsing session.
Larry
Alma.
w w.memoire.com/guillaume-desnoix/alma/+&hl=en
It can read several high level languages and build an internal representation and the convert that to other high level languages.
It is a great tool to help port this software to C for example.
Unfortunately the site seems to have gone, although I have used this software in the past.
See the google cache though: http://66.102.9.104/search?q=cache:Dbw7OX6Tco4J:w
blog.sam.liddicott.com
Can't gcc compile java code directly to native binary code?
Does this mean that one could make a shared library out of the java code for C-programmers to use?