Open Source Speech Recognition - With Source
Paul Lamere writes " This story
on ZD-Net and this recent story
on Slashdot
describes the recent open sourcing of IBM's voice
recognition software. This release, unfortunately, doesn't include
any source for the actual speech recognition engine. Olaf Schmidt, a
developer on the KDE Accessibility Project ,
is quoted as saying 'There is no speech-recognition system available
for Linux, which is a big gap.' In an attempt to close this gap, we
have just released Sphinx-4,
a state-of-the-art, speaker-independent, continuous
speech recognition system written entirely in the Java programming
language. It was created by researchers and engineers from Sun, CMU,
MERL, HP, MIT and UCSC. Despite (or because of) being written in the
Java programming language, Sphinx-4 performs as well as similar
systems written in C. Here are the release notes and
some performance data."
"There is no speech-recognition system available for Linux, which is a big gap."
Um, Sphinx 2 (a predecessor of Sphinx 4) has been around for quite some time now. Like Sphinx 4, it's speaker-independent. Unlike Sphinx 4, it's a C library, and is thus easily interfaced with other languages (insert shameless plug for a simple Python interface for Sphinx 2 I wrote).
Hey moron, it's R2D2 that beep-booped. C3PO was fluent in over 6 million forms of communication. ;-)
An expecially odd statement considering much of speech recognition can be broken down into great big vector operations, which are perfect for hand coding in C. Bet I could quadruple the speed of it in a couple of hours with some hand coded SIMD ops in x86 assembler.
It's funny because Java is fantastic at JIT compiling code with lots of non-local behaviour (e.g complex UIs) because it can take into account global behaviour at runtime. But it sucks at tight, heavy computation loop. DSP is a fantastic example of something Java is going to get creamed at when pitched against non-virtual machines.
Of course, if you have some cross-platform standard API calls for those vector DSP ops, then it's a different argument...