Slashdot Mirror


Open Source Speech Recognition - With Source

Paul Lamere writes " This story on ZD-Net and this recent story on Slashdot describes the recent open sourcing of IBM's voice recognition software. This release, unfortunately, doesn't include any source for the actual speech recognition engine. Olaf Schmidt, a developer on the KDE Accessibility Project , is quoted as saying 'There is no speech-recognition system available for Linux, which is a big gap.' In an attempt to close this gap, we have just released Sphinx-4, a state-of-the-art, speaker-independent, continuous speech recognition system written entirely in the Java programming language. It was created by researchers and engineers from Sun, CMU, MERL, HP, MIT and UCSC. Despite (or because of) being written in the Java programming language, Sphinx-4 performs as well as similar systems written in C. Here are the release notes and some performance data."

10 of 404 comments (clear)

  1. Re:But what about text to speech? by QuantumG · · Score: 4, Informative

    By we I assume you mean "the open source community" and the answer is "when you get off your ass and code it". If by "we" you mean the world at large then go and look at AT&T's Natural Voices project.

    --
    How we know is more important than what we know.
  2. Translation for those who still don't get it... by CaptainPinko · · Score: 4, Informative

    Title: I'm(Aim) using(You Sing) it(Ate) right(Write) now(How)
    Body: It(Ate) works(lurks) very(barry) well(wall).

    --
    Your CPU is not doing anything else, at least do something.
  3. Sphinx 2 by PiGuy · · Score: 5, Informative

    "There is no speech-recognition system available for Linux, which is a big gap."

    Um, Sphinx 2 (a predecessor of Sphinx 4) has been around for quite some time now. Like Sphinx 4, it's speaker-independent. Unlike Sphinx 4, it's a C library, and is thus easily interfaced with other languages (insert shameless plug for a simple Python interface for Sphinx 2 I wrote).

  4. OT Star Wars Nitpick by Anonymous Coward · · Score: 5, Informative

    Hey moron, it's R2D2 that beep-booped. C3PO was fluent in over 6 million forms of communication. ;-)

  5. Re:Virtual Machine Syndrome by pslam · · Score: 5, Informative
    It is most easily recognized in a release announcement, where for no reason whatsoever the afflicted developer suddenly interjects a statement like "and it's just as fast as C", to the bewilderment of the audience.

    An expecially odd statement considering much of speech recognition can be broken down into great big vector operations, which are perfect for hand coding in C. Bet I could quadruple the speed of it in a couple of hours with some hand coded SIMD ops in x86 assembler.

    It's funny because Java is fantastic at JIT compiling code with lots of non-local behaviour (e.g complex UIs) because it can take into account global behaviour at runtime. But it sucks at tight, heavy computation loop. DSP is a fantastic example of something Java is going to get creamed at when pitched against non-virtual machines.

    Of course, if you have some cross-platform standard API calls for those vector DSP ops, then it's a different argument...

  6. Re:Telephony by dalabrat · · Score: 3, Informative

    December 2003 http://www.voip-info.org/wiki-Sphinx

  7. Rolling your own speech recognition isn't so easy by belmolis · · Score: 4, Informative

    Speech recognition is not really a solved problem. For some applications it works adequately, but if you take a look at the error rates for the Sphinx system to which the post links, you'll see that the Word Error Rate for large vocabulary is over 18%. Even for 5,000 words it is 7%. For many applications that is unacceptable.

    A second factor is that these statistical speech recognition systems require extensive data for their language model. Building such a system requires recording real speech, segmenting it and creating a set of examples from which to compute the probabilities, which requires some knowledge of acoustic phonetics, and doing the computation for the model. This is time-consuming.

    Speech recognition technology isn't a dark secret, but it isn't trivial to create a system with good performance either.

  8. Re:There's more than one kind of overhead. by LarryRiedel · · Score: 3, Informative
    I can run inetd-style fork-exec-terminate servers in C on CPUs that a cellphone would spit on, and handle hundreds of connections a second. Bringing up a JVM on the same processor would take minutes.
    [...]
    if it takes 10s to start up a JVM your customer's already hit "back".

    I find that startup/shutdown for a simple Java program takes about 200ms at 1GHz with the vanilla Sun JDK 1.5 JVM, or 150ms using gcj (gcc), and an equivalent C program takes about 2ms.

    Browser plugins? For content, yes, but not for navigation.

    The overhead of starting a JVM should be incurred only once per browsing session.

    Larry

  9. Convert to C easily with ALMA by samjam · · Score: 3, Informative

    Alma.

    It can read several high level languages and build an internal representation and the convert that to other high level languages.

    It is a great tool to help port this software to C for example.

    Unfortunately the site seems to have gone, although I have used this software in the past.

    See the google cache though: http://66.102.9.104/search?q=cache:Dbw7OX6Tco4J:ww w.memoire.com/guillaume-desnoix/alma/+&hl=en

  10. Re:Java!?! by leinhos · · Score: 4, Informative

    Can't gcc compile java code directly to native binary code?

    Does this mean that one could make a shared library out of the java code for C-programmers to use?