Slashdot Mirror


Open Source Speech Recognition - With Source

Paul Lamere writes " This story on ZD-Net and this recent story on Slashdot describes the recent open sourcing of IBM's voice recognition software. This release, unfortunately, doesn't include any source for the actual speech recognition engine. Olaf Schmidt, a developer on the KDE Accessibility Project , is quoted as saying 'There is no speech-recognition system available for Linux, which is a big gap.' In an attempt to close this gap, we have just released Sphinx-4, a state-of-the-art, speaker-independent, continuous speech recognition system written entirely in the Java programming language. It was created by researchers and engineers from Sun, CMU, MERL, HP, MIT and UCSC. Despite (or because of) being written in the Java programming language, Sphinx-4 performs as well as similar systems written in C. Here are the release notes and some performance data."

8 of 404 comments (clear)

  1. But what about text to speech? by Anonymous Coward · · Score: 5, Interesting

    When are we going to get GOOD text to speech, that uses modeled parameters of human vocal tracts rather than stitching together a bunch of pre-recorded phonemes?

    1. Re:But what about text to speech? by DAldredge · · Score: 3, Interesting

      It still doesn't sound natural, this text sounds like a female Kirk read it.

      We would like to know if something does not sound quite right. After entering some text and listening to it, please fill out a feedback form and tell us what was mispronounced. And please note that no language translation is done so, for example, if you choose a French voice you should submit French text.)

      (That text is from the same page.)

    2. Re:But what about text to speech? by cheezit · · Score: 3, Interesting

      I'm thinking it might be a bit more complicated than that...the human voice is unfortunately far too expressive.

      Have the same person read the same passage ten times the same way and you will get ten very different results. Ask them to change tones/emotions and it will be even different.

      --
      Premature optimization is the root of all evil
    3. Re:But what about text to speech? by mevans · · Score: 3, Interesting

      I was sitting in English class one day, and working on a paper - a friend was editing, and I was looking to make a copy of the paper. Having no disks and a finnicky network, we decided to run text-to-speech on my machine and speech-to-text on hers. Needless to say, my paper on the Medicare Reform Bill of last year became garbage. - Evidence of a lossless transfer!

  2. Free C++ alternative from Mississippi State Univ. by j.leidner · · Score: 4, Interesting
    Another open source system, but implemented in C++ (like all industrial systems I know of) can be found at here (a vision statement is here.

    --
    Try Nuggets , the mobile search engine. We answer your questions via SMS, across the UK.

  3. Speech recognition by CastrTroy · · Score: 4, Interesting

    Speech recognition is one of the worst means of input there is for a computer. Keyboards work so much better. Even for those who don't have full use of their hands, there are many other options for user input, all of which are better than speech recognition. Worst thing ever is someone trying to use speech input in a cubicle environment.

    --

    Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
  4. nifty desktop control with sphinx and festival by Danny+Rathjens · · Score: 4, Interesting
    http://perlbox.sourceforge.net/

    The very small vocabulary needed for desktop control makes the speech recognition much more accurate and usable.

  5. Benchmarks are TIGHT LOOPS with no GC !! by Gopal.V · · Score: 3, Interesting

    > It's amazing that the myth of Java being slow is so persistant

    Before you mod me down as a Troll , I work on a virtual machine as a hobby.

    The problems with Java being slow have little to do with the "execution of code" part. The part that takes a hit are the Garbage Collector and the Class Loader. The latter causes a HUGE hit in the start up. The former is responsible for those strange Swing freezes I've been seeing when I switch into a Java app.

    Unicode also brings its own set of junk , for example "Hello World" in dotgnu's JIT does 7302 hastable inserts, 6000+ StringBuffer operations to initialize the Unicode encoder/decoder. And that is the standard way of decoding unicode (mono uses the same code).

    Lastly , C/C++ commonly uses a lot of fields while Java brings in get/set methods for these. A method calls for a get or set is a LOT more expensive than a pointer read . Design has a lot to do with why Java is slow.

    The enterprise apps where Java is popular are essentially backend applications which run for long periods of time (so have all the classes looked up and loaded) with a HUGE heap (256 MB or more) where occasional GC freeze won't destroy the entire experience (as it is often JSP/Web based interfaces).

    Java *is* fast, if you don't count the slow parts.