Open Source Speech Recognition - With Source

← Back to Stories (view on slashdot.org)

Open Source Speech Recognition - With Source

Posted by timothy on Tuesday September 28, 2004 @11:18AM from the what-I-hear-you-saying-is dept.

Paul Lamere writes " This story on ZD-Net and this recent story on Slashdot describes the recent open sourcing of IBM's voice recognition software. This release, unfortunately, doesn't include any source for the actual speech recognition engine. Olaf Schmidt, a developer on the KDE Accessibility Project , is quoted as saying 'There is no speech-recognition system available for Linux, which is a big gap.' In an attempt to close this gap, we have just released Sphinx-4, a state-of-the-art, speaker-independent, continuous speech recognition system written entirely in the Java programming language. It was created by researchers and engineers from Sun, CMU, MERL, HP, MIT and UCSC. Despite (or because of) being written in the Java programming language, Sphinx-4 performs as well as similar systems written in C. Here are the release notes and some performance data."

18 of 404 comments (clear)

Min score:

Reason:

Sort:

Re:withOUT source surely? by sploo22 · 2004-09-28 11:24 · Score: 2, Interesting

That's for the IBM one, dummy. Let me guess - you saw that sentence and had an instant knee-jerk reaction without reading the rest of the summary to find out what it's talking about.

--
Karma: Segmentation fault (tried to dereference a null post)
But what about text to speech? by Anonymous Coward · 2004-09-28 11:26 · Score: 5, Interesting

When are we going to get GOOD text to speech, that uses modeled parameters of human vocal tracts rather than stitching together a bunch of pre-recorded phonemes?
1. Re:But what about text to speech? by DAldredge · 2004-09-28 11:31 · Score: 3, Interesting
  
  It still doesn't sound natural, this text sounds like a female Kirk read it.
  
  We would like to know if something does not sound quite right. After entering some text and listening to it, please fill out a feedback form and tell us what was mispronounced. And please note that no language translation is done so, for example, if you choose a French voice you should submit French text.)
  
  (That text is from the same page.)
2. Re:But what about text to speech? by cheezit · 2004-09-28 11:57 · Score: 3, Interesting
  
  I'm thinking it might be a bit more complicated than that...the human voice is unfortunately far too expressive.
  
  Have the same person read the same passage ten times the same way and you will get ten very different results. Ask them to change tones/emotions and it will be even different.
  
  --
  Premature optimization is the root of all evil
3. Re:But what about text to speech? by mevans · 2004-09-28 14:18 · Score: 3, Interesting
  
  I was sitting in English class one day, and working on a paper - a friend was editing, and I was looking to make a copy of the paper. Having no disks and a finnicky network, we decided to run text-to-speech on my machine and speech-to-text on hers. Needless to say, my paper on the Medicare Reform Bill of last year became garbage. - Evidence of a lossless transfer!
IBM gave me speech recognition a decade ago by Anonymous Coward · 2004-09-28 11:27 · Score: 2, Interesting

In OS/2. Really, it was just about a decade ago. It worked pretty well, especially when you take into account the computer power of the time.
How about open source word spotting by Anonymous Coward · 2004-09-28 11:28 · Score: 2, Interesting

Old and busted = voice recognition

New hotness = word spotting

When are we going to see software for Linux that allow us to search keywords in audio or video files like Dragon MediaIndexer does?
Free C++ alternative from Mississippi State Univ. by j.leidner · 2004-09-28 11:34 · Score: 4, Interesting

Another open source system, but implemented in C++ (like all industrial systems I know of) can be found at here (a vision statement is here.
--
Try Nuggets , the mobile search engine. We answer your questions via SMS, across the UK.
Telephony by Anonymous Coward · 2004-09-28 11:41 · Score: 2, Interesting

So how long before this is integrated with Asterix for voice activated linux telephone apps?

Michael
Speech recognition by CastrTroy · 2004-09-28 11:44 · Score: 4, Interesting

Speech recognition is one of the worst means of input there is for a computer. Keyboards work so much better. Even for those who don't have full use of their hands, there are many other options for user input, all of which are better than speech recognition. Worst thing ever is someone trying to use speech input in a cubicle environment.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Serious Question: Real-Time Acoustic Scores? by Anonymous Coward · 2004-09-28 11:48 · Score: 1, Interesting

Does this incarnation of Sphinx or any of the other open-source speech recognition systems allow us to process acoustic scores and potential phonemes while the user is still talking?

In other words, can we access a time stream of phone-probabilities as it is being updated?

Thanks!
AC
Re:no more music by Anonymous Coward · 2004-09-28 12:16 · Score: 1, Interesting

Heh, I can just see a hacker running through the halls of a crowded office screaming commands through a megaphone.
nifty desktop control with sphinx and festival by Danny+Rathjens · 2004-09-28 13:07 · Score: 4, Interesting

http://perlbox.sourceforge.net/
The very small vocabulary needed for desktop control makes the speech recognition much more accurate and usable.
Re:There's more than one kind of overhead. by Anonymous Coward · 2004-09-28 14:00 · Score: 1, Interesting

Argh. Why does everyone insist on using VMs? /Natively compile/ your Java (I used Excelsior JET) and it will in fact run at C++ speed but with a lot less hassle and security risk. I used to prototype in Java and port to C for optimisation, but with native compilation I no longer have to bother.
Re:Why speech recognition on Linux will kill Windo by bugeye1959 · 2004-09-28 14:18 · Score: 2, Interesting

Yea, we will all speak BASH. Seriously, the real problem is not speech recognition, it is in the area of speech understanding. A good example from an SR book from my college days.. "Please plant some more tulips." or was it... "Please plan sum ore two lips." It is not a trivial computer problem to resolve this. In fact, I would venture to say that once you have an algorythm to resolve the above then you probably also have a "sentient" computer that can pass the Turing test. That would be pretty sweet as you will have solved many of other problems in the world.
Re:no more music by Anonymous Coward · 2004-09-28 15:43 · Score: 1, Interesting

Insightful?

1)You don't have to run voice-recog, software when root or anytime.

2)You are allowed to wear headphones when you listen to music.

3)Your brain is a wonderful thing. So is the developer's brain. If you've thought of a possible problem, I'm pretty sure the developer has too. So - don't expect this to be a problem.
Benchmarks are TIGHT LOOPS with no GC !! by Gopal.V · 2004-09-28 19:22 · Score: 3, Interesting

> It's amazing that the myth of Java being slow is so persistant

Before you mod me down as a Troll , I work on a virtual machine as a hobby.

The problems with Java being slow have little to do with the "execution of code" part. The part that takes a hit are the Garbage Collector and the Class Loader. The latter causes a HUGE hit in the start up. The former is responsible for those strange Swing freezes I've been seeing when I switch into a Java app.

Unicode also brings its own set of junk , for example "Hello World" in dotgnu's JIT does 7302 hastable inserts, 6000+ StringBuffer operations to initialize the Unicode encoder/decoder. And that is the standard way of decoding unicode (mono uses the same code).

Lastly , C/C++ commonly uses a lot of fields while Java brings in get/set methods for these. A method calls for a get or set is a LOT more expensive than a pointer read . Design has a lot to do with why Java is slow.

The enterprise apps where Java is popular are essentially backend applications which run for long periods of time (so have all the classes looked up and loaded) with a HUGE heap (256 MB or more) where occasional GC freeze won't destroy the entire experience (as it is often JSP/Web based interfaces).

Java *is* fast, if you don't count the slow parts.

--
Quidquid latine dictum sit, altum videtur
Frustrated Java detractors... by patniemeyer · 2004-09-29 00:26 · Score: 2, Interesting

Every year the Java naysayers get more and more frustrated and more desperate to find a reason that Java just won't do. For years it was that Java was too slow... that one was true for about 18 months in 1995. Well, maybe now that we can do crypto in Java, play DOOM in Java, and do speech recognition in Java we can finally put it to rest.

Next up - Java's footprint and startup time is too slow... Take a look at what they're doing in Java 1.5 to memory map and share core classes and pre-bind read only classes. Also think about the fact that all that work the HotSpot engine does to optimize things at runtime just gets thrown away every time the VM restarts and ask - why?

Pat Niemeyer
Author of Learning Java, O'Reilly & Associates