Open Source Speech Recognition
bedahr writes "The first version of the open source speech recognition suite simon was released.
It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model.
These components are united under an easy-to-use graphical user interface.
Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts.
It also provides means to train the language model with new samples and add new words."
I did use julius for a small project utilizing voice recognition once. While not perfect I was quite impressed by the results of the engine. Quite fun to control the light and TV with shout commands, thought once or twice a movie actually triggered "lights off"
--
webmasters: personalized bookmarking [primadd.net] scripts for your site
wp and phpbb plugin available
In my experience, I have not found speech recognition engines/software that productive. Too many errors and a slow [and steep] "learning" curve for the engine. I will have to be convinced that this simon thing is any different for me to give it a spin.
So, when can I use it in my Ford Focus instead of Sync? :-)
Cue the obligatory lets set so double the killer delete select all. :)
As a state gets corrupt, its laws multiply; the most corrupt states have the most numerous laws. (Tacitus, Annales 3:27)
That's great and all, but which languages are supported? I hope it's more than just English
If this is the first, what was Sphinx then?
Eye musing i trite now two poster slashed hot. It saw grate pro gram!
Contrary to what the summary claims, Wiktionary is NOT a sub-project of Wikipedia; rather, both Wiktionary and Wikipedia are projects of the Wikimedia Foundation. They're not only distinct but also - as far as their status within the foundation's hierarchy of projects is concerned - totally equal, with none being a sub-project of or more important than the other.
I would've expected that kind of sloppiness on the Register, but not on Slashdot (yeah, I know, I must be new here...)
No it's not - Wiktionary is a sister project of Wikipedia. Not a subproject.
However, I must concur that in my experience speech recognition has been extremely patchy. While using it to issue voice commands is OK (and can be a real time-saver as it avoids going into Start, /Applications, Programs menu etc), dictation tends to be pretty rubbish. Especially when you're demonstrating the new speech recognition abilities in Windows Vista and just happen to work for Microsoft. And be in a loud, echoey expo hall. And using a dodgy mike.
Those using pirated Tinysoft signatures(TM) are a real threat to society and should all be thrown in jail.
This could be very useful in projects like FreeSWITCH which is an Open Source project for building telephony applications. More info at http://www.freeswitch.org/
Say for remotely controlling say the TV or something .. instead of having to remember the channel number you could just say "TV (or other trigger word), Discovery channel". I guess combined with an LCD/OLED button remote it could be used. Also, on a phone ... it should be possible to use speech to text for certain stuff like adding items to a shopping list.
The software has to be intelligent to know what to do when you press a button and say "shopping list, plums" etc.
I dont think speech recognition is good enough yet where it's can take dictation unless the dictation is in a very specific form.
Wouldn't it be a good idea to work with the (open source) speech recognition of IBM?
http://news.zdnet.com/2100-9593_22-5383536.html
or
http://developers.slashdot.org/article.pl?sid=04/09/13/1058241
Well, if your TV is being controlled by the same computer (e.g., MythTV), then shouldn't the voice command be able to mask out anything the microphone picks up that matches the output sound? If there software to filter audio input to filter out what is currently being played? I'm sure it's a bit tricky to get right, but it would be very useful for a range of applications including this topic and speakerphones.
Trying to learn more about it, I followed the project's website link on the sourceforge page to simon-listens.org, but it's german only, found no english (or other language) info. Anyone has an advice?
Animoog.org
Offices full of people talking to their computers has been Bill Gates' wet dream for decades now. What will happen if open source gets there first?
Actually, the reason we're not there yet is because most people don't want it. Keyboards and mice are simply a better way to give instructions to your computer than speech recognition is. Could you imagine the clatter of a dozen or more people in close proximity chattering to their computers?
Tired of FB/Google censorship? Visit UNCENSORED!
[...]
you are not allowed to redistribute (parts of) HTK3 In other words, HTK - a critical part of the 'Simon' project - is owned by Microsoft. It is also not under a FOSS license: you can look at the code and use it for your own purposes, but you can't redistribute it. In fact, reading this, I wonder if Simon is not in violation of the license.
Citation needed.
Furthermore, is this really of significant importance to be included in the Slashdot comment archive?
You seem rather noobish for a Wiki* pedant.
Basically it comes from a live voice recognition demo from Microsoft for their feature in Vista. Yes, I had to look this up myself.
Your CPU is not doing anything else, at least do something.
I was thinking last night about what could be used to auto translate the Met/BBC Shipping Forcast into lay speak (just cause). This project sounds promising.
MilkMiruku
simon is open source.
julius is open source.
htk is *NOT* open source.
The latter is a micro$oft by-product, as clearly shown by the license that you have to first agree with and then send your email to them in order to download the tarballs...
myself never done this since 1995.
wp and phpbb plugin available Your comment was good, but I've got to say that your web site appearance (and low-contrast color scheme) leaves a lot to be desired. Might want to take a closer look at that, if using it to advertise a product.
You have to speak in the voice of Comic Book Guy.
in the chinese cars. Sadly, they are brighter than we are. They have been stealing everything under the sun from American, but we can not seem to get them to steal the source code that controls the reagan. I am guessing that they are afraid that they would lose track of things as well.
Many people think that "Speech recognition software" = "dictation software" - as is clear from many comments here. That is not simply the case. Dictation is just one application of speech recognition - and a personal application at that - which is the only thing most people come across. Other applications are media transcription (closed captioning), media mining "What did Obama say about the prime mortgage market this week?", telephone call center controlling (Are our staff using naughty words? Is the customer using aggressive language?), telephone call mining ("bomb", "anthrax", ...), indexing vast audio archives of news broadcasts (keyword/topic tagging), aligning audio to human transcription (documentaries, DVD subtitles, witness testimonies, court or parliament proceedings - think of any event that is transcribed like UN conferences), etc.
Don't you think CNN, BBC or any national film archive would be interested in searching through there millions of hours of recorded footage?
Now you tell me - do you think that the holy grail of speech recognition is "HAL - please close the hatch", "Dear Mom, we are having a lovely time here..." or hearing any TV show in any language you want, or calling anyone in the world and being able to talk to them in your own language?
Dictation Software is about the only speech-reco application that can be sold to the masses - all the rest is still fairly much below the horizon...
Are these speech recognition engines designed for use with English in mind? What is the status of the technology for other languages? It seems that other languages with far less sounds like Japanese would change the problem substantially and make it easier to have a quality product. Does the speech recognition problem difficulty change with the language?
The reason we're not there yet is that standalone speech recognition software is stupid. We need KDE and gnome to have built-in speech recognition with a simple API so any application can just monitor the speech input. It should not come in as keystrokes though - must be separate. The speech engine should be a component so different ones can be used of course. If it was there, any app could use it easy enough.
I'm all out of mod points, but this an important point to make about speech recognition.
Write 'rite' right.
Possibly incorrect grammatically, but it's the only obvious way to combine 3 homonyms into what passes for a sentence. Of course, someone saying that might be vehemently agreeing with you as well, "Right! Right! Right!". Sorting that out could be a mess. I've criticised the lack of progress on the speech recognition front for a decade. It's amazing how bad most speech recognition software is.
Here's a better test... Take a standard page of text (about 200 words). Scan it and run it through an OCR program. Then randomly grab people off the street and have them read the text out loud into a microphone. If the speech recognition outperforms the OCR'ed result then it's a success.
This is good news. I hope OSS speech recognition spurs some serious innovation. The field is still wide open for quality algorithms and software IMO.
I'm working on a home automation project and we've been looking for an OSS, linux-compatible speech rec system, but it seemed like every single Linux speech project died in the early 2000s when IBM sold their freeware ViaVoice system and the new company started charging for it. Seems like every single Linux project used it as the backend. The only other option was CMU's Sphinx work which looked impressive but almost impossible for non-speech-experts to use directly. This will be really cool to try out - kudos to everyone working on simon.
There is also CMU Sphinx, which is completely free (no HTK used) and very good quality.
http://cmusphinx.sourceforge.net/
http://en.wikipedia.org/wiki/CMU_Sphinx
I think the software is called "Enterprise Dictation System"; requires Internet Explorer, although there must be some component that's pushed out locally to the client since I can't imagine the sound data being sent over intranet to be interpreted. I dictate in chunks, and apparently the longer the chunk the more easily it can interpret what I say. For example, if I just dictate "to", then it may transcribe "to", "2", "two", or "too". If I say "to prevent this comma", then it knows that the first word should be spelled "to".
It's surprisingly accurate, and is more accurate for esoteric medical terms than for comment short words since for medical terms there is a relatively limited number of possibilities relative to the number of syllables.
For some colleagues who speak with foreign accents --and even for certain colleagues who seem to speak with standard local accents-- recognition was quite poor, and they fall back on human transcription.
Anyway, just wanted to share this experience. I was quite amazed at how well the dictation worked.
Here's hoping we can build up a good Open Source/Free database of voice recognition data. Or at least, perhaps an Open Source engine, and then different companies can market their voice data.
404555974007725459910684486621289147856453481154 in hex is "You sank my Battleship?"
[GPG key in journal]
I am not interested in learning the computer to recognize my terrible pronunciation, but rather to have some program expect to hear standard Chinese which I could practice with.
One extremely useful program I have found which is able to decode and show the tones is Wavesurfer. For those of you that do not know, tones play a very important part in Chinese speech, and it is kind of difficult to learn as a foreigner.
Request: Can any of you with knowledge within this field please contribute a little to update http://tldp.org/HOWTO/Speech-Recognition-HOWTO/index.html, it is a bit dated.
When you are sure of something, you probably are wrong (search for "Unskilled and Unaware of It").
What the hell is "open source (in the strict sense of you-can-look-at-the-code)"? Since when did anyone start to mean "open source" as code that was merely available but not modifiable? As this sibling comment points out (please mod him up, by the way), the term "Open Source" has a very specific meaning. This meaning was determined at the time this term was invented, so you can't even use the same excuse as "free software" and hide behind the excuse of "but for the past 600 centuries, Shakespeare has been using the term 'open source' for this other meaning!"
Microsoft has been muddying the water enough with terms like "Shared Source" and "Open" as in "OOXML". This thing about the "strict sense" of the term "open source" has got to be nipped in the bud.
404555974007725459910684486621289147856453481154 in hex is "You sank my Battleship?"
[GPG key in journal]
IBM ViaVoice on a IBM p100 64MB windows '95 and it was brilliant. It only needed patience to train. Shame nobody else could be bothered and complained it was too hard to use.
"What'd you install that crappy voice recognition software on this computer for, Matt? See? Everything I say is coming up on the screen as syllables
(computer voice) "For-Mat-See. Formatting in progress ..."
Eric Baird
About 15 years ago I worked for a company doing, amongst other things, VR for telephone use. These systems had localised dictionaries to handle accents. We struggled to get the stuff going properly and the only combination we got to work reliably was a Fijian Indian person talking to a British accest VR system. Go figure!
Engineering is the art of compromise.
So, it a bit longer than initially thought ... but, if considering the last point, it's getting here way faster than using Law-abiding processes ...
:
--
Disclaimer, or "legal deathtrap" :
Definitions
Claims :