Open Source Speech Recognition

← Back to Stories (view on slashdot.org)

Open Source Speech Recognition

Posted by ryuzaki0 on Saturday January 19, 2008 @04:14AM from the hello-computer dept.

bedahr writes "The first version of the open source speech recognition suite simon was released. It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model. These components are united under an easy-to-use graphical user interface. Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts. It also provides means to train the language model with new samples and add new words."

12 of 140 comments (clear)

Min score:

Reason:

Sort:

Re:Wiktionary != Wikipedia by Anonymous Coward · 2008-01-19 04:59 · Score: 1, Informative

Although you are correct with Wikipedia and Wiktionary being equal in importantance to Wikimedia. You must acknowledge that Wikipedia is the most well-known and talked about project. Therefore have a little grace with people who accidentally think or say that Wikipedia is the mother organization rather than Wikimedia. No need to be overly pedantic.
Pedant's Revolt by jrothwell97 · 2008-01-19 05:00 · Score: 4, Informative

Simon can import dictionaries directly from wiktionary (a subproject of wikipedia)

No it's not - Wiktionary is a sister project of Wikipedia. Not a subproject.

However, I must concur that in my experience speech recognition has been extremely patchy. While using it to issue voice commands is OK (and can be a real time-saver as it avoids going into Start, /Applications, Programs menu etc), dictation tends to be pretty rubbish. Especially when you're demonstrating the new speech recognition abilities in Windows Vista and just happen to work for Microsoft. And be in a loud, echoey expo hall. And using a dodgy mike.

--
Those using pirated Tinysoft signatures(TM) are a real threat to society and should all be thrown in jail.
Re:been playing with it by bedahr · 2008-01-19 05:07 · Score: 5, Informative

This is actually the simon approach does: the magic keyword is "simon". "simon Firefox" for example. -- bedahr
Re:Are they productive? by bedahr · 2008-01-19 05:19 · Score: 3, Informative

You might want to have a look at the voxforge project

And this doesn't require changes in the algorithm - just in the model.

-- bedahr
Re:Project's webpage in English? by bedahr · 2008-01-19 05:57 · Score: 5, Informative

We are sorry that there is no international homepage for this yet.

BUT: you are strongly encouraged to contact me with any questions: grasch < at > simon-listens.org

-- Peter
Re:Open Source, or Microsoft-Owned? by bedahr · 2008-01-19 06:12 · Score: 5, Informative

Simon is in no way connected to Microsoft.

Simon does NOT contain the HTK toolkit - it meerly executes commands.

HTK is free of charge and open source (in the strict sense of you-can-look-at-the-code). It is, however, not "free".

We are aware of that and have not packaged any parts of HTK for the release - you have to download it yourself if you want to modify the model from within simon.

It is not optimal, but we don't have the knowledge and / or manpower to code up something similar in a reasonable timeframe. And after all, it isn't that big of a deal, is it?

-- bedahr
For those not familiar with this meme by CaptainPinko · 2008-01-19 06:53 · Score: 3, Informative

Basically it comes from a live voice recognition demo from Microsoft for their feature in Vista. Yes, I had to look this up myself.

--
Your CPU is not doing anything else, at least do something.
filthy open-source by jumbolo · 2008-01-19 07:06 · Score: 4, Informative

simon is open source.
julius is open source.
htk is *NOT* open source.

The latter is a micro$oft by-product, as clearly shown by the license that you have to first agree with and then send your email to them in order to download the tarballs...

myself never done this since 1995.
Re:Which languages are supported? by R.Mo_Robert · 2008-01-19 08:05 · Score: 4, Informative

If you follow the link to the Sourceforge project and look at any of the screenshots (including the one on the front page--at the time when I visited it, anyway), you'll see that they're actually training the software with German. So, it looks like the answer to your question is, yes, it supports more than English.

--
R.Mo
CMU Sphinx, an other free speech recognizer by TorKlingberg · 2008-01-19 10:59 · Score: 2, Informative

There is also CMU Sphinx, which is completely free (no HTK used) and very good quality.
http://cmusphinx.sourceforge.net/
http://en.wikipedia.org/wiki/CMU_Sphinx
I use only computer dictation for medical notes by KWTm · 2008-01-19 21:59 · Score: 2, Informative

At my office, we use a computer dictation system for medical notes. It is amazingly accurate for those who speak with accents within the norm. It works well for me, and I will typically dictate something like this:
"The patient presents today with three complaints comma as follows colon new paragraph For the past week comma he has had right shoulder pain period new paragraph He has noticed that when he sneezes comma there are streaks of blood in his mucus period new paragraph He has been experiencing diaphoresis and is concerned that it may be related to his systemic lupus erythematosus for which he has been taking prednisone twenty milligrams q h s."

I think the software is called "Enterprise Dictation System"; requires Internet Explorer, although there must be some component that's pushed out locally to the client since I can't imagine the sound data being sent over intranet to be interpreted. I dictate in chunks, and apparently the longer the chunk the more easily it can interpret what I say. For example, if I just dictate "to", then it may transcribe "to", "2", "two", or "too". If I say "to prevent this comma", then it knows that the first word should be spelled "to".

It's surprisingly accurate, and is more accurate for esoteric medical terms than for comment short words since for medical terms there is a relatively limited number of possibilities relative to the number of syllables.

For some colleagues who speak with foreign accents --and even for certain colleagues who seem to speak with standard local accents-- recognition was quite poor, and they fall back on human transcription.

Anyway, just wanted to share this experience. I was quite amazed at how well the dictation worked.

Here's hoping we can build up a good Open Source/Free database of voice recognition data. Or at least, perhaps an Open Source engine, and then different companies can market their voice data.

--
404555974007725459910684486621289147856453481154 in hex is "You sank my Battleship?"
[GPG key in journal]
Re:Open Source, or Microsoft-Owned? by bedahr · 2008-01-19 23:07 · Score: 2, Informative

This software's license most obviously violates requirements 1, 2, and 3. These are perhaps the most important provisions of the definition and form the basis for the power of calling a license an open source license. By not adhering to this definition when calling licenses and software "Open Source" you dilute the power the terms carries. Simply calling something 'open source' because they allow you to look at the source code is something we should avoid because 'Open Source' requires freedom not just source.
Simon does not violate this description in ANY way.

HTK is not redistributed with simon so simon itself complies exactly with what you are writing.

Simon does not depend on the HTK toolkit. It simply uses it to compile / maintain the model. If you have compiled the model already (simon explicitly asks if you have done that already when starting the first time) you can specify the path to it.

Simon will then just use the model and can still start programs, type text, etc.

There is absolutely no need for the HTK toolkit. Simon is also useful without it.

Is e.g. X.org not open source because it has the means to put non-free software to use to make it even more powerful? (e.g. the nvidia driver)

Simon itself stands under the GPLv3.
-- bedahr