Open Source Speech Recognition

← Back to Stories (view on slashdot.org)

Open Source Speech Recognition

Posted by ryuzaki0 on Saturday January 19, 2008 @04:14AM from the hello-computer dept.

bedahr writes "The first version of the open source speech recognition suite simon was released. It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model. These components are united under an easy-to-use graphical user interface. Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts. It also provides means to train the language model with new samples and add new words."

6 of 140 comments (clear)

Min score:

Reason:

Sort:

been playing with it by primadd · 2008-01-19 04:20 · Score: 5, Interesting

I did use julius for a small project utilizing voice recognition once. While not perfect I was quite impressed by the results of the engine. Quite fun to control the light and TV with shout commands, thought once or twice a movie actually triggered "lights off"

--
webmasters: personalized bookmarking [primadd.net] scripts for your site
wp and phpbb plugin available
1. Re:been playing with it by bedahr · 2008-01-19 05:10 · Score: 2, Interesting
  
  Actually you don't need to get your hands dirty for writing your own grammar. Simon includes a complete grammar module with ways to compile the grammar, edit the sentence structures, import them from written texts (by looking the words up in the dictionary), etc.
  
  -- bedahr
Wiktionary != Wikipedia by Anonymous Coward · 2008-01-19 04:51 · Score: 4, Interesting

Contrary to what the summary claims, Wiktionary is NOT a sub-project of Wikipedia; rather, both Wiktionary and Wikipedia are projects of the Wikimedia Foundation. They're not only distinct but also - as far as their status within the foundation's hierarchy of projects is concerned - totally equal, with none being a sub-project of or more important than the other.

I would've expected that kind of sloppiness on the Register, but not on Slashdot (yeah, I know, I must be new here...)
Uses in Telephony by Anonymous Coward · 2008-01-19 05:03 · Score: 2, Interesting

This could be very useful in projects like FreeSWITCH which is an Open Source project for building telephony applications. More info at http://www.freeswitch.org/
Open Source, or Microsoft-Owned? by kripkenstein · 2008-01-19 06:00 · Score: 4, Interesting

Cue the obligatory lets set so double the killer delete select all. :) Speaking of Microsoft, according to HTK's FAQ:
HTK was originally developed at the Cambridge University Engineering Department (CUED). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research Laboratory Ltd was established. HTK was sold by Entropic until 1999 when Microsoft bought Entropic. Microsoft has now licensed HTK back to CUED and is providing support so that CUED can redistribute HTK and provide development support via the HTK3 web site. [...] Microsoft retains the copyright to the existing HTK code
[...]
you are not allowed to redistribute (parts of) HTK3 In other words, HTK - a critical part of the 'Simon' project - is owned by Microsoft. It is also not under a FOSS license: you can look at the code and use it for your own purposes, but you can't redistribute it. In fact, reading this, I wonder if Simon is not in violation of the license.
This is not about dictation software by idji · 2008-01-19 08:05 · Score: 5, Interesting

Many people think that "Speech recognition software" = "dictation software" - as is clear from many comments here. That is not simply the case. Dictation is just one application of speech recognition - and a personal application at that - which is the only thing most people come across. Other applications are media transcription (closed captioning), media mining "What did Obama say about the prime mortgage market this week?", telephone call center controlling (Are our staff using naughty words? Is the customer using aggressive language?), telephone call mining ("bomb", "anthrax", ...), indexing vast audio archives of news broadcasts (keyword/topic tagging), aligning audio to human transcription (documentaries, DVD subtitles, witness testimonies, court or parliament proceedings - think of any event that is transcribed like UN conferences), etc. Don't you think CNN, BBC or any national film archive would be interested in searching through there millions of hours of recorded footage? Now you tell me - do you think that the holy grail of speech recognition is "HAL - please close the hatch", "Dear Mom, we are having a lovely time here..." or hearing any TV show in any language you want, or calling anyone in the world and being able to talk to them in your own language? Dictation Software is about the only speech-reco application that can be sold to the masses - all the rest is still fairly much below the horizon...