Slashdot Mirror


Open Source Speech Recognition

bedahr writes "The first version of the open source speech recognition suite simon was released. It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model. These components are united under an easy-to-use graphical user interface. Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts. It also provides means to train the language model with new samples and add new words."

22 of 140 comments (clear)

  1. been playing with it by primadd · · Score: 5, Interesting

    I did use julius for a small project utilizing voice recognition once. While not perfect I was quite impressed by the results of the engine. Quite fun to control the light and TV with shout commands, thought once or twice a movie actually triggered "lights off"

    --
    webmasters: personalized bookmarking [primadd.net] scripts for your site
    wp and phpbb plugin available

    1. Re:been playing with it by Anonymous Coward · · Score: 5, Insightful

      You might want to do what they do in Star Treck and put a word infront of every command. Something like "Computer: Lights off" will reduce the chance that some random sentences from the TV will trigger the command. Unless you're watching Star Treck ofcourse.

    2. Re:been playing with it by Anonymous Coward · · Score: 5, Funny

      Not perfect? Like, if you say "Open the pod bay doors, HAL," it'll say "I'm sorry, Dave, but I can't seem to do that," and try to kill you (even though your name is Steve)?

    3. Re:been playing with it by Woldry · · Score: 4, Funny

      "not watching star trek" --- wait, I don't follow.

      --
      How can a post be modded "overrated" or "underrated" when it hasn't been rated yet?
    4. Re:been playing with it by bedahr · · Score: 5, Informative

      This is actually the simon approach does: the magic keyword is "simon". "simon Firefox" for example. -- bedahr

  2. Are they productive? by bogaboga · · Score: 3, Insightful

    In my experience, I have not found speech recognition engines/software that productive. Too many errors and a slow [and steep] "learning" curve for the engine. I will have to be convinced that this simon thing is any different for me to give it a spin.

    1. Re:Are they productive? by Yvanhoe · · Score: 3, Insightful

      I doubt that speech recognition is ready to be used as an alternative to keyboards to type text, but I think it can become, after the keyboard and the mouse, a third input device that would boost the productivity of a computer user.

      --
      The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
    2. Re:Are they productive? by Sanat · · Score: 4, Funny

      Dear aunt,let's set so double the killer delete select all

      --
      And in the end, the love you take is equal to the love you make
    3. Re:Are they productive? by Instine · · Score: 5, Insightful

      Nearly five years ago I used to help a guy who had no useful movement in his limbs. He could use a mouth stick to type and control the cursor. However he also used Dragon Dictate. His machine was old 7 years ago, and here's the amazing bit (to me at least) his speech was pretty garbled from his condition. Most humans found it very hard understanding him, yet the dictation software did a pretty good job. He wrote an entire screen play (later comitioned by the BBC) and was a lawyer with his own practice (it may sound like it but I'm not making this up). His success with this tech was probably what got me into assitive tech (now my job).

      So depends who you are on how much it improves you productivity.

      --
      Because you can - or because you should?
    4. Re:Are they productive? by bedahr · · Score: 3, Informative

      You might want to have a look at the voxforge project

      And this doesn't require changes in the algorithm - just in the model.

      -- bedahr

  3. Which languages are supported? by r_jensen11 · · Score: 3, Insightful

    That's great and all, but which languages are supported? I hope it's more than just English

    1. Re:Which languages are supported? by R.Mo_Robert · · Score: 4, Informative

      If you follow the link to the Sourceforge project and look at any of the screenshots (including the one on the front page--at the time when I visited it, anyway), you'll see that they're actually training the software with German. So, it looks like the answer to your question is, yes, it supports more than English.

      --
      R.Mo
  4. Aisle of it by ZeroFactorial · · Score: 5, Funny

    Eye musing i trite now two poster slashed hot. It saw grate pro gram!

  5. Wiktionary != Wikipedia by Anonymous Coward · · Score: 4, Interesting

    Contrary to what the summary claims, Wiktionary is NOT a sub-project of Wikipedia; rather, both Wiktionary and Wikipedia are projects of the Wikimedia Foundation. They're not only distinct but also - as far as their status within the foundation's hierarchy of projects is concerned - totally equal, with none being a sub-project of or more important than the other.

    I would've expected that kind of sloppiness on the Register, but not on Slashdot (yeah, I know, I must be new here...)

    1. Re:Wiktionary != Wikipedia by kryten_nl · · Score: 4, Funny

      Contrary to what the summary claims, Wiktionary is NOT a sub-project of Wikipedia; rather, both Wiktionary and Wikipedia are projects of the Wikimedia Foundation. They're not only distinct but also - as far as their status within the foundation's hierarchy of projects is concerned - totally equal, with none being a sub-project of or more important than the other.[Citation needed]
      --
      For the perfect anti-Unix, write an OS that thinks it knows what you're doing better than you do and let it be wrong.
  6. Pedant's Revolt by jrothwell97 · · Score: 4, Informative

    Simon can import dictionaries directly from wiktionary (a subproject of wikipedia)

    No it's not - Wiktionary is a sister project of Wikipedia. Not a subproject.

    However, I must concur that in my experience speech recognition has been extremely patchy. While using it to issue voice commands is OK (and can be a real time-saver as it avoids going into Start, /Applications, Programs menu etc), dictation tends to be pretty rubbish. Especially when you're demonstrating the new speech recognition abilities in Windows Vista and just happen to work for Microsoft. And be in a loud, echoey expo hall. And using a dodgy mike.

    --
    Those using pirated Tinysoft signatures(TM) are a real threat to society and should all be thrown in jail.
  7. Re:Project's webpage in English? by bedahr · · Score: 5, Informative

    We are sorry that there is no international homepage for this yet.

    BUT: you are strongly encouraged to contact me with any questions: grasch < at > simon-listens.org

    -- Peter

  8. Open Source, or Microsoft-Owned? by kripkenstein · · Score: 4, Interesting

    Cue the obligatory lets set so double the killer delete select all. :) Speaking of Microsoft, according to HTK's FAQ:

    HTK was originally developed at the Cambridge University Engineering Department (CUED). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research Laboratory Ltd was established. HTK was sold by Entropic until 1999 when Microsoft bought Entropic. Microsoft has now licensed HTK back to CUED and is providing support so that CUED can redistribute HTK and provide development support via the HTK3 web site. [...] Microsoft retains the copyright to the existing HTK code
    [...]
    you are not allowed to redistribute (parts of) HTK3 In other words, HTK - a critical part of the 'Simon' project - is owned by Microsoft. It is also not under a FOSS license: you can look at the code and use it for your own purposes, but you can't redistribute it. In fact, reading this, I wonder if Simon is not in violation of the license.
    1. Re:Open Source, or Microsoft-Owned? by bedahr · · Score: 5, Informative

      Simon is in no way connected to Microsoft.

      Simon does NOT contain the HTK toolkit - it meerly executes commands.

      HTK is free of charge and open source (in the strict sense of you-can-look-at-the-code). It is, however, not "free".

      We are aware of that and have not packaged any parts of HTK for the release - you have to download it yourself if you want to modify the model from within simon.

      It is not optimal, but we don't have the knowledge and / or manpower to code up something similar in a reasonable timeframe. And after all, it isn't that big of a deal, is it?

      -- bedahr

  9. For those not familiar with this meme by CaptainPinko · · Score: 3, Informative

    Basically it comes from a live voice recognition demo from Microsoft for their feature in Vista. Yes, I had to look this up myself.

    --
    Your CPU is not doing anything else, at least do something.
  10. filthy open-source by jumbolo · · Score: 4, Informative

    simon is open source.
    julius is open source.
    htk is *NOT* open source.

    The latter is a micro$oft by-product, as clearly shown by the license that you have to first agree with and then send your email to them in order to download the tarballs...

    myself never done this since 1995.

  11. This is not about dictation software by idji · · Score: 5, Interesting

    Many people think that "Speech recognition software" = "dictation software" - as is clear from many comments here. That is not simply the case. Dictation is just one application of speech recognition - and a personal application at that - which is the only thing most people come across. Other applications are media transcription (closed captioning), media mining "What did Obama say about the prime mortgage market this week?", telephone call center controlling (Are our staff using naughty words? Is the customer using aggressive language?), telephone call mining ("bomb", "anthrax", ...), indexing vast audio archives of news broadcasts (keyword/topic tagging), aligning audio to human transcription (documentaries, DVD subtitles, witness testimonies, court or parliament proceedings - think of any event that is transcribed like UN conferences), etc. Don't you think CNN, BBC or any national film archive would be interested in searching through there millions of hours of recorded footage? Now you tell me - do you think that the holy grail of speech recognition is "HAL - please close the hatch", "Dear Mom, we are having a lovely time here..." or hearing any TV show in any language you want, or calling anyone in the world and being able to talk to them in your own language? Dictation Software is about the only speech-reco application that can be sold to the masses - all the rest is still fairly much below the horizon...