Slashdot Mirror


Open Source Speech Recognition

bedahr writes "The first version of the open source speech recognition suite simon was released. It uses the Julius large vocabulary continuous speech recognition to do the actual recognition and the HTK toolkit to maintain the language model. These components are united under an easy-to-use graphical user interface. Simon can import dictionaries directly from wiktionary (a subproject of wikipedia) or from files formated in the HADIFIX- or HTK format and grammar structures directly from personal texts. It also provides means to train the language model with new samples and add new words."

140 comments

  1. been playing with it by primadd · · Score: 5, Interesting

    I did use julius for a small project utilizing voice recognition once. While not perfect I was quite impressed by the results of the engine. Quite fun to control the light and TV with shout commands, thought once or twice a movie actually triggered "lights off"

    --
    webmasters: personalized bookmarking [primadd.net] scripts for your site
    wp and phpbb plugin available

    1. Re:been playing with it by Anonymous Coward · · Score: 5, Insightful

      You might want to do what they do in Star Treck and put a word infront of every command. Something like "Computer: Lights off" will reduce the chance that some random sentences from the TV will trigger the command. Unless you're watching Star Treck ofcourse.

    2. Re:been playing with it by Anonymous Coward · · Score: 2, Funny

      Only if you are not watching star trek!

    3. Re:been playing with it by hoppo · · Score: 1

      Interesting. Does it come with a pre-trained model? Are you able to train it more fully?

    4. Re:been playing with it by primadd · · Score: 1

      There is one premade for japanese, and one for english, both are not that big, thought english one is lacking even more. You can write you own grammar, see http://julius.sourceforge.jp/en/grammar.html and for limited set of commands that is the best way to go

      --
      webmasters: personalized bookmarking scripts for your site
      wp and phpbb plugin available

    5. Re:been playing with it by Anonymous Coward · · Score: 5, Funny

      Not perfect? Like, if you say "Open the pod bay doors, HAL," it'll say "I'm sorry, Dave, but I can't seem to do that," and try to kill you (even though your name is Steve)?

    6. Re:been playing with it by Woldry · · Score: 4, Funny

      "not watching star trek" --- wait, I don't follow.

      --
      How can a post be modded "overrated" or "underrated" when it hasn't been rated yet?
    7. Re:been playing with it by bedahr · · Score: 5, Informative

      This is actually the simon approach does: the magic keyword is "simon". "simon Firefox" for example. -- bedahr

    8. Re:been playing with it by bedahr · · Score: 2, Interesting

      Actually you don't need to get your hands dirty for writing your own grammar. Simon includes a complete grammar module with ways to compile the grammar, edit the sentence structures, import them from written texts (by looking the words up in the dictionary), etc.

      -- bedahr

    9. Re:been playing with it by Anonymous Coward · · Score: 0

      In some of Larry Niven's books (e.g. The Integral Trees), voice recognition was triggered with the word "prikazyvat"; which is Russian for "command".

      Presumably, "prikazyvat" is a word that doesn't come up much in everyday English speech, so it makes a good delimiter.

    10. Re:been playing with it by cleatsupkeep · · Score: 1

      What if we don't know how to pronounce it? :-).

    11. Re:been playing with it by Marcos+Eliziario · · Score: 1

      Provided you mis-pronounce it always the same way, I think it's not much of a problem ;-)

      --
      Your ad could be here!
    12. Re:been playing with it by DMUTPeregrine · · Score: 2, Funny

      "Not watching star trek" N. Watching Babylon 5.

      --
      Not a sentence!
    13. Re:been playing with it by iapetus · · Score: 1

      Although, of course, "prikazyvat" actually means "to order", so wouldn't make much sense. I prefer the HK-47 approach, using the noun form "prikaz", meaning "order".

      Prikaz: Start Firefox.
      Prikaz: Open new tab.
      Zamyechaniye: It would be quicker to use a mouse.

      --
      ++ Say to Elrond "Hello.".
      Elrond says "No.". Elrond gives you some lunch.
    14. Re:been playing with it by owlstead · · Score: 1

      Of course, you should use "illuminate", in which case only one movie will trigger the light switch :)

    15. Re:been playing with it by samkass · · Score: 1

      Carnegie Mellon open-source sphinx years ago: http://cmusphinx.sourceforge.net/html/cmusphinx.php

      It's a speaker-independent, continuous speech recognizer that can be configured to do everything from simple commands to full-text dictation. It's not Dragon's stuff, but it's pretty good.

      They even have a pure Java version of it: http://cmusphinx.sourceforge.net/sphinx4/

      --
      E pluribus unum
    16. Re:been playing with it by cp.tar · · Score: 1

      You might want to do what they do in Star Treck and put a word infront of every command. Something like "Computer: Lights off" will reduce the chance that some random sentences from the TV will trigger the command. Unless you're watching Star Treck ofcourse.

      My Mac does that as well, though the one time I tried it, I did not have much success. Maybe because I had a cold, maybe because I'm not a native speaker, so my fancy Mac hates my Slavic accent.

      But it's quite a nice thing anyway.

      --
      Ignore this signature. By order.
    17. Re:been playing with it by Anonymous Coward · · Score: 0

      So... could you theoretically teach it to run commands upon hearing phrases like, "This is the police!"?

    18. Re:been playing with it by irtza · · Score: 0

      Then what goes in the PIP?

      --
      When all else fails, try.
    19. Re:been playing with it by bedahr · · Score: 1

      Yes that would be the type of thing that can be done already quite reliably...

      -- bedahr

    20. Re:been playing with it by Antique+Geekmeister · · Score: 0, Offtopic

      Don't you mean "not watching Star Trek: Deep Space Nine" and "watching the original material they actually stole it from"? When I look at the pilots for Babylon 5 and the major plot lines the director had already included, and compare them to the later-produced byt better-funded, and thus first to be seen Deep Space Nine, the similarity is so striking that it implies theft by Paramount iv the director's ideas.

      This sort of thing is also unfortunately common for new directors and authors when they pitch their ideas to a major studio. It also happens quite a deal in software, where a market niche is filled by a big company before a small company can get their ideas to market, despite the NDA's signed by both when discussing the concept.

    21. Re:been playing with it by deander2 · · Score: 1

      nitpick: it's "i'm sorry dave, i'm afraid i can't do that"

      yes, i'm a nerd. =P

    22. Re:been playing with it by Vspirit · · Score: 1

      exactly my sentiment.
      I have been exploring the voice computer control,
      and having a name called before a direct command,
      helps the command being directed properly.

      I was using computer at first,
      then next I evolved the process and
      I used the name of the computer,
      say "lisa, open do that"
      combining naming and semantics.

      I started using "dragon naturally speaking"
      back in the nineties, and language control,
      is not a substitute for keyboard and mouse
      input control, but an additionalc control,
      suitable in context.

      with an open project like mikeypedia..
      where people from all over the world
      put in small contributions, for various
      languages we may be able to actually
      build something useful.

      work is being done at Aalborg University,
      in Denmark, to enhance the quality of speech
      recognition, and I hope one day I can say:
      "lisa, tell to ".

      full stop.

    23. Re:been playing with it by Vspirit · · Score: 1

      ups, slashdot parser removed a few tidbits.
      I used greater than and lesser than symbols,
      but they and content in between was removed.

      to demostrate. say is .&gt
      the command desirable would be:
      "lisa, &lt.task1.&gt &lt.device.&gt to &lt.task2.&gt &lt.identity.&gt";

      where
      task = tell,
      device = human||animal||computer,
      task2 = do_something,
      identity = me_registered||somebody_registered||computer_registered

      task management completed.

      Love to see it come true.

      C.

    24. Re:been playing with it by Vspirit · · Score: 1

      just to mention a project that have been
      working on this for a long time and deserves
      recognition:
      http://cmusphinx.sourceforge.net/html/cmusphinx.php

      open source speech recognition project, university grade.

    25. Re:been playing with it by Vspirit · · Score: 1

      this post is simply for the purpose of archiving in context.

      computerized lip reading:
      http://science.slashdot.org/article.pl?sid=08/01/20/0141203

      have a camera add to the technology base,
      improving accurate recognition of speech.

      lips can't lie, unless they tell lies.
      and you get what you give in return.

      C.

    26. Re:been playing with it by Wobble-U · · Score: 1

      I think it's just not a very good recognition engine or it's only made for the American accent. It can't figure out my New Zealand accent either. I keep wanting it to tell me a joke, but I can only sometimes get it to understand "Who's there?".

    27. Re:been playing with it by Anonymous Coward · · Score: 0

      "ups, slashdot parser removed a few tidbits."

      That's what the "Preview" button is for.
      Please use it to preview your posts before submitting them.
      Since you've been posting since at least 2002, you should have known this by now.
      You should also know by now how to use HTML in your posts to produce symbols such as "<".
      You should also know by now that text boxes automatically cause your text to wrap around; there is no need for you to do it manually, and, in fact, it's very annoying to readers when you do so.

      BTW, it's spelled "oops", not "ups", and sentences usually begin with a capital letter.

  2. Are they productive? by bogaboga · · Score: 3, Insightful

    In my experience, I have not found speech recognition engines/software that productive. Too many errors and a slow [and steep] "learning" curve for the engine. I will have to be convinced that this simon thing is any different for me to give it a spin.

    1. Re:Are they productive? by Anonymous Coward · · Score: 2, Insightful

      No doubt you're correct, but it's got to be a boost for anybody who cannot type effectively.

    2. Re:Are they productive? by Anonymous Coward · · Score: 2, Funny

      Hey! Let's leave slashdot out of this.

    3. Re:Are they productive? by Yvanhoe · · Score: 3, Insightful

      I doubt that speech recognition is ready to be used as an alternative to keyboards to type text, but I think it can become, after the keyboard and the mouse, a third input device that would boost the productivity of a computer user.

      --
      The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
    4. Re:Are they productive? by b.emile · · Score: 1

      Exactly. I have played with some speech-to-text apps, and it always seems much slower and less accurate than if I were typing. They will have to be a lot more accurate for them to be useful for everyday use.

      --
      this space intentionally left blank
    5. Re:Are they productive? by deanlandolt · · Score: 1

      Sure, unconstrained writing, for now. But there are countless applications with controlled vocabularies that stand to benefit today.

    6. Re:Are they productive? by Sanat · · Score: 4, Funny

      Dear aunt,let's set so double the killer delete select all

      --
      And in the end, the love you take is equal to the love you make
    7. Re:Are they productive? by Instine · · Score: 5, Insightful

      Nearly five years ago I used to help a guy who had no useful movement in his limbs. He could use a mouth stick to type and control the cursor. However he also used Dragon Dictate. His machine was old 7 years ago, and here's the amazing bit (to me at least) his speech was pretty garbled from his condition. Most humans found it very hard understanding him, yet the dictation software did a pretty good job. He wrote an entire screen play (later comitioned by the BBC) and was a lawyer with his own practice (it may sound like it but I'm not making this up). His success with this tech was probably what got me into assitive tech (now my job).

      So depends who you are on how much it improves you productivity.

      --
      Because you can - or because you should?
    8. Re:Are they productive? by Instine · · Score: 1

      that should read nearly 7 years ago. And no I'm not using speech rec. Just typing quickly and badly ;P

      --
      Because you can - or because you should?
    9. Re:Are they productive? by blahplusplus · · Score: 1

      "So depends who you are on how much it improves you productivity."

      The biggest problem with text to speech is simply having to train the engine, I found Dragon Naturally speaking 9 not too bad, it's training it to recognize your own unique vocalizations that is the problem. I think text-to-speech and voice recognition is a project that demands wiki-pedia like sourcing of voices in different noisy environments nad using millions of samples of peoples voices to improve the alogorithms, I'm surprised no one at google has thought of this yet.

    10. Re:Are they productive? by bedahr · · Score: 1

      This is exactly what we are going for.

      Our training persons have spastic disabilities.

      -- bedahr

    11. Re:Are they productive? by bedahr · · Score: 3, Informative

      You might want to have a look at the voxforge project

      And this doesn't require changes in the algorithm - just in the model.

      -- bedahr

    12. Re:Are they productive? by MacarooMac · · Score: 1

      According to the Julius blurb on the acoustic models used, there are currently just two languages available: Japanese and English.

      "Since Julius itself is a language-independent decoding program, you can make a recognizer of a language if given an appropriate language model and acoustic model for the target language. The recognition accuracy largely depends on the models. "

      "We currently have a sample English acoustic model trained from the WSJ database. According to the license of the database, this model *cannot* be used to develop or test products for commercialization, nor can they use it in any commercial product or for any commercial purpose. Also, the performance is not so good"

      --Sounds like you'll be better off using the Japanese acoustic model for now ;)

      --
      "He Who Dares Wins" ...or gets twenty-to-life for totaling their Bimmer on a poodle parade
    13. Re:Are they productive? by Anonymous Coward · · Score: 1, Insightful

      In my experience, it mostly comes down to the quality of the microphone /sound input these days, with modern speech recognition software. Computer mics are atrocious. I got a high-end sound card (just an old but good emu10k1-based SB), a decent studio mike (Studio Projects) and a mic pre-amp, used the line-in input on the card, and get very, very good results. Of course, I paid more for my audio setup alone than many people pay for their PCs these days, especially after the noise-reduced fans, psu and case for the PC itself. I think it was worth it though. (Posting anonymously because I'm not trying to boast or something; just saying if you want decent computer audio, you still have to pay a bit more (in my case, a bit under EUR1K on the audio side)

    14. Re:Are they productive? by morgus+morphus · · Score: 1

      Google are working on this: http://blogoscoped.com/archive/2007-12-17-n30.html
      They're using Goog-411 to get voice samples to train their speech recognition engine.

    15. Re:Are they productive? by blahplusplus · · Score: 1

      That's not what I'm talking about, they're going about collecting voices the entirely wrong way. The setup is horrible and most importantly they are not setup for user feedback and ratings. In realtime, they need something like a "hot or not" for people to input their own voices and recite words, etc, and then have a rating system that plays it back for other users (i.e. play a sample of text or a sentence, while converting it to text, and vice versa) then have a way for people to rate it, infinitely faster.

    16. Re:Are they productive? by rohan972 · · Score: 1

      From the project website: "The project provides a ready-to-use interface for the julius CSR engine for a handicapped child which is not able to use the keyboard well. It integrates into X11 and Windows." (emphasis mine)

      I'm not that interested in speech-to-text either, but if you can't use a keyboard I imagine it would be a huge benefit.

    17. Re:Are they productive? by leenks · · Score: 1

      And people would do this because ... ? Their current scheme is good because people want to use that kind of service, and ultimately pay for it (which I guess allows them to pay someone to transcribe the text). This is much more valuable because the recordings will contain proper, off-the-cuff conversational speech (if a little contrived because of the circumstance). Models trained on call-home data etc invariably fail when given real world tasks, so hopefully this would work well.

    18. Re:Are they productive? by blahplusplus · · Score: 1

      Have you used Dragon naturally speaking 9? I mean something like that, except who's interface is opened up to the public... i.e. a dictation / word application, etc, you can 'get' conversation from dictation, when I'm speaking into DS9 I'm speaking into it like I'm conversing with someone. I imagine you could get 90% of what you need out of something like that.

      Also if they needed real conversation I'm certain they could do a lot better then what they're doing (i.e. partner with possibly other call centers, etc)

      The real problem is with inhousing it, is that there is no feedback of where you're going wrong from the users perspective which is critically important IMHO. Ultimately the real test is when your speech recognition engine can recognize slurs of speech and also be able to recognize individual differences and subtleties in voices that throw off recognition and for that you need a decent sample base as well in many varying environments.

      I'd like to see speech recognition get to the point where it can pick out and decode the correct voice in a room of conversation.

  3. sync? by Anonymous Coward · · Score: 0

    So, when can I use it in my Ford Focus instead of Sync? :-)

    1. Re:sync? by ben(zen) · · Score: 1

      Probably once it's in beta. Seriously, I'm not sure if I want Microsoft running any part of my car.

  4. Double the Killer by rxmd · · Score: 1

    Cue the obligatory lets set so double the killer delete select all. :)

    --
    As a state gets corrupt, its laws multiply; the most corrupt states have the most numerous laws. (Tacitus, Annales 3:27)
    1. Re:Double the Killer by $RANDOMLUSER · · Score: 1

      Cue the obligatory lets set so double the killer delete select all.
      Hey! It's hard to wreck a nice beach!
      --
      No folly is more costly than the folly of intolerant idealism. - Winston Churchill
  5. Which languages are supported? by r_jensen11 · · Score: 3, Insightful

    That's great and all, but which languages are supported? I hope it's more than just English

    1. Re:Which languages are supported? by bedahr · · Score: 1

      The language has nothing to do with the software.

      But it has everything to do with the model. You'd just need for exmpaple an Italian language model. (Sure the ui /should/ probably translated as well but that has nothing to do with the recognition).

      Simon doesn't even include a language model - it does, however include the means to create one.

      -- Peter

    2. Re:Which languages are supported? by dotancohen · · Score: 1

      I hope it's more than just Lojban grammar with English words.

      --
      It is dangerous to be right when the government is wrong.
    3. Re:Which languages are supported? by R.Mo_Robert · · Score: 4, Informative

      If you follow the link to the Sourceforge project and look at any of the screenshots (including the one on the front page--at the time when I visited it, anyway), you'll see that they're actually training the software with German. So, it looks like the answer to your question is, yes, it supports more than English.

      --
      R.Mo
  6. Open Source? by kylegordon · · Score: 1, Insightful

    If this is the first, what was Sphinx then?

    1. Re:Open Source? by kylegordon · · Score: 1

      Ooops, I'll learn to read stories properly one day :-)

    2. Re:Open Source? by tg2k · · Score: 1

      Maybe we need to worry more about text-to-brain than speech-to-text...all it says is that it's the first version of simon. It doesn't say it's the first open source speech-to-text project.

    3. Re:Open Source? by bedahr · · Score: 1

      Sphinx is just an engine - isn't it?

      Simon takes the julius engine and uses the recognition results to do something useful.

      Please take a look at the screenshots at the sourceforge page (mentioned in the article).

      -- bedahr

    4. Re:Open Source? by kylegordon · · Score: 1

      Yep, see the above comment that I made just minutes after the previous one :-)

    5. Re:Open Source? by tg2k · · Score: 1

      I did...as soon as I posted my own. Yours was posted in the couple minutes it took to post my own (I think the time on mine was two minutes after your response to yourself).

  7. Aisle of it by ZeroFactorial · · Score: 5, Funny

    Eye musing i trite now two poster slashed hot. It saw grate pro gram!

    1. Re:Aisle of it by dotancohen · · Score: 2, Funny

      You know, I actually read that in Festival's voice!

      --
      It is dangerous to be right when the government is wrong.
    2. Re:Aisle of it by History's+Coming+To · · Score: 1

      I live in the flat above the office where Festival was developed.

      I have no other relevant information....t's just that that is the best bit of geek-name-dropping I have...

      --
      Please consider this account deleted, I just can't be bothered with the spam anymore.
    3. Re:Aisle of it by dotancohen · · Score: 1

      Unfortunately, I live in a flat above an everlasting festival of university students. I've learned to hate types of music that I had never before imagined exist.

      --
      It is dangerous to be right when the government is wrong.
  8. Wiktionary != Wikipedia by Anonymous Coward · · Score: 4, Interesting

    Contrary to what the summary claims, Wiktionary is NOT a sub-project of Wikipedia; rather, both Wiktionary and Wikipedia are projects of the Wikimedia Foundation. They're not only distinct but also - as far as their status within the foundation's hierarchy of projects is concerned - totally equal, with none being a sub-project of or more important than the other.

    I would've expected that kind of sloppiness on the Register, but not on Slashdot (yeah, I know, I must be new here...)

    1. Re:Wiktionary != Wikipedia by Anonymous Coward · · Score: 1, Informative

      Although you are correct with Wikipedia and Wiktionary being equal in importantance to Wikimedia. You must acknowledge that Wikipedia is the most well-known and talked about project. Therefore have a little grace with people who accidentally think or say that Wikipedia is the mother organization rather than Wikimedia. No need to be overly pedantic.

    2. Re:Wiktionary != Wikipedia by xtracto · · Score: 1

      No need to be overly pedantic.

      Hey this is slashdot... pedantry is the base of most of the discussions here...

      you must be new here uh?

      --
      Ubuntu is an African word meaning 'I can't configure Debian'
    3. Re:Wiktionary != Wikipedia by kryten_nl · · Score: 4, Funny

      Contrary to what the summary claims, Wiktionary is NOT a sub-project of Wikipedia; rather, both Wiktionary and Wikipedia are projects of the Wikimedia Foundation. They're not only distinct but also - as far as their status within the foundation's hierarchy of projects is concerned - totally equal, with none being a sub-project of or more important than the other.[Citation needed]
      --
      For the perfect anti-Unix, write an OS that thinks it knows what you're doing better than you do and let it be wrong.
    4. Re:Wiktionary != Wikipedia by Anonymous Coward · · Score: 0

      We need a +1 pedantic option.

    5. Re:Wiktionary != Wikipedia by Anonymous Coward · · Score: 0

      Anon is forever.

    6. Re:Wiktionary != Wikipedia by Blakey+Rat · · Score: 1

      On behalf on virtually everyone, I'd just like to say: Who the hell gives a crap?

      Geez, if you can't cope with the fact that somebody *slightly* and *accidentally* misrepresented which "project" wiktionary is, you really need to step back and examine your life. Get over yourself.

  9. Pedant's Revolt by jrothwell97 · · Score: 4, Informative

    Simon can import dictionaries directly from wiktionary (a subproject of wikipedia)

    No it's not - Wiktionary is a sister project of Wikipedia. Not a subproject.

    However, I must concur that in my experience speech recognition has been extremely patchy. While using it to issue voice commands is OK (and can be a real time-saver as it avoids going into Start, /Applications, Programs menu etc), dictation tends to be pretty rubbish. Especially when you're demonstrating the new speech recognition abilities in Windows Vista and just happen to work for Microsoft. And be in a loud, echoey expo hall. And using a dodgy mike.

    --
    Those using pirated Tinysoft signatures(TM) are a real threat to society and should all be thrown in jail.
    1. Re:Pedant's Revolt by RAMMS+EIN · · Score: 1

      ``While using it to issue voice commands is OK (and can be a real time-saver as it avoids going into Start, /Applications, Programs menu etc)''

      What?

      Oh, ah, yeah. Sorry, I've been away from Windows for a _long_ time. This is yet another one of those things that work great when you only have a few items, but really not that well anymore when the lists get longer. Like the task bar...by the time you have more than a few windows open, there isn't space anymore for the text.

      Really, there are better ways. Speech commands are one, which solves the problem is basically the same way as the command line, which is what I use. It just puts everything into a single list. No hierarchy to navigate. Actually, nothing to navigate at all; you aren't selecting an item from a list you are offered, you are just telling the computer what you want. The downside of this is, of course, lack of discoverability. How do you find out there is a program called Ekiga in the first place? (And, when using speech recognition, how should you pronounce it?)

      Apple's (NEXTSTEP's, really) dock is another solution. Single, flat list, everything immediately accessible. This system will eventually run into problems when the list gets too large, but the fact that items are distinguished by their (large enough to actually be distinguishable) icons makes it a bit better than a system based on text (which is long in one dimension). Compared to commands, it has the advantage that it's discoverable, and the disadvantage that it doesn't scale. I can fit about 40 icons along the bottom of my screen at usable size, whereas the shell lets me easily choose from over 2400 commands. But then again, most of the shell commands can't really sensibly be put in icons, and of those that can, I wouldn't be surprised if the number I used regularly were less than 40.

      --
      Please correct me if I got my facts wrong.
    2. Re:Pedant's Revolt by jrothwell97 · · Score: 1

      Apple's (NEXTSTEP's, really) dock is another solution. Single, flat list, everything immediately accessible. This system will eventually run into problems when the list gets too large, but the fact that items are distinguished by their (large enough to actually be distinguishable) icons makes it a bit better than a system based on text (which is long in one dimension). Compared to commands, it has the advantage that it's discoverable, and the disadvantage that it doesn't scale. I can fit about 40 icons along the bottom of my screen at usable size, whereas the shell lets me easily choose from over 2400 commands. But then again, most of the shell commands can't really sensibly be put in icons, and of those that can, I wouldn't be surprised if the number I used regularly were less than 40.

      Another disadvantage of using a Dock is that you have to choose what to put on there. For example, I have 15 application icons for the Finder, Safari, Mail, iChat, Skype, Address Book, iCal, iTunes, iPhoto, iMovie, TextEdit, Keynote, Pages, Numbers and System Preferences, along with icons for my external disk (the 'ANNEXE'), Documents, Pictures, Movies and the Trashcan. Those are a lot of links, but even now I find them insufficient when I need to run a whois: while it's not used frequently enough to warrant being Docked (I generally have a lot of minimised windows so space is at a premium on my crummy 1024*768 screen), I still find it irritating when I have to switch to the Finder, and then double-click three icons (Welchman HD>Applications>Utilities) to get to what I need.

      Again, this is all due to the problem of scalability being difficult. I really don't mind voice or shell commmands as I know my way around both UNIX and Windows, but when designing with the novice in mind, it is difficult to strike a balance between allowing the user to have good usability, while at the same time giving him a fully-featured system and the ability to learn the system and discover new programs.

      What would be interesting is if the computer could translate normal conversation into machine-friendly commands. For example, if I told my computer to let me write a letter to my friend John Smith, it would pick out the info from the address book, open a fresh Word/OOo Writer/Pages/whatever document, paste the information in, and tell you (possibly through TTS) that it's ready for you to start typing.

      Combined with some simple AI, voice commanding could become a very powerful tool indeed. While I shan't imagine I'll be kissing goodbye to my keyboard any time soon (I would still find dictation annoying, even if it was perfect, due to the time it takes to correct mistakes) I'd like to stow my mouse away in a cupboard somewhere. Saying 'File/Save/Exit Microsoft Word' is far more efficient, in my opinion, than using a mouse and keyboard to do the same thing.

      --
      Those using pirated Tinysoft signatures(TM) are a real threat to society and should all be thrown in jail.
  10. Uses in Telephony by Anonymous Coward · · Score: 2, Interesting

    This could be very useful in projects like FreeSWITCH which is an Open Source project for building telephony applications. More info at http://www.freeswitch.org/

  11. I could see how it could be useful in some apps by backslashdot · · Score: 1

    Say for remotely controlling say the TV or something .. instead of having to remember the channel number you could just say "TV (or other trigger word), Discovery channel". I guess combined with an LCD/OLED button remote it could be used. Also, on a phone ... it should be possible to use speech to text for certain stuff like adding items to a shopping list.

    The software has to be intelligent to know what to do when you press a button and say "shopping list, plums" etc.

    I dont think speech recognition is good enough yet where it's can take dictation unless the dictation is in a very specific form.

    1. Re:I could see how it could be useful in some apps by estevon07 · · Score: 1

      It's funny how many Slashdot readers immediately think of science fair type applications when something interesting like an open source voice reco app is released. Not that such applications don't have merit, but the when compared with high dollar speech application companies like Nuance (http://www.nuance.com/) and Holly (http://www.holly-connects.com/), the real value of open source speech recognition becomes apparent - it's another critical piece of software needed to create a truly open source voice platform. Asterisk is great, but isn't a complete solution for vxml apps.

      Companies like Nuance know the open source alternatives to their own products are not yet ready for prime so they charge a huge premium for licenses. When open source can provide a complete and robust voice platform solution, these companies will know then end is near.

  12. work together? by Jesus_Corpse · · Score: 1

    Wouldn't it be a good idea to work with the (open source) speech recognition of IBM?

    http://news.zdnet.com/2100-9593_22-5383536.html
    or
    http://developers.slashdot.org/article.pl?sid=04/09/13/1058241

    1. Re:work together? by debatem1 · · Score: 1

      Unless something's changed pretty drastically, the IBM voice projects were dead a couple of years ago. I went up to the office where ViaVoice was handled and they wouldn't even let me buy a copy of it for linux.

  13. Mask out known audio? by crow · · Score: 1

    Well, if your TV is being controlled by the same computer (e.g., MythTV), then shouldn't the voice command be able to mask out anything the microphone picks up that matches the output sound? If there software to filter audio input to filter out what is currently being played? I'm sure it's a bit tricky to get right, but it would be very useful for a range of applications including this topic and speakerphones.

    1. Re:Mask out known audio? by Anonymous Coward · · Score: 0

      Well, if your TV is being controlled by the same computer (e.g., MythTV), then shouldn't the voice command be able to mask out anything the microphone picks up that matches the output sound? If there software to filter audio input to filter out what is currently being played? I'm sure it's a bit tricky to get right, but it would be very useful for a range of applications including this topic and speakerphones.

      It shouldn't be too hard, because it's basically echo cancellation, which has been well studied and is often done digitally these days.

  14. Project's webpage in English? by Lord+Satri · · Score: 1

    Trying to learn more about it, I followed the project's website link on the sourceforge page to simon-listens.org, but it's german only, found no english (or other language) info. Anyone has an advice?

    1. Re:Project's webpage in English? by bedahr · · Score: 5, Informative

      We are sorry that there is no international homepage for this yet.

      BUT: you are strongly encouraged to contact me with any questions: grasch < at > simon-listens.org

      -- Peter

    2. Re:Project's webpage in English? by $RANDOMLUSER · · Score: 0, Redundant

      MODS: parent needs a +5 informative.

      --
      No folly is more costly than the folly of intolerant idealism. - Winston Churchill
    3. Re:Project's webpage in English? by battles · · Score: 1

      Is there a Windows binary of simon somewhere? My C++ compiler is too old to compile it.

    4. Re:Project's webpage in English? by bedahr · · Score: 1

      I don't have a windows machine here to play with...

      So sadly there are no official windows binary (no binaries at all for that matter) until now.

      (The alpha version is not really targeted for the end-user)

      -- bedahr

    5. Re:Project's webpage in English? by mlc · · Score: 1

      So here's a question: I don't speak (much) German. How do I make this software do something?

    6. Re:Project's webpage in English? by bedahr · · Score: 1

      Have you downloaded the software?

      There is a work-in-progress article about setting up simon on our project wiki: http://simon-listens.org/wiki

      As it is still far from complete here the key points:

      * Download and compile simon (this can be tricky on windows).

      * Simon will present you with a setup-type wizard when you first start it (you have to compile it first).

      * The wizard will prompt you for paths and settings.

      * You can then create a new language model / open an existing one (if you speak english you could try out the voxforge model) and specify commands.

      Please try it out and contact me if you have any questions.

  15. Whither Microsoft? by IGnatius+T+Foobar · · Score: 2, Insightful

    Offices full of people talking to their computers has been Bill Gates' wet dream for decades now. What will happen if open source gets there first?

    Actually, the reason we're not there yet is because most people don't want it. Keyboards and mice are simply a better way to give instructions to your computer than speech recognition is. Could you imagine the clatter of a dozen or more people in close proximity chattering to their computers?

    --
    Tired of FB/Google censorship? Visit UNCENSORED!
    1. Re:Whither Microsoft? by 3seas · · Score: 1

      Hmmm, from the http://htk.eng.cam.ac.uk/ site

      "HTK was originally developed at the Machine Intelligence Laboratory (formerly known as the Speech Vision and Robotics Group) of the Cambridge University Engineering Department (CUED) where it has been used to build CUED's large vocabulary speech recognition systems (see CUED HTK LVR). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research Laboratory Ltd was established. HTK was sold by Entropic until 1999 when Microsoft bought Entropic. Microsoft has now licensed HTK back to CUED and is providing support so that CUED can redistribute HTK and provide development support via the HTK3 web site. See History of HTK for more details."

    2. Re:Whither Microsoft? by MacarooMac · · Score: 1

      Agreed. Speech Recognition will realistically only become a viable substitute to the traditional keyboard/mouse interface - for the user majority, at least - when the AI (and processing power required to run) it has advanced to the point where you and your box can actually hold a 'semi intelligent' interaction/conversation.

      Now, considering that much of said user majority out there is barely able to reach this iq threshold simply interacting amongst themselves, we should shortly be seeing this kind of product being advertised as

      '...productivity levels achievable will be limited to the chip spec installed in your user sub-group'

      --
      "He Who Dares Wins" ...or gets twenty-to-life for totaling their Bimmer on a poodle parade
    3. Re:Whither Microsoft? by hyades1 · · Score: 1

      Wither, Microsoft.

      --
      I've calculated my velocity with such exquisite precision that I have no idea where I am.
    4. Re:Whither Microsoft? by mustafap · · Score: 1

      That's really true for the office environment, but I would love voice recognition at home - while in the kitchen cooking, in the bath ( no - don't comment on that one), crashed on the couch. Or even in the car.

      --
      Open Source Drum Kit, LPLC deve board - mjhdesigns.com
    5. Re:Whither Microsoft? by Chapter80 · · Score: 1

      Actually, the reason we're not there yet is because most people don't want it.
      I think you're wrong in this point. The reason we are not there yet is not because of demand. It's because the technology isn't quite good enough yet. It's getting very close, relative to five years ago.

      Your argument is comparable to someone in the early 80's saying "The reason computers don't come with mice is because most people don't want it." While it's true that most people didn't want mice for DOS machines, the reason computers didn't come with mice was more a function of the software and the power of the machine. Computers weren't quite powerful enough to handle a GUI, and so powerful software to improve productivity didn't exist, so of course people didn't want it.

      Think of how many times you call a company these days, and your call is routed (or possibly handled entirely) by voice recognition software. It is DEFINITELY in demand. People want it. It's just not down to the PC yet.

      I have worked in offices among cubes of people who are all on the phone. I imagine it will be a lot like that - productivity will be SO much greater when conversing with your computer, that only the dinosaurs will stick with only mice and keyboards. Same group that is still using no mouse, I suppose.

      When PCs and voice recognition provide a truly powerful tool, that's when people will "want it". But the problem is the hardware and software capability right now. Not demand.

    6. Re:Whither Microsoft? by IGnatius+T+Foobar · · Score: 1

      Think of how many times you call a company these days, and your call is routed (or possibly handled entirely) by voice recognition software. It is DEFINITELY in demand. People want it. It's just not down to the PC yet.
      Actually, that's a perfect example. I hate those systems. I'd rather just press a key on the phone.
      --
      Tired of FB/Google censorship? Visit UNCENSORED!
    7. Re:Whither Microsoft? by orin · · Score: 1

      How can Open Source get their first? Speech recognition is built into Vista! The "Speech Recognition" Icon in the Control Panel kinda gives it away.

    8. Re:Whither Microsoft? by Chapter80 · · Score: 1
      You hate those systems, yet they are still in place, confirming my point.

      You were not the buyer of the system. The demand is there (by the buyer). Unfortunately, the technology isn't there for that to be a very good user experience YET. So the USERs don't *all* like them.

      (I happen to like these voice recognition systems, and nearly always use voice, as I find working through a button-driven menu very cumbersome on a cell phone. I'm sure neither of us is alone in our preference.)

  16. Open Source, or Microsoft-Owned? by kripkenstein · · Score: 4, Interesting

    Cue the obligatory lets set so double the killer delete select all. :) Speaking of Microsoft, according to HTK's FAQ:

    HTK was originally developed at the Cambridge University Engineering Department (CUED). In 1993 Entropic Research Laboratory Inc. acquired the rights to sell HTK and the development of HTK was fully transferred to Entropic in 1995 when the Entropic Cambridge Research Laboratory Ltd was established. HTK was sold by Entropic until 1999 when Microsoft bought Entropic. Microsoft has now licensed HTK back to CUED and is providing support so that CUED can redistribute HTK and provide development support via the HTK3 web site. [...] Microsoft retains the copyright to the existing HTK code
    [...]
    you are not allowed to redistribute (parts of) HTK3 In other words, HTK - a critical part of the 'Simon' project - is owned by Microsoft. It is also not under a FOSS license: you can look at the code and use it for your own purposes, but you can't redistribute it. In fact, reading this, I wonder if Simon is not in violation of the license.
    1. Re:Open Source, or Microsoft-Owned? by bedahr · · Score: 5, Informative

      Simon is in no way connected to Microsoft.

      Simon does NOT contain the HTK toolkit - it meerly executes commands.

      HTK is free of charge and open source (in the strict sense of you-can-look-at-the-code). It is, however, not "free".

      We are aware of that and have not packaged any parts of HTK for the release - you have to download it yourself if you want to modify the model from within simon.

      It is not optimal, but we don't have the knowledge and / or manpower to code up something similar in a reasonable timeframe. And after all, it isn't that big of a deal, is it?

      -- bedahr

    2. Re:Open Source, or Microsoft-Owned? by Anonymous Coward · · Score: 1, Insightful

      And after all, it isn't that big of a deal, is it?
      That's a dangerous road for a free software project to take. Think of bitkeeper.
    3. Re:Open Source, or Microsoft-Owned? by Antique+Geekmeister · · Score: 1

      Most of us will appreciate the careful handling by people by you, and the requirements to use the available tools. But yes, it is a big deal: Microsoft has a very bad history of "embracing and extending" software, and clearly breaking inter-operability in the process. Take a look at what they did to Kerberos when they incorporated it into Active Directory, and how it broke compatibility, and the resulting lawsuits and required patches by MIT to address the problems created by Microsoft and interacting with *anyone* else's release of Kerberos.

      Microsoft also has a habit of abusing its patent portfolios to threaten open source projects, refusing to identify the patents they believe involved. The end user license agreement with a tool like HTK can cripple development with it, and take the ability to upgrade or refuse such crippling changes in their own tools based on HTK.

      Like discovering that your McDonald's french fries are cooked with lard and thus not vegetarian, it's a big deal. (They called it "beef tallow" and hid it for years under the "natural flavors" part of the label. This led to a lot of screaming by misled Hindus and by misled vegetarians who'd been eating it.)

    4. Re:Open Source, or Microsoft-Owned? by techno-vampire · · Score: 1
      Like discovering that your McDonald's french fries are cooked with lard and thus not vegetarian, it's a big deal. (They called it "beef tallow..."


      Considering it's McBarfles we're talking about here, I'm not surprised. The important point, however, is that it's even more deceptive than you think. If they were, in fact, using lard, calling it "beef tallow" would be false advertising because lard comes from pigs, not cattle.

      --
      Good, inexpensive web hosting
    5. Re:Open Source, or Microsoft-Owned? by Anonymous Coward · · Score: 0

      And after all, it isn't that big of a deal, is it?

      Yes,it is.

      And you ask that here, of all places?

    6. Re:Open Source, or Microsoft-Owned? by JustinRLynn · · Score: 1

      I'm sorry but I believe that the software world the term "Open Source" is a trademarked term that which when referring to a license or software work requires that the license or code adheres to the following definition:

      Introduction
      Open source doesn't just mean access to the source code. The distribution terms of open-source software must comply with the following criteria:

      1. Free Redistribution
        The license shall not restrict any party from selling or giving away the software as a component of an aggregate software distribution containing programs from several different sources. The license shall not require a royalty or other fee for such sale.

      2. Source Code
        The program must include source code, and must allow distribution in source code as well as compiled form. Where some form of a product is not distributed with source code, there must be a well-publicized means of obtaining the source code for no more than a reasonable reproduction cost preferably, downloading via the Internet without charge. The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed.

      3. Derived Works
        The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software. ... {requirements 4-10 snipped}

      -- The Open Source Definition, The Open Source Initiative (http://www.opensource.org/docs/osd)

      This software's license most obviously violates requirements 1, 2, and 3. These are perhaps the most important provisions of the definition and form the basis for the power of calling a license an open source license. By not adhering to this definition when calling licenses and software "Open Source" you dilute the power the terms carries. Simply calling something 'open source' because they allow you to look at the source code is something we should avoid because 'Open Source' requires freedom not just source.

    7. Re:Open Source, or Microsoft-Owned? by JustinRLynn · · Score: 1

      Well, I have to correct myself on the trademark status of 'open source'. It seems that since it's 'descriptive' it can be abused, but there is still a compelling case for those in the know to avoid using it in this way and do all that we can to prevent it from becoming simply descriptive. 'Open Source' should be synonymous with freedom.

    8. Re:Open Source, or Microsoft-Owned? by jonaskoelker · · Score: 1

      Like discovering that your french fries are cooked with lard and thus not vegan


      Fixed. Compare http://en.wikipedia.org/wiki/Vegan and http://en.wikipedia.org/wiki/Vegetarianism.
    9. Re:Open Source, or Microsoft-Owned? by Antique+Geekmeister · · Score: 1

      Reading the definitions you cite, I stand by "vegetarian". I'm not discussing an animal product that is harvested from animals that continue to live, such as eggs, or milk, I'm mentioning the use of beef fat rendered from slaughtered cows.

      Mind you, a lot of vegetarians will tolerate some of the nastier animal products put in cheese, so there are limits to what many people worry about. But it was a bit of a surprise to find out they did this.

      I do see the point of the person who pointed out that "lard" is pork, "beef tallow" is the same thing made from beef. I misunderstood that lard referred only to pork fat.

    10. Re:Open Source, or Microsoft-Owned? by bedahr · · Score: 2, Informative

      This software's license most obviously violates requirements 1, 2, and 3. These are perhaps the most important provisions of the definition and form the basis for the power of calling a license an open source license. By not adhering to this definition when calling licenses and software "Open Source" you dilute the power the terms carries. Simply calling something 'open source' because they allow you to look at the source code is something we should avoid because 'Open Source' requires freedom not just source.

      Simon does not violate this description in ANY way.

      HTK is not redistributed with simon so simon itself complies exactly with what you are writing.

      Simon does not depend on the HTK toolkit. It simply uses it to compile / maintain the model. If you have compiled the model already (simon explicitly asks if you have done that already when starting the first time) you can specify the path to it.

      Simon will then just use the model and can still start programs, type text, etc.

      There is absolutely no need for the HTK toolkit. Simon is also useful without it.

      Is e.g. X.org not open source because it has the means to put non-free software to use to make it even more powerful? (e.g. the nvidia driver)

      Simon itself stands under the GPLv3.

      -- bedahr
    11. Re:Open Source, or Microsoft-Owned? by JustinRLynn · · Score: 1

      You are correct Simon in that itself is open source software. I was speaking only about parent post use of the term open source to refer to the HTK toolkit, when its license does not meet that definition.

    12. Re:Open Source, or Microsoft-Owned? by Anonymous Coward · · Score: 0

      And after all, it isn't that big of a deal, is it? You must be new here.
    13. Re:Open Source, or Microsoft-Owned? by Degrees · · Score: 1
      Not knowing much at all about this whole field of software, I'm going to ask: what other software can build the model simon can use? I think that would be pretty cool, if I could use simon with something that is released under the GPL or BSD style licenses. Thanks!

      --
      "The most sensible request of government we make is not, "Do something!" But "Quit it!"
    14. Re:Open Source, or Microsoft-Owned? by bedahr · · Score: 1

      You would only need to change the modelmanager (modelmanager.cpp) to use different tools.

      It wouldn't be that hard I guess....

      -- bedahr

  17. Citation Needed by Anonymous Coward · · Score: 0

    Citation needed.

    Furthermore, is this really of significant importance to be included in the Slashdot comment archive?

    You seem rather noobish for a Wiki* pedant.

  18. For those not familiar with this meme by CaptainPinko · · Score: 3, Informative

    Basically it comes from a live voice recognition demo from Microsoft for their feature in Vista. Yes, I had to look this up myself.

    --
    Your CPU is not doing anything else, at least do something.
  19. shipping forecast by miruku · · Score: 1

    I was thinking last night about what could be used to auto translate the Met/BBC Shipping Forcast into lay speak (just cause). This project sounds promising.

    --
    MilkMiruku
  20. filthy open-source by jumbolo · · Score: 4, Informative

    simon is open source.
    julius is open source.
    htk is *NOT* open source.

    The latter is a micro$oft by-product, as clearly shown by the license that you have to first agree with and then send your email to them in order to download the tarballs...

    myself never done this since 1995.

  21. OT: site design by Anonymous Coward · · Score: 0

    webmasters: personalized bookmarking [primadd.net] scripts for your site
    wp and phpbb plugin available Your comment was good, but I've got to say that your web site appearance (and low-contrast color scheme) leaves a lot to be desired. Might want to take a closer look at that, if using it to advertise a product.
  22. Only one problem by ThatsNotPudding · · Score: 1

    You have to speak in the voice of Comic Book Guy.

  23. No, but I hope that they are running by Anonymous Coward · · Score: 0

    in the chinese cars. Sadly, they are brighter than we are. They have been stealing everything under the sun from American, but we can not seem to get them to steal the source code that controls the reagan. I am guessing that they are afraid that they would lose track of things as well.

  24. This is not about dictation software by idji · · Score: 5, Interesting

    Many people think that "Speech recognition software" = "dictation software" - as is clear from many comments here. That is not simply the case. Dictation is just one application of speech recognition - and a personal application at that - which is the only thing most people come across. Other applications are media transcription (closed captioning), media mining "What did Obama say about the prime mortgage market this week?", telephone call center controlling (Are our staff using naughty words? Is the customer using aggressive language?), telephone call mining ("bomb", "anthrax", ...), indexing vast audio archives of news broadcasts (keyword/topic tagging), aligning audio to human transcription (documentaries, DVD subtitles, witness testimonies, court or parliament proceedings - think of any event that is transcribed like UN conferences), etc. Don't you think CNN, BBC or any national film archive would be interested in searching through there millions of hours of recorded footage? Now you tell me - do you think that the holy grail of speech recognition is "HAL - please close the hatch", "Dear Mom, we are having a lovely time here..." or hearing any TV show in any language you want, or calling anyone in the world and being able to talk to them in your own language? Dictation Software is about the only speech-reco application that can be sold to the masses - all the rest is still fairly much below the horizon...

    1. Re:This is not about dictation software by houstonbofh · · Score: 1

      And no one ever thinks about the most common use, command and control. Yet every phone maze has it now, and voice dial is on almost every cell phone.

    2. Re:This is not about dictation software by Anonymous Coward · · Score: 0

      searching through there millions of hours

      "their".

  25. I18n? by Magitek0777 · · Score: 1

    Are these speech recognition engines designed for use with English in mind? What is the status of the technology for other languages? It seems that other languages with far less sounds like Japanese would change the problem substantially and make it easier to have a quality product. Does the speech recognition problem difficulty change with the language?

    1. Re:I18n? by Anonymous Coward · · Score: 0

      Dutch, Russian, German, Italian, Hebrew models are on their way. Visit voxforge.org and help. Submit your speech and speech of your friends. Everything depends on your contribution.

  26. Reason we're not there yet... by gr8_phk · · Score: 1

    The reason we're not there yet is that standalone speech recognition software is stupid. We need KDE and gnome to have built-in speech recognition with a simple API so any application can just monitor the speech input. It should not come in as keystrokes though - must be separate. The speech engine should be a component so different ones can be used of course. If it was there, any app could use it easy enough.

    1. Re:Reason we're not there yet... by bedahr · · Score: 1

      Well simon is not that far off of that idea.

      The real recognition is "outsourced" to juliusd which communicates with simon over tcp/ip.

      The main simon program is just to maintain the language model / train it / etc.

      In my opinion exposing the recognition results over e.g. dbus would be a better way than to quadruple the efforts by splitting this (HUGE) task to gnome, kde, xfce, window, etc.

      -- bedahr

    2. Re:Reason we're not there yet... by gr8_phk · · Score: 1

      In my opinion exposing the recognition results over e.g. dbus would be a better way than to quadruple the efforts by splitting this (HUGE) task to gnome, kde, xfce, window, etc. Maybe. IMHO the UI needs to be involved though to make sure the speech goes to the right place. Some good forethought could create a really cool environment. Could the WM or something pick it up off dbus and route it to the appropriate apps? Remember, we want "the computer" to respond to voice, not a particular app. "the computer" would determine context and send the voice input to the appropriate app - the one with focus initially until a smarter router is devised. You need the mediator, so not every app that is voice enabled is picking up the same speech. The mediator could pick it up from dbus, and applications could too I suppose, but that would bypass "the computer" and chaos would follow with lots of windows open.
  27. Re:This is not about dictation MOD PARENT UP by Anonymous Coward · · Score: 0

    I'm all out of mod points, but this an important point to make about speech recognition.

  28. My own personal acid test by PingXao · · Score: 1

    Write 'rite' right.

    Possibly incorrect grammatically, but it's the only obvious way to combine 3 homonyms into what passes for a sentence. Of course, someone saying that might be vehemently agreeing with you as well, "Right! Right! Right!". Sorting that out could be a mess. I've criticised the lack of progress on the speech recognition front for a decade. It's amazing how bad most speech recognition software is.

    Here's a better test... Take a standard page of text (about 200 words). Scan it and run it through an OCR program. Then randomly grab people off the street and have them read the text out loud into a microphone. If the speech recognition outperforms the OCR'ed result then it's a success.

    This is good news. I hope OSS speech recognition spurs some serious innovation. The field is still wide open for quality algorithms and software IMO.

  29. Really cool to see OSS speech rec come back by seanthenerd · · Score: 1

    I'm working on a home automation project and we've been looking for an OSS, linux-compatible speech rec system, but it seemed like every single Linux speech project died in the early 2000s when IBM sold their freeware ViaVoice system and the new company started charging for it. Seems like every single Linux project used it as the backend. The only other option was CMU's Sphinx work which looked impressive but almost impossible for non-speech-experts to use directly. This will be really cool to try out - kudos to everyone working on simon.

  30. CMU Sphinx, an other free speech recognizer by TorKlingberg · · Score: 2, Informative

    There is also CMU Sphinx, which is completely free (no HTK used) and very good quality.
    http://cmusphinx.sourceforge.net/
    http://en.wikipedia.org/wiki/CMU_Sphinx

  31. I use only computer dictation for medical notes by KWTm · · Score: 2, Informative
    At my office, we use a computer dictation system for medical notes. It is amazingly accurate for those who speak with accents within the norm. It works well for me, and I will typically dictate something like this:

    "The patient presents today with three complaints comma as follows colon new paragraph For the past week comma he has had right shoulder pain period new paragraph He has noticed that when he sneezes comma there are streaks of blood in his mucus period new paragraph He has been experiencing diaphoresis and is concerned that it may be related to his systemic lupus erythematosus for which he has been taking prednisone twenty milligrams q h s."


    I think the software is called "Enterprise Dictation System"; requires Internet Explorer, although there must be some component that's pushed out locally to the client since I can't imagine the sound data being sent over intranet to be interpreted. I dictate in chunks, and apparently the longer the chunk the more easily it can interpret what I say. For example, if I just dictate "to", then it may transcribe "to", "2", "two", or "too". If I say "to prevent this comma", then it knows that the first word should be spelled "to".

    It's surprisingly accurate, and is more accurate for esoteric medical terms than for comment short words since for medical terms there is a relatively limited number of possibilities relative to the number of syllables.

    For some colleagues who speak with foreign accents --and even for certain colleagues who seem to speak with standard local accents-- recognition was quite poor, and they fall back on human transcription.

    Anyway, just wanted to share this experience. I was quite amazed at how well the dictation worked.

    Here's hoping we can build up a good Open Source/Free database of voice recognition data. Or at least, perhaps an Open Source engine, and then different companies can market their voice data.
    --
    404555974007725459910684486621289147856453481154 in hex is "You sank my Battleship?"
    [GPG key in journal]
  32. Open Source Chinese Speech Recognition? by ZorroXXX · · Score: 1
    Hi. I am currently learning Chinese and when reading this I thought that maybe speech recognition software could be useful (or maybe not, but at least I would like to try). Does anyone have any tips on what I need to get of software (for Linux) that supports recognition of Chinese?

    I am not interested in learning the computer to recognize my terrible pronunciation, but rather to have some program expect to hear standard Chinese which I could practice with.

    One extremely useful program I have found which is able to decode and show the tones is Wavesurfer. For those of you that do not know, tones play a very important part in Chinese speech, and it is kind of difficult to learn as a foreigner.

    Request: Can any of you with knowledge within this field please contribute a little to update http://tldp.org/HOWTO/Speech-Recognition-HOWTO/index.html, it is a bit dated.

    --
    When you are sure of something, you probably are wrong (search for "Unskilled and Unaware of It").
  33. WHOA! "Open Source"="can look at code"!!?? WTF? by KWTm · · Score: 1

    HTK is free of charge and open source (in the strict sense of you-can-look-at-the-code). It is, however, not "free".
    Hold on just a frakking minute.

    What the hell is "open source (in the strict sense of you-can-look-at-the-code)"? Since when did anyone start to mean "open source" as code that was merely available but not modifiable? As this sibling comment points out (please mod him up, by the way), the term "Open Source" has a very specific meaning. This meaning was determined at the time this term was invented, so you can't even use the same excuse as "free software" and hide behind the excuse of "but for the past 600 centuries, Shakespeare has been using the term 'open source' for this other meaning!"

    Microsoft has been muddying the water enough with terms like "Shared Source" and "Open" as in "OOXML". This thing about the "strict sense" of the term "open source" has got to be nipped in the bud.
    --
    404555974007725459910684486621289147856453481154 in hex is "You sank my Battleship?"
    [GPG key in journal]
  34. Re:I have used computer dictation by Anonymous Coward · · Score: 0

    IBM ViaVoice on a IBM p100 64MB windows '95 and it was brilliant. It only needed patience to train. Shame nobody else could be bothered and complained it was too hard to use.

  35. Shh! Not in front of the computer! by ErkDemon · · Score: 1

    Something like "Computer: Lights off" will reduce the chance that some random sentences from the TV will trigger the command. Unless you're watching Star Treck ofcourse.
    And long as you never have any conversations about computers within earshot of the 'puter.


    "What'd you install that crappy voice recognition software on this computer for, Matt? See? Everything I say is coming up on the screen as syllables ..."

    (computer voice) "For-Mat-See. Formatting in progress ..."

  36. War stories.... by EmbeddedJanitor · · Score: 1

    About 15 years ago I worked for a company doing, amongst other things, VR for telephone use. These systems had localised dictionaries to handle accents. We struggled to get the stuff going properly and the only combination we got to work reliably was a Fijian Indian person talking to a British accest VR system. Go figure!

    --
    Engineering is the art of compromise.
  37. It's even farther down the path ... by Schtroumpf42 · · Score: 1
    So, what are the steps necessary ?
    • Implement voice recognition : needs to know the language, and actual semantics & meaning for (most of) the languages to approach human understanding. Additionally, most languages are natively ambiguous (even when taking into account only their canonic meaning), so that translation can only be made contextually; Neural Nets and Markov Chains are good at that, but are about as easy to debug as an human being .... good luck with your Computer Therapist training
    • Implement a mean to recognize tone of voice, including person-specific tones : the way I say "this price is crazy" DOES influence its meaning .. how to recognize it automatically ? Take that, and the fact that, from one person to another, you can have same sounds, but different facial expression, hence different meanings ... and you'll end up developing a Facial Recognition System, along with a Knowledge Base encompassing that of modern-life Ethnologists - definitive pre-requesite for choosing between slang's Woman and war's Weapon in "bomb"
    • Develop a mind reader : since the cultural background of speakers is not known beforehand, and the extract may not be enough for the Knowledge base to work,how to know enough of his contextual concepts in any other way ?
    • As an alternative to all previous points : implement a Surveillance Society(TM), with the abolishment of most Privacy Rights, so that you'll know some vague backgrounds for those individuals ... if they expressed some of it in sight of some Surveillance Device beforehand

    So, it a bit longer than initially thought ... but, if considering the last point, it's getting here way faster than using Law-abiding processes ...

    --
    Disclaimer, or "legal deathtrap" :
    Definitions :

    • "products" or "shit" : the terms "Surveillance Society", along with all the patents, copyrights and other legal deathtraps derived from them
    • "Legal entities" or "blobs" : individuals , groups, agencies, companies or other (stuff & combination of stuff) like that
    • "owner" or "bastards" : George Walker Bush, and the persons that agree with his "shit" ... seems to include at least the following "blobs" :
      1. Former Video-Surveillance fan and Prime Minister of UK, Tony Blair
      2. Former Australian thingy, John Howards
      3. Current French President and former Interior Minister, who earned both salaries, along with a 175% increase in presidential salary for about SIX month, luxuries-fan, egotistical and megalomaniac, Nicolas Sarközy de Bosca
    • "representatives" or "SoB" : any "blobs" that talks regularly in a Court of Law and turns Richer in the process, without having much to do apart from reading some convoluted texts (produced & obfuscated by their peers) ... similar to Middle Ages's Witches, except that they do not produce any medecine, visionary insights, or more generally, original thoughts ...
    • "retaliation" or "fsck you" : careful application of "SoB" on a "blob" that we wish to weed out of existence
    • "retaliation" or "Hasta La Vista Baby" (alternate definition for people that do not have as of 2008 an US-stamped, biometrically-identified, GPS-tracked Passport) : reckless application of Marines, Tomahawks (the 20'000 Kg version), Uranium-enriched munitions, Torture, and Shady Contractors to designated "blobs" and any other "blobs" that happen to live in the same country (or general area, provide that it doesnt involve US land, at least for the portion that "bastards" may one day visit)

    Claims :

    • Claim 1 : The "shit" and their various legal offsprings are the exclusive property of the "bastards"
    • Claim 2 : If any "blob" dares to speak of