Slashdot Mirror


Voice Recognition for a Techie?

kaybee asks: "I am a long-time developer, sysadmin, and general computer junkie (for fun and for work) who needs to seriously curb the usage of his hands. I'm curious as to the current voice recognition options, preferably usable on Linux and Windows. I prefer the command-line to a GUI, I prefer Vim to anything else, and I still read my email with Pine. I'd like to hear options for sending email via voice, which I hope is easy, and I'd love to hear of any solutions that allow effective coding via voice, which seems much more difficult."

102 comments

  1. Computer.... Computer... Hello computer... by TibbonZero · · Score: 4, Funny

    Oh, I'm sorry. I thought you said voice recognition for a Trekkie....

    --
    Tibbon
    tibbon.com
    1. Re:Computer.... Computer... Hello computer... by dbIII · · Score: 2, Funny
      I thought you said voice recognition for a Trekkie
      Not long after the movie with the whales I saw a mouse with integrated microphone for sale - almost worth getting it for trek joke value.
  2. Write it yourself by Kawahee · · Score: 2, Informative

    Write it yourself. Grab the Microsoft Speech SDK and WINE or some suitable interoperatibility layer and you should be good for Windows and Linux. The Microsoft Speech SDK doesn't require oodles of code to make it work, so you should be able to get a working sample under Windows in about half an hour. It comes with some rudimentry samples as well, and since it's not released under any particularly binding license you can just build your code around it.

    'Course you could go the other way with some Open Source speech recognition and cygwin or similiar.

    --
    I'll subscribe to Slashdot when I see a month without a dupe, a typo, or an article the "editors" didn't read.
    1. Re:Write it yourself by amliebsch · · Score: 2, Informative

      Parent is on the right track, for sure. Microsoft may be evil, but their speech API is truly easy to use. Also, if you are willing to use Windows, Windows XP Tablet Edition comes out of the box with relatively full-featured voice command and dictation capabilities built into the OS. With a little training, it can probably do most of what you require. I have in the past actually used it to dictate an entire paper. In fact, I used it to write this post. Once you get used to getting all the proper punctuation commands, it is possible to dictate at a fairly good rate of speed.

      --
      If you don't know where you are going, you will wind up somewhere else.
    2. Re:Write it yourself by Miguel+de+Icaza · · Score: 0

      for the ABM crowd cmusphinx might be a better toolkit. It has the open source goodness, is multi-platform (no wine shit) and has several versions in differnt programming languages including a pure java version (sphinx-4).

      --
      Before adopting WHATWG, read the moonlight.NET EULA [http://www.microsoft.com/interop/msnovellcollab/moonlight.mspx]
    3. Re:Write it yourself by ChildeRoland · · Score: 2, Funny

      Is there a reason you inserted an extraneous comma into your last sentence?

      --
      The mark of a mature person is not creating arbitrary criteria for considering others mature.
    4. Re:Write it yourself by Anonymous Coward · · Score: 1, Informative

      Yes, there were two clauses separated by a comma which is perfectly good punctuation by normal standards of literacy.

  3. Sysadmining by voice by Anonymous Coward · · Score: 5, Funny

    No, computer, I said, "awk single quote left curly print dollar one right curly single quote file dot txt pipe sort pipe uniq dash see greater than a dot out"

    shudder

    1. Re:Sysadmining by voice by Tackhead · · Score: 5, Funny
      > No, computer, I said, "awk single quote left curly print dollar one right curly single quote file dot txt pipe sort pipe uniq dash see greater than a dot out"


      Oh yeah?


      { } . ! /
      & ; ^ # -
      < > @ \
      { } _ SYSTEM HALTED


      Left titty, right titty, dot bang slash.
      Ampersand semicolon, caret pound dash.
      Less than greater than, at back slash,
      left titty, right titty, under score crash.


      * # ! ! (
      ~ & | )
      ' " . . DEL
      # ^G ! ! working... done.


      Star pound bang bang, open-paren.
      Tilde and pipe, close-paren.
      One quote, two quote, dot dot delete,
      pound bell, bang bang, process complete.


      - Doktor Dynasoar posting some ASCII poetry, and the thread also includes the immortal Hatless Atlas, which I'm not even going to fantasize about getting past the filters.


    2. Re:Sysadmining by voice by dgatwood · · Score: 4, Funny
      / ! [ . * +
      $ $ $|-|1+
      # 3 11 H E LL
      A general protection fault has occurred. A general protection fault has occurred. This application will be terminated.

      slash bang open bracket dot star plus,
      dollar sign dollar sign code for cuss,
      pound three eleven, H-E double-hocky-pucks
      BSOD. BSOD. Windows really sucks.

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

    3. Re:Sysadmining by voice by amliebsch · · Score: 2, Funny

      Umm...isn't an "L" a hockey stick? Unless I'm woefully misinformed about the nature of hockey pucks, that is.

      --
      If you don't know where you are going, you will wind up somewhere else.
    4. Re:Sysadmining by voice by ArsonSmith · · Score: 1

      duh, stick doesn't rhyme

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
    5. Re:Sysadmining by voice by Albanach · · Score: 1

      And when using windows, watch out for coworkers shouting "format see colon"

    6. Re:Sysadmining by voice by dgatwood · · Score: 1
      Sure, if you wanna get technical....

      :-D

      --

      Check out my sci-fi/humor trilogy at PatriotsBooks.

  4. Hand use by Doomstalk · · Score: 5, Funny

    [...]who needs to seriously curb the usage of his hands.

    Lest they... *ahem* wander.

    1. Re:Hand use by Jimekai · · Score: 1

      A proposition from a serious mind-uploader could explain the need to relinquish hand use, by stating it as a life's goal. I write such stuff for my own mind-uploading quest, under the guise of my other nickname of Jimekus. The Ingrid software is always ready and being tested, but my VB6 GUI Ingrid Command And Control Edition will only again be useful to me when I can get a wireless microphone that will accept my voice commands without interfering with the output from the Ingrid On Winamp Frontend. A recently discovered Spaninsh program called MouseCam needs to be re-written first as a VB6 open source project. Imagine a mousecam that can read your lips. Even better than a wireless microphone.

      --
      Argumentum ad Probabilitum
    2. Re:Hand use by ArsonSmith · · Score: 1

      No way, think about it you could cup while rubbing with a pure voice controlled pr0n library.

      --
      Paying taxes to buy civilization is like paying a hooker to buy love.
    3. Re:Hand use by Doomstalk · · Score: 1

      This is a brave new world you describe, sir.

    4. Re:Hand use by I(rispee_I(reme · · Score: 1

      Somebody mod this shit up. We need our best minds on this.

    5. Re:Hand use by alienmole · · Score: 1

      Seeing as how our best hands are busy.

  5. wouldn't bother yet by joe+155 · · Score: 2, Informative

    I've actually used some vioce recognition over the years and it's got a lot better than it used to be. last time I used to use a voice recognition software on my computer even though I did loads of the training it just didn't seem to get it; eventually I had to give up... it wasn't cheap either. Whilst I think it will have potential to do a lot in the future I'm just not sure that it's really at the stage where it can be considered a full time replacement; especially for technical jobs

    --
    *''I can't believe it's not a hyperlink.''
  6. Speaking WPM != Chars Per Minute by FlameboyC11 · · Score: 5, Insightful

    The main issue I see with coding by voice is that each character needs to be said by a word. We only have 26 single sounds we can make (at least us english speakers) and so pretty much everything besides the basic sounds have to be the result of multiple letters strung together. Here's some math:

    Lets say you type at about 40wpm, or about 160characters per minute (this is a low estimate of 4 chars per word), or about 2.5 characters per second.
    To be as productive speaking, you'd probabily have to speak about the same number of words per second as you type characters, or 2.5 words. That's really fast.

    Sorry bub, doesn't look speech is a very good alternative. Hell, Brain Implants on the other hand...

    1. Re:Speaking WPM != Chars Per Minute by jonabbey · · Score: 2, Insightful

      The best thing to do is take a rest on your hands, and get professional help. Voice recognition for coding sucks.. believe me. You're better off doing something else altogether if it comes to that.

      Coding is very precise work, and voice recognition just isn't good at that. If you try coding with your voice, you'll soon find that your voice hurts, and you've been immensely frustrated at the whole experience.

      Have you had medical attention to your hands?

    2. Re:Speaking WPM != Chars Per Minute by TheWanderingHermit · · Score: 2, Informative

      We only have 26 single sounds we can make

      No. Not true, even in English. For example, "c" does not make a sound distinctly different from all other characters. Some letters, such as "x" make sounds that can easily be made from a combination of other letters. Including pairings and such, linguists say that the English language includes something closer to 45 single sounds.

      I used to teach Special Ed and saw software that could recognize entire words and use them in writing in a word processor. I have not used voice rec software in the 10 years since I left the field, but I don't see how, if it could be done by some of the pioneering programs 10 years ago, why many programs now would recognize only individual letters and not words.

      I have also heard about some writers that were using voice rec to do a lot of their writing as long ago as when we were using the software in sp.ed. (again, that was about 10 years ago).

    3. Re:Speaking WPM != Chars Per Minute by Eideewt · · Score: 2, Informative

      I think you may be confused. First of all, there are way more than 26 sounds in the English language. It's more like 49 individual consonants, vowels, and dipthongs, and many monosyllabic words can be constructed from those.

      As far as I can tell, you're saying that words would need to be spelled out character by character so you'd have to talk really fast to be productive. Custom dictionaries would go a long way towards fixing that. The main issue would be whether a particular speech recognition solution integrates well with the shell and/or dev environment being used. It would be fairly simple, with any software, to get it to recognize shell commands like rm, cp, |, and grep when they were spoken as words. When coding, it would also be pretty simple to recognize common keywords and operators and output the proper text. I don't think there would be much trouble with speed until stuff like variable names began to come up. Even then, the big problem isn't that storing variable names for later spoken use would be hugely difficult to implement; it's just that (afaik) it hasn't been yet.

      Assuming that most words could be recognized when spoken, you wouldn't need to speak at a higher WPM than you type at. Conversations happen at around 200 wpm (just over 3 words per second), according to Wikipedia, so speed wouldn't be much of an issue.

      I think the biggest problem with speech as an input for techies is that the software itself has not yet been written. While there may be recognition software that can comprehend speech at normal speed and append its dictionary as it runs, there's none that I know of that has been set up to function in a technical environment. It may be as simple as putting the pieces together, but it would probably require a lot of hacking on your own. The second biggest problem would be wearing out your voice, although that's something you can work with.

    4. Re:Speaking WPM != Chars Per Minute by Danny+Rathjens · · Score: 1

      And then add in letter combinations like "th". And then add in multiple different sounds from the same letters or combinations like the "th" in "think" versus the "th" in "this". The first is unvoiced, the second is voiced; like the difference between "f" and "v".

    5. Re:Speaking WPM != Chars Per Minute by Drachasor · · Score: 1

      I think his concern was carpel tunnel syndrome. Hence his comment on curbing the use of his hands. Your analysis is accurate otherwise; one should use hands if one can. At least for regular things.

      Hmm, as for coding it does make one think a bit. I think you might see this sort of thing eventually for coding, but you'd need a special compiler (and perhaps language) that had a bit of AI in it to avoid silly mistakes with commenting, commands, variables, that sort of thing. It could work potentially though.

      Hmm, that just brings to mind how our ability to speak math completely stinks. We have no good way to talk about mathematic outloud. This would hurt any attempt to give anything mathematical a voice interface.

    6. Re:Speaking WPM != Chars Per Minute by Anonymous Coward · · Score: 0

      Two things:

      The custom dictionaries are very important for getting this thing to work well... for some programs it is
      fairly easy to add custom words, variable names, macros, etc. Dragon Naturally speaking is very good at that.

      However, if you want to dicate to a console window (such as a remote x-window to a linux machine), Dragon Naturally Speaking is so closely tied to Windows that it actually runs slower, I think because it can't do context-based predications on the characters in the cin/cout pipe of a console or java-based application.

      The Microsoft SAPI-based shareware programs deal with writing to conole windows much better. Plus, you can write your own, if you want... I think there is even a bit of Perl code to interface with the Microsoft SAPI on the net somewhere...

    7. Re:Speaking WPM != Chars Per Minute by identity0 · · Score: 2, Informative

      Hooray, I just got out of linguistics class and happen to have my book on me. According to my "Contemporary Linguistics: An Introduction 5th Edition" by O'Grady, et al, there are 49 phonemes in American English. Keep in mind that variants and dialects of English can vary quite a bit, and the book itself says some speakers may be missing a few of the phonemes.

    8. Re:Speaking WPM != Chars Per Minute by (1+-sqrt(5))*(2**-1) · · Score: 1

      Not to mention the medieval esh, ezh, thorn, and yogh; which, I understand, are pronounced with subtle differences against their Roman transliterations: sh, zh, th and gh.

    9. Re:Speaking WPM != Chars Per Minute by Ubernurd · · Score: 1

      if it could be done by some of the pioneering programs 10 years ago, why many programs now would recognize only individual letters and not words

      I think what GP means is that you can speak whole words into a word processor and it will match them against words in its dictionary. Any it doesn't recognize would need to be spelled out.

      When coding, We use a lot of words that aren't in the dictionary. if, else and switch would be ok but Degrees2Radians isn't going to be in any dictionary so you're going to end up spelling a lot of words.

      --
      Stack overflow: pid 352258, proc httpd, addr 0x11f7ffff0, pc 0x12000195c Segmentation fault (core dumped)
    10. Re:Speaking WPM != Chars Per Minute by TheWanderingHermit · · Score: 1

      Maybe I'm wrong, since it's been 10 years, but I think you can train the systems for new words. It'll be a pain at first, but in the long run, it'll help. It might be a mess coding in something like Java, though, where so many methods are made up of 3-4 distinct words.

    11. Re:Speaking WPM != Chars Per Minute by LurkerXXX · · Score: 1
      No. Not true, even in English. For example, "c" does not make a sound distinctly different from all other characters.

      Yes it does, but only when followed by an h. The "ch" sound is distinctly different from sounds produced by any other letters. If it weren't for "ch", yes, 'c' would be a rudendent letter.

  7. Find ways to save typing effort by Anonymous Coward · · Score: 3, Interesting

    Voice recognition is good for letter-writing but bad for overall computer usage, especially in UNIX shell (incl vi and especially Emacs). Picking programs that don't require jumping all over the keyboard for basic tasks can reduce the strain. Same goes for programming syntax: Python is a lot more RSI-friendly than Perl, for example. (IMHO) Write scripts that automate routine tasks, even if it's just one line with lots of regex.

  8. Voice Wreck Ignition by Anonymous Coward · · Score: 2, Funny

    Eye won stride to yews voice wreck iginition soft wear tomb ache a slash dot post. Eye was knot imp pressed. It was sofa king we todd did

  9. Just us a keyboard by Seriously,+who · · Score: 0

    Seriously, who would choose a voice recognition system over a normal keyboard? Given the problems inherent in understanding human speech, a keyboard is always going to be the superior solution.

  10. Circles within circles by XanC · · Score: 1

    Wouldn't he need it to be written in order to write it?

    1. Re:Circles within circles by Kawahee · · Score: 1
      --
      I'll subscribe to Slashdot when I see a month without a dupe, a typo, or an article the "editors" didn't read.
    2. Re:Circles within circles by xenocide2 · · Score: 1

      I believe the gentlemen's point was that a man who needs to stop typing things is the wrong person to have type up a program to do speech to text conversions.

      --
      I Browse at +4 Flamebait

      Open Source Sysadmin

    3. Re:Circles within circles by Kawahee · · Score: 1

      Touché

      --
      I'll subscribe to Slashdot when I see a month without a dupe, a typo, or an article the "editors" didn't read.
    4. Re:Circles within circles by MrFlannel · · Score: 1

      No. He only needs to write a small aspect of it. Then he can use that to bootstrap in more functionality, ala compilers written in their own languages.

      --
      Clones are people two.
  11. Save your hands -- while you can by Lars512 · · Score: 5, Informative

    Seriously, if you're suffering hand or arm pain, you should think about the way you're doing things now. Speech recognition is unlikely to replace your current coding practices, although it might help with writing reports.

    Instead, try using the keyboard break feature in gnome. To start with, have it kick you off your computer every 30 mins for a 3 min break, and don't allow yourself to postpone breaks. Get some equivalent software for windows too. Use your 3 min breaks to walk around and stretch. Within a week, you won't be a lot less productive, but your arms will feel a lot better. Then you can maybe up it to 40 mins. In the short term, a course of anti-inflams might help (ask your doctor).

    Also, don't come home in the evening and play games on your computer, or do more work. Your arms probably can't take it. Equivalently, inform your employer of your condition and subsequent inability to work reckless overtime hours.

    These two things should get you started for long-term sustainable maintenance of your arms.

  12. Linux Adaptability by skwirlmaster · · Score: 3, Informative

    It's been a while since I've had to look into speech recognition for linux, but this link should help you get started: Linux Accessibility Resource Site

    Read down to the section about speech recognition. I hope that helps.

    --
    My inner self is ineffable, so don't eff with me.
  13. Try using a GUI for email, etc... by bergeron76 · · Score: 1

    Seriously. Try not using CLI for everything and see if that helps your problem.

    Voice recognition is still hit-or-miss.

    --
    Don't think that a small group of dedicated individuals can't change the world. It's the only thing that ever has.
    1. Re:Try using a GUI for email, etc... by dbIII · · Score: 2, Informative
      Voice recognition is still hit-or-miss
      It seems to work with Nintendogs on low end hardware (Nintendo DS with 4MB memory). I suspect the secret is having a limited number of things to match - for example a voice menu with limited options in each context that sound very different.
    2. Re:Try using a GUI for email, etc... by Osty · · Score: 1

      It seems to work with Nintendogs on low end hardware (Nintendo DS with 4MB memory). I suspect the secret is having a limited number of things to match - for example a voice menu with limited options in each context that sound very different.

      But in Nintendogs, you're talking to a dog. It's okay if it doesn't quite understand the first time. In fact, that's expected and part of the "charm" of the dog. I don't mind telling my dog to "sit" a couple times, but if I had to tell my computer to "save" three or four times before it actually saved I'd be really pissed (or worse, if it acted like a real dog and tried to do something else to see if that's what I'm asking. "Roll over. Roll over. No, that's 'play dead'. Roll over." "Save. Save. No, that's 'reboot'. Save.")

      (Caveat: I haven't played Nintendogs, and it's been a while since I've had a dog of my own. However, I have very good friends with dogs, and even when well-trained they may take two or three repetitions of a command to do the right thing.)

  14. Shoot! by Bios_Hakr · · Score: 4, Informative

    For gaming on WinXP, I use an app called Shoot!. While playing Falcon, I use it for fairly simple (press T, wait 5 seconds, press 1) macros. I was dicking around and decided to set up a profile for some simple stuff in Cygwin. If I say "list", the program returns "ls". "List all" will return "ls -a". "List all long" will return "ls -la".

    You can, with some tweaking, even get it to understand complicated stuff. If I say "manual g r u b", I can get "man grub". "Vi save quit" could be mapped to ":wq" without too much trouble.

    Anything you can type, it can do.

    I don't think it works under Linux. I don't know of anything like it under linux. It does, however, work quite well inside PuTTY.

    --
    I'd rather you do it wrong, than for me to have to do it at all.
    1. Re:Shoot! by cgenman · · Score: 1

      This sounds ver similar to the built-in voice recognition that macintoshes have had for some time now (not to knock it). Anything you want typed or done can be triggered by a voice command, though it has to be scripted individually.

      In terms of Windows, the best that I know of is still Dragon Naturally Speaking, though I strongly recommend pirating it first to decide if it serves your needs. Unfortunately, even with regular training it still gets things wrong with alarming frequency. You have to retouch everything or your mail recipients will think you're an incoherent blogger.

  15. CMU Sphinx by Anonymous Coward · · Score: 0

    http://cmusphinx.sourceforge.net/html/cmusphinx.ph p

    no idea how to use it, but you're such a techie, you'll surely figure out..

  16. Sphinx by Anonymous Coward · · Score: 0

    Java-based speech recognition. 'nuff said.

    1. Re:Sphinx by Anonymous Coward · · Score: 0

      Yes. So you can be 60 seconds behind on every command. Gives you time to consider whether you want to run the command or not. Think of it like a queue you can add and remove commands from.

      Be willing to not have any ram left for anything else. And it's required JRE will break everything else. And, being java, it has memory leaks. Need I go on?

  17. to curb your hands.. by Anonymous Coward · · Score: 0

    ..please infringe a patent of your choice

  18. mmmmmmmaudio by MobileTatsu-NJG · · Score: 3, Interesting

    "I'd like to hear options for sending email via voice, which I hope is easy, and I'd love to hear of any solutions that allow effective coding via voice, which seems much more difficult."

    I've wondered about this myself. I tend to use my computer with the headphones on. Often, I'm listening to music or.. well just plain silence, just the standard dings of Windows. I do pay attention, though, to the sounds coming from the computer. (i.e. the traditional hoo-hoo of recieving an email.) I've always wondered about what more could be done with sound to make the user more aware of the goings on with their computer, especially when a number of apps are actively working. I think I was inspired by an episode of Futurama I caught. One of the character's personalities was in the Pilot's body. The Pilot, whose personality was in yet another body was trying to describe how to interact with the ship. I remember him saying "Can you hear that faint little tone? That's the status of..".. or something or other.

    In any event, it's fun to imagine. I wouldn't mind if a soft low-volume voice were to say "You have recieved an email from: John Smith." I had a job a few years ago where that would have been a nice little feature since messages would come in that required urgent attention. My solution to the problem at the time was to use a custom filter that would specficially notify me of important messages by bringing a little window up to the surface. That was fairly annoying, though, when the computer was busy and it was slow as molasses to get the window to go away.

    --

    "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    1. Re:mmmmmmmaudio by MobileTatsu-NJG · · Score: 1

      I can't believe I misread that summary. He's talking about emailing via voice, not having the voice play back his emails. Yes, I'm an idiot, that's what I get when I post before my first coffee.

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

    2. Re:mmmmmmmaudio by alienw · · Score: 1

      Heh, I already have this feature in my new desktop machine. Apparently, the CPU fan is PWMed by the motherboard, and it changes its speed based on the powernow clock speed. Therefore, you hear the fan revving up whenever the CPU is doing something. To be honest, it's getting to be really annoying, simply because you hear it rev up every time you, say, drag something with the mouse.

    3. Re:mmmmmmmaudio by Anonymous Coward · · Score: 0

      Totally OT - are you talking about Farscape? I don't believe there is such and episode of Futurama, and Farscape does have The Pilot (with caps) whereas Futurama just has a pilot (no caps). :)

    4. Re:mmmmmmmaudio by MobileTatsu-NJG · · Score: 1

      "Totally OT - are you talking about Farscape? I don't believe there is such and episode of Futurama, and Farscape does have The Pilot (with caps) whereas Futurama just has a pilot (no caps). :)"

      OH man. I was thinking Farscape, and I typed Futurama. Geez. Yes, you're right. Man, double embarrasment.

      Hehe. Thanks, man. :)

      --

      "I like to lick butts!" by MobileTatsu-NJG (#32700246) (Score:5, Informative)

  19. Techie by suv4x4 · · Score: 1

    Yea, you've ever talked dirty to your computer and felt as if the feelings are not mutual?
    I hope better voice recognition and TTS will resolve that.

  20. OSSRI, VoiceCoder by robocyberdroidbot · · Score: 3, Interesting

    I ran into this problem while working (coding) & trying to do grad school (in Comp Sci). The first point I'd make is, take a rest break (no computer use) for a while if you can. ASR isn't really there yet, & it won't help you with other things you might want to be pain-free for... seriously. That said, there is a group called "Open Source Speech Recognition Initiative" whose mailing list I'm on, but they don't have any product yet. Might get a better answer posting there, though. Or not. There's also a group on Yahoo (I think) called VoiceCoder. That's your best bet right now, although it's all about Dragon Naturally Speaking & various hacks & kludges to be able to do coding, use Dragon for Linux, etc. Dragon has been reported to run under WINE, but of course YMMV depending on your hardware, versions, etc., etc. Finally, whatever approach you try, expect it to take a good long while before you begin to approach your hand-using productivity. The technology isn't there yet, and even though I know how to improve it, I have no Ph.D. so no one would give me the $$ to do the research that could back up my claim.

    --
    nificant.
  21. perlbox using sphinx by Danny+Rathjens · · Score: 4, Informative
    The perlbox voice control app is kind of a stalled project, but it is a nifty front end for the open sphinx voice recognition engine.
    http://perlbox.sourceforge.net/
    http://cmusphinx.sourceforge.net/

    Command and control is a lot easier to do with voice recognition since the dictionary the engine has to choose from is so much smaller. Having voice recognition engines understand arbitrary words well is still a bit difficult.

  22. Record an MP3??? by Anonymous Coward · · Score: 0

    You just want to send email without typing? Record an MP3 and just email it off??? You'll have to do a few clicks but not too many one would think...The recipient can just listen to it. Yes, large package size, who cares? If the people getting your email know in advance that you are having probs with your hands and fingers, they will either understand and put up with it, or they really aren't your friends.

    I don't know the state of the art, but dragon naturally speaking and such like programs are out there now, for dictation, to use for email or whatever, maybe that is what you want to keep it to text-only.

    A long time ago I had a mac classic program that would open and run applications with voice, it worked well, too, but I can *not* recall the name of it right this second. someone else might remember it though.

    Coding, no idea, hire some neighbor kid cheap to type for you while you talk, they should pick up on it quickly enough...

  23. IBM ViaVoice by inotocracy · · Score: 2, Funny

    I was looking to make a headless system that I could bark commands at, and I was quite sucessful at developing my own actually. I used IBM's ViaVoice SDK and modified a few of the sample programs they had that were written in C. It took a little work getting the system running, it being a tad old and all, but eventually got it down to where it was completely useable and could make requests like "new mail" and "talk to me dirty". Oh and yes, it was a Linux system it was running on (Slackware 8).

    Google it.

  24. Not Voice Recognition but still helps RSI by PockyFreak · · Score: 1

    A new and inovative input device has had some positive reviews floating around the net lately. It's called AlphaGrip and is basically a keyboard mapped onto a large game controller (with a track ball to boot). I ordered one a few days ago so I don't have first hand experiance with it yet but the reviewes come from some reputable sites (linked below). It clames to allow 50-wpm with only 30 hours of training. I'm not so sure about that but I'm willing to find that out for myself. Sorry for the short post but I'm eating a pizza with one hand and typing with the other and I have to be to work in five minutes and my wrist is killing me. http://www.extremetech.com/article2/0,1697,1949084 ,00.asp http://www.alphagrips.com/ product page with live demonstratio0n video

    --
    Ah, di fish da Bibble!?!
    1. Re:Not Voice Recognition but still helps RSI by zogger · · Score: 1

      that is pretty spiffy! I bookmarked that link to their product page, and 99 bucks doesn't seem all that unreasonable if it is as comfortable as they say (or it looks). Keyboard and mouse takes three hands! nuts! It's always seemed 'tarded to me that way... And I have tried trackpad keyboards, don't like them, something like this, though, where you can sit back in a comfy chair and surf and type looks pretty neat.

  25. FWIW... by Anonymous Coward · · Score: 0

    Technically, it's "speech recognition".

  26. Phonetic Punctuation to the Rescue! by wowbagger · · Score: 1

    Obviously, what we need is a voice recognition algorithm with Phonetic Punctuation support built in.

    Of course, we will have to extend it - Victor Borge didn't have sounds for #, < or > - but I'm sure we can come up with something.

    Of course, some programming languages will be better than others - Ada will sound almost normal (other than having to bark out all the words in your best Drill Instructor parade voice), while Perl.... you'll need a good sock on the mike to keep the spit out, and people will think you have Tourette's.

  27. A thought for coding via voice: by Ayanami+Rei · · Score: 3, Interesting

    First, find a solution that makes it easy to enter text into a GUI (gnome accessibility, WINE w/dragon natural speaking, whatever).

    Find a subset of words that are short, easy to remember, easy to say, and above all -- accurately translated by the chosen voice recognition software.

    Then create a small perl script that can take this coded input and convert it into a nicely formatted chunk of code.

    You can have different translators for different target languages... for example

    In shell programming, you might have the following:

    hash -> #
    bang -> !
    pipe -> |
    test -> [
    end test -> ]
    mark -> '
    quote -> "
    end mark/quote (keeps them balanced for shell scripts)

    for identifiers... don't name them. For example, lets' say you wanted to do this:

    #!/bin/bash
    function hello_lcase {
        HELLO = $1
        if [ -z $HELLO ] ; then
            echo "Hello world"
        else
            echo -n "Hello from "
            echo $HELLO | sed -e 's/.*/\L\0/'
        fi
    }

    you would say:


    hash bang slash bin slash bash
    new function 1
    set local 1 ref in 1
    if test empty ref local 1 end test
    then
    echo string 1
    else
    echo option n string 2
    echo ref local 1 pipe program s e d option e space
    mark s slash dot star slash back upper l back 0 slash end mark
    end if
    end function 1


    you'd run the perl script and it'd ask you:


    what do you want to call function 1: foo
    what do you want to call local variable 1 in function 1: HELLO
    what do you want to use for string resource 1: Hello World
    what do you want to use for string resource 2: Hello from

    and it'd output the script (maybe after running through indent)

    You could substitute "1" for any easily recalled mnemonic or symbol the text->speech translator is unlikely to mistranslate (in this case "foo" and "hello" would probably be fine as is)
    Then you'd get a chance to globally "refactor" your symbols and give them nice-looking names, only having to type them once.

    --
    THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
    1. Re:A thought for coding via voice: by Anonymous Coward · · Score: 0

      I don't like that part with adding identifiers afterwards. That is nice as long as I have pre-written listing before my eyes, but if I was to make it up in my mind and recite it "by hart", it would be difficult, like playing chess without board and pieces.

    2. Re:A thought for coding via voice: by Ayanami+Rei · · Score: 1

      Well, again you could use stand-in identifiers. You could leave some of them as is. Because I suspect the hardest part of dictated coding is reciting things like LD_LIBRARY_PATH over and over. Its critical that for the duration of your dictation can use some other term, or a number (if you just need a temporary local that'd be easier)

      So instead you might have:

      new function shrink
      set local hello ref in 1
      if test empty ref local hello end test
      ...

      do you want to rename function "shrink": hello_lower
      do you want to rename local variable "hello": (carriage return)

      or perhaps:

      new function ident hello lower end ident
      which would optionally prompt:
      what do you want to call function "hello lower": hello_lower

      or for an extreme example:

      set explicit upper l d under word library under word path end explicit slash o p t slash c s w slash l i b colon ref explicit upper l d under word library under word path end explicit
      (for LD_LIBRARY_PATH="/opt/csw/lib":$LD_LIBRARY_PATH ... yeeesh)
      contrast:
      set ID library path end ID string ID blast wave lib end ID colon ref ID library path end ID
      and cue appropriate prompts
      The thing is you have to be able to read what the dictation software outputs into your text editor and be able to work with it... so its probably better to have standins than try to tell educate the voice->text program about your commonly used identifiers (although it would be nice to preload it with some common idioms)

      --
      THIS THING CAN TURN ON A DIME, MACROSSZERO STYLE ALSO FUCK BETA, ~NYORON
  28. More than just the technical aspect... by Myrano · · Score: 2, Interesting

    I just did a presentation on speech recognition software for the Office of Disabilities Services at my school, and since I see that you have a lot of response on the technical aspect of it, I'd like to bring up something else: how speaking to the computer affects *you*. One of the things that most surprised me about using speech recognition is how speaking comes from a different part of the brain than typing. Composition through speech is *very* difficult to start; don't think you're going to just dive in and compose an essay or report right off the top: even if the computer can understand you, you won't be able to coherently phrase your thoughts in a truly professional manner. Speech recognition is at its best when used for email and (ironically, at least I thought) instant messaging, because these forms of communication most accurately mimic speech. I don't know how it's going to affect coding, though, 'cause I wasn't brave enough to try that (but I can only imagine it would be difficult). I just wanted to offer a slightly differerent perspective on it. It certainly seemed like I was using different neural pathways or *something*, so just remember: as much as you're going to be training the speech recognition profile, you're going to be training yourself, as well!

  29. Languages-U can tune a piano,but U can't tuna fish by Anonymous Coward · · Score: 0

    The speech recognition out there may not be useful enough to work well under current CLI environments. But why do we need to use the same old CLI, and the same old programming languages? Why not work instead on developing programming languages and programming environments that are better tuned for use with input by speech recognition.

    When GUI interfaces first appeared, their design paradigm was a distinct departure from the old modal interfaces that had dominated before the GUI. Modality never went away. It is the right tool for certain problems. GUI interfaces, GUI frameworks, and GUI applications are the right tool for other problem sets. Why not re-frame programming frameworks, application design, and administrative frameworks again for use with input by speech recognition?

  30. VoiceCode by Stranger+Than+Fictio · · Score: 2, Informative

    Don't get too discouraged by the large number of commenters who haven't used speech recognition or who don't understand why someone might need to lay off the keyboard for a while. I wrote 100k lines of C++ code hands-free for my astronomy thesis over the course of two years, using with speech recognition software that is now about 10 years out-of-date. There have been significant improvements in both the speech recognition technology and tools for coding by voice since then. For coding, take a look at the VoiceCode project at http://voicecode.iit.nrc.ca/VoiceCode/public/ywiki .cgi For other tools/approaches to coding by voice, see also the VoiceCoder group at yahoo groups: http://groups.yahoo.com/group/VoiceCoder/ I don't know of any open-source or non-commercial dictation software which matches the accuracy and ease-of-use of the Dragon NaturallySpeaking (fair warning - I work for Nuance, which makes Natspeak, though I was a user long before I became an employee). Natspeak is only available for MS Windows, but you can always put a Windows box on your desk and connect to a unix host via an X server (exceed, xwin32). That generally works well for command-line stuff, not so great for GUIs (but you say you prefer command-lines anyway).

    1. Re:VoiceCode by lpq · · Score: 3, Informative

      I tied both Dragon Naturally Speaking (costing ~ $500 or $600 at the time), and gave up the training problems and low recognition rates. I tried IBM's ViaVoice Professional, USB-Pro -- with digital signal processing in an included microphone and a digital connection to my computer. With a 1 paragraph training session, it was already over 95% and improving over Draggin'. It was easier to train, and you could train it on the text you were typing -- i.e. it was able to learn from corrections and merge them back into your voice profile.

      Unfortunately, IBM released it in 2001-2002, then forgot about it. They've since gone onto their non-training voice recognition solutions for sale to businesses. They seem to have advanced, but not in any retail product.

      Dragon has come out with updates, but from people who have used and trained on *both*, ViaVoice has higher accuracy (~1% difference). The ViaVoice product price has fallen, and Dragon has, of course, gone up....

      Whatever product you get, get a fast 2+CPU machine with lots of RAM - 2GB or more. The ViaVoice algorithm adapts to your talking speed -- it will perform more looks and comparisons and have greater accuracy as the processor speed goes up. ViaVoice stops comparing when it runs out of time (your speaking has gotten too far ahead). But it listens to the words, in context, to determine spelling. The more memory it has, the more vocabulary it can pull into memory. Note -- I am saying get a dual-cpu (or dual core) machine, the faster the better.

      Viavoice was also released on Linux, but without as much application support.

      For coding support in voice products -- there just hasn't been enough demand.

      But for "wrist support" -- try a multi-faceted approach. Maybe voice recognition, maybe a tablet for input? Ergo keyboards, trackballs? It's not a comfy field. There isn't a great financial incentive to develop voice input for coding when you can hire foreigners for peanuts, and keep having eager generations of new hackers to come and be sacrificial lambs on the keyboards of progress...;-)

    2. Re:VoiceCode by Anonymous Coward · · Score: 0

      I wrote 100k lines of C++ code hands-free for my astronomy thesis over the course of two years

      If you coded for 40 hours a week and 50 weeks a year that's 4000 hours, for a line of code every 2 minutes. I really doubt you could pump out that much code for 2 years straight.

  31. Why is this modded down? by Ubernurd · · Score: 1

    Parent makes a good point and presents an alternative to the MS speech SDK as the submitter asked. Is it because he said "wine shit"?

    Had he said "with no need for wine" would you have modded him down?

    Even if he said "wine is shit", just because you don't agree with an ON TOPIC, informative post is no reason to mod it down. Read the moderator guidelines.

    You can mod this down too if you like.

    --
    Stack overflow: pid 352258, proc httpd, addr 0x11f7ffff0, pc 0x12000195c Segmentation fault (core dumped)
    1. Re:Why is this modded down? by Kawahee · · Score: 1

      It was done automatically. Check out his profile. I thought it peculiar because it was actually quite a useful post.

      --
      I'll subscribe to Slashdot when I see a month without a dupe, a typo, or an article the "editors" didn't read.
    2. Re:Why is this modded down? by Seraphim_72 · · Score: 1
      OK, I dont get it. Is it the real Miguel? If so why the auto mod?

      Sera

      --
      Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
    3. Re:Why is this modded down? by magetoo · · Score: 1
      Probably not.
      User page:
      "i play a pivotal role in a grand conspiracy to cripple the free software movement from within, by covertly embedding an unnecessary, yet seductively useful, patented technology in the very heart of the linux operating system's second most popular desktop environment."
      http://slashdot.org/comments.pl?sid=164643&cid=137 45635:
      "Miguel De Icasa and Ximian/Mono people *know* this full well but don't want to admit how dangerous mono adoption is for the gnome community."
  32. the bonfire analogy by nido · · Score: 1
    While your suggestions are just fine, they're worthless to someone who has a real structural problem. I wrote the following in an email to my Cranial Osteopath when requesting an appointment sooner than 3 weeks out.

    During one of our early sessions (about a year ago, for me it seems like quite a long time), you said something about about the work being something like "peeling an onion" [in that the "trauma"/"lesions" comes off in layers]. I like the analogy, but I've settled on something slightly different for my own personal case.

    My body's condition is something like a bonfire. Big logs of firewood were set at birth. Bumps & falls added to the pile, and another load of combustibles were added when I sustained the head injury almost 8 years ago. Even carrying around this big load, I was relatively stable. Then I went off to college, and the laptop & all-nighters on projects were the sparks that set the stack on fire.

    Osteopathic treatment is like removing the wood from the fire. Fire Fighter [Doctor Osteopathic] comes along, "let's see what we can get at today," and pulls out the biggest pieces of firewood that are accessible. The fire goes into hibernation for a bit, until I go and stir it up again with the computer/whatever.

    At first I didn't notice much of anything [from the osteopathic treatments]. But no one had ever recognized the fire before, so I stuck with it, and after a while, "neat - the fire is starting to go down..."

    My little status reports are biased towards optimism - "Fire's down by 5% this week!" But the 2 or 3 weeks between appointments still drag, with the fire fully flared up after a week or so.

    I've gone & poured liquid oxygen on my little 10% pile of firewood [the time before last he said the level of trauma in my body was 10% of what it was a year ago, when I started treatment with him] over the last couple of days, and it's all flared up again, with something new this time - a distinct pressure on the left side of C1 [1st cervical vertebrae], I think. And ... "T1" [1st thoracic vertebrae] has been clunking real good...


    After my last appointment, I tried to avoid the internet, but I went and checked my email and before I knew it I'd spent 6 or 8 hours on the computer. Arms all inflamed, etc etc.

    The writer who asked the question has a bonfire going in his arms/shoulders/wherever. Most the usual suggestions for "RSI" won't do anything to deal with the "stack of wood" (trauma/structural misalignments/etc) that creates the conditions for the bonfire to thrive.

    Also see my comments in this thread (be sure to follow the link to my comments in an earlier story), and perhaps others I've made in the last year (buy a subscription).
    --
    Learn the rules so you know how to break them properly.
    www.teslabox.com
    1. Re:the bonfire analogy by Lars512 · · Score: 1

      My advice only really applies to someone who's still typing and working on a normal keyboard. If you're unable to do even that for short periods (as in the case of my fiance), then you need serious medical treatment, and I agree voice recognition is the way to go. She uses a tablet, which lets her do a little computer work, 1 year later from the initial flare-up. In your case, spending 6 or 8 hours is clearly destructive, when the problem is already so bad. I still think the keyboard monitor to kick you off regularly (20-30 mins) would give you an interuption, a break to determine that actually, your arms have started hurting again, and that you should stop. Since it helps prevent the long destructive stints, you should find you get a little more computer time out of your arms in the long term. Perhaps just enough to not be disadvantaged in your chosen (preferably non-computer-based) employment.

    2. Re:the bonfire analogy by nido · · Score: 1

      ah, but the thing is, the discomfort's there from the moment I sit down. It started 7.5 years ago, in my first semester at college. I learned to ignore it, and while it does get worse if I don't take a break, the difference between taking a break and not is nearly imperceptible. Your suggestions might have done something for me then, but based on what I've learned since I started with the Osteopathic treatment program, what I've been through these last few years was inevitable.

      But I'm getting better now. The difference between tonight & 4.5 years ago (when I was still in College) is amazing. Before I was miserable after 30 minutes, now it takes 6 hours to really screw me up. Which is why I pimp Osteopathic care every chance I get.

      It seems like your fiance would benefit from a consult with a capable Cranial Osteopath, or "biodynamic" cranio-sacral therapist. At least pick up a copy of Andrew Weil's Spontaneous Healing and read chapter 2. :)

      --
      Learn the rules so you know how to break them properly.
      www.teslabox.com
  33. speech recognition for Linux by belmolis · · Score: 2, Informative

    For something that runs on Linux directly, you might have a look at the Accessible Speech Recognition Technology software. It's a research project, not a polished system, but you might be able to hack it to do what you need.

    1. Re:speech recognition for Linux by belmolis · · Score: 1

      Sorry, I screwed up the URL above. Here it is: Accessible Speech Recognition Technology software.

  34. Terminology by 6031769 · · Score: 1

    In the UK the terms are different. Over here the process to which you are referring (having a computer hear, understand and interpret words being spoken to it) is called "Speech Recognition". This process is very tough because the machine needs to be trained by the individual doing the speaking - there are differences in dialect, accent, timbre, pitch, all kinds of voice attributes which can throw the machine off course.

    OTOH, "Voice Recognition" is used to describe the process of taking a voice sample and comparing it with previously analysed voice samples to provide a means of authentication. The very attributes listed above which make automated speech recognition so difficult make voice recognition possible.

    To answer your question: speech recognition is still way too flaky to be used in something which requires such precision as systems management, but the casual end user can probably get away with it. Since you use the CLI a lot you should find that any sort of speech-based input is really going to slow you down.

    --
    Burns: We're building a casino!
    McAllister: Arrr. Give me 5 minutes.
    1. Re:Terminology by PatrickThomson · · Score: 1

      I'm looking for something like this actually. Dead simple to set up, requires "training". Prerecorded samples which, if detected, run certain shell scripts. Gimmicky, like "computer, pause music".

      --
      I am one of many. My idea is not unique, nor do I expect my voice alone to sway you. I speak in a chorus of opinion.
    2. Re:Terminology by Eideewt · · Score: 1

      If you're on linux, try out cvoicecontrol. I just ran across it yesterday (thanks to this thread), and it seems to work pretty well. I've only just tested it, but setting it up is very simple and it seems accurate enough.

  35. I doubt that's supported by sentientbrendan · · Score: 1

    I seriously doubt that WINE supports the speech API. Last I heard only the most commonly used elements of win32 were supported. Maybe someone more familiar with WINE would care to comment.

  36. xvoice by TheRealDamion · · Score: 3, Interesting

    xvoice is a gtk1 X application which uses IBM's ViaVoice engine to provide voice control and dictation support to arbitrary X applications. xvoice.sf.net is the url. The mailing list mainly covers issues of getting the ViaVoice libs working on modern distributions. The last release of VV was around the glibc2.0/2.1 era and most new ld.so's will struggle to execute the libraries and java dependancies. It's also fairly hard to buy a copy of VV 2nd hand anywhere and IBM appear to ignore any request to release it.

    However once you get past all of these issues (actually even running the old gtk1 xvoice becomes hard on modern dists), it works a charm. As it's X clean, you can X to any X server, be it one run under OSX or Windows, or a Sun SPARC box. You just need the mic connected to the x86 Linux box the client runs on.

    This meets your requirement for editing in vim etc. The accuracy, I found was fantastic.

  37. Using speech recognition for anything... by 4D6963 · · Score: 1

    I'll Ike's peach recognition all hot. Hits deaf finite Lisa free king she tune awe wad aim in. Soon Hampshire wheel used at took converse hate tuna Delhi basis. Him a gin bee ink ape able loft haul King Kong chats norm ally a Zeus aim thai mass Jah king of. the hill Harry you sand dumb harass Sing Sing whool after behaving the peep hole gay thing York raze him ownings trains crypted, dead Bill Ike wad deaf hock.

    --
    You just got troll'd!
  38. Coming from speech recognition by obarel · · Score: 2, Informative

    It's possible to recognize speech pretty well (and no, the ridiculous examples of "I'll Ike's peach recognition all hot" don't really happen for any reasonable engine that uses language models, and most of them do these days).

    The main problem is that no one actually speaks or writes as eloquently as people present speech recognition.

    Try this experiment: map backspace, delete and arrow keys to @ and try to write a letter or some code. You'll quickly give up. When you see demos of speech recognition, you never hear someone saying "Yesterday I went to the cinema. umm Monday actually. Ha, look the computer is still writing. Oh boy... delete delete delete delete ... delete ... delete ... replace Yesterday with Monday" (while it's possible to recognize "replace X with Y", you still have to be pretty focused not to say anything else).

    The missing bit is the intelligent dialogue that is redundant when you type. When you type, you have arrow keys, control keys and backspace. When you talk, these things are part of the communication, and writing an intelligent dialogue system is not trivial. If you want another experiment for the limits of speech recognition, just try whatever you want a computer to do with a real person. Try to dictate code to someone, and you'll soon find that it's not that simple. A person can even ask at the right time "empty brackets?" after you say a function name followed by a semi-colon, yet it's still very difficult to dictate code (or even a letter without any corrections).

    There is another problem: Imagine that you type away, and suddenly you see that you've forgotten a semi-colon. But as you're writing a game, you have the constants UP, DOWN, LEFT and RIGHT. Hmmmmm.... Now you have to change your code (or the code you've downloaded) to suit the interface. Not good. Another option would be "missing semi-colon at the end of the line beginning with strcpy", but you need a very intelligent dialogue system for that.

    Note: I've assumed that the recognition is perfect (and the problem is with our brains), but of course it isn't.

  39. Not much out there by Anonymous Coward · · Score: 0

    Some early post suggested the MS Speech API/SDK. There is that, and for general correspondence (ie, nothing in particular), it's pretty good.

    There's also Sphinx from Carnegie Mellon. I think IBM also recently released their source code for one of their engines (not Via Voice, though).

    You still might be able to get Dragon for Windows, but I don't think there is a *nix version. I could be wrong... after all, Dragon created a speech recognition language library for Klingon, so they might have stuff for *nix and coders.

    Philips might have binaries available for *nix. I'm not sure if it's available for single users or if you have to pay licence-until-grave fees. I'm also pretty sure there isn't a coder's language library.

    Coding would actually be an excellent language library: there is a relatively limited vocabulary for coding and it occurs in reasonably predictable environment. But, there isn't a demand yet.

    You will have to hack around whatever solution you get. The first hack will be to get it to cooperate in Windows and in *nix. The next hack will be teaching it "coders language". I don't think either is trivial.

    Or, you could spend less time at the keyboard.

    Good luck!

  40. I had the same problem -- though mild by wonkavader · · Score: 1

    Yep, stop playing games, and get right with your keyboard. That means GET A GOOD CHAIR. A good steno chair, which will set you back $200. No arms. Back support.

    Mouse as little as possible.

    Anti-inflamatories are your friend. Much of what's happening to your hands is your own body's doing.

    You need to be able to heal over night as much as you do damage during the day. Do .01% more damage each day than you heal, and someday you simply won't have hands. So heal as much or more each night, and slowly you'll get better.

    Eat a little more amino acids or protein or whatever your doctor says gives you easy access to building blocks for healing.

    One deal here is that you're scared. You're in danger of losing your job/life's vocation etc. Fear causes stress, and stress SLOWS DOWN THE HEALING PROCESS.

    So relax, and have confidence that things will get better. Take a LOT of ibuprofen, and have a liver function test if you wind up doing that for months (under the supervision of a doctor).

    Yoga, meditation and other pleasant things will reduce your stress level. Embrace them.

    The first thing to realize is that you're going to be OK. You're going to do what's necessary, and it'll be less than you imagine. It's just going to take some time. After all, if your healing process was VASTLY (say 5%) worse than your damaging process, you'd have been laid up after a few weeks of typing. So you don't have to make many changes for the good to eliminate this probelm in a few months.

  41. Low Back Shibboleths by SeanDuggan · · Score: 1

    Keep in mind that variants and dialects of English can vary quite a bit, and the book itself says some speakers may be missing a few of the phonemes.
    With, of course, the classic case of cot, caught, and bother, which are defined with three different phonemes, but where the average person in the use uses only two of them based upon region.

    --
    This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.
  42. An itch to scratch by booch · · Score: 1
    I think the biggest problem with speech as an input for techies is that the software itself has not yet been written. While there may be recognition software that can comprehend speech at normal speed and append its dictionary as it runs, there's none that I know of that has been set up to function in a technical environment. It may be as simple as putting the pieces together, but it would probably require a lot of hacking on your own. The second biggest problem would be wearing out your voice, although that's something you can work with.
    This sounds like an itch that somebody in such a situation would want to scratch. I think a lot of the basic speech recognition technology is out there. What would be really cool would be adapting it to an IDE environment. I can imagine setting it up so that it creates a dictionary of variables in use (like ctags), that you would teach it how to spell, then associate the spelling with a spoken word.
    --
    Software sucks. Open Source sucks less.
    1. Re:An itch to scratch by Eideewt · · Score: 1

      I agree. I'd like to do it myself (just because it's cool), but my schedule and mediocre programming skills may be a bit of a problem.

  43. Voice Recognition may lead to RSI in Vocal Cords by zark22 · · Score: 1

    A few years ago (ahem... 1996-1997) when VR became the big business buzz I was tasked with implementing a pilot project for a large government organization. The goal was to get rid of stenographers and have extremely-highly-paid analysts do their own bottom-of-the-pay-scale transcription. Keep in mind it was government -- which meant that most of the people in these positions were 50+ and many had never learned how to type, and only started using computers because my prior big project forced them to.

    Almost everyone abandoned the VR for various reasons... out of over 250 users, I think only 2 used the systems for more than a month or two. The error rate was within spec (around 95% after training) but in a page there would still be 15-20 errors that needed proof-reading and correction. Never mind the speed issues as well as background noise, answering the telephone etc.

    The main point of this post is that a several people started to get problems with their voice -- sore throats and loss of voice -- after only a short while using VR. This was particularly alarming given that it was specifically targeted at one of these people due to RSI in her hands from typing.

    The reason given was that in order for the computer to be accurate, people were forced to speak in a measured monotone, quite different from regular speech or using a dictaphone. This stressed people's vocal cords, some to the point where they suffered temporary voice loss for several days.

    My advice: vary your speaking tone, don't dictate for more than say 30 minutes straight before taking a break, and drink lots of water. It would really suck not being able to type or speak.

  44. It's sad that voice control has been deemphasized. by Richard+Steiner · · Score: 1

    Some folks might remember that OS/2 Warp 4 (September 1996) was released with both IBM voice navigation and IBM voice dictation technology as part of the standard package.

    The initial product package even included a headset microphone in the box.

    Not many people used it, and at that point in time it required some initial training to use in an effective manner (it had to learn each person's pronunciation habits), but there were still a few folks I knew that got a lot of mileage out of the technology at the time.

    I wonder why industry focus has fallen away from such tech? Is it that useless in the real world?

    --
    Mainframe/UNIX Bit Twiddler and long time Windows/Linux Hobbyist.
    The Theorem Theorem: If If, Then Then.
  45. SpeechLion by rbrewer123 · · Score: 2, Interesting
    I have a small project based on sphinx4 that allows command and control of Linux. It is really not ready for primetime yet, but help and feedback is appreciated. I have looked into dictation (for email) with sphinx4 but have not implemented it yet.

    http://freshmeat.net/projects/speechlion

  46. Plug for Dragon by tengu1sd · · Score: 1

    Dragon version 8 made major improvements in recognition. The preferred version will read out loud. My wife has neck and shoulder problems, Dragon allows her to use a computer reasonably well. They have ratings for different microphones, I sprung for 5 Dragon usb mic. Doesn't make sense to cheap out after the software's installed. We got an upgrade offer after giving up on Dragon many versions ago. The version 8 release is actually worthwhile.

  47. Dragon Naturally Speaking by Anonymous Coward · · Score: 0

    Each time speech recognition gets mentioned on /. I see comments about how it doesn't work, or comments on ViaVoice. I am continually amazed at the lack of comments on DNS. DNS8 is a fantastic product with full command and control, auto-punctuation, and almost no training needed. The accuracy is high its hard to demo the slick correction tools; you really need to slur your speach to force it to mis-recognize. Unfortunatly there is no MAC or Lunix support. If you want it, please let the management http://www.nuance.com/naturallyspeaking/support/ know. No work will be done unless there is a buisness case for it.

    Also watch for the DNS9 beta starting soon.