Posted by
ryuzaki0
on from the voice-command dept.
malacai writes " IBM has announced that
ViaVoice will be available for Linux." Excellent-IBM does another good thing. Anyone played around with ViaVoice much? I'm interested in potentially using it-once my wrists fall apart.
Touchy feelies with Viavoice
by
Anonymous Coward
·
· Score: 2
I've used Viavoice under Win32. To get good recognition you need to do the full training exercise, which consists of reading about an hour of text; Mark Twain if you're a 'Merkin, or Alice in Wonderland if you're a Brit (or an Australian; apparently IBM is yet to realise there is a difference;). Recognition is pretty good; you can use it effectively for prose. I found you can talk faster than you can type (and I can type pretty fast). Typos do creep in and these are frustrating. This sort of software has a long way to go, but conversely it's already come a long way anyway! One bitch with ViaVoice; for some reason it slowed down MS Word; added maybe 10 second to the MS Word startup (even if you're not using the MS Word support, which is painfully slow anyway). The Wordpad is the best way to enter text, but lacks a multi-level undo feature! And yep, it's easy to delete half an hours work. SAVE. SAVE . SAVE.:)
The PROBLEM with Voice Recognition
by
whoop
·
· Score: 2
I bought a year-old copy of ViaVoice for like $10 or $15 recently. It was able to handle natural speech just fine. You may be thinking of an earlier product line, VoiceType maybe.
But any voice recognition program for Linux should come with some sort of SDK so we can then make macros/scripts to interface with any program. If a company provides us with a decent shell, I'll be more than willing to help in and develop some of these interfaces.
Most speech control apps work best when they're integrated into the UI, or are at least able to interact with it in some way. Anyone know what plans Motif/GTK/QT and/or the X Consortium have to provide hooks for speech recognition integration into X?
I used ViaVoice a while back, and was impressed by the accuracy. Speed sucked, but then I was running on a machine significantly below their minimum spec, and had to wait for it to catch up every now and then.
Basically, ViaVoice is an excellent product, and is pretty useful for dictating documents in human languages. Naturally, it's hopeless for coding or entering commands at a shell prompt, but that's more because speech will never be a natural way to communicate stuff like that than because of any failing in ViaVoice. As others have mentioned, it could prove useful for X10 automation, though.
-- "The invisible and the non-existent look very much alike." -- Delos B. McKown
I have seen this mentioned, but I want to ask a direct question. Does the design of GTK facilitate Speech recognition integration?
If we could get ViaVoice (or any other speech recognition software) to interface with the GTK toolkit well, you could suddenly have a huge number of applications that are speech enabled. Instead of having to make every application compliant... (or have to make it compliant to work WELL)
Integration into the Window Manager was one of the criteria that was discussed in some essay a while back about creating a flexible UI for the future.
kvoicecontrol is pretty wicked.
by
navindra
·
· Score: 2
Launch applications (or perform any string of commands) by speaking into your mike. It works amazingly well.
For example, when I say "connect to internet", kvoicecontrol does "say connecting;/usr/local/bin/nconnect". 'say' is a cheesy speech synthesis program and 'nconnect' is a script that controls X-ISP remotely. Pretty nifty
My only beef with kvoicecontrol right now is that it monopolises (sp?) my sound card even though the AWE64 is full duplex. Fortunately all I have to do is right click on the docked kvoicecontrol to disable it.
ViaVoice is an excellent product (at least under Win32). Sometimes it amazes me as to how it understands what I dictate, of course other times it plainly has no clue. In general it's very good if you have time to go back and correct whatever it has written. It is not suitable as a complete replacement for typing, since it expects you to be dictating in a natural voice (e.g. infrequent stops/pauses between words). Telepathic speech isn't understood clearly by the engine. You would not be able to use this efficiently at a bash prompt or for coding. I suppose if you wanted to write your own grammar (which is possible with Win32 tools right now), you might be able to make a C or a Perl grammar, but moving around the code would be painful. Hopefully ViaVoice will integrate with most applications easily, as it does under Win32. Currently, you can speak to whatever textbox has focus under Win32, and if developers use the free SDK, more functionality (e.g. FONT BOLD ON) could be added to programs. I don't expect wordperfect to support ViaVoice, since they already seem to have a contract with Dragon Systems.
-- / \
\ / ASCII ribbon campaign for peace
x
/ \
ViaVoice+Text2Speech = displayless solution!
by
Tekmage
·
· Score: 2
I like this! Not only could you run it mobile/wearable, you could literally use it right over the phone. "Ask" your home system for a piece of information, or tie it into the mail-server at work through the voice-mail system. Lots of potential.
Now, if only I could find that Linux link mentioned in the article... If someone finds it, please post a link.
-- --The more you know, the less you know.
ViaVoice: depends on the implementation
by
CodeShark
·
· Score: 4
While I am very interested in this announcement, the IBM voice technology I've worked with in Win32 (95 and NT) thus far is not sufficient for full-time use yet. I have used ViaVoice Gold for a couple of years now, and even with IBM's longest voice template "training", occasionally ViaVoice goes loopy and acts like it's dictating to itself, rather than translating from my voice. Thus I have not as yet been able to recommend the technology to my client customers.
However, the state of the art will obviously advance. Optical Charaacter Recognition (OCR) technology four years ago was a "probable buy", however the accuracy has gone up and cost down, so much that it is now a "should buy", and any company requiring significant amounts of document translation is behind the times if it does not have at least one employee competently using OCR.
In voice recognition, IBM is definitely one of the "to market" leaders, especially in the consumer area. My thoughts are that with the cleaner OS code in Linux may actually help IBM develop code that is much more powerful than the Win32 versions. IMHO the number one thing IBM can do to help ViaVoice succeed in the Linux arena (other than GPL'ing the code, which they probably will not do) is provide crystal clear documentation of the API and a powerful SDK to allow other programmers to develop "voice-drivable" applications. This would be similar to how IN-CUBE can be used to drive various applications from small voice commands. BTW, IN-CUBE is already available on Solaris, so maybe the Linux community can persuade CommandCorp to port their product (?)
The faster this technology develops, the better for all of us, especially the motion disabled who can use this technology as a true window to the world. The same group which produces ViaVoice also has a screen reader for the visually impaired which I would like to see in Linux as well.
Let IBM know of your interest, offer to act as a BETA tester, etc. The more we get involved in projects like these, the more quickly Linux will succeed in breaking the M$ stranglehold on the industry.
-- ...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
Didn't Gates say a few months back...
by
cowbutt
·
· Score: 2
...that "Linux would *never* get any sexy apps like voice recognition"?:)
(Never mind that kvoice was already under development.)
could be used for homeautomation
by
ianna
·
· Score: 3
Linux has all the potential to be the core of a homeautomation system... Voice control could be just one part of it...
Lots of sw is already available to control X-10 devices
Speech recognition is a holy grail to many people like myself. I may be able to type 120+ WPM, but what good does that do when my hands hurt like hell the minute they touch a keyboard. That's the price we pay for years of constant computer use -- about 18 years in my case. I'd rather type, but it's getting too difficult. Anything that might save my career and my hobby is a godsend, even if it isn't open-source.
My HMO doesn't give a damn about my problems... Anyone have the download URL for this program?
VoiceType is great technology
by
Shotgun
·
· Score: 2
Once again, we have technology introduced that solves one problem, and people call it crap because it doesn't solve THEIR problem. I've used VoiceType (the predecessor [sp?]) of ViaVoice on OS/2 for several years now (it came with Warp 4). No, it wasn't any good for coding. But when you got to the documentation it was a god-send. Unfortunately, they computer is not 'intelligent' and will type what it hears. So if you pause to say 'uh' and 'hmm', it types 'uh' and 'hmm'. It's also neat to see what rustling papers say. However, if you scribble up a rough outline so that you can dictate in a semi-fluid manner, it makes for an excellent first draft system. You'll still have to go back and proofread, but not any more than you would with manual typing and the dictation is WAY faster. (note: try typing several paragraphs w/o hitting the backspace key.)
-- Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
Sounds like great fun! :-)
by
DigitalRonin
·
· Score: 2
I'm sure I remember hearing about someone doing exactly that (yelling "format cee colon backslash return") just before a demo of some voice recognition software -- wish I could remember whose it was -- anyway, according to the story, it worked! Probably boll*cks, but a good story anwyway.
The PROBLEM with Voice Recognition
by
Silex
·
· Score: 3
I purchased ViaVoice from IBM (for Win32) a while ago. The IDEA behind the technology is a good one. But the problem is, it's not only slower than typing, it's twice as frusturating, and may take up as much as 3x the time it takes to type the same document.
WHY?
(a) Accuracy -- My copy of IBM VoiceType came with a speaker-mic combination head set (made by Andrea). The documentation says that this is ideal for use with VoiceType. So I'm not going to blame my hardware for the inaccuracy of this product. It has trouble recognizing a lot of words. I don't have an accent, so that's not the problem. There are many technical reasons why this happens... but they don't matter to the enduser.
(b) Method of Speech: You can't just talking into the mic, like you normally talk. You have to pause between EACH word. But you MUST NOT pause or slow down while saying A WORD. This.. is.. a.. very.. unnatural way of speaking. Sometimes you forget to pause, or sometimes you accidently pause between multi-sylable words. This is one of the major causes of errors.
(c) Although this product DOES have support for editing the text through voice, it's quite impracticle. If you want to edit text that has already been typed, or you want to format text in a certain way, you're still going to have to use the keyboard, and possibly the mouse. You will find yourself trying to work with the mouse, keyboard and (now) trying to speak in a very unnatural way to the computer as well. It's not a matter of being HARD to do, it just doesn't make sense. It's easier to just type.
I think this application is not very usefull for typing large documents. What it IS usefull for is giving commands to the system through voice. I'm not sure how IBM plans on integrating this with Linux, because Linux systems vary greatly between eachother (unlike Windows, which has a very centralized control over the system, making it easy to make calls to all kinds of programs without knowing what the program really is). But if they can pull it off... maybe get it working with xterm or something, that would be great. And if they could get it working with an IRC and/or an ICQ client, that would certainly make life easier for many of us (that it would be kind of like a low-bandwidth alternative to audioconferencing... especially if you could get the IRC client to 'say' all the text as it scrolls by).
This is a good application, but the whole voicerecognition deal is really over-hyped. I hope IBM plans on porting some REAL software to Linux as well.
I've used Viavoice under Win32. To get good recognition you need to do the full training exercise, which consists of reading about an hour of text; Mark Twain if you're a 'Merkin, or Alice in Wonderland if you're a Brit (or an Australian; apparently IBM is yet to realise there is a difference ;). Recognition is pretty good; you can use it effectively for prose. I found you can talk faster than you can type (and I can type pretty fast). Typos do creep in and these are frustrating. This sort of software has a long way to go, but conversely it's already come a long way anyway! One bitch with ViaVoice; for some reason it slowed down MS Word; added maybe 10 second to the MS Word startup (even if you're not using the MS Word support, which is painfully slow anyway). The Wordpad is the best way to enter text, but lacks a multi-level undo feature! And yep, it's easy to delete half an hours work. SAVE. SAVE . SAVE. :)
I bought a year-old copy of ViaVoice for like $10 or $15 recently. It was able to handle natural speech just fine. You may be thinking of an earlier product line, VoiceType maybe.
But any voice recognition program for Linux should come with some sort of SDK so we can then make macros/scripts to interface with any program. If a company provides us with a decent shell, I'll be more than willing to help in and develop some of these interfaces.
Posted by Arborius:
Most speech control apps work best when they're integrated into the UI, or are at least able to interact with it in some way. Anyone know what plans Motif/GTK/QT and/or the X Consortium have to provide hooks for speech recognition
integration into X?
Basically, ViaVoice is an excellent product, and is pretty useful for dictating documents in human languages. Naturally, it's hopeless for coding or entering commands at a shell prompt, but that's more because speech will never be a natural way to communicate stuff like that than because of any failing in ViaVoice. As others have mentioned, it could prove useful for X10 automation, though.
"The invisible and the non-existent look very much alike." -- Delos B. McKown
I have seen this mentioned, but I want to ask a direct question. Does the design of GTK facilitate Speech recognition integration?
If we could get ViaVoice (or any other speech recognition software) to interface with the GTK toolkit well, you could suddenly have a huge number of applications that are speech enabled. Instead of having to make every application compliant... (or have to make it compliant to work WELL)
Integration into the Window Manager was one of the criteria that was discussed in some essay a while back about creating a flexible UI for the future.
Launch applications (or perform any string of commands) by speaking into your mike. It works amazingly well.
/usr/local/bin/nconnect". 'say' is a cheesy speech synthesis program and 'nconnect' is a script that controls X-ISP remotely. Pretty nifty
For example, when I say "connect to internet", kvoicecontrol does "say connecting;
My only beef with kvoicecontrol right now is that it monopolises (sp?) my sound card even though the AWE64 is full duplex. Fortunately all I have to do is right click on the docked kvoicecontrol to disable it.
get kvoicecontrol here
look in metalab.unc.edu/pub/Linux/apps/sound/speech
There are a couple of speech recognition-type things there
Of all the comments I've ever posted, this is definately one of them
ViaVoice is an excellent product (at least under Win32). Sometimes it amazes me as to how it understands what I dictate, of course other times it plainly has no clue. In general it's very good if you have time to go back and correct whatever it has written. It is not suitable as a complete replacement for typing, since it expects you to be dictating in a natural voice (e.g. infrequent stops/pauses between words). Telepathic speech isn't understood clearly by the engine. You would not be able to use this efficiently at a bash prompt or for coding.
I suppose if you wanted to write your own grammar (which is possible with Win32 tools right now), you might be able to make a C or a Perl grammar, but moving around the code would be painful.
Hopefully ViaVoice will integrate with most applications easily, as it does under Win32. Currently, you can speak to whatever textbox has focus under Win32, and if developers use the free SDK, more functionality (e.g. FONT BOLD ON) could be added to programs.
I don't expect wordperfect to support ViaVoice, since they already seem to have a contract with Dragon Systems.
/ \
\ / ASCII ribbon campaign for peace
x
/ \
I like this! Not only could you run it mobile/wearable, you could literally use it right over the phone. "Ask" your home system for a piece of information, or tie it into the mail-server at work through the voice-mail system. Lots of potential.
Now, if only I could find that Linux link mentioned in the article... If someone finds it, please post a link.
--The more you know, the less you know.
However, the state of the art will obviously advance. Optical Charaacter Recognition (OCR) technology four years ago was a "probable buy", however the accuracy has gone up and cost down, so much that it is now a "should buy", and any company requiring significant amounts of document translation is behind the times if it does not have at least one employee competently using OCR.
In voice recognition, IBM is definitely one of the "to market" leaders, especially in the consumer area. My thoughts are that with the cleaner OS code in Linux may actually help IBM develop code that is much more powerful than the Win32 versions. IMHO the number one thing IBM can do to help ViaVoice succeed in the Linux arena (other than GPL'ing the code, which they probably will not do) is provide crystal clear documentation of the API and a powerful SDK to allow other programmers to develop "voice-drivable" applications. This would be similar to how IN-CUBE can be used to drive various applications from small voice commands. BTW, IN-CUBE is already available on Solaris, so maybe the Linux community can persuade CommandCorp to port their product (?)
The faster this technology develops, the better for all of us, especially the motion disabled who can use this technology as a true window to the world. The same group which produces ViaVoice also has a screen reader for the visually impaired which I would like to see in Linux as well.
Let IBM know of your interest, offer to act as a BETA tester, etc. The more we get involved in projects like these, the more quickly Linux will succeed in breaking the M$ stranglehold on the industry.
...Open Source isn't the only answer -- but it's almost always a better value than the alternatives...
...that "Linux would *never* get any sexy apps like voice recognition"? :)
(Never mind that kvoice was already under development.)
Linux has all the potential to be the core of a homeautomation system... Voice control could be just one part of it...
0 .5.3.tgz
:)
Lots of sw is already available to control X-10 devices
Heyu - http://www.prado.com/~dbs/
Xtend - http://www.jabberwocky.com/software/xtend/
TKx10 - http://www.houseofhack.com/tkx10/
WebX10 - http://members.tripod.com/~famewolf/webx10/
IR control is available using
http://members.home.net:80/ncherry/common/lirc-
Now we just need someone that integrates some function in PHP and we can controll the house via web.
Well, if Viavoice will provide voice controll and KDE a desktop interface, what will stop world domination even in this area?
Marco
Speech recognition is a holy grail to many people
like myself. I may be able to type 120+ WPM, but what
good does that do when my hands hurt like hell the minute
they touch a keyboard. That's the price we pay for
years of constant computer use -- about 18 years in
my case. I'd rather type, but it's getting too difficult. Anything that might
save my career and my hobby is a godsend, even if it isn't open-source.
My HMO doesn't give a damn about my problems... Anyone have the download URL for this program?
Once again, we have technology introduced that solves one problem, and people call it crap because it doesn't solve THEIR problem. I've used VoiceType (the predecessor [sp?]) of ViaVoice on OS/2 for several years now (it came with Warp 4). No, it wasn't any good for coding. But when you got to the documentation it was a god-send. Unfortunately, they computer is not 'intelligent' and will type what it hears. So if you pause to say 'uh' and 'hmm', it types 'uh' and 'hmm'. It's also neat to see what rustling papers say. However, if you scribble up a rough outline so that you can dictate in a semi-fluid manner, it makes for an excellent first draft system. You'll still have to go back and proofread, but not any more than you would with manual typing and the dictation is WAY faster. (note: try typing several paragraphs w/o hitting the backspace key.)
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
I'm sure I remember hearing about someone doing exactly that (yelling "format cee colon backslash return") just before a demo of some voice recognition software -- wish I could remember whose it was -- anyway, according to the story, it worked! Probably boll*cks, but a good story anwyway.
I purchased ViaVoice from IBM (for Win32) a while ago. The IDEA behind the technology is a good one. But the problem is, it's not only slower than typing, it's twice as frusturating, and may take up as much as 3x the time it takes to type the same document.
... but they don't matter to the enduser.
.. is .. a .. very .. unnatural way of speaking. Sometimes you forget to pause, or sometimes you accidently pause between multi-sylable words. This is one of the major causes of errors.
... maybe get it working with xterm or something, that would be great. And if they could get it working with an IRC and/or an ICQ client, that would certainly make life easier for many of us (that it would be kind of like a low-bandwidth alternative to audioconferencing ... especially if you could get the IRC client to 'say' all the text as it scrolls by).
WHY?
(a) Accuracy -- My copy of IBM VoiceType came with a speaker-mic combination head set (made by Andrea). The documentation says that this is ideal for use with VoiceType. So I'm not going to blame my hardware for the inaccuracy of this product. It has trouble recognizing a lot of words. I don't have an accent, so that's not the problem. There are many technical reasons why this happens
(b) Method of Speech: You can't just talking into the mic, like you normally talk. You have to pause between EACH word. But you MUST NOT pause or slow down while saying A WORD. This
(c) Although this product DOES have support for editing the text through voice, it's quite impracticle. If you want to edit text that has already been typed, or you want to format text in a certain way, you're still going to have to use the keyboard, and possibly the mouse. You will find yourself trying to work with the mouse, keyboard and (now) trying to speak in a very unnatural way to the computer as well. It's not a matter of being HARD to do, it just doesn't make sense. It's easier to just type.
I think this application is not very usefull for typing large documents. What it IS usefull for is giving commands to the system through voice. I'm not sure how IBM plans on integrating this with Linux, because Linux systems vary greatly between eachother (unlike Windows, which has a very centralized control over the system, making it easy to make calls to all kinds of programs without knowing what the program really is). But if they can pull it off
This is a good application, but the whole voicerecognition deal is really over-hyped. I hope IBM plans on porting some REAL software to Linux as well.