IBM, TrollTech Integrate Linux Voice Recognition
Paladin128 writes: "Talk about cool technology. Linux may get widespread voice recognition before Windows, as this article mentions that IBM's ViaVoice will be bundled with Qt, and allow the programmers to use BNF to create parsing rules, and bid voice input directly to Qt components via Qt signals and slots. This level of integration evidently wasn't possible with Win32, thus there were performance issues. And since Qt is open source, the GNOME people could easilly find a way to integrate this technology into GTK+. Between adding voice to the handicapped accessability list, offering KDE in more languages than Windows is available in (I don't use GNOME so I can't comment on how it's doing here), and more customization than Windows can ever hope to offer (such as choice of desktops), Linux could really make some waves this year." Just don't mention "rm -rf" when you're near the microphone ...
Let's see, which QT event should I bind *sneeze* to?
Reality checks:
- it's going to take just a few more CPU cycles you have, no matter how many you have
- it's going to take ages to get translated to all these fancy little European languages
- it's going to take longer to tweak up to usability than all those Eterm background pictures
- it's still slower than typing.
Embedded devices, yeah, but they don't have the muscle. And nobody wants to spend hours to teach their coffee machines or garage doors to listen.
We have each other to misunderstand ourselves, the machines don't really care anyway.
Anyway, where's the tarball? This is going to be soo fun. I'll just have to clean up my place so I can convince someone to come over and witness my geeky coolness once it's running...
I think, therefore thoughts exist. Ego is just an impression.
Restaurants have this system call Squirrel that lets them input orders and it maintains the bills. Two of the common complaints with the touch screen interface is that the servers, who use the system, find it unintuative, and often have their hands full. This sounds like you can make a system where they just have to say "Table 13, burger with fries, philly steak no onions, 2 cokes", and it would do alright. Servers tend to have their own lingo for the meals they serve, and the system could understand them, in list format, with the exceptions. You wouldn't expect the populous to deal with such a system, but a server would pick it up quickly, and they could do it while clearing away dishes.
-no broken link
...does this new code integration mean that Debian will have to move KDE into non-free again? I never see the licence issues with this discussed (this stuff was announced before). ViaVioce is not Free Software. So, how do they plan to solve this?
It's... It's...
"We can confirm that Debian does *not* ship the version with the trojan horse. Our version predates it." [CA-2002-28]
Yes, VR is overrated... unless you don't have any hands. Or even if you've only got one hand. Or if you've got arthritis. And so on.
It's true that VR is not much use for able-bodied coders, but it is useful for able-bodied letter writers who don't type so fast.
-- Don't Tase me, bro!
As i saw, there are a lots of comments on now useless is speech recognition for coders. Of course it is. Voice recognition is made for secretaries, writers, housewifes and handicap persons, not for coders. Of course it wount interpret your speech in comand line (imagine yourself in someones eyes dicating vi commands). Idea is ti make computers easy to use for non-computer literate persons.
Typing CTRL-B is surely quicker than saying "bold on" surely? Hitting alt-tab a couple of times is quicker than saying "mail window" surely?
;)
Maybe Emacs users would benefit, since it probably *is* quicker to say "bold" than it is to type "meta-x-embolden-text" or whatever
Emacs baiting aside, though, this is great news for a segment of the disabled market, but I really don't see the mainstream applications. Not to mention how awful a place the average open-plan office would become if voice-recognition took off...
--
For example, use the Star Trek test. They've got very powerful computers (nevermind that they can be infected with weird space-borne contaigons), but what do they use voice controls for? Asking questions, controlling their environment, etc. When they need to program a new subroutine for the deflector dish, though, they use the keyboard.
Which brings up another question: Has anyone done any serious investigation into context-modifiable keyboards? My understanding on Star Trek is that their keyboards change their layouts depending on who's there, and what they're doing. I've always thought something like that would be fantastic, say, for switching into Quake or a flight sim -- make your keyboard LOOK like a control panel, so you don't have to remember that "." is strafe or whatever...
As for voice control, I'd really like to be able to control house systems (see my ÜberTiVo posting under the Set Top Box thread). To say "Play 1812" and have the system start playing it for me. Or "Where's my dinner?" and have the computer tell me to cook it myself (hey, gotta be realistic here). Or to just start rambling on, stream-of-conciousness, in a rant or rave about what's really annoying or cool, so I can edit it down to a letter later. That is what I think we need, and it's more on the application side than on the OS side.
Of course, we may already have good solutions for this, I just haven't been able to play with them yet... :(
everyone here is knocking voice recognition as useless. Well, I'm here to say that it is not.
I used OS/2. Version 4 came with a version of voice recognition, and I ran it on a 100MHz Pentium with only 32Meg RAM. It ruled in the proper place.
First, the system is good for first drafts of text documents like long reports. Don't expect to get a perfect copy the first time through. The output from the voice input will require some cleanup. But guess what, so does anything I type.
Very few people type anything close to 80wpm. I only get around 40. Voice type allowed around 100wpm. For those l337 haX0rz that can type and think that everyone should be able to...go out and see the sun every now and then!!
I would write up my report in note form, basically just outlining what I wanted to say. Anything that I had to quote got a reference to the text I would quote from. Then I locked the bedroom door to keep out noise from wife and kids, gathered my notes and references around and started talking. An hour later I had the first draft of a ten page report. I've spent 4 doing it manually.
You may not have a need for it, but if you're in school or any other place where you have to produce long reports and you don't type with flaming fingers, then voice input can be a real boost to productivity.
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
Use your imagination here, nobody said you have to use it for dictation. I'd like to set up my own HAL-style computer with a few microphones throughout the house and program it to open xmms and play a song or control lights and possibly other appliances. Even better, allow it to take remote voice commands, so I could call my computer from my cell phone and tell it to start making coffee or run apt-get update.
In any case, there are a million and one cool things you can do with voice recognition (well, until your HAL-9001 tries to kill you and you end up dead or on another plane of evolution, whichever comes first), and I'm sure the ideas I have right now are just the tip of the iceberg.
My other
I can speak faster than I can type.
If I can have my speech recognized *accurately* then I can gain in productivity.
Real world proof are the articles of Charles W Moore at www.macopinion.com and www.applelinks.com
He creates all his articles using iListen for Macintosh.
If I can do the same with my linux boxen, then this is a dramatic leap forward for me.
A host is a host from coast to coast, but no one uses a host that's close
You don't have to spend hours training you coffee machine.
For that, all you need is command recognition. It's orders of magnitude simpler than dictation and can be done with little or no training.
Listen to ViaVoice's recordings of what it thinks you've said when playing with its correction feature and you'll see just how hard a job transcribing complete, dictated continuous speech with a wide vocabulary really is. Even deciding where one word ends and another begins is far from simple - but that sort of problem is so myuch simpler with a limited vocabulary and no continuous speech requirement. Both of which can be done with that sort of device.
I agree about coughing and, erm, well, thinking, er about what you, er, were trying to say. I always found I needed fair presence of mind to get something readable (especially if formal) down on the page. If you think the above is exaggerated, try dictation software and you'll see what I mean.
Greg
(Inside a nuclear plant)
Aaaarrrggh! Run! The canary has mutated!
I see a number of postings here to the effect that voice recognition, especially for dictation, will be largely useless. The problem is that these postings are considering the use of voice recognition as a replacement for typing within the current crop of user interfaces.
The true power of voice recognition is not in replacing the keyboard. It comes with allowing new forms of interaction with a computer. Consider the simple task of checking the weather. Pulling up a browser and heading to weather.com is no big task, but why would I want to sit at my computer and have to do that just so I can decide how heavy a sweater I'll need for the day? Why not just ask the computer to read me the forecast while I'm getting dressed?
Many people would assume in this scenario that one would call out: "Computer, browse to h t t p colon slash slash w w w dot weather dot com. Read page." How about simply calling a script intead that does all the hard work behind the scenes? "Computer, what is the weather forecast for today?" The use of predefined grammars, as the article describes, will make such queries very reliable as they will be much easier to recognize.
This may have been a simple example, but hopefully it gets the point across. Voice recognition is not going to replace typing. As many have said, some people can type much faster than they can dictate text. Once you start considering higher level interaction with the computer, however, the situation changes, and voice recognition systems will really show their colors.
-kris
So, Windows 200(1|2|3) = OS/2 1996. Glad the Windows world is catching on. Maybe now they'll adopt a system-wide object model like OS/2 1992.
There's no "we" in team, only "me"
Okay, somewhere in there is a wise-ass comment about the usefulness of voice-recognition for porn-surfing, but I won't stoop to that level... :-)
---
"They have strategic air commands, nuclear submarines, and John Wayne. We have this"
Hacker Public Radio is our Friend
The only drawback with the OS/2 version is that it only supports discrete, not continuous, dictation. This means that you need to pause between each word. For voice navigation, that's not a problem. You also have to go through a three-hour "training" session if you want it to work well.
So before you get all excited about how Linux might beat Windows, you should not forgot that OS/2 is real competitor here.
--
And the men who hold high places must be the ones who start
To mold a new reality... closer to the heart
Most of the coders I know type anywhere from 60-140 words per minute. When coding, this measure of speed goes out the window, but it still is a fair shade faster than actually discussing what they are in the midst of coding.
Most writers I know type anywhere from 60-170 wpm. I type on the lower end of this scale, about 80-90 wpm. Again, this is significantly faster than I can comfortably speak.
When *editing* code or text, however, voice commands cannot hold a candle to a combination of mouse and keyboard commands, especially with newer trackballs and 'wheel' mice.
"Page up. Page up. Page up. Stop. No, go up. Stop! Not delete! Damnit!"
The next Slashdot story will be ready soon, but subscribers can beat the rush and slashdot the links early!