Voice-Op Linux PDA
Anonymous Coward writes "At http://www.the-times.co.uk/interface/dailyextra5.html is
news of a voice-operated Linux handheld computer to be announced at CeBit next week. Sounds cool!" Oh yeah. Until someone shouts out, "ARRR-EMMM ARRR-EFFF STAR!" Then we'll see who's laughing.
What makes the Palm or a Newton Useful?
The user space apps.
Things like the names/dates/call logging application.
And, face it, most of the apps like that under the modern Unixes need to go on a resource diet if they want to fit on a handheld.
Who's been writing the lo-resource version of Xcalendar? OR a database?
If it was said on slashdot, it MUST be true!
I recall reading once (in Risks perhaps?) about a workplace where they were testing voice recognition. All was well until a disgruntled employee walked down the corridor, shouting "FILE! EXIT! NO!", with predictable results.
I have L&H's voice express for my windows machine, and have found it's text-to-speech features to be rather adequate. Granted, it's not exactly the same as having your own personal secretary dictate the on screen text to you, but then how many of us have a personaly secretary? As for the speech-to-text: well, the enrollment process seemed rather lenghty, but I was able to use the program to do a fairly good job at dictating Emails and such. But, isn't this just a step away from those IBM commercials with the guy in Russia wearing his PC? Seems rather similar to me. This is my first post, so be sure to moderate me down! :)
If Murphy's Law can go wrong, it will.
Voice recognition? Do you honestly speak faster than you type? I mean, think about it for a moment. Find a passage,time yourself reading it in a normal voice that a very sophisticated speech-to-text program could interpret, and then time yourself typing it. You might be surprised. Then again, you may not. Depends on how fast you type I guess. Then you have to take the time to correct any errors, etc with the speech-to-text.
::shakes his head::
/etc/fstab with vi. delete line 2." Good for someone who types slowly I guess.
Then there's the mouse thing, I remember the dilbert cartoon:
Pointy Haired Guy: Higher, higher, higher, ok click there. Now! No! Not There!
Dilbert:
I mean, for certain things voice is all well and good. "open
Now consider this: the gimp
explain voice control in that? "draw the mona lisa"?
Until (IF) our thoughts can be interpreted, I'm gonna support my old keyboard and mouse. I have ten fingers, but I can only make one sound come out of my mouth at once. And well, despite its dopiness, the mouse works great. Oh, and I'm convinced that m$ should have been a hardware company, not a software one. Look at the intellimouse explorer. Working optical mouse, great accuracy, you never need to clean the ball and rollers, and it never looses tracking unless on a mirror or very smooth white surface. I have no complaints with mine. Actually, I've never owned any m$ *hardware* that I really ever had complaints with.
I work for L&H, and I do remember at least one colleague who was testing stuff with Voice Xpress, and he said "Select all", and then "Delete" while trying some text processing commands. Unfortunately his active window was his e-mail program, more precisely his inbox... More than a year of e-mail gone :-)
So, it's not a Dilbert joke anymore, it happens for real...
superblog.org: all your favourite blogs on o
Now consider this: the gimp explain voice control in that? "draw the mona lisa"?
No, not quite. Voice control won't replace any 2-dimensional manipulator interfaces any time soon (at least not for non-disabled users). No one is claiming that the mouse will be rendered useless. After all, "a picture is worth..." Well, ya know.
BUT. How much do you really enjoy clicking around the gimp toolbox? Or, worse yet, searching for a filter you don't normally use in 3- or 4- deep menu system while losing that exact pixel you were over in the image. Right there, a secondary interface via voice would be ideal. No need to lift hand off mouse or move the pointer at all. Just "Use filter A, settings 50%, 3, no." I'm generally against voice recognition, but this would be one of the few spots I'd definitely want to see it.
// zyqqh
Would *that* qualify as "free speech"?
Seriously, Voice interfaces probably have a very limited usage. Some disabled would benefit (much). Hands free applications are very useful in cars and such, but typing is generally less tiresome.
Sure many people type faster than they speak (at least if it is to be interpretable by a machine) but the main problem is that speaking for an hour is very tiresome (and irritating for those around), and commands by voice are difficult compared to mouse and keyboard. ("Swap those two words,... three sentences back" as opposed to drag and drop or the arrow key dance.).
Still cool is always cool...
All opinions are my own - until criticized
http://www.developer.ibm.com/library/articles/niel sen1.html
Have a read what Jakob Nielsen (one of the greats of User Interface design) says, he presents one of the better arguments as to why voice recognition just isn't that good a way of interacting with a machine. Most of the things that voice recognition is pushed forward for can be done better and with greater accuracy with your hands and a well thought out display. There are certain cases where it is the best option, and possibly a PDA is one of them (although I use a Psion and don't have any problem with it at all and I wouldn't want voice recognition) but for the most time its a gimmick that doesn't stand upto the demands of the user.
An Eye for an Eye will make the whole world blind - Gandhi
The killer applications for a PDA are the contact info, schedule, and memos - in general, maintaining a database made of records with a small amount of data in each field. Short messaging (integrated with E-mail) too, I guess - still small amount of data. Everything else is bells and whistles. People do not write long texts on a PDA - they use laptops, or at least buy one of the nifty folding keyboards for their PDA. People do not run GIMP on a PDA.
For these killer apps, a voice API is great: "show today's schedule". "new meeting, March 14th, at 10, with L&H". "new memo: buy milk for santa". "new expense: the L&H account, 112$, business lunch". "show contact Joe". "Message to Jane: Lunch at 2?".
I'd expect you'll need to push a button to make the PDA listen - I wouldn't like one which listens all the time (it might make sense for a desktop system but not for a PDA). I also expect you'd still have a touch-sensitive display, and be able to use a stylus for menu navigation and writing. Just like desktop systems did not give up the keyboard when they got the mouse!
Something like the "Itsy" would be perfect for the above. Take my REX-PRO and add handwriting recognition like the Palm's and voice recognition like the above and you end up with the perfect PDA. The only possible improvement would be integrating it with a cellular phone, or maybe with a holographic projector
Obviously working on the voice UI would take a lot of effort to get right. I predict the initial offering - by L&H or whoever - will flop like the Newton, to be followed by a Palm-like successor which would get it right.
And both L&H and Compaq know this. Thats why they are both using Linux; writing a voice UI that works is a classical open source "itch to scratch". They'll be able to obsolete the first generation software and replace it with a second open-sourced generation - while maintaining the same hardware platform, escaping the Newton's fate. Good move for them, good move for us, bad news for Microsoft
I haven't checked in a while (may a bit outdated), but heres some linux speech apps
For those that really wanna play, check out ISIP 's ASR project.
For those that are interested in aquiring speech corpa (training data) check out The LDC-online. Get the free guest account, use your perl skills and your imagination, and suddenly the TIMIT corpus is yours
Email me if you're interested in this kinda stuff (or want my timitgrab.pl script)... its not my primary address, but I check it from time to time.
I ate my sig.
This is basically the last big hurdle on the way to what I call Gear. (The name comes from the short-lived SF series _Earth 2_, where it referred to the heads-up, voice-controlled computer/communicators the humans wore.) Consider:
Morning. Get up. Get dressed. Put on your Baldric, a Miss-Universe-style sash made of trendy-stereo-grey squares, roughly the size of cigarette packets. You're going for state-of-the-art, so your Baldric contains:
- a RAM RAID, four or five Gear Cells of high-capacity, non-volatile memory, redundantly copying each other so that nothing short of a flamethrower will cause memory loss.
- a Jack-In-The-Box, a cell containing a speaker, microphone, infrared and microwave tranceivers, all sorts of cable in/outs, and all the software necessary to allow your Gear to communicate with the mobile phone network, internet, infranet, and you.
- a Brain Cell, a pluggable, replaceable processor.
- an Eye Ball, a cell containing a digital camera and a projector; this does most of the visual display work, projecting on a nearby wall, or connecting to your optional heads-up display.
- a Handle, a slightly oversized cell with a chord keyboard _and_ a Palm-style stylus/graffitti-pad arrangement for quick, quiet text input.
You operate your gear using voice commands, mostly, but like most power users you don't only use English. GearCorp have followed the example of Palm Computing, whose Graffitti is not quite standard handwriting but rather a modified, streamlined version. Knowing that some sounds are easier to detect than others, they invented a language called Glish. So: a casual user might open a work file with the command "Menu File. Open. Section 'Work'. Section 'Memo'. Document 'DailyMemo'.", On the other hand, you, as a power user, would say "Fie Oh Dok At 'Work' At 'Memo' At 'DailyMemo'". Rolls off the tongue, and is much quicker for you and the Gear.
Go to work. That is, go to the park, sit there and conduct work in relaxed surrounds. Take calls, write programs or documents, "attend" meetings, all while sitting on a park bench watching the world go by. If you need confidentiality, use the Handle, or speak in Glish. In your briefcase you have a full-sized foldable keyboard and a foldable flatscreen with easel legs, so you can avoid using the Handle and the Eye Ball if you like.
I think it'd work. I think it'll be here within five years. And I think it'll change the computing world more than anything since VisiCalc.
: Fruitbat :
I have discovered a truly remarkable
Do you want the new user interface applications developed in open source on Linux, or only on MSWin3K and the occasional Macintosh? Yeah, I thought so... There's also the PDA-like devices that will come from the cell-phone makers, and it'd be nice to have good programming interfaces to them. Some things will be killer apps, others will be toys we get bored with quickly, but open development environments will make it easier for everybody to try things out.
Some user interfaces are just dumb replacements for keyboards on machines that have conventional-sized screens. There are a lot of problems for which this is adequate, including the typing-impaired but also applications where you want hands-free but don't need to be eyes-free, such as information kiosks ("mirror, mirror on the wall, where can I find beer in this airport?"), reference-finders for workers in messy environments ("zoom in on the picture of the carburetor"), etc.
Voice commands can also be mouse/menu substitutes, for people who like them. A long-known safety principal is to limit the commands to a relatively short set of very safe commands. You don't want to have "rm -fr *" there, but "mail" and "phonebook bob smith - yes - dial" are pretty safe. (Ok, there are still risks like that web site with the background sounds saying "phonebook 1-900-RIP-OFFF - dial", but you can decide how much risk management you want. And you want it to ignore almost anything after the keyword "Daddy".) One of my coworkers had a PC-based application; we'd be on a conference call, and he'd occasionally interrupt to tell his computer to fetch a file. He doesn't use it much any more - I'm not sure if the novelty wore off or if he decided to cut down his weirdness quotient on the phone.
If you're willing to do voice input and output, portability becomes more practical, and computers can be a lot smaller because they don't need screens and keyboards, and more flexible because you can stick them in a pocket or backpack and use a headset. Sure, people will look at you funny walking down the street talking to yourself, but here in San Francisco, half the people on the streets are either talking to their cellphones or their liquor bottles, and society has adjusted to it. A hands-free voice portable makes an interesting combination with a GPS system and datacomm; it can give you while you're driving, tell you about nearby restaurants and traffic jams, and maybe let you call nearby cars ("Hey, CA123456, use your &^%&^% turn signal!").
MP3 Players can also benefit from voice interfaces, since it mainly requires adding a bit of storage to the computer you're already carrying. ("Computer, play Dark Side Of The Moon three times, volume low, speakers, order large pizza from Foobaros.").
Bill Stewart
New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks