Talking Palm
Isotopia writes: "This article from the NY Times is very cool. It's about this guy from IBM who was able to put voice recognition on his Palm III and it talks to him!
It can remind him about meetings and it will tell him when his battery is getting low." I bet if you used this much, it would tell you how low the battery is -- frequently. That aside, it's amazing that IBM has been able to squeeze this onto a Palm.
Not that I am a huge fan of meetings or anything, but the last thing I want is more annoying handheld technology showing up in meetings.
*pager*
*cellfone*
*palm*
And now a frigging TALKING PALM? Then again...
Eliza + Talking Palm + Male Real Doll = no more meetings ever. Hmm....
http://windows.scares.us
talk to the hand cause the palm aint liss'ning. oh wait, yeah it is. hey palm, wassup G
yo my battery is audi 5000 aight peace out
lates, palm
slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
Personally, I think this sort of tech is better used in cell phones. A device which already has a decent text input system is probably only made more clumsy by including speech recognition and text to speech capabilities. Why? Because it "requires" switching modes of interfacing with the device which is something humans don't tend to like. Rather, most people will choose one mode and stick with it. And, be honest now, you can guess which mode that will be: stylus or keyboard. On the other hand, in cell phones, the vastly predominant mode is already voice and hearing oriented. It would be really nice to be able to get rid of the keypad (or at least severly reduce its usage). Other reasons cell phones are a better place for this tech: when you listen to a cell phone, what you hear is private. Cellphones cannot speak at you: they ring first. Two different rings would be sufficient to distinguish between a person call and the cell phone telling you something.
Helping with organizational effectiveness is our job.
IBM Via Voice is supposed to have similar software bundled with the new Ipaq 3700 and 3800 series, but since those won't ship until November, I haven't had a chance to play with it.
Also, there has been a voice-controlled Contacts lookup program on the Pocket PC for a while (too lazy to look up the link), as well as software that will read the time to you at regular intervals and when you turn the device on (TimeTalk).
I'm not trying to discount what's being done here on Palm (in fact, it's amazing they got it to work given the anemic processing power in Palms), but I wanted to mention that a lot of this functionality is available on Pocket PCs here and now.
Jenova_Six
screw X11, think of the marketing opportunities for X10!
It could be worked in submliminally, like this:
"time for meeting [buy an X10-cam] with your boss"
"loading zap!2000 [buy 2000 X10s put them everywhere]"
"time for kinky [tape your babysitter] sex with your [keep an eye on her] mistress at the Ritz"
microsoftword.mp3 - it doesn't care that they're not words...
They didn't. They made the palm bigger by adding at least a mic, speaker, and an additional processor to it. The first two are par for this course, though the handspring visor at least has a mic built in. The third makes this into a pretty basic accomplishment for someone with IBM's resources, especially if that CPU has more RAM attached to it, or embedded in it.
All I really want is a speech recognition module for visor. I don't want my palm to talk to me, one of the nice things about a handheld is that only I can tell what's going on on it. The visor already has a mic built in, so now I just need the speech recognition hardware/software in a handspring module.
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
A proper voice recognition system should be able to understand any words in the English language... the chances are this system is simply used to control a few Palm commands and therefore the incoming speech patterns only need to be compared to a few stored patterns. Then a system of pre-synthesising the outgoing speech would reduce further the demands on the CPU but use more disk. I have my Pentium 75 talking to me using the University of Edinburgh's Festival system on Linux by pre-synthesising the most important words.
By the way, the festival system is excellent and takes under ten minutes to download, compile and install!
http://blog.grcm.net/
I can't see this being a bit hit. My experience with voice recognition software, even on fast computers, has been that without a good microphone and very little background noise, recognition is horrific. Most of the world is, unfortunately, rather noisy and as such muttering into a palm pilot is going to produce very little workable speech - and yelling into a palm pilot is likely to get one arrested for being a freak.
Worse - imagine sitting in a boardroom meeting.
CEO: "well, gang, sales results are up for this quarter!"
fifteen cronies all mutter into thier palm pilots in unison - "well comma gang comma sales results are up for this quarter exclamation mark new sentence" except for the one poor sap who accidentally brushed his thumb across the front panel of the palm while dictating, and is madly muttering "begin edit delete r-e-s-u-l-t-s-delete-s end edit". Just what the world needs - longer meetings.
Or a girl gives you her number at a bar, and you proceed to yell it into your palm pilot - is that cool? What about those of us who love using our palm pilots while in the bathroom? Imagine wandering into a public bathroom with geeks muttering in every stall? The kind of stuff I wake up in a cold sweat in the middle of the night having nightmares about, I tell you. Even grocery stores would produce entries like this:
TODO LIST: Don't forget attention shoppers to get sale on meatloaf a gift in aisle for mom seven
I can't see it being too useful.
-- "Ignorance more frequently begets confidence than does knowledge." (Charles Darwin)
"Palm, record new to-do item."
"Ready"
"Remember not to refer to boss as 'dickhead' when talking to you. End recording."
"Note saved."
(later) *Bling,bling* "Reminder: Weekly jerkoff meeting with Dickhead in 10 minutes."
"Um, I thought I told you we bumped that meeting up... Now please apologize to Mr. Cooper."
Kevin Fox
Saw and had several conversations with this person at an IBM-only conference up in Vancouver earlier this year. It's actually just a proof of concept to show off some cool uses of voice rec/synth technology.
:-)
It was a standard Palm III that had a snap on module with it's own processor. It ran off special batteries that only last for like 2 hours. Not really something ready for prime-time.
HOWEVER - he was doing some REALLY cool things with it. They have several languages in it. As a result, one of the applications was a basic language translator. He spoke in English, out came japanese. He graphiti'ed in English, out came German speech.
He was able to speak to create memos, appointments, to-dos, etc. It would also read those back to him.
While I'm not allowed (damn NDA!) to discuss the future plans that they have, suffice it to say, that this is just the first step. If they get the funding to take his vision to reality, I'm DEFINATELY ditching my old Palm for a new IBM unit someday.
Also, all those IBM commercials showing really wierd stuff (like the coke machine that dispenses when you use your cell phone, or the guy trading stocks in the middle of that park using the head mounted monocal display) - that's all REAL stuff that they actually DO have working today as prototypes.
God I wish we could fast forward 3 years....
Processing power: this is a nuisance. It's not that you can't get enough processing power into a handheld or cellphone these days, but:
User expectations (a.k.a., the Star Trek problem, a.k.a., even that clunker without circuit breakers that Kirk talked to could always understand him perfectly): This is a general speech-recognition problem, but it gets more intense the more mass-market you go. Palm pilots are largely successful because they don't try to do too much, but do what they do well. It's hard to set that kind of expectation reasonably for nontrivial speech recognition. Even worse, I think that people are actually more demanding of a self-contained special-purpose device (with more limited resources, as above) than they are of general PC software.
User interface design: this is still a largely unsolved problem; how do you really want to interact with a PDA by voice? It's hard to arrange a device so you can look at it and be close to the microphone at the same time, which complicates the picture. Dragon Systems back in their pre-acquisition days sold a product called "Dragon NaturallySpeaking Mobile Organizer" that was an interesting step along the way. They didn't put the speech recognition into the handheld -- speech was recorded into a handheld recorder, recognized on a PC and synched up with PDA later -- but the product did attempt to deal with the interface questions of large-vocabulary PDA-based speech recognition; e.g., when you say something, is it intended for your calendar, your email, or your address book? How many variations of "next Tuesday" can the device understand? The general interface problem, once everything's in the same device, is still open and interesting.
My Newton MessagePad 2000 (upgraded to 2100) has been talking for years. Apple wrote a Macintalk extension years ago, which was never released. It was leaked however, and is now widely available.
Furthermore, just recently, an old Dragon Dictate demo for the Newton has been found and released. While the Newton's vocabulary is limited, this is true voice reognition nonetheless.
I dislike Apple Computer in general, and the fact that they discontinued the Newton didn't help my opinion. Nonetheless, I still feel the Newton MP2.1k is the greatest PDA available, even today. Unfortunate that Apple no longer makes the best product they've ever produced.
Voice synthesis (I dunno about voice analysis, however) has been around since the early 1960's. A few years ago, I picked up a CD called "Computer Music Currents, Vol. 13 : A History Of Digital Sound Synthesis" published by a German outfit called Wergo. It contained nothing but rare, early recordings of engineers trying to produce music with computers, with some attempts going back to the late 1950's.
Anyway, this CD came with a booklet, and an interesting story. Theres a famous scene in 2001: A Space Odyssey where HAL offers to sing "Daisy, Daisy, A Bicycle Built For Two" as he's dying. Arthur C. Clarke once visited AT&T Bell Labs in New Jersey in 1962 where he saw a demonstration of a "singing computer", in the form of an IBM 7094 Mainframe with voice synthesis capabilities. The engineers had taught the machine how to play the song, and then superimpose a synthesized voice ontop of it, in realtime. It impressed (or scared the shit out of him) enough that he chose to write it into the story, and what later became the film.
All of this was done under 128K of RAM, top to bottom.
The story also has an interesting anecdote about how many punched cards it took to pull it off-- Something like 28,000 paper punch cards if I remember correctly. The engineers (one of whom later turned out to be my C and x86 Assembly instructor in college) remembered there was some concern about how to transport them, that putting them in the back seat of a Volkswagon would crush the axles. Heheheh..
Cheers,
Bowie J. Poag
It's juvenile, but I couldn't resist the image of a talking 'palm':
Dave?
What are you doing Dave?
I can't let you do that Dave.
Not again Dave!
It's only been fifteen minutes since the last time Dave.
You know it makes me feel dirty Dave.
You could at least wash me afterwards Dave.
Can't you just get a girlfriend instead Dave?