Using PDAs for Dictation?

Well... by acehole · 2002-11-22 06:10 · Score: 3, Interesting

The reason for this can be put down to a couple of reasons.

First off, buying a dictaphone is still much cheaper than a PDA with software.

And secondly the whole voice/word recognition program market hasn't really accomplished any great leaps or bounds over the past five years, not to mention it's not popular in the mainstream yet.

--
Be you Admins? nay, we are but lusers!

Re:Well... by Anonymous Coward · 2002-11-22 06:15 · Score: 0

And secondly the whole voice/word recognition program market hasn't really accomplished any great leaps or bounds over the past five years, not to mention it's not popular in the mainstream yet. We're assuming human years here, but in computer years that's the equivalent of nothing happening in 100 years. This means it probably won't pick up anytime soon until there's a major breakthrough in methodology. They say it gets exponentially harder to produce good dictation software.
Re:Well... by danormsby · 2002-11-22 06:19 · Score: 0, Offtopic

> First off, buying a dictaphone is still much cheaper than a PDA with software.

Dictaphone? I use my finger.

--
Omnis amans amens
Re:Well... by stratjakt · 2002-11-22 06:21 · Score: 5, Funny

>> First off, buying a dictaphone ...

DICTAPHONE? DICTAPHONE?

re-vulcanize my tires, post-haste. And make sure this post is on the next auto-gyro to Prussia.

--
I don't need no instructions to know how to rock!!!!
Re:Well... by Anonymous Coward · 2002-11-22 06:42 · Score: 0

Stand back, everybody! It's a penis joke!
Re:Well... by suman28 · 2002-11-22 06:55 · Score: 2

I definitely that with the above poster that the market is very small and has not made many achievements. Also, you have to sometimes wonder about IBM. They invest lots of money, do so much research, come up with this great product and that's it. They just leave it. Take the thinkpads, linux, OS/2, Informix, Lotus Suite.... and so on. I just don't understand the mentality of the execs there.
Re:Well... by banzai51 · 2002-11-22 07:02 · Score: 4, Funny

Reminds me of what someone at work says about IBM:
"IBM: Where software goes to die."
Re:Well... by caino59 · 2002-11-22 07:05 · Score: 1

wow, yet another case of someone not even reading the original post....

caino

Don't touch my .sig there!
Re:Well... by _ph1ux_ · 2002-11-22 07:08 · Score: 2

They do this so they can stroke their egos when some other company develops a product that uses some sort of technology that IBM did some research on in the past.

It makes them feel like they are a super tech think tank ala PARC...

They do come up with some great stuff and I would bet that if IBM were a japanese company the entire tech industry would look totally different.
Re:Well... by Anonymous Coward · 2002-11-22 07:28 · Score: 0

Dictaphone? Sounds to me like the guy just spend the money and get an electronic robot suit. And probably one of those penis pumps too. What else could you possibly need?
Re:Well... by Anonymous Coward · 2002-11-22 07:44 · Score: 1, Informative

wrong..

I work in a retail location that sells both items, PDA's and Dictaphones. Basic Palm PDA (Zire of M105, for example both run for $169.99 CDN...

Dictaphone is $300

Do the math, a basic palm would be cheaper and more cost effective...

There may be a better solution though...

Olympus, on the other hand, manufacturers a very cool little digital voice recorder, that has a USB docking station, uses MP3 compression to boot. Dictate to the recorder, download the MP3 to your desktop, and run it through ViaVoice or Dragon Naturally speaking, and you get a word document.
Re:Well... by Anonymous Coward · 2002-11-22 07:53 · Score: 2, Funny

This book must be out of date: I don't see "Prussia", "Siam", or "autogyro".
Re:Well... by utexaspunk · 2002-11-22 08:11 · Score: 1

...so that would explain why they've been big on linux...
Re:Well... by Ponty · 2002-11-22 08:14 · Score: 1

You're forgetting the cost of software needed for that PDA. Also, do you know that the basic Palm has the horsepower to run voice recognition? Also, you're forgetting that it needs a relatively high quality microphone.
Re:Well... by DFossmeister · 2002-11-22 08:52 · Score: 1

I always thought that was CA. They bought lots of good software products, bastardized it, and lost their marketshare.

At a pharm. where I used to work, when CA bought a product we were using, we immediately contacted all of our sales folks to find a replacement solution.

DFossmeister

--
No Not Again! Its whats for dinner.
Re:Well... by fishbowl · 2002-11-22 11:22 · Score: 2

"First off, buying a dictaphone is still much cheaper than a PDA with software."

Yes, but, buying a dictaphone (or a digital voice recorder, or a microcassette recorder, or a minidisc, which I personally prefer) isn't the whole solution. You forgot the price of the Typist. Or failed to consider that the original poster has no hands.

--
-fb Everything not expressly forbidden is now mandatory.

Simputer by papasui · 2002-11-22 06:11 · Score: 3, Informative

http://slashdot.org/article.pl?sid=02/11/19/234216 &mode=thread&tid=100

Re:Simputer by Keighvin · 2002-11-22 06:19 · Score: 3, Insightful

The Simputer comes with Text-to-Speech out of the box, but not Speech-to-Text. It does have microphone and USB jacks, so loading additional software may be an option. Battery life is in the not-so-great realm as the major downside.

--
Any spoon would be too big.

My god by Anonymous Coward · 2002-11-22 06:11 · Score: 5, Funny

Next thing, you'll be wanting a machine to wash your dishes and clothing, or, heck, let's be crazy, and send moving pictures around the world!

More to do with perception by zanerock · 2002-11-22 06:11 · Score: 5, Interesting

I think it has more to do with the perception of voice dication as unreliable and resource intensive rather than any actual fact, as the poster points out, it can be done fairly cheaply.

I have not had much experience, but I think the other thing is that people are averse to any sort of training or teaching required, no matter the long term dividents.

Like most things, it comes down not to fact, but to perception and prejuidice. Most people base their buying decisions on 30-second spots, not informed research, so the cost of educating people to is too high for producers to incur.

Re:More to do with perception by Locutus · 2002-11-22 07:13 · Score: 5, Interesting

I met some people at COMDEX who have VR(voice recg) running the the Sharp Zaurus. I've run IBM's VR software and it was pretty good 6 years ago. On the Zaurus, I would imagine that at 256MB CF card could hold a good sized dictionary so dictation appears to be possible. Especially since this guy was doing it on a 16MHz 386 years ago.

The ability of the Zaurus to take a MIC input makes a big difference since a good MIC is important due to noise cancelling features they have. All the PDA's with no external MIC option are pretty much useless for VR/Dictation.

LoB

--
"Anyone who stands out in the middle of a road looks like roadkill to me." --Linus
Re:More to do with perception by numatrix · 2002-11-22 07:20 · Score: 1

Any idea what software they were running? I've got a hefty-sized SD card and a sharp zaurus myself...
Re:More to do with perception by benzapp · 2002-11-22 09:20 · Score: 1

Even 256 megs is way too much. I last ran voice dictation using the Voicetype dictation that came with OS/2 v4.0. That was 1996. At the time, I had a pentium 100 with 32 megs of ram. The whole voicetype install took up perhaps 80 megs. At the time I was using a 340 meg hard drive as my main boot drive, and a gig drive for storage.

I imagine a 400mhz Xscale has the same processing power as a Pentium 100. 64 megs of ram is common... Who knows.

Of course, my Casio E150 can barely run Pocket IE and play mp3z at the same time. But that is a 150mhz MIPS processor.

--
I don't read or respond to AC posts
Re:More to do with perception by Anonymous Coward · 2002-11-22 09:54 · Score: 0

>Especially since this guy was doing it on a 16MHz 386 years ago.

That's silly, they didn't have computers 386 years ago.

*ducks*
Re:More to do with perception by Cy+Guy · 2002-11-22 10:02 · Score: 4, Informative
PDAStreet has lots of info on PDA Voice Recognition:
- IBM VR for iPAQ (but I think that is just voice control, not dictation)
- Voice Mate Organizer - Voice Recognition PDA "allows people who are Blind and visually impaired to store and retrieve information" (but still might not be full dictation capable)
- One thread led me eventually to this 1999 Article on the status of VR for PDA's
- Finally after much Googling, I eventually came across this promising Press Release from about a year ago. That Company seems to have a product (basically an SDK), but it's not clear if anyone is putting it to use in consumer PDA application (Samsung uses them for some phones).
Since it doesn't look too promising I think you may want to expand your search beyond PDAs. I saw several references to the linux based simputer, maybe one of those with Linux based speech-to-text software is the way to go?
--
Work for Change & GET PAID!
Re:More to do with perception by drayzel · 2002-11-22 10:34 · Score: 1

I think that 256MB would be overkill. I tried out ViaVoice when it came with OS/2 Warp 4. The first time I installed it I trained it for about 60 minutes. My thinking was that if 10 minutes is good then 60 minutes would be even better. Training it for 60 minutes made a large 'dictionary' that slowed the app down so much it was virtualy useless, so I deleted it. A few months later I installed it again but only trained ot for 10 minutes. It worked very well with the smaller voice 'dictionary'. I later found out that if the voice dictionary is too large it just confused the software. I guess I should have read the intructions! Keep in mind this was an old Cyrix-P150 with 32MB of RAM and a variety of other trash hardware. I think one of those Xscale might hande it. BUT, I would think the FPU on those are pretty poor, if I where designing a cheap, low voltage, small, cool running CPU thats main function would be a contact list, scheduling, note taking, and silly games I would kill or cripple the FPU portion... then again if I were designing chips people would probably prefer a small notepad and pencil to a PDA. ~Zilch
Re:More to do with perception by Locutus · 2002-11-22 10:53 · Score: 2

It was their own since L&H is the best known name for speech recognition( and before ENRON, for cooking the books ;).

I'm hoping to do some work with these guys so I will eventually find out more. This thread will be history by then so www.zauruszone.com is where you would look for new things Z related. IMHO.

LoB

--
"Anyone who stands out in the middle of a road looks like roadkill to me." --Linus
Re:More to do with perception by Chanc_Gorkon · 2002-11-22 11:43 · Score: 2

Actually thanks to compression, dictionaries can fit in alot smaller space. Try 2.53 MB. That's the size of the dictionary I have in MS Reader format on my e740. I think the problem with voice recognition on handhelds is it just plain doesn't work even on PC's and server like devices. Ever call one of those stupid voice recognition callers? I have on Xerox's tech support line. You have to say the serial number. Rattle it off too fast and it can't understand it. Slur even just a little bit and it screws the letter up. Never mind trying to call one inwith a stuffed up nose! Thing is these things plain don't work. THAT'S why noone has even tried on a PDA yet.

--
Gorkman
Re:More to do with perception by js7a · 2002-11-22 14:47 · Score: 2

perception of voice dication as unreliable and resource intensive rather than any actual fact...

Well, the fact is that typical minimum word error rates don't often get much better than 95%, and even when they do get better, it just makes the errors all that hard to miss while proofreading.
Your 3rd grade teacher might have approved of your writing with one out of thirty words spelled wrong, but your clientel are less likely.
Interface and software system quality for automatic dictation on the PC-compatible platforms has actually gone down, with many people opting for the old, discrete dictation systems rather than the new, continious systems. Much of that has to do with the fact that operating system complexities have gone way up ever since Win32, but many if not most people can achieve superior word error rates using discrete systems with practice.
Also, many of the common handheld CPUs (e.g., ARM, Xscale) have around 0.4% of the amount of cache memory recommended for general-purpose, 32 MB RAM systems. That's a big part of the problem right there. But cache memory eats battery life when it's not eating silicon area.
So, those new Xscale processors that run at 400 MHz, but they have a 100 MHz memory bus, a tiny cache, and run about as fast overall as a Pentium II at 75 MHz. Even discrete automatic dictation on Pentium-class machines was not so great until they got to around 200 MHz or so, and by that time the vendors had already started making negative progress. Assuming that the same mistakes won't be made on the handheld platforms, based on Moore's law of performance doubling every 18 months, you should wait around a couple years.
Re:More to do with perception by js7a · 2002-11-22 15:00 · Score: 2

Commercial time-critical speech recognition software authors usually try to avoid using the FPU for signal processing or heuristic search probability calculations, because most FPU numbers are far more precise (and therefore much larger) than the optimum precision necessary. Using FPU numbers eats memory bus bandwidth, which is usually in short supply during the execution of speech recognition programs. This is not to say that FPUs which can quickly operate on integers are not appreciated, but they are uncommon.
Please see my reply to the grandparent post.
Re:More to do with perception by theLOUDroom · 2002-11-23 07:08 · Score: 1

How do you like your zaurus?
I'm thinking about picking one up this Christmas.
Have you had any trouble with that sliding panel?
I heard somewhere about a double sysmlink problem that happens when trying to run apps off SD cards. Have you run into that?
What are the few minor bugs you were going to return your zaurus for?

--
Life is too short to proofread.

It's not just the processor... by gpinzone · 2002-11-22 06:12 · Score: 5, Interesting

It's the other, most overlooked piece of hardware used in speech recognition, the microphone. The junky headset given away with ViaVoice or the el cheapo unit sold in Radio Shack for under $10 makes most people's experiences with voice recognition software less than favorable. Invest in a $50-$60 professional headset and the ability of the software to accurately detect your speech patterns improves dramatically. How are they going to shoe horn a high fidelity audio sound processor in there? Maybe a USB headset might be the answer assuming the device can accept USB devices.

I'm also going to assume that the current line of speech recognition products are MUCH better than what ran on your old 386.

Re:It's not just the processor... by Daytona955i · 2002-11-22 06:45 · Score: 2, Informative

The headphone isn'y an issue, like you said, make it accept USB and get a good headset type mic and your good.

The problem is in recognizing what you said, the best software out there still sucks and you have to train it forever. No matter what you will have to train it to recognize your voice. My saying car and some one from Boston saying car are drastically different but they are the same word. Given a lot of training you can get something halfway decent but it still requires corrections. This is especially true if you have a cold, you just woke up or are sleepy.

It's a very complex thing and I don't see any signifigant breakthroughs anytime soon. I've used quite a lot of programs (with a good microphone) and you can get ok results especially for simple things like "Open" "Close" but I think we're a long way from really good dictation software.
-Chris
Re:It's not just the processor... by nojomofo · 2002-11-22 06:59 · Score: 5, Funny

My saying car and some one from Boston saying car are drastically different but they are the same word.

Hey! I resent that remahk! You ah stereotyping heah, and it's not fa-uh. Some of us from Bahston can say cah just like the rest of you. Just jealous, that's what you ah. Come up heah, and you'll be wicked sorry that you did. :-)
Re:It's not just the processor... by MobileC · 2002-11-22 07:24 · Score: 2, Interesting

The main problem is not the microphone.
It's the microphone circuit on the soundcard.
My brand new AWE-64 had a crap mic circuit.
The el-cheapo replacement was excellent.

--
Fran
:):):)
1st 1st Poster of the new Millennium!
Re:It's not just the processor... by adamy · 2002-11-22 07:36 · Score: 2, Funny

Stop bein such a Tahd

--
Open Source Identity Management: FreeIPA.org
Re:It's not just the processor... by CrazyJoel · 2002-11-22 07:37 · Score: 5, Interesting

I remember seeing a ViaVoice demo a couple of years ago. The guy doing the demo said they use these headmikes that are actually 2 microphones. One mike faces the mouth, the other faces away. The circuitry then filters out any environmental noise from your voice. Don't know how much they cost though.(I'm sure I could look it up)

--

Such is the infinite Grace of Popeye.
Re:It's not just the processor... by Zordak · 2002-11-22 07:45 · Score: 5, Funny

You used your dictaphone to post, didn't you? Somebody mail this guy a keyboard.

--

Today's Sesame Street was brought to you by the number e.
Re:It's not just the processor... by meteau · 2002-11-22 08:34 · Score: 2, Funny

"You used your dictaphone to post, didn't you?"

Thank you for the new sig :)

--
-- "You used your dictaphone to post, didn't you?"
Re:It's not just the processor... by markmier · 2002-11-22 10:10 · Score: 1

Obligatory Simpsons Reference:
It's CHOWDAH! Say it right!
chau...dehrrrrr...
OK, you asked for it! I'm gonna enjoy this!
*crash crash*
Re:It's not just the processor... by SunPin · 2002-11-22 14:39 · Score: 1

I use a fairly high-end USB microphone and DragonDictate 3.1 for my desktop. I played around with the modern ViaVoice 9 and NatSpeak 6 but natural speech just doesn't cut it for general use. The microphone is absolutely critical and the fact that this point was never pushed as hard as it should have been is a major reason that ASR is gasping for air.
It looks like the HPC from OQO would make this a moot point. Their unit is quite sufficient if it's anything resembling the Sony Picturebook which served me well for awhile.
The point here is that there should be something available for lower priced handheld units. The capability is there. As you and other respondents pointed out, the market might not be. Someone made a very good point about getting old code to open stores and another mentioned readily available code in journals and such.

--
Laws are for people with no friends.
Re:It's not just the processor... by stienman · 2002-11-22 14:52 · Score: 3, Informative

The guy doing the demo was probably dumbing up a basic microphone tactic that's been in use for decades.

There are not two microphones in that headset - that would just make it worse, since no PC it would run on is real time enough to match the sound samples together, etc, etc, etc.

Instead they use a dual port microphone. The element lies between the front of the mic (towards the speaker) and the back (towards ambient noise). Sound pressure from ambient noise tends to hit both the front and back simultanously, while sound pressure from the speaker hits only the front. The difference gives mainly the speaker, with muted external sound

Even cheap mics have that now. The main difference between a good mic and a bad one is its construction and materials, which affect its response characteristics.

-Adam
Re:It's not just the processor... by Alien+Being · 2002-11-23 22:27 · Score: 2

"There are not two microphones in that headset - that would just make it worse, since no PC it would run on is real time enough to match the sound samples together, etc, etc, etc."

I agree that they simply open the back of the mic, but you wouldn't need a processor to do it with two one-sided mics. Just combine the two signals with the polarity reversed.
Re:It's not just the processor... by Anonymous Coward · 2002-11-30 02:07 · Score: 0

With the Palm Tungston, and some iPAQ models, bluetooth communication is integrated. Some nice bluetooth microphone/headsets are out there, maybe that's worth a try.

(After you got some suitable software, that is)

Because by drinkypoo · 2002-11-22 06:12 · Score: 5, Informative

Those speech recognition packages were only really capable of handling a few key phrases. In order to do seamless voice recognition that people will actually want to use it is necessary to recognize any (reasonable) word from any (Reasonable) speaker in a (reasonably) :) short amount of time.

IBM can't even manage to do this on, for example, a P3 733EB. How they're going to do it on a 300MHz XScale or SH chip or similar (let alone a Motorola Dragonball) is beyond me. I think your head is in the clouds.

With that said, voice recognition is very much on everyone's minds and it is coming. The limiting factor in handhelds right now is battery technology, which seems to be advancing more rapidly now than it has been in the last decade or so. With more power density comes faster processors and more ram, and the ability to perform these kinds of operations on smaller computers.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

Re:Because by Anonymous Coward · 2002-11-22 06:17 · Score: 2, Insightful

I think it's fairly clear that the person in this question has said they'd be happy with the functionality of the older software, if it were available on a PDA. That's not hard to understand is it? He's not asking why voice recognition is unpopular; given it is a niche application especially for people who can't use a keyboard. But for those people, isn't a PDA solution, even if it isn't up to "your" standards, a good idea?
Re:Because by photon317 · 2002-11-22 06:21 · Score: 5, Insightful

Yeah but the author claims he was happy with discrete speech processing on a 386-16 that we had back in the day. He doesn't want continuous speech that doesn't have to be trained and all that jazz - just simple old school voice recognition. Is it so much to ask that someone port the old algorithms to the palm?

--
11*43+456^2
Re:Because by tmark · 2002-11-22 06:45 · Score: 5, Insightful

Yeah but the author claims he was happy with discrete speech processing on a 386-16 that we had back in the day.

The author might be happy with what he had those days. The rest of the market would not be happy with that. In fact, the market is not happy with what we have now, as witnessed by the very low penetration of voice-recognition software. So why would we expect companies to spend the resources porting the old stuff when the new stuff won't even sell ?
Re:Because by FireballFreddy · 2002-11-22 07:03 · Score: 2, Interesting

I don't think voice recognition is going to take off much at all, not for the general consumer. I don't thing many people want to spend 8 hours a day talking at their computer (or handheld, as the case may be). I imagine it'd leave you pretty hoarse unless the technology got to the point where you could quietly mumble or subvocalize. There is also a certain amount of privacy that comes with a "quiet" input device... you can hack away at the Linux kernel or type a naughty fantasy to your girlfriend and nobody knows the difference unless they look at your screen. Now imagine speaking each of them at work. ;)

Frankly I don't want the din of dozens of coworkers talking at their computers around me. I'll stick with my qwerty keyboard. And this means those with physical disabilities will be condemned to a corner of the market, getting less attention and as a result more expensive and less quality products.

-FF

--
SQUEAK, the Death of Rats explained.
Re:Because by Anonymous Coward · 2002-11-22 07:04 · Score: 0

Is it so much to ask that someone port the old algorithms to the palm?

Dude, point us to an open-source implementation of these "old algorithms" and hey presto someone will port it. Otherwise, yeah, I think it is too much to ask of us hobbyist porters. (No offense but we're not, like, domain experts in Everything.)
Re:Because by banzai51 · 2002-11-22 07:24 · Score: 2, Insightful

Exactly. The real problem is that speach recognition is a niche demand. Speach recognition in and of itself has no mainstream uses. Think of an office full of people using speach recognition. Not pretty. At home? People only want speach recognition if it is tied to computer commands. ("Computer, download my email, filter for spam, then read back the names of the senders.") Who's left? People who find typing difficult because of a physical limitation. While a worthy cause, it may well not be a profitable one.
Re:Because by MobileC · 2002-11-22 07:27 · Score: 1

My Cyrix 200 went very nicely with Dragon Naturally Speaking 1.

--
Fran
:):):)
1st 1st Poster of the new Millennium!
Re:Because by drinkypoo · 2002-11-22 07:28 · Score: 2

I disagree, I think voice recognition will (eventually) become the way of interacting with computers. Think Sci-Fi TV; being able to just speak and have the computer respond to your requests. (Computer, locate wesley crusher. The airlock? Computer, open outer airlock door, safety override authorization...)
Er, sorry Wil.

Anyway, which is more "natural"... opening word and typing, or saying "Computer, please dictate a letter to such and such"? I think the answer to this is clear. It won't be replacing the secretary any time soon but this is how many people (I think most) do/would prefer to control their computers. Some things will likely always be best done with a keyboard; don't expect the keyboard to vanish any time soon. But especially in the case of portable computers which either have no keyboard or a substandard one, I would expect voice control to be the norm within five years or so. Text input on portable computers is simply too tedious.

With that said, I think there's also room for dictation on your PDA and then non-realtime conversion to text while you're not doing anything with it, or conversion done on your PC (of course that's also non-realtime) when you dock. Also what with mobile wireless internet getting cheaper you may actually find yourself speaking to your mobile device, which then sends an audio stream somewhere else for processing. If communications technology continues to outpace battery technology, this seems likely.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Because by Cyn · 2002-11-22 07:29 · Score: 2

get a zaurus and run it in dosemu.

--
cyn, free software and *nix operating systems enthusiast.
Re:Because by alanh · 2002-11-22 07:50 · Score: 5, Funny

I disagree, I think voice recognition will (eventually) become the way of interacting with computers. Think Sci-Fi TV; being able to just speak and have the computer respond to your requests.

I guess you haven't seen 2001: A Space Odyssey....

"Open the pod bay doors HAL."
"I'm sorry Dave, I'm afraid I can't do that."

Maybe it wasn't that Hal was insane, just his speech recognition software failed....

--
- AlanH
Re:Because by Anonymous Coward · 2002-11-22 08:19 · Score: 0

Dammit! The word is SPEECH!!
Re:Because by Casca · 2002-11-22 08:31 · Score: 2

I have to disagree that voice recognition will take over. The reason the sci-fi shows use it so much is because it is much more contrived when the actor stands at a keyboard madly typing away saying things like "Ok, now I'll locate wesley. Got him. Now I'll open the outer airlock door. I'll have to put in my safety override authorization now..." And without that, it would be hard to convey to the audience what was happening. Kind of like the way they added speach to the WOPR on War Games.

Beyond that, not everything you do with a computer is language related. I don't know about you, but when I'm dealing with lots of large numbers I would much rather 10key them in than try to speak all of them. As with everything else, there is no one size fits all solution. Some things will lend themselves nicely to speach recognition, some things will work better with a different type of input.

--
Casca
Re:Because by Anonymous Coward · 2002-11-22 08:45 · Score: 0

Anyway, which is more "natural"... opening word and typing, or saying "Computer, please dictate a letter to such and such"? I think the answer to this is clear.

You're asking the wrong question. The question isn't what's more "natural", it's what's more efficient? It doesn't matter that speech is more natural than typing any more than it matters that walking is more natural than turning a steering wheel and stepping on an accelerator.
Re:Because by WatertonMan · 2002-11-22 09:13 · Score: 3, Interesting

Speech recognition is only a niche market because of the way it is integrated at present. If there was reliable speech recognition on PDAs then I suspect many people would use that instead of the nearly as unreliable handwriting recognition. (I can speak a lot more clearly than I can write) Further it would be a boom for businessmen on the go. You could dictate notes and letters while driving, for instance.
Sometimes niche markets turn out not to be. Just look at a lot of "desktop publishing" software. Back in 1986 that was still largely a niche market. Now it is indespensible for many, many people.
Re:Because by kableh · 2002-11-22 09:17 · Score: 2

I've used IBM's ViaVoice on my iPaq 3870, and it works fairly well. The vocabulary is somewhat limited, but I believe you can program it to open just about any program you want. The version I was using was a freebie that came with the iPaq. The only training it required was for the names of your contacts, and that worked fairly well too.

I think a big limitation is the mike in the iPaq. It is acceptable for VoIP use, but for voice recognition I think the SNR is too low.
Re:Because by WatertonMan · 2002-11-22 09:20 · Score: 2

I find controlling a computer with speech recognition software to be annoying. GUIs are simply much better. Until there is software that not only recognizes speech but can analyze text in a reliable and high level it is useless. (i.e. convert text to meanings) That is still a long way off. Yes you can run text through profilers, categorizers and even all those things you did with Prolog. But it is very, very difficult to discern intent from most speech. Further there are problems with feedback which most GUIs give you very well.
In effect a useful spech recognition control system requires a reasonably functional AI program. So the analogies to HAL aren't far off. And real AI is still a long way off, assuming that are current methadologies could even achieve AI.
Speech recognition will be useful, but more for dictating letters or the like. Even then I think that the noise of dictating renders keyboards better. That's why PDA's are more interesting. While dictating to a desktop computer has questionable utility, dictating to a PDA on the go is much more useful. You can dictate a letter while driving down the road, send messages, and so forth.
Right now a lot of phones have email, but it is almost useless because of how hard it is to message with them. Imagine if a phone could convert your message to text and email it. Now that would be useful. Imagine a PDA that could be interacted to in a voice manner. (i.e. reads your email and can construct email in this fashion)
It wouldn't eliminate the use of the pointer/pen. (There are too many places where that is useful, just as a GUI is useful) But I think that the way people use PDAs is much more condusive to voice software.
Re:Because by benzapp · 2002-11-22 09:34 · Score: 1

I think this is the major answer to the poster's question. It looks like a few weeks IBM released a PocketPC version of ViaVoice. It sounds sweet, but it is only available for the iPaq. No mention of the SNR. Maybe someday they will make an adapter? Of course, iPaq's don'y have CompactFlash slots...

--
I don't read or respond to AC posts
Re:Because by JanneM · 2002-11-22 10:29 · Score: 5, Insightful

Voice being the natural way to interact with devices? Think it through: an entire office trying to dictate to their word processing program all at once, with people popping in to each other trying to talk about work; an airplane of road warriors all trying to dictate stuff to their respective laptops at once (without saying anything confidential); support departments trying to make dictation work with fifty other people speaking commands to their respective clients; or programmers trying to spell their way through their creations.

And have you ever actually tried speaking for eight to ten hours at a stretch? I'm not talking about random, occasional speech acts, but sustained, focused speech. You'd have about three weeks until laryngitis became an occupational hazard among white-collar workers.

Speech is nice, but it is very much a niche application. Not only now, but ever. A keyboard is faster than speech, and does not contribute to noise level or occupational damage nearly as much as sustained speech would. It's a nice, even essential, mode of operation for those apps when a keyboard just won't do; the disabled, firemen, surgeons and so on will rightly love the interface. For mainstream use, however, it's just not good enough even when it's perfect.

It could become an accessory input, on the lines of replacing menu commands for an app: mark text, say "cut", mark a place, say "paste" and so on, but it just would never replace keyboard input in any mainstream application.

--
Trust the Computer. The Computer is your friend.
Re:Because by fishbowl · 2002-11-22 11:14 · Score: 3, Insightful

"Think of an office full of people using speach recognition. Not pretty. "

Almost as frightening as an office full of people all using telephones.

You don't remember typewriters and adding machines, or for that matter, the dictaphone, do you?

--
-fb Everything not expressly forbidden is now mandatory.
Re:Because by fishbowl · 2002-11-22 11:16 · Score: 2

"And have you ever actually tried speaking for eight to ten hours at a stretch? "

Never been a telemarketer? A teacher? A tech support rep? A salesperson?

--
-fb Everything not expressly forbidden is now mandatory.
Re: Because by Corvus9 · 2002-11-22 11:21 · Score: 1

Think Sci-Fi TV; being able to just speak and have the computer respond to your requests.
I think Sci-Fi TV is a very poor model for UIs. Fictional computer UIs are designed to show the viewer what is going on, not to actually control any real device.
Consider the typical computer UI on television or movies; a single huge window with no widgets or chrome, where the user types extremely ambiguous commands like "Download Virus" in inch-high text.
This is what Sci-Fi TV voice UI is like. "Computer, open airlock". Which computer? I have 4 around me right now, and our sci-fi hero probably has dozens. Which airlock? Which door? What if someone's already inside it? The UIs in Sci-Fi are oversimplified to the point of absurdity.
Yes, I realize that these Sci-Fi computers have continuous, untrained, natural speech recognition with mega-AI to recognize ambiguous commands. But they will still need some kind of display to let users know the machine state. Either that, or you'll need to have a conversation just to walk through a door (shades of Hitchhiker's Guide!).
Re:Because by JanneM · 2002-11-22 11:29 · Score: 1

Yes, actually. I've been a teacher. But in none of those occupations do you actually have to keep going all the time for hours on end the same way that you do when interacting with your computer. And your voice _is_ tired and worn on those occasions when you have had to keep going for a full day.

And as I said, having to keep up a constant chatter is only one side of the coin; the other is to keep doing so while having to experience any number of other people doing the same; I won't even go into the problems of implementing speech recognition when the user is speaking in a sea of other voices and talking to other humans as well to the computer.

--
Trust the Computer. The Computer is your friend.
Re:Because by SunPin · 2002-11-22 12:30 · Score: 1

You have a point. The training for even basic discrete dictation is intensive--despite what the marketing departments say--and is not exactly ready for your average clone. As I mentioned in the post, the University offices serving disabled students would be an ideal source of regular income for a developer that ports or writes voice recognition software for a PDA. It's a slow nickel but it would probably be easier to sell to (Rational) investors than a fast dime. Discipline and patience are required for speech recognition. Neither traits occur in any significant frequency among the general consumer population. To get money, one would need a tech-savvy yet captive audience.

--
Laws are for people with no friends.
Re:Because by Elwood+P+Dowd · 2002-11-22 12:51 · Score: 2

Ever sat next to two people having a conversation on a bus? Was that really too loud?

Ever worked in a call center? You wind up with a room full of people talking all day long. Sustained speech at a reasonable volume is absolutely safe. You are the first person that I've ever heard suggesting otherwise.

When you're in the same room as someone else, do you type to them, or do you talk to them? Have you ever had a conversation that lasted for hours? Did you get laryngitis?

It would take quite a while until computer interfaces made speech a good way to control a computer, but once they did, they'd do it in such a way that programmers wouldn't have to spell every word they speak. That's easy. As is noise cancellation for a roomful of speakers. I don't understand your criticism. Speech recognition, perfected, would outdo typing. Sure, those of us that have spent years typing all day long, every day, can type faster than we talk. Perhaps if we spent that much time getting proficient at a particular type of speech input, we'd be better at that.

--

There are no trails. There are no trees out here.
Re:Because by russellh · 2002-11-22 13:18 · Score: 1

yeah, a niche, but there still is potential for nearly everyone to use it, on contrast to, say, a niche like cemetery management software.
by the way, my personal opinion of speech recognition is that for the rest of us (beside people like the writer in question) its usefulness is limited to STOP!... but on the other hand, the idea of scripting my computer with a tape recorder (of speech commands) is just... way too cool to contemplate.

--
must... stay... awake...
Re: Because by drinkypoo · 2002-11-22 13:27 · Score: 2

This post addresses the real issues better than the others so I will reply to this one.
Consider the typical computer UI on television or movies; a single huge window with no widgets or chrome, where the user types extremely ambiguous commands like "Download Virus" in inch-high text.

First of all, I'm not suggesting that anyone use LCARS as a model for a GUI, or even less anything from Hackers. Mission Impossible (with cruise, not the oldies-but-goodies) might be okay... They actually had some kind of hip window manager in use on the laptop on the train, which was either running some flavor of Unix or looked like it.

But I would like to suggest that the way people speak to the computer on star trek (or similar) is exactly the way that people will speak to computers in the future. You might have different names for different computers, but I think it is far more likely that there will only be one computer doing the speech recognition.

I don't need to tell you (or any other /. reader, no matter how newbieish) that computers are only becoming faster and more powerful. This will lead to more applications like (working) natural speech recognition.

Computers are also getting better at interoperation. Witness items like .NET for example, which is of course also being implemented on Unix as free software, and which features a (hopefully completely) published specification; and XML, of course, which is being used to get all kinds of random things to talk to other random things, usually through http. I'm sure we can expect ever-more-advanced versions of these technologies to roll out over the years.

Also, voiceprint identification is very good now. If you combined it with speech recognition you'd have authentication (For most purposes) and a voice interface in one. You could walk from room to room, issuing commands which were heard by one computer with global pickups or multiple computers who sent your requests to a central server for processing -- either as audio samples or as decoded messages, perhaps depending on the power and location of the system, as well as how much idle cpu it has at the moment -- and then interpreted, after which time you see the results. Parts of this are no doubt done in millionaires' houses today, and all of it could be put together in one package if you owned all of the IP.

This is what Sci-Fi TV voice UI is like. "Computer, open airlock". Which computer? I have 4 around me right now, and our sci-fi hero probably has dozens. Which airlock? Which door? What if someone's already inside it? The UIs in Sci-Fi are oversimplified to the point of absurdity.

The "which computer" problem is explained away (I hope convincingly) above. If you really must address a certain system, you can do it by name. This goes for doors as well. The default door to open would be the one you are closest to (your position and facing should be relatively trivial to determine based on your voice with a significant number of speakers and training the system so that it can account for the various objects in the room and so on) and if it's an airlock (or similar... think floodgate) the amount of work which goes into making sure it won't open at an inopportune time is trivial compared to the difficulty of implementing these other features, except perhaps if you're training it not to open the door when you're having sex. ("Computer, lock my door" should be sufficient, however.) What would you call that function, antiflagrantedelico?

Yes, I realize that these Sci-Fi computers have continuous, untrained, natural speech recognition with mega-AI to recognize ambiguous commands. But they will still need some kind of display to let users know the machine state. Either that, or you'll need to have a conversation just to walk through a door (shades of Hitchhiker's Guide!).

It's okay for the computer to not recognize ambiguous commands for a while (this is more a BTW than anything) so the Mega-AI isn't even necessary. From a military standpoint you can simply train people to issue the proper commands. For ANY purpose you will have physical controls (maybe just virtual buttons on a flat console) to back up ALL commands (so you can do things even if you cannot speak) and you will have real, three-dimensional physical controls for critical, "real-time" issues like manouvering, at least up until the point at which the computer can be creative in ways that don't kill you, or events occur at speeds too fast for humans to be a useful part of the equation at any level other than oversight.

I don't forsee the number of displays we have decreasing any time soon, unless it's the number of them decreasing specifically while the area of displays increase. If we have a good method for small devices to request a display area, then many of them won't need any displays at all, but only if we have very large and easily addressable (accessible?) displays. We are all of course hoping for inexpensive video wallpaper or paint which will solve the display area issue for us. Then we have to decide how to grant applications the ability (and rights) to access it.

As for the HHG reference, besides being a fine example of humor, that would also be only an example of bad UI which can exist whether your interface is a windowing GUI, speech recognition, or a steering wheel. (Remember the days before steering was proportional?) Anything can be done right, or wrong, or of some intermediate quality.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Because by SunPin · 2002-11-22 14:49 · Score: 1

And have you ever actually tried speaking for eight to ten hours at a stretch?
Yes, I have. I've never been stricken with laryngitis.
It's a nice, even essential, mode of operation for those apps when a keyboard just won't do; the disabled, firemen, surgeons and so on...
WTF are you talking about?

--
Laws are for people with no friends.
Re:Because by Anonymous Coward · 2002-11-22 16:18 · Score: 0

Hall was NOT insane!!! He was stuck in a loop and unable to decide on mission objectives!
Re:Because by Anonymous Coward · 2002-11-23 01:59 · Score: 0

Who cares what he wants? A company won't take the time to make the software unless there's a market significantly bigger than 1 person for it.
Re:Because by Sri+Lumpa · 2002-11-23 09:40 · Score: 2

"Speech is nice, but it is very much a niche application."

True, but there is one niche where I think it would fit quite well: CLI replacement.

I am not talking about shell scripting, where you would have all the same problems as with programmation, but about simple commands.

What do you think is easier to remember, an obscure find command or "find me all files created during the week of the twelve, having the word report in their name and bigger than fifty kilobytes."

Of course, STT won't be enough for that, you would also need a semantic analysis and geeks still will use a CLI for speed reason but it still would be useful.

--
"The obvious mathematical breakthrough would be development of an easy way to factor large prime numbers." Bill Gates,
Re:Because by drinkypoo · 2002-11-23 14:57 · Score: 2

Not only that but being forced to speak clearly (not being forced to speak in a particular accent like dragon "naturally" speaking... That thing never even got close to recognizing my speech and I'm from California. The only "accent" I have (IE, how much I deviate from the media) :P is that I speak quickly, but even when I slowed my speech down for it, it just couldn't pick me up.
Now, being forced to speak the way the dictionary indicates things should be pronounced, that would be a great boon to everyone. It would force people to look up some words (better include a text dictionary with a good spelling match, most people can't spell worth a damn) and force everyone to practice their diction. So that "diction", for example, doesn't sound like it contains a K. Or my name (Martin) doesn't become "Mardin". There's a glottal stop there, fuckers. It is also not Maaaahahhhhhhhhttiiiiinnnn! On the other hand the average Radio DJ would become even more unbearable to listen to. At least now they make me laugh.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Because by drinkypoo · 2002-11-23 15:16 · Score: 2

support departments trying to make dictation work with fifty other people speaking commands to their respective clients; or programmers trying to spell their way through their creations.

Obviously not all environments will lend themselves to voice operation. Phone support is definitely one of those situations in which you'll have problems; You can't speak to a person and speak to the computer simultaneously. Hence cash registers at Jack in the Crack will either just take your order for you (likely) or be operated by a pimply teenager (or loud surly old woman) who presses buttons with the same usual lack of ability that we witness now.

But I suspect that programmers will be using voice interfaces, and they'll actually be using RAD tools which consist of attaching tinkertoys together (to borrow a phrase N.S. borrowed from someone else in his turn) and perhaps they'll be speaking the code that makes up the portions of the tinkertoys. You have to think outside of these stupid flat desktop metaphors that we won't be stuck with forever if for no reason other than the cool factor of a pretty 3D GUI. I mean, Enlightenment... need I say more?

And have you ever actually tried speaking for eight to ten hours at a stretch? I'm not talking about random, occasional speech acts, but sustained, focused speech. You'd have about three weeks until laryngitis became an occupational hazard among white-collar workers.

Let me share a little something with you. I've spent a lot of time unemployed in my life -- some would say as much as possible -- And I spend a lot of that time sitting at a cafe talking all day, smoking cigarettes (spirits or luckys or top rollys... always something harsh) and drinking coffee, and then beer. And then as often as not, going out and pounding cocktails. I had some money left over from when I got laid off, at the time...

Anyway I spent a ridiculous amount of time talking and guess what, I never lost my voice. I spent a lot of it doing voice caricatures too, I'm a very emphatic fellow.

Speech is nice, but it is very much a niche application. Not only now, but ever. A keyboard is faster than speech, and does not contribute to noise level or occupational damage nearly as much as sustained speech would. It's a nice, even essential, mode of operation for those apps when a keyboard just won't do; the disabled, firemen, surgeons and so on will rightly love the interface. For mainstream use, however, it's just not good enough even when it's perfect.

You're thinking of doing the same things you do with a computer now in the future, and in essentially the same way. Obviously this will be the case, but I sincerely hope that in ten years we're doing a lot of new things with our computers. Some of the things I hope we'll be doing are things some people are doing now, like controlling our appliances (beyond just turning them on and off without feedback... X10 sucks) and actually using the computer as a tool to make all other aspects of our life easier. I don't know how many people I know who have a computer and a personal organizer and yet still make grocery lists on paper, so obviously we have a ways to go yet -- but that's always going to be true in some way or another. :P

Obviously some applications are apparently more deserving than others. You cite as examples conditions in which people cannot use their hands for one reason or another; these are obvious. You did leave out the most obvious, though, which is law enforcement and military use.

But really there is nothing you can do which you cannot do better with a computer, except be a luddite. A sufficiently advanced computer (the amount of advancement required varies) with the appropriate peripherals can improve anything. You could use a small, simple computer with some sort of biomedical peripherals to monitor pulse rate and other conditions while doing some purely physical activity to keep yourself working efficiently, at one end of the spectrum, where on the other you're doing some kind of crazy imaging or something, something you can't even do without a computer.

In most situations you'd really rather have your hands free. If you had some kind of glasses which would overlay video onto your sight (various solutions exist today) then voice recognition would be the best way to put different things there while you're working on something with your hands. With eye tracking thrown into the mix it becomes even more incredibly useful, and if you add a camera and a shitload of processing power, plus a nice laser rangefinder and the usual GPS and whatnot, you're in the big time baby. I mean, isn't that what we're looking for? Unfortunately the glasses have to be stylish, nothing more lame than (say) gargoyles. At least those of us who have a square head can pull off the arnie look.

It could become an accessory input, on the lines of replacing menu commands for an app: mark text, say "cut", mark a place, say "paste" and so on, but it just would never replace keyboard input in any mainstream application.

For most things in which you are generating reams of text, you will continue to use some form of keyboard. Of course a lot of that text can be replaced by dictation in the field, which you're doing anyway, but instead of typing it later you'll be grabbing the text from your PDA. Further proof that for many purposes, if you have enough processing power, you can even do away with some storage :)

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
Re:Because by banzai51 · 2002-12-04 15:25 · Score: 1

Sure, but your telephone doesn't reboot because Bob in the cubical next to you is a tad loud.
You don't remember typewriters and adding machines, or for that matter, the dictaphone, do you?

Remember them, sure. But I didn't work in an office back in the stone age either. :-)

A writer *what* ? by Anonymous Coward · 2002-11-22 06:13 · Score: 0, Flamebait

To be grammatically correct, a real writer would say, "I'm a writer who....," not "I'm a writer that..." :-)

Re:A writer *what* ? by toxcspdrmn · 2002-11-22 06:17 · Score: 1

Dang that voice-recognition software!

--
"E pur si muove!" - attributed to Galileo Galilei, 1564-1642
Re:A writer *what* ? by stipe42 · 2002-11-22 06:28 · Score: 1

Only if you define a 'real writer' as one who gives a shit about grammatical rules. If breaking a grammatical rule does not impact the ideas being communicated, it was not a rule worth following.
stipe42
Re:A writer *what* ? by Anonymous Coward · 2002-11-22 07:05 · Score: 0

It does impact the idea in so much as it shows that the person with the idea might be a morom for not being able to follow basic rules.
Re:A writer *what* ? by Anonymous Coward · 2002-11-22 07:22 · Score: 0

An unsucessful gimpy writer.
Re:A writer *what* ? by SunPin · 2002-11-22 15:22 · Score: 1

It does impact the idea in so much as it shows that the person with the idea might be a morom for not being able to follow basic rules.
Who, exactly, is the "morom"?

--
Laws are for people with no friends.

Re:Because it does not work. by Anonymous Coward · 2002-11-22 06:14 · Score: 0

Yes, we all know that. Nobody has denied this. But for those of us who can't use a keyboard, we're discussing alternatives. Did your bonobo brain fail to grasp that?

Dragon Dictate is portable by Anonymous Coward · 2002-11-22 06:14 · Score: 5, Informative

Dragon has a portable product that you dock to your PC to do the voice to text. You can bring it with you, then connect it when you're home. A digital recorder is available bundled with the software, or you can use any micro cassett recorder and a Norcom playback and interface device. Seach Google for info!

Re:Dragon Dictate is portable by fuzdout · 2002-11-22 06:30 · Score: 1

Yah, isn't it the Manderine Dragon Speak thingy?

My landlord has that and uses it to type emails. From what I've seen, it's very quirky and you have to speak in a very precise and correct manner (if you have an even a very SLIGHT accent it won't type the correct word). Also, you have to specifify all punctuation such as periods and question marks.

For someone with a disability it would be highly useful, as for the rest of us, it's more work than it's worth.

--
Fuzdout
..My sig ran away. Has anyone seen my sig?
Re:Dragon Dictate is portable by oiuyt · 2002-11-22 07:05 · Score: 1

Quick correction, Dragon's product line is Naturally Speaking (or NatSpeak). Dictate was the discrete system that they introduced in 1989-1990 and was the state-of-the-art until NatSpeak arrived in 1995ish.

Too bad the whole L&H thing happened. It would be much more interesting if Dragon still existed (for real, not the ScanSoft ownership of the name/cade that isn't still the same company).
Re:Dragon Dictate is portable by oiuyt · 2002-11-22 07:08 · Score: 1

It works really well IMO. I haven't tried it since the first couple of releases back around 1997-1998, but even then it was much better than you're desribing. Yes you have to speak punctuation. You have to type it too, this isn't an unreasonable requirement. Accents very rarely caused major problems and the software is very good at adjusting to nearly any user.
Re:Dragon Dictate is portable by fuzdout · 2002-11-22 12:27 · Score: 1

I haven't used it personally, but I've watched my landlord use it and it always seems to need a little tinkering..My landlord got it this year a couple months ago so it's the newest version and while the thing with the accent still seems to bugger things up. Also, it has a hard time knowing what version of certain words that sound a like but spelled different in the English language.

Example: There, their and they're.

It's okay for what it is, but I find given the choice hand typing is faster and easier.

--
Fuzdout
..My sig ran away. Has anyone seen my sig?

Amazing! by Anonymous Coward · 2002-11-22 06:15 · Score: 1, Funny

"I'm a writer that is 99% dependent, due to fine-motor disabilities, on voice dictation."

This isn't meant in jest at all. Despite the limitations you face for computer input, you've still managed to construct a more grammatically correct, and properly spelled "Ask Slashdot" than half the readers of this site. Bravo ... and hopefully our PDAs will one day catch up to the Jetsons.

Re:Amazing! by Anonymous Coward · 2002-11-22 07:37 · Score: 0

Actually, that's an improper use of the phrase "due to". In this context, it ought to be "owing to". Your main point, however, is still correct.
Re:Amazing! by HughsOnFirst · 2002-11-22 08:01 · Score: 2

I was using voice to text software a couple years ago, and grammatically correct, properly spelled test was never a problem. You can't misspell anything. On the other hand I got a rather odd reputation at work after sending a lot of email that looked like grammatically correct, properly spelled Markov text or surrealist poetry.

I did have fun dictating fiction using it, and changing the story to accommodate the bizarre errors it would insert.
I should try that again, it was pretty fun.
Re:Amazing! by maffoo · 2002-11-22 08:16 · Score: 1

properly spelled test was never a problem

Case in point...

stephan hawking by u19925 · 2002-11-22 06:17 · Score: 1, Insightful

what kind of machine does Stephan Hawking use nowadays? Last I heard him in early 90's, he was still using 80286 based machine. Today's PDAs running at 400 MHz, 32 bit, should atleast be 50 times faster.

Re:stephan hawking by ceejayoz · 2002-11-22 06:20 · Score: 3, Informative

Stephen Hawking uses text --> speech, not speech --> text (considering he can't speak). Text --> speech is easy, speech --> text is not.
Re:stephan hawking by Anonymous Coward · 2002-11-22 06:37 · Score: 0

Easy? Have you heard what that thing sounds like when he talks? He might as well have a Talking Computron in front of him, that way he could entertain people with an elementary addition game if they got bored.
Re:stephan hawking by EvilBudMan · 2002-11-22 06:39 · Score: 1

Yes, someone please MetaMOD that +2 to -1. Exactly!

Function_Speech

text2speech=easy; speech2text=hard.
easy+hard=veryhard2do /end
Re:stephan hawking by Anonymous Coward · 2002-11-22 07:08 · Score: 0

hey did you get my phone call bitch?

-clay@rgv.rr.com
Re:stephan hawking by Anonymous Coward · 2002-11-22 07:13 · Score: 0

Wow! thank you, Captain Obvious! I am now so much more informed, having read that post. The scary part is that it actually was moderated informative, and, in respect to the post that it replied to, perhaps it was. We should have a -1 Common Sense or -1 Obvious moderation.
Re:stephan hawking by Cyn · 2002-11-22 07:31 · Score: 3, Funny

I think that Mr. Hawking's "speech" -> text would be trivially easy. For one thing, he just typed it - and for another, we have exact samplings of the voice that was generated and know how to regenerate it.

--
cyn, free software and *nix operating systems enthusiast.
Re:stephan hawking by Anonymous Coward · 2002-11-22 08:28 · Score: 0

AFAIK, Stephan Hawking designed this device himself, and it has very few inputs, possibly 1 or 2 which he uses to generate syllables. I think it also predicts the word, kinda like that java app that was posted a while back.
Re:stephan hawking by Anonymous Coward · 2002-11-22 11:09 · Score: 0

As has been said before, his device is text to speech, not voice recognition. And if anyone's wondering why he's still using that 70s-era robotic voice rather than a newer, more understandable one, well, he apparently likes it.

Storage space? by StandardDeviant · 2002-11-22 06:17 · Score: 5, Insightful

I'm guessing the storage space requirements for that in terms of the data files the programs would use to map vocalizations to meaning would be the biggest stumbling block... Most mainstream PDAs only have 8mb of ram/storage combined, and Palm is still shipping devices with as little as 2mb. Your best bet might be one of the StrongArm based handhelds combined with a reasonably large CompactFlash/SecureDigital card... (E.g. Sharp Zaurus, Hewlett-ComPackard's iPaq, etc.) Of course, that's probably 300-500, but that's still less than a new laptop...

--

News for Geeks in Austin, TX

Re:Storage space? by Anonymous Coward · 2002-11-22 06:36 · Score: 0

used thinkpad P3 500MHz with USB and 2 PCMCIA slots for $500... check out ebay dude.
Re:Storage space? by corey_lawson · 2002-11-22 08:00 · Score: 1

I dunna know. I bought a used HP laptop for my wife last summer - P2-300, 96MB RAM, 14.1" LCD, $550.

It's not the best, but it's more than servicable.
Re:Storage space? by Vocabularinist · 2002-11-22 08:25 · Score: 2, Informative

Absolutely.

Any PDA dictation system would need to have at least 1000 triphones. In total they would use around 20MB.
Re:Storage space? by Anonymous Coward · 2002-11-22 11:18 · Score: 0

"less than a new laptop" operative phrase: "new". dillweed.
Re:Storage space? by Anonymous Coward · 2002-11-22 11:21 · Score: 0

"less than a new laptop". "new". As in "not used".
Re:Storage space? by AnyoneEB · 2002-11-22 12:13 · Score: 1

The article said that the program he was thinking of took 10MB of hard drive space and 8MB of RAM. Probably too much for most Palms (some newer ones have 16MB of memory), but using a flash card you would have plenty of space for it. A Dragonball processor at 33MHz could probably be enough to run the software (might be pushing it though).

--
Centralization breaks the internet.

dictaphone's EXSpeech by Anonymous Coward · 2002-11-22 06:18 · Score: 5, Informative

With a simple search for dictaphone I was able to find a product called EXSpeech. I think this is what you are looking for.

Re:dictaphone's EXSpeech by M00TP01NT · 2002-11-22 06:33 · Score: 3

Although I am a fan of Dictaphone, the EXSpeech product is hardly suitable for a PDA or for the general tasks that the original poster is looking for. From the site:

"EXSpeech(TM) offers a highly accurate continuous speech recognition solution that's fully integrated with Dictaphone's industry-standard Enterprise Express® voice and text management system. This state-of-the-art speech recognition technology, incorporated into a complete patient information workflow management system, can reduce transcription costs by more than 20% while speeding report turnaround."
Re:dictaphone's EXSpeech by Anonymous Coward · 2002-11-22 08:02 · Score: 0

I run some of dictaphones large dictation servers, their EXspeech product is a module for their Enterprise Express dictation server. It is not in the ballpark of what this guy is looking for. 50k+ on top of their dictation software for the module. And this runs on a pretty good sized server, no way their software would run on a PDA. I won't even start on the pathetic accuracy of the thing if you have any sort of accent.
Re:dictaphone's EXSpeech by Anonymous Coward · 2002-11-22 10:04 · Score: 0

Actually, I think you're looking more for something like the Walkabout or GoMD.

Dosent Voice work on PDAs. by jellomizer · 2002-11-22 06:19 · Score: 0

I think the real reason why the PDA doesnt take voice reconision like the 386 did is bascily the fact that the consumer market who is using the PDAs are use to PC level of quality and when they have voice reconision they want it to work just as good or better then what they have today. So if they made a PDA with 386 level of voice reconision which will get the job done. They will feel that they got riped off.
There is a second possible reason. Sience there is a large drop in the cost of hardware compared to software. That in order for the PDA dealers to seel their equiptment at a market barable price they need to save money on the software side. So Voice Reconision would be a larger undertaking in software side and the extra time to optimize it for the smaller power hardware will cost more. so your $200 PDA could be $300 which would loose some market share for the extra price in a feature that may not work.
There is a third possible reason. Once they make this device the universe will drasticly change into something different.
There is also a forth reason. That this has already happends. (My apoligies to the Late Douglas Adams)

--
If something is so important that you feel the need to post it on the internet... It probably isn't that important.

Re:Dosent Voice work on PDAs. by Cy+Guy · 2002-11-22 06:37 · Score: 0, Troll

Just curious, but was the haphazard spelling of your comment supposed to be making some point about the historic problems of voice recognition, or do you really spell (type?) that poorly? If it's you, you might want to invest in voice recognition software for yourself (one with built-in grammar correction might be good too).

If it was the former, then you should know that most of the VR problems were with homophones, such as your use of 'forth' instead of 'fourth' and rarely would it involve inserting additional phonemes into words such as an additional 't' in equiptment or putting a 'd' in happens.

--
Work for Change & GET PAID!
Re:Dosent Voice work on PDAs. by falzer · 2002-11-22 07:16 · Score: 3, Funny

Oh well. It happends.

i hope that voice recognition never really flies by greechneb · 2002-11-22 06:19 · Score: 2, Interesting

Can you imagine how bad it would be if everyone switched to voice recognition? Cellphones are bad enough, imagine if everyone was talking to their computers. The noise would be terrible. No matter how quiet you are, the noise would still grow rather large. Would you want to dictate something to your computer that is supposed to be private? Not when anyone can hear it. I'm waiting for something better, whatever it might be.

I dont get it by stackdump · 2002-11-22 06:19 · Score: 2, Informative

Is the poster just dissatisified with existing software: or pissed because he wants to be computing star Trek style and never will?

Re:I dont get it by Anonymous Coward · 2002-11-22 07:04 · Score: 1, Informative

navigating a pocketpc calendar program != dictation of letters etc.
Re:I dont get it by schof · 2002-11-22 07:06 · Score: 1

Is the poster just dissatisified with existing software: or pissed because he wants to be computing star Trek style and never will?

Why has this not been modded up? This is a link to a review of existing voice-recognition software on PDAs.
Re:I dont get it by Mark+Pitman · 2002-11-22 08:21 · Score: 1

Well, it has been modded up now, but it shouldn't have been. None of the software in that article will do what the poster is asking for: dictation.
Re:I dont get it by japhmi · 2002-11-22 08:53 · Score: 1

Well, considering that he mentioned that his "preference is 'discrete' speech." I don't think that he is pissed off because of not being able to do "Star Trek style" vr. He wants to use a PDA to be able to write stuff down, not just navagate his contact list.

--
"Giving money and power to government is like giving whiskey and car keys to teenage boys" P. J. O'Rourke

Re:Because it does not work. by Wiseazz · 2002-11-22 06:20 · Score: 3, Funny

Unfortunately, the keyboard is the most accurate way to input data.

According to my boss, it's actually something called an office administrator.

--
My sig sucks.

Re:Because it does not work. by Anonymous Coward · 2002-11-22 06:21 · Score: 0

Unfortunately, the keyboard is the most accurate way to input data.

And more unfortunately, people who can't use them are stuck with voice... or maybe you missed that central theme of this article? What bong-sucking crack-head modded this moron up?

Re:Because it does not work. by RAMMS+EIN · 2002-11-22 06:21 · Score: 3, Insightful

I wonder where your claim that it doesn't work comes from. I have some experience (a couple of hours) with Philips FreeSpeech 2000 (I guess it's called that - I don't have it at hand). It recognizes natural speech fairly accurately. My guess would be that discrete speech is easier to recognize, making for better results and requiring less hardware real estate. I am absolutely willing to bet that it works a lot better than typing for people with disabilities. Your turn again.

--
Please correct me if I got my facts wrong.

Dependable dictation by t0qer · 2002-11-22 06:24 · Score: 3, Interesting

Sorry no links....

There are dictation services availiable on the net, basically you e-mail them an MP3 and they e-mail back a fully typed document.

As far as the reason for voice recognition not being on a PDA, I think it's space requirements. Of the two packages i've tried (dragon dictate and IBM) both of them require a lot of disk space to contain the recognition engine and your personal voice pattern files. Much more than your average PDA can hold. We're probably only a few years off from PDA's having that type of storage.

Re:Dependable dictation by djp928 · 2002-11-22 06:59 · Score: 0

Did you even read the question? The guy was happy with the stuff that ran on the old 386s only using 10MB of disk space back in the day. Many PDAs today can handle that.

-- Dave

--
Making fun of dumb people since 2009
Re:Dependable dictation by MobileC · 2002-11-22 07:29 · Score: 1

The original poster says they would be happy with discrete recognition which dosn't use as much space and dosn't need as much training.

--
Fran
:):):)
1st 1st Poster of the new Millennium!
Re:Dependable dictation by Elwood+P+Dowd · 2002-11-22 13:00 · Score: 2

Again, he's talking about discrete voice recognition, which requires you to speak a certain way, but does not require training or voice profiles. Much harder to use than viavoice or dragon, but it works better. Much better when you get practice using it.

--

There are no trails. There are no trees out here.

Text to Speech, why so crappy? by taxman_10m · 2002-11-22 06:25 · Score: 2

In a similar vein, why does text to speech still sound as crappy as Steven Hawking's text to speech device from his 80s documentaries?

I recently downloaded Microsoft Reader along with a text to speech add-in and it sounded horrible. Same thing with Adobe's eBook Reader (well, their's was a little better).

But why is this so? Why is text to speech even difficult? If you just have a human person speak all the different phonetic sounds shouldn't it be a simple matter of stringing together those sounds in a relatively seemless way?

Re:Text to Speech, why so crappy? by stratjakt · 2002-11-22 06:35 · Score: 5, Insightful

It's not just the phonetic sounds, but the multitude of various inflections and emphasis' that are lacking, and are pretty hard to reproduce, unless the TTS engine can interpret the meaning of the text.

Raising the voice at the end of a question may be easy enough. But how much? When? This is a question too, is it not?

A good orator would read a more 'exciting' passage more quickly, and with more enthusiasm, punctuating key verbs and nouns. How is software to know which passages are more exciting, and which arent?

It's not just a hard task for computers, but people too.
Computers read aloud at about the same level as poor orator. Pho-net-i-call-y, in a dull drab monotone. Drop by the local high school, and listen to them reading shakespeare.

Reading aloud may be simple, reading it well and naturally is a skill.

--
I don't need no instructions to know how to rock!!!!
Re:Text to Speech, why so crappy? by CTho9305 · 2002-11-22 06:44 · Score: 2

Check out AT&T's TTS demo. It sounds REALLY good.

--
My server
Re:Text to Speech, why so crappy? by taxman_10m · 2002-11-22 06:53 · Score: 2

Raising the voice at the end of a question may be easy enough. But how much? When? This is a question too, is it not?
Heuristics? You should be able to come up with a rudimentary rule set for certain things. And really the only limit to how accurate you can get is how much time you are willing to put into refining and lengthening the number of rules.
A good orator would read a more 'exciting' passage more quickly, and with more enthusiasm, punctuating key verbs and nouns. How is software to know which passages are more exciting, and which arent?
How do we know? By matching key words and phrases. Is there even an attempt at this?
It's not just a hard task for computers, but people too. Computers read aloud at about the same level as poor orator. Pho-net-i-call-y, in a dull drab monotone. Drop by the local high school, and listen to them reading shakespeare.
Even if it is too hard a task for a computer to leap beyond dull drap monotone for straight text to speech, do you know of any attempts at emphasis tags?
<quiet></quiet>, <excited></excited>
I find it really hard to beleive that this hasn't advanced at all since the 80s.
Re:Text to Speech, why so crappy? by lostchicken · 2002-11-22 06:58 · Score: 2

Voiceware's stuff here is really quite good. You just never hear the good TTS on desktops because the licenses are expensive, and only telcos can afford them.

You actually hear the voices all the time over the phone (recordings and such), but you just think it's prerecorded, and then spliced. I think part of GM's OnStar service may use TTS.

--
-twb
Re:Text to Speech, why so crappy? by stratjakt · 2002-11-22 07:03 · Score: 1

>> Even if it is too hard a task for a computer to leap beyond dull drap monotone for straight text to speech, do you know of any attempts at emphasis tags?

Sure, it's used in video games all the time.. Stuff like the commentator chatter in the background of a recent sports title. It's evolved beyond just a bunch of voice samples strung together.

It's also in windows.. Clippy the paperclip, merlin the wizard, those guys are generated by the "Agent" activeX control. They use text to speech, and you can augment it with metatags to make the speech sound more natural.

But it takes too much massaging of the input. To format, say, this post so it could be 'spoken' in a more natural manner, would take longer than it took to write it.

To have a computer do everything automagically, that computer would have to have a good understanding of the english language, which is fairly complex and full of subtleties.

Then take the example of most asian languages, where the enunciation of a given word completely changes its meaning.

--
I don't need no instructions to know how to rock!!!!
Re:Text to Speech, why so crappy? by fizbin · 2002-11-22 07:05 · Score: 2
If you just have a human person speak all the different phonetic sounds shouldn't it be a simple matter of stringing together those sounds in a relatively seemless way?

No. For the complete answer take an introductory linguistics course and pester the professor.

Short answer: speech doesn't work that way. When you cut phonemes away from the surrounding ones, they no longer sound like speech and you can't string them back together - the result isn't heard as speech at all, but a bunch of random chirps and vowel sounds.

This is also part of why speech to text is so hard; the sound graph of, for example, /k/ looks completely different depending on what other phonemes are in the same syllable. (and so speech to text can't really match at the level of phoneme very well, and has to back off matches to the syllable level or longer) Sounds which we interpret as "identical" when used in speech look completely different when you plot out the frequencies involved (or take a look at the data). About the only phonemes which can be cut-and-pasted in isolation are vowels, and only the middle parts of long vowel sounds do that particularly well.

It frustrates your intuition, but the initial and final sounds of "cook" are not the same to some sound-sensing device that isn't connected to the human brain's special speech processors. That's because the human brain processes speech-like sound so that you hear as similar those sounds which require similar positions of the tongue, mouth, and other organs humans speak with. There's also noise correction in there like you wouldn't believe, which is how you can still understand stilted Hawking-like text to speech.

I suppose that the ultimate text to speech machine would run an intense physical simulation of air being forced over human vocal chords and through a human mouth with a tongue moving just right for each word, but:
1. the processing time would be, to put it mildly, massive
2. Doing the motion capture for that would be difficult and possibly quite painful
3. You'd still have the issue of pronouncing words within the context of a sentence
Re:Text to Speech, why so crappy? by outlier · 2002-11-22 07:05 · Score: 2

The problem is a bit more complex than you make it sound. People doing text-to-speech development are smart, and would have jumped on this idea years ago if it were as easy at it sounds at first.

To sound natural, speech has to incorporate prosody and intonation as well as being able to support coarticulation.

Coarticulation refers to the fact that the sound of a phoneme (the smallest unit of linguistic sound) is affected by those that come before and after it.

It is not an easy problem, but there have been some nice advances in concatenative text-to-speech systems. For example here is a pdf about IBM's approach to the problem.

We're not there yet, but things are improving.
Re:Text to Speech, why so crappy? by TheOneEyedMan · 2002-11-22 07:50 · Score: 1

The Torah and Haftorah have trope. These markings uner the words are a combination of punctuation and musical notation used make the sounds of reading it aloud similar across a variety of readers. I can imagine a form of trope designed for machine readable intonation. Maybe even sucha notation could replace current methods of writing as it provides additional contextual clues for machine translation and decoding. That way new readers and machines could more readily decode more difficult ideas.

--
Reality is that which refuses to go away when I stop believing in it. --Phillip K. Dick (remove SPAM to email)
Re:Text to Speech, why so crappy? by frankie · 2002-11-22 07:52 · Score: 2

why does text to speech still sound as crappy as Steven Hawking's
It doesn't. Text to Speech has definitely come a long way since the days of Talking Moose and good old Fred. Victoria, for example, is completely intelligible even to my 2 year old daughter.
Re:Text to Speech, why so crappy? by Anonymous Coward · 2002-11-22 08:25 · Score: 0

IIRC Dr. Hawking once stated that the reason he has kept the stilted, robotic sounding voice, and not upgraded, is because people have now associated that voice to him.
Re:Text to Speech, why so crappy? by js7a · 2002-11-22 15:40 · Score: 2

It is a truism that speech recognition models can very often be "run in reverse" to synthesize speech. This is more time-intensive than some other forms of synthesis. However, the quality of such synthesis, which is usually easy to correct by ear, corresponds directly to the quality of the recognition models.

Re:Because it does not work. by Flabby+Boohoo · 2002-11-22 06:26 · Score: 1

The question was "why has this not happened yet". Using the stylus (ala Graffiti) will get the job done, but there is an accuracy problem, which is why keyboards are popular.

Yes, there is software out there that will get the job done I suppose, but because it is not too accurate, it never caught on with the public.

And since there is not a solid ROI, it has not happened. Someone would have to port something for moral reasons....

voice server by Steven+Rumbalski · 2002-11-22 06:27 · Score: 3, Insightful

Conjecture: Voice recognition on a PDA could work if you had a separate voice server over a wireless connection. So you have voice sent over a regular phone connection to you home pc (with modem) that does the recognition, it then spits back text (over another connnection?) to your PDA.

Some might say that this would make VR to slow. I don't see why this would be noticibly slower than doing VR in person. After all, when we talk on the phone the person on the other end hears us almost instananeously.

On a side note: my brother is doctor who uses VR to do his dictations. It is much cheaper than paying a transcription service. He also does not need to review the transcriptions afterwards for accuracy, because he essentially reviews it as he speaks it.

Re:voice server by Mad+Bad+Rabbit · 2002-11-22 06:37 · Score: 2

Couldn't this be done without a PDA, just a cellphone
capable of instant messaging (can you recieve those
at the same time you're on a voice call)?

--
>;k

Re:MOD PARENT DOWN! by MoCycleGeek · 2002-11-22 06:28 · Score: 2, Insightful

I don't think so.

He is correct, current markets go for the majority and don't bother for the minority (excepting small speciality groups).

Unless you show one of the big players how to turn it in to a cash cow, they won't put to much time or money in to it.

Tablet PC? by InnovATIONS · 2002-11-22 06:28 · Score: 2

Maybe as a middle ground this could be a good use for a Tablet PC, particularly since it would give you a bigger screen and interface for seeing and marking the text as it is input

Re:Tablet PC? by SunPin · 2002-11-22 15:10 · Score: 1

I agree with your point and used a Sony Picturebook for awhile with that in mind. However, wires were the main culprit that forced me to sell it--not background noise. As I mentioned in the post, discrete dictation has a higher tolerance for background noise than natural speech recognition so I know that my concept can work, it's just a matter of finding the right software/hardware setup. OQO seems to be moving in the direction of desktop power with handheld weight but until it's on a shelf for sale, it's vapor.
As an aside that has nothing to do with your comments, I have to categorically generalize everyone who made comments about desiring "Star Trek" quality recognition as idiots of the highest order. This isn't flamebait. Before commenting on this question, try to understand the reality of dictation. If you don't, just hop to another discussion.

--
Laws are for people with no friends.

Sharp Zarus + ViaVoice (or dragon for linux?) by Victor+Tramp · 2002-11-22 06:29 · Score: 2, Interesting

just an idea.. it's a handheld Linux based system, so why would this be such a bad idea? hell, while your at it, install festival, so it can talk back

yes yes, a scripting nightmare.. perhaps some enterprising programmers could start something on sourceforge or something..

its not like the technology isn't out there. It's certainly not perfect; the Zarus isn't big on storage space, and it's hardly cheap. and of course countless threads on the imperfection of voice recog.. blah blah.. but good enough is a fine answer on the path to [unattainable] perfection.

Anyway; Keep It Simple, Stupid:

Zarus + Microdrive + ViaVoice/Dragon libs [+ festival?] + glueware = handheld voice recognition..

what's the big deal?

--
US$0.02++

Re:Sharp Zarus + ViaVoice (or dragon for linux?) by Anonymous Coward · 2002-11-22 06:44 · Score: 0

Perhaps the Zaurus would need a CF sound card to pull this off?
Re:Sharp Zarus + ViaVoice (or dragon for linux?) by Locutus · 2002-11-22 07:40 · Score: 2

someone at Comdex said that L&H is doing work with voice recognition on the Zaurus. Don't know about Dictation...

--
"Anyone who stands out in the middle of a road looks like roadkill to me." --Linus

Audio to Audio? by Anonymous Coward · 2002-11-22 06:31 · Score: 0

All you people are talking about speech to text (and a few confused people are talking about text to speech,) but what about just recording audio on these pdas? Even with the storage limitations of most pdas you can still fit several hours of low bitrate and mono audio very easily. Do many pda's have speakers and mikes built in? If so this should be a fairly simple project.

It's always been my dream by The+Evil+Couch · 2002-11-22 06:32 · Score: 2, Funny

to use PDAs for World Dictation!

MARCH ON MY PALM MINIONS! Go forth! And ravage the world!

*cackles deviously*

--
The World's Worst Webcomic!

D I C T A T I O N by LJPeixoto · 2002-11-22 06:34 · Score: 1

Am I wrong or he just wants an dictation software (versus Speech Recognition) ???

ViaVoice for PocketPC exists! by RevAaron · 2002-11-22 06:35 · Score: 5, Informative

You can get a version of ViaVoice for the PocketPC. However, it sucks. It's not a real dictation system though- it only allows you do use a pretty small pre-defined group of commands, not general english word dictation. I was pretty disapointed. However, I wouldn't be surprised if eventually there will be a full-blown ViaVoice Embedded version for the PocketPC.

As usual, there are some results that come up with a simple Google search.

There was a Dragon Naturally Speaking beta for the Newton OS 2.1, and it works OK. But it's still a beta and is far from perfect.

If you're looking for voice recognition for other PDAs, including PalmOS or Linux devices, you'll probably have much less luck.

--

Working toward a usable PDA environment in the spirit of Newton OS: Dynapad

Re:ViaVoice for PocketPC exists! by jswitte · 2002-11-22 06:48 · Score: 1

Dragon Naturally Speaking beta for the Newton OS 2.1

Anyway, the beta only works with digits, ZIP codes, and a few other limited things. I can't even get it to work (I have the DragonHack2 package, though I haven't tried using the line-in or splicing in my own microphone.

Beta? Bah! Give me the source code, the format for the dictionaries, and the full Dragon API to interface it with NewtonScript as well as C++ (including the ability to try to recognize just straight phonemes). As well as a way to do the recognition straight out of the sound channel, instead of using the current kludge (I think, the code in ViewFrame is a bit hard to understand) of using a VBO to transfer frames from NS to the recognizer code in C++ (wasting time switching environments, and stressing the VM system with the VBO). Surely using large dictionaries wouldn't be a problem with a custom written FAT extension to Paul's ATA driver? But no, would the pin-striped suits at Dragon hear of it (no pun intended). Never!

Never *mind* that it's a porduct that will never make them any money that the Newton's dead. Never *mind* that the codebase used is probably 3 generations behind the current codebase (the current one is probably too processor-hungry for the Newton and would make the VM system absolutely choke on itself..) Hell, the codebase that DragonDemo uses is probably on par (if not below) that used by Sphinx, the open-source project. Sphinx would be a bitch to port, especialy with the state of C++ programming on the Newton being what it is, and I've been told it might be to slow even so..
Re:ViaVoice for PocketPC exists! by RevAaron · 2002-11-22 07:07 · Score: 2

Yeah it sucks. Not much more to say, other than Viva La Green! With DragonHack2, it worked with a friend's upgraded 2000, but not my 2100. I've no idea why, but I wasn't too excited to get it to work...

Sphinx might be portable, but yeah, it would probably require rewriting lot sof it to force it to live in the C++ environment the newton OS imposes.

--

Working toward a usable PDA environment in the spirit of Newton OS: Dynapad

Research is underway... by Cyclopedian · 2002-11-22 06:35 · Score: 5, Interesting

This place at the University of Washington is working on different model of speech recognition that could be conducive to PDA use (low-power, filter out extraneous info).

Basically, they are working to analyze speech in slices (phonemes) instead of the more computationally intensive task of the whole word. This would lead to a higher success rate and could be easily used across multiple accents of the same language (English, engrish, etc).

I'm excited about what they could accomplish there.

-Cyc

--
/.'s 10 Millionth

Re:Research is underway... by AndSoitGoes · 2002-11-22 09:25 · Score: 1

From your post I don't see anything particular new about this. Voice recognition has been done based on phonems for a long time. I did research on it 12 years ago.

The problem is that most of use do not pronouce ever phoneme in ever word and we certainly don't
hear every single one. Our brains take what does get across and fill in the rest.

Take a common phrase like
"did you eat?"
Which me and my friends said about every day as
we met after work or school. "did" has 3 phonemes.
but a lot of poeple skip the last "d" sound.
We shorted it almost "di gi eat" (notice how
different the "you" becomes)

Also some phonemes are very similar like m and n. Although these sound are produced very differently
m is by the lips and n with the tounge. But someone says monday you can distinguish m and n
because there is no such word as nomday. So even
with phoneme recoginition you still have to do word based matching and even sentence based matching to get high rates.

Another problem with phoneme is that the sounds are changed based on what phonemes some directly before and directly after. Doing speach therapy is hard because it very to demonstrate some sounds without adding a second phoneme. Most
people unconsciencely but an "a" sound after
a "p" sound when they try to show a child how
to say "p", which gets confused with "b". This
is hard for recoginition because the "p" in
"pa" is slightly different then the "p" in "pi".

Lack of a ADC/DAC is a big problem by Myrv · 2002-11-22 06:36 · Score: 4, Informative

Only recently have PDA's been shipping with anything approaching a good DAC and many PDAs still lack any ADC support. Without a good Analogue to Digital convertor built into the PDA you won't be able to do voice recognition. Remember that your 386 still required a soundcard to work properly. The same is true for PDAs today.

Re:Lack of a ADC/DAC is a big problem by dylan.ucd · 2002-11-22 09:20 · Score: 1

...and this is why the Quadra 840/av and its little sibbling the Quadra 660/av were able to do voice recognition before just about any other PC... they had DSPs!

in fact, the Quadra 840/av had 2 ATT DSPs with considerable processing power! there is even some software that will use the DSPs for various calculations!

so the answer is pretty simple -- put some high powered DSPs in one of those PDAs, coupled with some decent software, and you would have the very device that you desire!

the end

Talking back? by dirvish · 2002-11-22 06:37 · Score: 2

Dictation is fine...as long as the damned thing doesn't start talking back to me.

--
FoundNews.com - get paid to blog.,

Patience.... It's coming.... by jgrider · 2002-11-22 06:40 · Score: 5, Informative

(Disclaimer: I am currently consulting for a firm that is developing a Palm cradle with built-in dication/voice recognition capabilities for the medical transcription market...)

Since the asker wanted to know WHY nobody has done this yet, I'll spell it out:

Basically the major pitfalls to developing this are:
1) Crappy algorithms that mangle what you really said into something unrelated :)
2) Power Consumption
3) Interfacing to the PDA (not hard to do, but non-trivial)
4) Limited PDA capabilities (Remember that Palm's DragonBall is a RISC architecture, and things like speech recognition NEED floating point math which must be emulated)

The solutions:

1) Somebody (not unlike me...) has to code the already existing better algorithms (check the literature - speech recognition is a mature technology, and publications abound) into a usable chunk of code, instead of simply recycling ViaVoice or NaturallySpeaking's libraries.
2) Add more battery storage.
3) Use another processor to do the conversion, then simply write it to the Palm in a serial stream.

I would just wait about a year, then ask that question again to your physician friends, and see what they whip out of their pockets... :)

Major reason: Learnout & Hauspie's corporate d by Adrenochrome · 2002-11-22 06:43 · Score: 2, Insightful

In the late 90's there were 3 major SpeechWreck vendors: IBM, Lernout & Hauspie and Dragon Systems.

Microsoft poured a bunch of cash into L&H. L&H eliminated some competition by purchasing Dragon.
L&H did some highly irregular accounting tricks, got themselves thrown in jail, and took their comapny down with them.

End result: There is only really one speech recognition vendor at this time, IBM, and they are just useless at marketing consumer products.

Keep an eye on Phillips. They are currently spending big bucks developing their Speech Magic engine.

Your other option is to find a copy of Dragon Mobile. Record an audio file on your mobile, then have it recognized on your PC.

Not enough profit .... by mustangdavis · 2002-11-22 06:43 · Score: 2

1) Create PDA voice recognition software

2) ????

3) ???? (not profit!!!)

Seriously, TRUE voice recognition is only 99% accutate. It is bad enough trying to make corrections on a regular key board .. but on a PDA???? That would be rough!

Why not stick to using your laptop (which has MUCH more processing power) for voice recognition for now? You'll be able to run better software (software that does TRUE voice recognition, not phrase recognition) and have enough memory to run a text editor w/ spell check after you have completed your document.

This might be a great idea, but I think it might be a little ahead of its time ....

Just my two cents ...

--
HallmarkOrnaments.Com

Re:Not enough profit .... by tjcoyle · 2002-11-22 07:17 · Score: 1

Seriously, TRUE voice recognition is only 99% accutate. It is bad enough trying to make corrections on a regular key board .. but on a PDA???? That would be rough!
Evidently, keyboards are only 98.775757575757575757575757575758% accurate :)

Some thoughts... by RAMMS+EIN · 2002-11-22 06:43 · Score: 2

I have been wondering why speech recognition isn't more widely used as well. My conclusion was that there simply isn't enough interest in it. Companies won't make it until consumers are willing to buy it, and the consumers won't buy it until they are convinced it works better (and maybe even then they won't - see M$IE vs. the other browsers).

As an open-source zealot, I have to point out that Free software would be a solution here, as it is less concerned with profits. IBM seems to have open-sourced some code related to speech recognition, and there are a number of other projects out there, but even for open-source, there has to be sufficient interest in a project, and sufficient could mean _a lot_ in this case.

I think speech recognition is great, and I would use it if I used Windows. I just haven't found a good solution for XFree86 yet - not that I've looked very hard.

--
Please correct me if I got my facts wrong.

Re:Some thoughts... by Paul+Lamere · 2002-11-22 07:59 · Score: 2

There are some very active open source speech projects underway right now:
For synthesis, there's FreeTTS - a speech synthesis engine written in the Java(tm) programming language.
For recognition, there's Sphinx .
For embedded synthesis there's Flite A small fast tts engine suitable for embedding.

IBM has a mobile version of ViaVoice by 123571113 · 2002-11-22 06:44 · Score: 1

Maybe this is a step in the right direction for you: http://www-3.ibm.com/software/speech/handheld/vvms _sr.html

Battery life Anyone? by dmayle · 2002-11-22 06:45 · Score: 1

Has everyone missed the key reason here? When we use PDA's, we usually only access them for a few minutes at a time, which is why we're able to get such great battery life. But imagine if the processor was constantly running to do recognition? The battery life would be practically nil...

It's the battery by PCM2 · 2002-11-22 06:45 · Score: 4, Insightful

Palm applications, in particular, are designed around the idea of "forms" -- you put a form up on the screen, and then you sit there waiting for the user to do something. You don't run a constant loop listening to a microphone every minute, because that sucks up the battery like crazy. The Palm programming philosophy says that 99% of the machine's time should be, essentially, idle. Voice recognition, on the other hand, is very processor-intensive -- probably too much so for a pair of AAA's.

--
Breakfast served all day!

Re:It's the battery by Vocabularinist · 2002-11-22 08:18 · Score: 1

You don't run a constant loop listening to a microphone every minute, because that sucks up the battery like crazy. The Palm programming philosophy says that 99% of the machine's time should be, essentially, idle.

You could either use a walkie-talkie style speak button, or a simple speech detection algorithm that fires up the main recogniser when someone starts speaking.
Re:It's the battery by FallLine · 2002-11-22 10:39 · Score: 2

I disagree. While you're correct about the zen aspect, you wouldn't need to have a constant loop like that. You could launch the voice recognition app from Launcher as you would any other application and simply press have the user press a button and poll it every couple milliseconds (as in other applications) to determine whether or not you should be "listening." Presuming that the dragonball palms have enough processing power to handle (and a Mic) to handle this task in the first place, I see no reason why it'd need be unduly burdensome. It'd be pretty much a standard application.

Opps, wrong link... by 123571113 · 2002-11-22 06:46 · Score: 1

This should work: http://www-3.ibm.com/software/speech/handheld/vvms .html

Voicetype on PDA by schoolsucks · 2002-11-22 06:46 · Score: 1, Informative

http://www-3.ibm.com/software/speech/handheld/ipaq _fam.html

Re:Voicetype on PDA by Anonymous Coward · 2002-11-22 07:07 · Score: 0

link is 404

Cheap Palms by suitti · 2002-11-22 06:47 · Score: 1, Interesting

My $150 Handspring Visor Platinum has 8 MB RAM, a 33 MHz Dragonball (68000) processor with no cache or FPU. It is claimed to perform at about 5 MIPS, about what a 386/25 could do. It has a microphone, but it is connected to pins for springboard modules only. You have to have a module to use it. The placement of the microphone suggests that it's there so that a cell phone module could be built.

8 MB RAM should allow recording for over 5 minutes, uncompressed, of 22 KHz 8 bit sound. This is pretty good quality sound.

Given that you need a module for this unit anyway, you might add some hardware over just a D/A converter to make speach recognition quicker.

In any case, there's no reason that it can't be done. In fact, there are Palms with cell phones built in that can dial favorites in the address book using voice commands.

--
-- Stephen.

Re:Cheap Palms by Anonymous Coward · 2002-11-22 07:30 · Score: 0

I just want to know when the PDA manufacturers are going to get it right.

I can justify buy one until they have a huge amount of storage, huge amount of memory and some processing power to use it.

For example, I'd like them to double the current size of PDAs. Use 1/2 the new space for a power supply, and the other 1/2 for say an IBM micro drive ! ( nothing like having a Gig of disk space on a PDA) and more memory 128M or heck why not 256M. Use the space free'd up by the expansion for a more powerfull CPU....

EARS by david.given · 2002-11-22 06:47 · Score: 5, Informative

Lo, many years ago I had a lot of luck with EARS on my 66MHz 486. It's a very simple discrete trainable recogniser; you have to teach it every word before it would recognise it. But it was fast then, it should be really fast now, and was pretty decent for recognising simple commands.

Re:EARS by Anonymous Coward · 2002-11-22 08:04 · Score: 0

Simple commands...

"command"

"Format"

"C colon"

"enter"

"y"

"enter"

A different solution by victim · 2002-11-22 06:47 · Score: 3, Insightful

Tiny devices like cell phones and PDAs don't have the CPU power for sexy, high quality voice recognition. They do however have wireless connectivity. So, solve the problem this way...

Install voice recognition servers, network connected boxes with powerful CPUs and the best voice recognition software you can get your hands on. A voice recognition client then just needs to send the voice data up to the server and get the translation back, say 100kbps up and some tiny amount back.

The payback comes because most devices will only use voice recognition for brief periods, so will present a negligible load on the servers. The dictation users will place a higher load on the servers, but even there, I'm guessing there is a lot of pausing involved. I'm also going to guess that some lag is acceptable for dictation. Presumably the person is thinking about what they are saying and proof reading later. This load can be prioritized lower to allow better immediate response for people issuing voice commands on their mobile devices.

Power consumption on the portable device will probably improve. They will have to operate their transmitter (think "talk time" vs. "on time"), but they won't need 5 watts of CPU doing recognition. (Guessing from a mobile G3 PPC, further validated, considering that the CPU spot of my iBook gets far hotter under solid use than a cellphone.)

So, just to pick numbers out of the air, a dual processor, high end commodity hardware voice server might serve 500 pda users giving intermittent commands and 6 simultaneous dictation users.

A company or school could easily justify the hardware cost of this service.

Now, someone go out and build one.

Re:A different solution by Anonymous Coward · 2002-11-22 06:54 · Score: 0

Dude, the original poster pointed out that the machines that used to do dictation 10 years ago are less powerful than the PDAs of today. So, like, your idea was great 20 years ago.

Modern PDAs have the equivalent of a PPro 200 MHz with 32 MB ram. That was the first computer I ever bought, and it was faster than anything at the place were I worked for a year or so.
Re:A different solution by victim · 2002-11-22 07:36 · Score: 2

Dude... The original poster is deluded. 10 years ago dictation sucked. 5 years ago dictation sucked. (I've got boxes of the stuff that were purchased and stunk so bad no one used them.) I'm talking about dictation that actually works well enough to use.

Modern PDAs do not have the equivalent of a PPro 200. StrongARM 200s are about the same as a Pentium 90 for general purpose work, and right on useless for floating point work.

According to my compilation benchmarks (integer and data pushing, no floating point), a PPro 200 is right about 4 times faster than a strongarm 200. On the other hand the SA200s are little more than twice as fast as a 486-66dx2. :-)

The very latest xscale pdas are proably about twice the speed of the SA-110 200MHz ones, but I don't have any similar hardware to benchmark.

I just unlinked my benchmark page form the web because I hadn't updated it in years, but here is a link. Fun to reminise about all those machines I thought were so fast at the time...
metastones
Re:A different solution by TheOneEyedMan · 2002-11-22 07:58 · Score: 1

Is there enough fidelity over a cellular telephone's tiny microphone and noisy connection to process a full vocabulary. I know that one telephone company (AT&T I think) spent soemthing like a hundred million to develop the voice recognition for identifying calling card numbers and words like yes and no over the phone (with no other training.) I'm a bit skeptical that this would work well without a lot of user end training and a very clear phone call. Why not just use a wireless headset, a laptop, and existing voice recognition software. I assume the user has a motorized wheel chair so wouldn't this be much simpler? Less cool I guess...
but you'd actually get what you wanted.

--
Reality is that which refuses to go away when I stop believing in it. --Phillip K. Dick (remove SPAM to email)

Hooray! Former publisher of "The Onion" returns... by DeHar · 2002-11-22 06:48 · Score: 1, Offtopic

T. Herman Zweibel has returned to us and is posting to Slashdot!

Sir: Last we heard you were rocketing into deep space. What happened? Has your loyal servant Standish discovered the workings of the giant rocket-ship?

We rejoice at having your wisdom returned to us.

read the post by Daytona955i · 2002-11-22 06:50 · Score: 1

Actually if you read the post he doesn't actually say he was happy with it just that the requirements to run it were low. I think he was hinting at the fact that since most times things get smaller and faster that speech recognition software would as well.

The fact is that the speech recognition used to SUCK! It has gotten a lot better since then but only because of the increased complexity of the software.
-Chris

Re:read the post by Anonymous Coward · 2002-11-22 07:07 · Score: 0

Well, there are two interpretations here. Either, as you think, he's demanding that very good, modern speech recognition should exist on something as small as a PDA just because "things get smaller and faster", in which case everyone jumps on his case for being unrealistic. Or, as I see it, he understands this paradigm, but is asking for at least something to be available on PDA, because he is concerned with having something small and portable that can do dictation, even if it's only as good as older dictation software. Since such software, though it has its drawbacks, is completely feasible and yet doesn't seem to exist, I prefer to give him the benefit of the doubt and assume he's asking about the latter.
Re:read the post by Anonymous Coward · 2002-11-22 07:10 · Score: 0

P.S. Also note that this person seems to be 1) a slashdot geek, and 2) probably has more experience with dictation software than most of us will in our entire lives, on account of his disability, I think assuming ignorance on his part is silly.
Re:read the post by captain_craptacular · 2002-11-22 08:32 · Score: 2

Whoa there big guy. Since when did any software get smaller and faster as time went on?

--
They who would give up an essential liberty for temporary security, deserve neither liberty nor security
Re:read the post by drinkypoo · 2002-11-23 15:18 · Score: 2

We find new ways to do things that reduces the amount of processing power required. Of course, then we use that processing power we saved, plus the additional amount of processing power available because CPUs have been advanced significantly since it was written (this almost always happens) and we implement something else on top of it that it has now made possible. The cycle continues.
I know you were just talkin' some smack but I had this thought way down here where no one moderates and thought I'd wander OT.

--
"You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"

OfficeXP Voice Recognition by Anonymous Coward · 2002-11-22 06:50 · Score: 0

Microsoft Office XP has voice recognition / dictation that ships with the product. You might be able to get a small laptop running Office XP to do some dictation. I don't know how well it works, but if the 1990s application worked for you maybe this one is acceptable.

Whaddaya mean, no market? by ianscot · 2002-11-22 06:51 · Score: 3, Insightful

People have been saying there's no market, here.

You don't have to be disabled in some way to think this'd be handy, do you? That's the story for this one person, okay. But if you hadn't heard of a PDA ever before, wouldn't this be one of the most likely functions you'd think of for them? It's a totally natural application for a handheld gadget like that, and one that really would have a natural market among all the middle manager types who made Palms so popular to start with. Right?

(Are there PDAs that can even read text in the other direction, though -- text to speech?)

--
"Fundamentalism" isn't about divine morality. It's about human authority.

Discrete is passe by outlier · 2002-11-22 06:52 · Score: 2, Informative

Unfortunately for you, discreet speech is seen as passe by the major players (IBM, L&H, MS). For a long time, continuous speech was seen as the major boundry to widespread acceptance of general purpose dictation software (another boundry was the support of large vocabularies). Eventually, processor power and algorithms evolved to a point that both barriers were overcome and discrete speech (and small vocabs) were left by the wayside.

One byproduct of this was a decrease in voice error correction performance -- Most verbal corrections are single words (e.g., the user selects the misrecognized word, "foo" and repeats the intended word "bar" without any of the coarticulation cues that the continuous recognition engine relies on). The recognition of isolated words by a continuous speech recognizer is inferior to the performance of a discrete system, yet the major software companies removed the discrete recognition engines from their products. (for more on speech errors, see this or this pdf).

Anyway, the use of discrete recognition engines has been essentially abandoned by the major players, and seems to have been relegated to the specialty shops that cater to disabled users. One outcome of this is that there is very little innovation related to discrete speech because it was one of (many) historical barriers to the use of desktop speech reco. I can certainly understand the resistence by the big companies to go back to an "inferior" recognition engine for handheld devices. Most likely, speech reco on the handheld will emerge in a client-server environment with the speech signal (maybe somewhat processed) being sent from the handheld to a server for recognition, and the text being returned to the handheld. We probably won't see a general purpose speech recognition application (as opposed to a limited vocab application) that runs solely on a handheld until continuous processing can be done entirely on the device.

ASK SLASHDOT: Using Computers For Spellchecking? by Anonymous Coward · 2002-11-22 06:54 · Score: 0

I am a 'editor' that is 99% dependant on storyes from other news sights. My preference is 'inflamatory' speech because of very low chance of lawsuit and its effectively infinite ability to produse ad revanue. Over the years, my rants have devolved into mindless ramblings about: The M$ Conspirasy, why stealing from companies is ok, but stealing from me isn't, and why it is bad for some people to censor, but it is fine for others. There are no artilces that I shouldn't spellcheck. Problem is, I can't. Back in the day, the re-quirements for Word 1.0 were DOS, 4M RAM and 100M HD. Why can't I run 'ispell' on my fucking Dual Athalon 1900MP with a Gig of RAM and a 100G HD?

Thanks
Taco

2001 A space odyssey - HAL 9000 by ehiris · 2002-11-22 06:55 · Score: 2, Insightful

Maybe the way to approach voice recgnition through using air waves is all wrong to start with:

Bowman: "Hello, HAL? Do you read me, HAL?"
HAL: "Affirmative, Dave, I read you."
Bowman: "Open the pod bay doors, HAL."
HAL: "I'm sorry Dave, I'm afraid I can't do that."
Bowman: "What's the problem?"
HAL: "I think you know what the problem is just as well as I do."
Bowman: "What are you talking about, HAL?"
HAL: "This mission is too important for me to allow you to jeopardize it."
Bowman: "I don't know what you're talking about HAL..."
HAL: "I know you and Frank were planning to disconnect me, and I'm afraid that's something I cannot allow to happen."
Bowman: "Where the hell'd you get that idea, HAL?"
HAL: "Dave, although you took thorough precautions in the pod against my hearing you, I could see your lips move."

Re:2001 A space odyssey - HAL 9000 by Anonymous Coward · 2002-11-23 02:29 · Score: 0

Computers will never read lips. As a hearing impared individual who reads lips just to survive, I can tell you that you don't get enough information from lipreading. I need to use the little bit of hearing I have left to get any thing out of lip reading.

Only a mamillian brain has the power to translate bits of garbled speech and lip movements fast enough to get any meaning out of it, even then it's far less than 100%

Why hasn't it occurred yet? by wayn3 · 2002-11-22 06:57 · Score: 1

Why hasn't it occurred yet?

Voice recognition has been "two to five" years away since for the past twenty years. The reason it hasn't occurred because it is too hard. The "two to five" year quote are from overzealous marketing folks.

It won't happen for quite a while because the emphasis has been moved away from wide-scale voice recognition to having specific recognition requirements, such as, recognizing numbers, "yes" or "no," few verbs, or recognizing an individual's voice commands.

You would have to look to academia for answers, because the funds for the research dried up in the comercial market.

Re:Why hasn't it occurred yet? by Anonymous Coward · 2002-11-22 07:02 · Score: 1, Insightful

The point of the article was that the poster was quite happy with what voice recognition could do on his desktop. So your blathering about it not being there is missing the point. The posters question was why hasn't the software which works fine on older PCs been ported to PDAs which now have equivalent or greater computing power ?

If I were as obtuse as you, I would avoid posting on the intelweb for fear of people finding me out.

dragon solution by Slashdotess · 2002-11-22 07:01 · Score: 2

Actually, Dragon Naturally Speaking Doctor's Edition comes with a special USB dictaphone that plugs into the computer and translates the voice into text using Dragon's software. I'm not sure if it works with anything but Windows but its certainly cheaper than hiring someone to do it.

Re:dragon solution by swv3752 · 2002-11-22 07:25 · Score: 2

Sony's IC recorders come with Dragon Naturally Speak Standard Ed. and do the same thing. Agian Windows only. PEGNX70v will let you do voice recordingds but I think it just doubles as a IC recorder. Some of the other pdas on the market do the same. That is probably the best that is available right now.

--
Just a Tuna in the Sea of Life

Another whacky idea... by RevAaron · 2002-11-22 07:04 · Score: 2

You could try running this version of VoiceType within PocketDOS on a Handheld PC 2000 or PocketPC machine... Or, you could find an older PDA that has a 486-class processor. Not sure if PocketDOS can handle sound, but it worked great for running Lotus Agenda on the Jornada 720... Not a DOS shell for WinCE, but a x86 emulator with DOS installed.

--

Working toward a usable PDA environment in the spirit of Newton OS: Dynapad

Speech-to-text? by thatguywhoiam · 2002-11-22 07:08 · Score: 3, Insightful

I think the question really is one of processing power, and pattern recognition. I have yet to see any truly impressive speech technology beyond what was available on a Mac in 1994.

The poster's question brings to mind a thought I've had lately, though, on PDAs and smart mobile phones. I've recently 'switched' from a Visor to just using my Sony Ericsson T68 as an organizer. Works great with iSync, etc.

The Palm-with-phone always made more sense to me than the phone-with-organizer. It seemed that the phone part could change shape - I could stick it in my ear in the form of a headset, with a connector to the Palm. A phone I need to hold up to my head. I can't surf with something held against my head that way.

However,

I've realized that I need a phone more, and more importantly, I only enter very small bits of text into the Palm. Furthermore, I spend much more time looking up things than entering things (as I use the Mac do enter data whever possible).

This led me to the conclusion -- the one thing we are missing from the organizer/phone landscape, as the poster asked, is some kind of speech-to-text.

If I could literally hit a button and say "lunch with Dave next Tuesday" and have it enter that as live text... blammo. No more Palm, no more stylus. The phone already listens to voice commands. If it took short notes/appointments, I could literally walk around, call people, make appointments and notes, and not take the thing out of my pocket. Nice dream.

*sigh*

--
If Jesus wants me it knows where to find me.

Re:Patience.... It's coming.... by CommieLib · 2002-11-22 07:10 · Score: 2

ask that question again to your physician friends, and see what they whip out of their pockets

Is that a dictaphone in your packet or are you just happy to see me?

--
If your bitterest enemies are people who hack the heads off civilians, then I would say you're doing something right.

hello world by _ph1ux_ · 2002-11-22 07:10 · Score: 4, Informative

and it also has 8k of storage! You could store all of "Hel" on that thing

New fangled 386 16? by The+AtomicPunk · 2002-11-22 07:12 · Score: 1

Those were pretty damn old in 1990, when the 486/33 was king. =)

Re:New fangled 386 16? by Junta · 2002-11-22 08:27 · Score: 2

I thought the 486/33 was king about 92 or 93 maybe... I'm a bit fuzzy, but I remember the Pentium 60 was out about mid-94, and the DX2/66 was until then the holy grail, so it seems like 93 should have been about right for 486/33... I think about 90 I got a 286 and it was considered a decent performer at least...

--
XML is like violence. If it doesn't solve the problem, use more.
Re:New fangled 386 16? by Sandman1971 · 2002-11-22 08:36 · Score: 1

I hate to differ. I purchased my first IBM type PC in 1990 (previous to that I had commodores) in the fall of 1990 (first year of college). At the time, the best affordable PC you could get was a 386-16 (2500$ Can). Though 486's did exist, they cost about the same as a small car. So in this region, 386/16 was still King in 1990. 486's didn't really take off until a year or so later.

--
It's better to burn out than to fade away
Re:New fangled 386 16? by The+AtomicPunk · 2002-11-26 14:14 · Score: 1

Just because you were slumming doesn't mean the 486 wasn't "King". Geos are popular, I don't think many people refer to them as the "King" of the automobile.

I was just a kid, and I somehow managed to afford a 486/33 in 1990 without parental contribution, so I'm not sure what your problem was.

Newton had this.... by otis+wildflower · 2002-11-22 07:13 · Score: 1

... at least in beta form, from Dragon. Compatible with the Newton 2x00 series, which featured a StrongArm 110 processor @ 162mhz (which is only today being matched by PocketPCs), the big limitation was in the available memory for vocabularies. With 64-128MB of RAM, 128-512MB flash memory in upcoming handhelds, it should be easy to do voice (particularly in an IBM branded Linux PDA? :) recog. Talk about a killer app...

Hey Apple, how about a new PDA with voice recog, inkwell, color, bluetooth + 802.11, compactflash and a small OS X kernel?

There are ViaVoice for iPAQ PDA ... by Anonymous Coward · 2002-11-22 07:13 · Score: 1, Informative

See

http://www-3.ibm.com/software/speech/handheld/ip aq _fam.html

Plantronics makes Sound-Cardless Headphones by scotpurl · 2002-11-22 07:21 · Score: 3, Informative

Plantronics makes several headsets with microphone that only require a USB connection, but do not require a sound card. They work quite well, and this should lower the hardware requirements for a small, lower-powered device.

http://www.plantronics.com

and search for their DSP-*00 series. I picked up their DSP-500 (normally $110) for $40 on a deal.

Re:Plantronics makes Sound-Cardless Headphones by RAMMS+EIN · 2002-11-22 08:11 · Score: 1

``Plantronics makes several headsets with microphone that only require a USB connection, but do not require a sound card.''
Something about that makes me think their solution replaces the souncard by a piece of software, naturally limiting the device to the platforms supported by Plantronics (unless they release specs, but most hardware vendors seem not to). This really smacks of Winmodems to me.

--
Please correct me if I got my facts wrong.

Exactly! by Anonymous Coward · 2002-11-22 07:24 · Score: 0

Since when do corporations sell us what we need instead of creating a previously non-existing demand?

zeke

misconceptions of misrecognition by Vocabularinist · 2002-11-22 07:28 · Score: 1

Everyone has gotten so used to the idea that computers will do exaclty what we tell them. SR will never be 100% reliable (or even 99%) because of the noisy communication medium - air. Therefore you will always need some handy error correction protocol (commonly called dialogue). Have you ever wondered about how well people recogize speech. If something is blurted out at random we rarely catch the meaning first time. "What?". If humans have a lot of trouble understanding each other (about 20% error rate) then computers have no chance when it comes to out-of-the-box out-of-the-blue dictation. And computers don't have the benefit of a decade of childhood, not to mention millions of years of evolution. What I'm getting at is that computers need a great deal of context to succeed (to reduce the number of possible interpretations, and therefore the number of ways of getting it wrong). (I'm speech recogition engineer - our company went bust last year - another dot bomb). 1) the algorithms are good (trust me, i've seen them) 2) the training takes bloody ages - it takes weeks (and tera-bytes of data) to get good results across most of the speaking population. 3) dialogue is very hard. 4) actual recognition is fast (we had dozens of simulateous recognitions on 600Mhz machines). The take home message: Train the users. Manage expectations. Say bye bye to HAL.

Distributed Speech Recognition by kylef · 2002-11-22 07:28 · Score: 3, Interesting

It is interesting that I JUST did a project on this subject for a Ubiquitous Computing class... My project was called "Distributed Speech Recognition." Here is a link:

Distributed Speech Recognition Project

I also have heard it through the grapevine that the big voice recognition companies are working on exactly this technology... I wouldn't be surprised if Speech .NET includes support for something like this in the near future. I believe I read on some website that support for Speech API on PocketPC was coming soon...

misconceptions of misrecognition by Vocabularinist · 2002-11-22 07:32 · Score: 2, Informative

Everyone has gotten so used to the idea that computers will do exaclty what we tell them. SR will never be 100% reliable (or even 99%) because of the noisy communication medium - air. Therefore you will always need some handy error correction protocol (commonly called dialogue).

Have you ever wondered about how well people recogize speech. If something is blurted out at random we rarely catch the meaning first time. "What?". If humans have a lot of trouble understanding each other (about 20% error rate) then computers have no chance when it comes to out-of-the-box out-of-the-blue dictation. And computers don't have the benefit of a decade of childhood, not to mention millions of years of evolution.

What I'm getting at is that computers need a great deal of context to succeed (to reduce the number of possible interpretations, and therefore the number of ways of getting it wrong).

(I'm speech recogition engineer - our company went bust last year - another dot bomb).

1) the algorithms are good (trust me, i've seen them)
2) the training takes bloody ages - it takes weeks (and tera-bytes of data) to get good results across most of the speaking population.
3) dialogue is very hard.
4) actual recognition is fast (we had dozens of simulateous recognitions on 600Mhz machines).

The take home message: Train the users. Manage expectations. Say bye bye to HAL.

Not Much Real Progress in 10 Years by reallocate · 2002-11-22 07:44 · Score: 2

Back in 1990, the requirements for IBM VoiceType were: DOS, 8MB RAM, 10MB of drive space with one of those new-fangled scorching 386-16MHz processors...

Except for web browsing, back in 1990 I'd have to say I was doing everything I'm doing now, with DOS running on a 386. Makes me wonder what real progress we've made. At the time, I ran DesqView, Lotus Magellan, Lotus Agenda. Brief, Word 5.5 for DOS, Borland's Turbo C and Turbo Pascal, TopsSpeed's Modula-2 compiler, the MKS Toolkit (a KSH shell in each DesqView window), and assorted odds and sods. To this day, I've seen nothing on any platform to equal Brief and Magellan, and the rest of the bunch were no slouches either.

Since then, I've spent thousands of dollars on new hardware and software with only a mrginal increase in capability. Yeah, sure, the fonts on my monitor look better, but what price prettiness?

--
-- Slashdot: When Public Access TV Says "No"

Re:Not Much Real Progress in 10 Years by Anonymous Coward · 2002-11-22 08:00 · Score: 0

Except for web browsing? Doesn't that seem like a large exception? I don't think you made this post from DOS.

I don't think that you're going to find much argument that the largest increase in functional technology exists in your "exception."
Re:Not Much Real Progress in 10 Years by reallocate · 2002-11-22 08:19 · Score: 2

No, I didn't make the post from DOS, but I might have. Graphical browsers for DOS exist, and I've used them.

We owe the popularity of the web to infrastructure and a version of Netscape that ran on Windows 3.1. I'm not sure I see a great deal of innovation there.

--
-- Slashdot: When Public Access TV Says "No"

Re:D I C T A T I O N by Senior+Frac · 2002-11-22 07:44 · Score: 2

That's the way I read it, yes. Didn't the author only want integrated replayable audio? A little tape recorder functionality in his palm OS device? I didn't see anything in the original article that suggested speech-to-text.

Market Forces. by gurps_npc · 2002-11-22 07:46 · Score: 1

Keep in mind the following: True High Quality Voice Recognition is the next real Operating System Killer "feature". If Linux gets it going reliably before MS, it could wipe out MS, simply because voice commands are so user friendly. If M.S. gets it up and working before Linux does, Linux may end up the next "Apple". That is why MS is interested in it so much, while IBM etc. do not care. True Medium Quality Voice Recognition is the holy grail for PDA type devices. It would allow people to drop the cumbersome stylus / Graffiti / micro keyboard data entry system, freeing up things like the Watch version of the PDA. These things ARE heavily desired and researched, but what you considered to be "acceptable" quality using old hardware never took off because it was not really acceptable - it sucked. The technology - both hardware and software are just not up to most people's needs yet. Try again in 5 years.

--
excitingthingstodo.blogspot.com

Re:Market Forces. by BrainInAJar · 2002-11-22 08:21 · Score: 1

No it's not.

I can't remember where I read it (likely newscientist or something) but the mind can focus on typing and thinking, but when you engage the language centres of the brain, the rest of it suffers. Voice recognition is a great thing for people with disabilities that prevent them from properly typing, but for the rest of us it's just a neat feature that'll likely never get used.

And besides that, I can type much faster than I can speak

Writing by Anonymous Coward · 2002-11-22 07:50 · Score: 0

"I'm a writer that is 99% dependent ..."

should be

"I'm a writer who is 99% dependent ..."

speaking of writing.

Re:i hope that voice recognition never really flie by kevin+lyda · 2002-11-22 08:02 · Score: 2

what?

this guy can't type. sorry it would disturb your sense of aesthetics but he needs vr to DO HIS FUCKING JOB. so i hope it does take off. and when it does, just buy earplugs until you can have some consideration for people who might not be as fortunate as you.

--
US Citizen living abroad? Register to vote!

Hey! by Akardam · 2002-11-22 08:03 · Score: 4, Funny

If I can find a machine to wash my dishes AND clothing, I'd say that'd be pretty cool!

Re:Hey! by Ponty · 2002-11-22 08:17 · Score: 2, Funny

"Honey, why is there a fork in the sock drawer?"

"Sorry dear, I'm an idiot."
Re:Hey! by NanoGator · 2002-11-22 11:16 · Score: 4, Funny

"If I can find a machine to wash my dishes AND clothing, I'd say that'd be pretty cool! "

Get a wife.

(I hope you're all happy, that comment cost me an expensive dinner.)

--
"Derp de derp."
Re:Hey! by sean23007 · 2002-11-22 20:13 · Score: 2

Well you could eat off your clothes or wear your dishes, and your problem would be solved! But I don't think anyone would think you're particularly cool... yum.

--

Lack of eloquence does not denote lack of intelligence, though they often coincide.

Re:Patience.... It's coming.... by leandrod · 2002-11-22 08:05 · Score: 3, Informative

> Palm's DragonBall is a RISC architecture, and things like speech recognition NEED floating point math which must be emulated

Dragonball's Motorala's, not Palm's. It is a CISC, not RISC, more specifically a M68K. RISC is usually better than CISC at floating point, but both architectures can go without a floating point unit, and that's what Dragonball does.

--
Leandro GuimarÃ£es Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin

Laugh all you want by sjbe · 2002-11-22 08:05 · Score: 2

It isn't a obsolete term, just an uncommon one. My wife is a doctor and has to dictate patient reports constantly. Guess what they use? Yup, dictaphones. And that is what they call them too. Sure it's nothing more than a voice tape recorder of one type or another but I honestly can't think of a more appropriate term. (more common but not more appropriate)

NC microphones by Andy+Dodd · 2002-11-22 08:08 · Score: 4, Informative

Some mics do this mechanically also - They have a port on the reverse side of the mic element so it only detects pressure differences between the two sides of the mic, i.e. only nearby sounds coming from one side of the mic (your mouth). Plantronics has plenty of these - Such NC headsets are common thanks to cellular telephone handsfree kits being required by law in some states, and they are quite good. (I love my Plantronics headset.)

--
retrorocket.o not found, launch anyway?

Better??!! by Anonymous Coward · 2002-11-22 08:10 · Score: 0

I used VoiceType for a while, as it came free with OS/2 Warp 4.0. (Anyone remember that?) It was around 1994, and even on an FPUless machine (a NexGen, remember those?), it performed reasonably.

A few years later, around 1999, I found myself in college, and a friend of mine had just acquired the latest natural speech product from Philips. Accuracy? Zero, even after we spent hours training it.

Given that hardly anything has advanced since 1999 (except *BSD and Linux ;)), I somehow doubt today's offerings are any better, even if they do take twice the space and CPU time.

Re:Patience.... It's coming.... by ivanandre · 2002-11-22 08:18 · Score: 1

And what is the problems arquitectures RISC have with floating-point math?
I thought this was the place where they shine!

Actually it does exisit by Anonymous Coward · 2002-11-22 08:21 · Score: 0

From what I remember, and I may be wrong, but the current series of Ipaq PDA's from Compaq/HP actually ships with IBM ViaVoice on the CD if not on the machine.

Check out: http://www-3.ibm.com/software/speech/handheld/ipaq _fam.html

Dan

Real issue: no floating point on handhelds by Anonymous Coward · 2002-11-22 08:22 · Score: 0

Although on the surface a PDA appears to have the required "horsepower", in reality the handheld hardware is much different than PC.

The biggest factor is that the majority of PDA's have no floating point unit and practically all algorithms for audio applications use floats.
This is the reason there was no mp3 for PDA up until recently, the codecs had to be written with only integer math.

In fact, I was involved in specing a project to port speech recognition to a PDA. The entire project was just about rewriting all the float operations.

Besides that problem, all of the accurate dictation software is not really based on "audio recognition", it's based in contextual lingustic analysis. Raw conversion of speech to text only averages 70% to 90% depending on the clarity of the speaker. All of the accurate systems use context to create high accuracy. For example, the chance that a English sentence starts with "The" is much higher than it starting with "Sea".

For this reason, dictation systems require huge contextual databases and complex analysis subsystems. So the actual reason why there are few "PDA dictators" is probably more due to disk space and access time issues.

Yes you've been living under a rock. by Stoutlimb · 2002-11-22 08:24 · Score: 2

'nuff said.

Approach is all WRONG! by Anonymous Coward · 2002-11-22 08:25 · Score: 0

I took a class on Speech Recognition and Voice interfacing at the University of St. Thomas. (See here: http://www.gps.stthomas.edu/ ) Taught by Dr. Wayne Lea, reference his book for further info: Lea, Wayne A. Editor (1986). "Trends in Speech Recognition" (Speech Science Publications, Apple Valley, MN)

A few interesting things about voice recognition: The approach to date has largely been one of signal processing. Idea being that we have to develop algorightms to map these sound waves to similar speech sound waves. SW like IBM's and Dragon's continually reqires more powerful processors and bigger storage space because they "improve" this software by adding more algorightms and a bigger database of sound waves to refer to. The only real improvements in speech recognition in the past few years have been due to hardware improvements.

The approach is highly flawed because it does very little to recognize that the "signals" are speech, not just random sound waves that need to be matched (somehow) to similar sounds waves. Speech has rules in it's production, it is limited by the articulators (parts of the body making the sound--throat, tounge, etc) and even more ruled by our "innate" rules for speaking. There's this funny little academic field that studies exatly these subjects, called Linguistics.

Linguists know that speech has many levels from which to begin recognizing it, from a high level where we can find Sentences to the lowest level of a morpheme. By processing speech "top down" like this, and chopping it into smaller and smaller more recognizable pieces it is possible to efficiently process speech. (This is a very, very high-level summary of how this could work). Until the APPROACH is made more intelligent, forget about "slim" applications that could run on a PDA, cellphone, or embedded apps like for cars/fighter jets, or for consumer electronics.

This field needs less electrical engineers and more linguists, but all the major players (L&H, IBM, etc) continue on with their brute force signal-recognition approach, relying almost exclusively on more muscular hardware for "improvements".

--Jonathan

Nope, nothing WinModem about it by scotpurl · 2002-11-22 08:28 · Score: 2

The DSP portion of these (Digital Signal Processing) is a large block about 4" x 1.5" x 0.5" in size, and is attached between the USB connector and the headphones. It contains the sound card (or whatever DSP they're using) for the earphones and the boom micrphone. No offloading of stuff to the CPU.

The PDF file with the description of the device mentions Windows and Mac platforms are supported, so it sounds like they haven't written a driver for other operating systems.

The DSP unit is described as a 5-channel, 16-bit, 48kHz data processor from USB, and 24-bit 100dB signal to noise CODEC, with a 32-bit digital audio processing unit.

But don't trust me. Read the brochure yourself on their website.

Re:i hope that voice recognition never really flie by univgeek · 2002-11-22 08:29 · Score: 1

Everyone would not be using voice for everything...

The keyboard and mouse are much more powerful in their own ways, and I think most programmers might prefer that.

Come to think of it though, imagine a dictation engine tied to your programming language of choice. The number of possible elements is highly reduced, and it might actually be easier to code....

--
All bow to his Noodliness!! His Noodle Appendage has touched me!

ack! by Stoutlimb · 2002-11-22 08:35 · Score: 3, Informative

Ack it filtered out my URL.

http://www.dictaphone.com

Re:Because it does not work. by Anonymous Coward · 2002-11-22 08:38 · Score: 0

There are other options. Did you ever see that South Park with the cripple fight? I think if you started a real-life version of that you could make a lot of money. Then if you're rich you can pay someone to take dictation for you, probably a naked girl even.

We're working on that. by davids-world.com · 2002-11-22 08:41 · Score: 3, Interesting

Well, it two or three years, you will be able to buy something like that. We're working on it (MIT's Media Lab Europe).

until recently, the PDA processors were not good enough, but that is changing rapidly (even though there is, in my view, little use for so much power except language technology).

The resulting dictation systems will not replace conventional keyboard input for a while, however, as recognition rates are .97-.98 (accuracy), and that's a wrong word in at least every second sentence. In comparison to low-bandwith input, however, (as in the PDA with the stylus or as in the author's case due to a fine-motor dysfunction), voice recognition is very competitive.

cheers from dublin.

appalled by the negativity! by peter303 · 2002-11-22 08:46 · Score: 2

The correspondent said it was working ten years ago, and bunch bozos said it still too hard.

The only reason I can think of is that these small machines aren't as open to general developers as the generic PC, so you dont see as many niche applications.

Re:i hope that voice recognition never really flie by greechneb · 2002-11-22 09:08 · Score: 0, Offtopic

I didn't mean for him, I just meant I hope it doesn't take off for general business use. I don't wan't people talking like crazy all the time trying to get their computer to understand. I'm glad for anything that helps people with disabilities do their jobs. One of those being my father, who only has one arm. I just don't want to have to listen to everyone speak slowly so their computer will recognize their voice.

Re:i hope that voice recognition never really flie by CrazyJoel · 2002-11-22 09:11 · Score: 1

"Cellphones are bad enough, imagine if everyone was talking to their computers. The noise would be terrible."

I imagine keyboardless systems that might work by either sign language or whisper systems that translate the positions of the anatomy of your face and voicebox to text.

Of course, it will turn out that whispering is bad for your voice.

--

Such is the infinite Grace of Popeye.

Google Search Time! by hackwrench · 2002-11-22 09:16 · Score: 1

"speech recognition" "open source"

Re:i hope that voice recognition never really flie by Anonymous Coward · 2002-11-22 09:21 · Score: 0

Even better, the government could just place some microphones around in public/private places and collect much more easily processed. Think "The words bomb and 'bomb', 'martyr' and 'allah' were just heard on floor 2 near room 532."

Dynaspeak by epityrum · 2002-11-22 09:22 · Score: 1

SRI is trying commercialize its speech recognition system for small devices, called dynaspeak. You can get more information at http://www.dynaspeak.com/. While this isn't off-the-shelf, the version that I've played with is pretty usable.

NY Times Article... by Multiproximus · 2002-11-22 09:22 · Score: 1

Due to my lack of NYT registration, I can only point you to this link, which gets you to the NYT article. I assume it will discuss what you are asking. http://nooface.com/articles/01/10/11/1429210.shtml Good luck in your search. MP

--
Made with massively parallel wetware.

goalposts move by zenst · 2002-11-22 09:27 · Score: 1

Indeed the technology required back then is iin all essence today's PDA's but anything a programmer touches or a marketing peeps see's instantly gets rejigged/revamped X,Y,Z added (sure there is one peep who actualy uses function Z somewhere) and the ballpark moves. You see CPU and technology is like money - the more resources you have the more ways it gets spent. ALso speach recognition back then was just single person with distinct well trained accent/language. Today your'd be hard pushed to get a marketing dept to sell anything that didn;t have flash graphics, Multi user and language capabilities etc etc etc. Yes it could be done and until some software company takes the approch that supermarkets have and does a basic simple range of software then the requirements will keep on mooving on. Best bet IMHO is to use your PDA as a digital recorder and the/A PC to do the backend processing. I mean do you need instant translation - probably not unless it a universal translator and there still aspiring to that nivana.

NLU - Natural Language Understanding by Anonymous Coward · 2002-11-22 09:36 · Score: 0

I think what you are anticipating is the release of consumer oriented NLU software. NLU software does exist for embededded devices as well as for the desktop. In the near future you will see these products marketed.

Use Zaurus by ajaygautam · 2002-11-22 09:39 · Score: 1

I have used Sharp Zaurus SL-5500 to capture voice. At low bitrate settings, it creates a small files.

Then you can add in 1 GB Compact Flash cards and have a long conversation recorded :)

--
http://www.ajaygautam.com

Re:Major reason: Learnout & Hauspie's corporat by morgue-ann · 2002-11-22 09:56 · Score: 1

It looks like Dragon's "modern" products (Naturally Speaking) are still published (& maintained?) by ScanSoft. Some vendors still show the "obsolete" discrete-recognition Dictate available (Dragon Dictate Power Edition 3.0 $189). If you Google for the part number 01-022-24-01 you'll find other vendors.

L&H also took down Kurzweil (the K. that did voice recognition, not the K. music systems co.).

Kurzweil made a discrete recognition system (Voice) that I used for programming for a while when whateverthefuck is wrong with my hands was kicking up.

I ran it on a Pentium 120MHz laptop w/ 1GB hard drive & it worked pretty well.

I still have the CD's, but it'll only run on Win95, not Win98SE.

Discrete is better for programming because continuous relies on the underlying syntax of the natural language (e.g. English) to hint it at reaonable words for ambiguous sounds. In programming, I say things like "left paren eye spacebar equalsign spacebar..."

Dragon's big advantage over Kurzweil was that it could be used completely hands free (including mouse) so they got the disabled market.

The discrete recognizers had to be trained (Voice called it "enrolling") so they'd know how you pronounce the phonemes, but they had huge vocabularies- you didn't have to speak every word, just enough to represent every sound. The version of Voice I had let me add words, so I could add gss_SetPixelMap & other symbols. There was a way to import a whole list of words, so I munged the output of nm & spent a couple of days training it to recognize every function, type, and variable in the PowerTV operating system.

Kurzweil Voice & Dragon Dictate had extensive "correction" features because they made a lot of mistakes. This started as very irritating, but you'd get used to it & you could talk ahead a few words, then correct the 3rd word back, then continue where you left off. When correcting, it would present a list of close-sounding words to pick from or you could spell out the word & add it if it was new.

There was even a "you keep screwing up" helper where you could give it two words that it confused a lot & re-train them.

One problem with speaking discretely (pausing between each word) is that the back of your throat takes more wear & you can start to go hoarse, causing the recognizer to get worse. Yer supposed to avoid carbonated beverages @ lunch.

Instead of running these old apps on a PDA, why not run 'em on a Crusoe subnotebook or something like the OQO?

didn't the 386 require an expensive add-on card? by freakmaster · 2002-11-22 10:42 · Score: 1

My buddy in college ran some sort of voice recognition on his 486 (he was handicapped). he had a card in there which he said cost several thousand at the time. this not to say that handheld's today can't do it. but perhaps the cpu power required is at least not THAT trivial

You can do it! by Anonymous Coward · 2002-11-22 11:06 · Score: 0

You can do it, just get an OQO!

www.oqo.com!

A personal story by kirkjobsluder · 2002-11-22 11:08 · Score: 1

Perhaps some real world experience is needed in this discussion. About a year ago, an extended career of bad typing habits (probably due to early childhood training on manual typewriters) ended up in a very nasty case of Repetitive Motion Disorder (RSI). Basically the recommendation was to stop typing at all for an extended period of time, followed by extensive retraining. At that time I started working with Dragon NaturallySpeaking.

In my experience, the software does a pretty good job of recognizing dictation. It does a better job if I remember not look at the screen while I am talking, a habit that is very difficult for me. Continuous speech recognition does better if you give it very long phrases as opposed to very short phrases. Yes, the text does require correction, but text requires correction no matter what input method you happen to use. Basically, dictation software means the difference between doing 500 painful words a day, and 2000 easy words a day. I don't need to continuously train the software, only when I come across a new word not in the dictionary. The main problem is that the primary text errors are not spelling typos that are easy to spot, but homonym errors that are more difficult to spot.

The hardest part is that it forces me to change the way that I write. Formerly I was a very "Beethovenian" writer. This term was inspired by the composer's tendency to use literal cut-and-paste to edit his documents pasting on as many as a dozen layers of new manuscript on top of a phrase until it was perfect. When typing, I tend to delete text several times as it is written. Speech recognition works best if you can dictate an entire paragraph and then edit afterwards. In most cases editing is fairly simple because most mistakes come up in the most probable match list. Speech recognition is probably not as good as typing, but for me is certainly better than the alternative of pursuing a career based around "would you like fries with that?" After all, Henry James suffered the same problem later in life and dictated most of his later manuscripts.

My primary irritation is that the speech models work very well for writing expository text, but they don't work very well for writing programming code. It would also be nice if Dragon NaturallySpeaking came with a better text editor, one that can handle multiple files opened at once.

off-line recognition by g4dget · 2002-11-22 11:11 · Score: 2

You can record on your PDA or digital recorder and then have it transcribed on your PC.

I think the problem with the older speech recognition systems was that they weren't good enough for most people. Also, making them work on low-end processors is a lot of work--it requires a lot of optimization and assembly language programming. The market just isn't enough to make that kind of investment.

Move over Harry Potter... by NanoGator · 2002-11-22 11:20 · Score: 3, Funny

"Over the years, my computer use has de-evolved to programming, FTP, email (Mozilla), word processing (OpenOffice) and Ricochet."

I'd say this guy found the magic combination of words to get his article posted on Slashdot. Heh.

--
"Derp de derp."

My PDA does this... by CODiNE · 2002-11-22 11:26 · Score: 1

Buy a Handera 330... CompactFlash and SD/MMC cards, built in microphone, save directly do them, export in .WAV format to your PC, do what u want with em... one button record, just hold down the record button and it turns on and starts up, no prob, works great. =)

http://www.handera.com/products/330specs.asp

-Don.

--
Cwm, fjord-bank glyphs vext quiz

Re:i hope that voice recognition never really flie by fishbowl · 2002-11-22 11:58 · Score: 2

So many people make the claim that people doing dictation would be too noisy for an office. Never seen the secretarial pools of the previous generation? Never worked in a call center where EVERYONE is CONSTANTLY talking on the phone? I just don't see why it's so easy to dismiss "dictation" as being impractical. It hasn't been that long since it was THE NORM in offices, I've personally worked in places where the phone thing is standard fare, and I remember my father's office with all the secretaries and their typewriters -- not the nice "quiet" IBM Selectrics, either. Oh yeah, they had ASR-33 teletypes and a couple of IBM printers going all the time, printing orders and invoices. Sure it was a bit noisy at times, but I've seen worse.

--
-fb Everything not expressly forbidden is now mandatory.

Handspring has a third-party solution by invisik · 2002-11-22 12:16 · Score: 0

http://www.handspring.com/products/Product.jhtml?i d=240035&cat=170018

If you're doing straight dictation (speaking into a recorder and someone else transcribes it to paper/electronic) this will do ya. 8.5 hours of pure recording enjoyment. It's as portable as you want.

Now if you want to do Speech-to-Text in there to, the ViaVoice/Dragon solutions are more what you're looking for.

Good luck.

-m

--
http://www.invisik.com

I worked on this at MS by rufusdufus · 2002-11-22 13:16 · Score: 5, Interesting

I worked on dictation and dialogue on a PDA prototype at MS several years ago. It was called MiPad and was pretty cool. Well except that it really had to use a wireless network to a computer to get the recognition done.

There are a couple of reasons why this hasn't hit the market yet:
1) the PDAs really are not powerful enough to do decent recognition. Mainly, they don't have good enough audio input systems for reasonable speech quality. Also not enough disk space for dictionary storage. And the cpus are slow and the RAM is too low.

2) at least at MS it is not a top priority to make speech work for disabled users. Outrageous you say? Not so! Turns out when the speech guys approached the accessability guys on the subject, they learned that speech recognition is not workable in most cases where accessability is needed; that is to say, the market for disabled people who cannot use the keyboard but who CAN use speech input is actually quite small. Most people who don't have the motor function to type (or use some sort of keyed input like Stephen Hawking has) dont have the motor function to speak clearly enough for speech recognition to work. Bottom line: other solutions work better.

GUI & APPs eats CPUcyc, Use phonetic not engl by aaron_pet · 2002-11-22 13:39 · Score: 1

The user interface and background tasks eat CPU.

There is a symbol set for pronunciation... I don't need the stupid computer to figure out exactly what I'm saying.

If it checked for aspiration, frikatives (don't yet know what this means), poping noises etc, and recorded the sounds and the rate at witch they occured... You might be able to highly highly highly compress spoken language...

to maybe... 100words/min 300 sylables/min 900 phonyms/min 1800 Bytes/minutes The rate you speak would affect the bitrate...

It could be like midi for speaking.
you could record tonal stuff too...

And if storage space is not really much of a problem you could record the sound at a low bitrate, have it along with the sound markup, and compare the two later for extra reliability.

You could then have it process after the speaking is over, do some database searching, and CONTEXT remembering... and NO freaking MUDDLING of curse/taboo words... I'm sure that eats a lot of CPU time...

These stupid programs try to record directly to english... I want to see the phonetics, and manipulate the phonetic database by hand. I don't want to train a stupid program that doesn't let me undo my reading of the wrong line...

The training process could be made so much better by giving the user more control...

We should start an open database of phonetics and meanings... along with context information for use in these...
(It's in my plans)

More ideas: I have a computer program that lets me sing into it, and it tells me what note I'm singing.

--
Please use [ informative / summarizing ] SUBJECT LINES
Flame me here

No. by stienman · 2002-11-22 14:48 · Score: 2

Does anyone believe in keeping it simple, anymore?

Most people do, yes, but most companies believe in keeping it profitable. (at least for the few high executives, anyway)

-Adam

Re:No. by Anonymous Coward · 2002-11-22 23:27 · Score: 0

Linux price to performance ratio: Error: Divide by zero. Continue?(Y/N)

Yeah, no one looks after a Linux box do they?
And they don't cost more than some MSCE shit?

Linux price is not free, cheap and will get cheaper as more Linux shits come on the market, but free, Ha grow up dick head.

Lots being done, many done that by Revvy · 2002-11-22 15:00 · Score: 1

I have to ask why you're limiting yourself to PDAs because I'm surprised you haven't gotten a Sony Vaio, or a MicroPC, or one of the numerous HUD/voice combinations that exist. Heck, an iBook is pretty small, too, and you could easily hook up a small video screen to it and keep it in a backpack.

World Domination by Anonymous Coward · 2002-11-22 15:11 · Score: 0

I see that lots of americans can still make jokes about "World Dominstion" or "World Dictation" as in this case. Be happy, don't worry, there is no free lunch.

El Quakero

The New Sharp Zaurus Will Have this Feature by Anonymous Coward · 2002-11-22 15:18 · Score: 2, Insightful

The new model of the Sharp Zaurus will have a built in microphone and an application where you can dictate notes right into calendar.

Hmm... by entrylevel · 2002-11-22 15:25 · Score: 1

Why has IBM not yet taken a keen interest in *BSD?

--
Karma: Incomprehensible (Mostly affected by posting at +5, reading at -1, and metamoderating everything unfair.)

Re:Hmm... by earlytime · 2002-11-23 13:17 · Score: 2

> Why has IBM not yet taken a keen interest in *BSD?

because of the bsd license i presume.
seems to me that in the real world, the "pro-commercial" intents of the bsd license do more to discourage commercial use of bsd than the "anti-commercial" intents of the gpl seems to discourage commercial use of linux.

there are certainly a number of companies using bsd, and developing products based on bsd, but there are far more doing the same with linux. I suspect that in time they'll be a more even comparison between gnu(hurd) and bsd.

of course, there's a whole idealogical war over this issue.

--

Not enough CPU by bluGill · 2002-11-22 15:42 · Score: 3, Interesting

Sure, a 386 could do vioce recignition, but it required a special card that not only had higher quality sound inputs, but also had some DSPs to do the hard work. When IBM put voice recignition in OS/2 they warned you that a a 486 was not enough. (Several people tried it anyway, and it worked only within narrow limits)

To emulate a DSP required a lot of floating point math. Most PDAs do not have floating point in the CPU because nothing would use it. The few times it is needed emulation is easy enough, just very slow. No problem though because as I said floating point math isn't much used.

Don't forget that PDA cpus are not designed for speed above all else. They are designed for low power, which means they have to compromise something and require extra CPU cycles to get something done.

Finially don't forget power requirements. When doing normal use the CPU is shut down most of the time, and drawing essentially no power. Voice recignition would change that, and your battery life would suffer drasticly.

Why not one of the time laptops by Anonymous Coward · 2002-11-22 16:24 · Score: 0

Maybe the author should check out the Toshiba Libretto or Funjitsu Lifebooks

Just because you have the speed doesn't mean itcan by bsouthwick · 2002-11-22 19:36 · Score: 2, Insightful

Just because you have 300 mhz doesn't mean it can do the same as a notebook computer. The CPU on a notebook has additional instructions (floating point arthithmetec), and more importantly it has additional chipset that support the main processor. Most PDA's use Risc processors ie in Palms. The current algrithems use a lot of floating point instuctions, the RISC processors do not have floating point. Most computers have multimeadia chipsets that are in addition to the main processor that most of you are thinking of. You mention only a few companies that have voice reconigtion but there are many more but not on the market now. One of them to watch is Apple it has had voice reconignition but not doing alot of new products with it. Dragon for Newton is the one I still use that takes simple words and does instructions but the Newton is not sold any more. The Newton is I think 100mhz. The Zarus should be able to do it with Linux. I think we will just have to wait. Microsoft will probably not do it because it has not show to be a money maker.

How is this insightful? by Anonymous Coward · 2002-11-22 21:57 · Score: 0

Stephen Hawking doesn't use dictation. He has some kind of finger-controlled input device. His wheelchaircomputer can speak, but that takes practically zero resources - back in the day I could make my Amstrad CPC speak to me - but it doesn't do speech recognition. Totally different thing, and not particularly relevant.

The solution : Distributed Speech Recognition by perak · 2002-11-23 00:23 · Score: 1

PDAs can use state-of-the-art speech recognition services by using the Distributed Speech Recognition (DSR) schema. All you need is a connection to
a Speech Recognition Service!

Back in 2000 i worked on a special compression schema for DSR purposes. The results were really impressive! : Using only 2kbps data rate you have access to speech recognition from anywhere (PC, pda, smartphone) withought any penalty for speech accuracy!

I am currently making some tests with Zaurus too! (I had stopped working on that for about a year..)

Check : http://www.telecom.tuc.gr/~perak/speech

Monopoloy Power by bobarmstrong · 2002-11-23 01:29 · Score: 1

Dragon's Naturally Speaking was pretty good, VERY good if you really trained it. But not long after it came out, Microsoft announced it would have voice recognition software built into a soon to come version of Windows. And Dragon's stock price dropped like a rock. They were the leaders in voice for years.

Interesting advancements in software dried up with the increased power of Microsoft's monopoly. Don't expect much until their control is ended.

Get a wearable pc by Anonymous Coward · 2002-11-23 01:58 · Score: 0

Get a xybernaut or other wearable pc. Expensive, though.

Split the problem? by Bazzargh · 2002-11-23 04:26 · Score: 2

I'd like one of these devices - and I've often wondered if this strategy would be any good:

Instead of doing speech recognition to 'English', recognize speech to a textual representation of the sound (like the international phonetic alphabet - just an example). Transcription errors should be far fewer since you have a smaller set of patterns to recognize. The device should be capable of reading this version of the text back to you.

The memory required to store text is far smaller than that for sound, so I reckon even limited memory devices should be able to handle hours of dictation. When you return to base, a second program on your PC converts the phonetics to words, much like a spelling checker is used to correct transcription errors in OCR.

The philosophy is somewhat similar to that of Graffiti on the Palm - instead of trying to recognize handwriting, they changed the problem to recognizing something similar-but-simpler. I think people would get used to reading notes they've taken as phonetics (as they did with Graffiti), particularly if the PDA was also capable of reading back to remind them.

As for the command mode stuff - I'm in favour of using bushmen clicks for that ;)

-Baz

Googled yet? by Anonymous Coward · 2002-11-23 09:08 · Score: 0

Here's a bunch of links.

OT: Your Sig by Sri+Lumpa · 2002-11-23 09:46 · Score: 1

"Linux price to performance ratio: Error: Divide by zero. Continue?(Y/N)"

should be performance to price ratio, unless you mean linux performance=0 so price/performance=n/0.

--
"The obvious mathematical breakthrough would be development of an easy way to factor large prime numbers." Bill Gates,

Okay fine... by pyrote · 2002-11-23 19:20 · Score: 1

If the palm/CE devlopers refuse to make something more than ViaVoice command crap for the palm, why use a palm/ce?

True to life PC's and even linux hand-helds are smaller than ever... These new tablet PC's and a few small XP capable machines are out there.

not sure, but a while back they had an article on /. about Linux doing Voice recognition. the very nature of linux says that porting it to a handheld should be no chore.

If I'm wrong, let me know.

--
THE WORLD IS GOING TO END!!!! eventually.

Martco by Anonymous Coward · 2002-11-28 22:12 · Score: 0

Or my name (Martin) doesn't become "Mardin". There's a glottal stop there, fuckers. It is also not Maaaahahhhhhhhhttiiiiinnnn!

it becomes "Martco".

302 comments