Mac Version of NaturallySpeaking Launched

ai by User+956 · 2008-01-15 20:01 · Score: 2, Funny

MacSpeech, the leading supplier of speech recognition software for the Mac, has canned its long-running iListen product and has launched a Mac version of Dragon NaturallySpeaking

Tell me more about has launched a Mac version of Dragon NaturallySpeaking.

--
The theory of relativity doesn't work right in Arkansas.

Re:ai by arazor · 2008-01-15 20:46 · Score: 1

Teach me of fire mancub.
Re:ai by ozmanjusri · 2008-01-15 21:45 · Score: 0, Redundant

Dear aunt, let's set so double the killer delete select all.

--
"I've got more toys than Teruhisa Kitahara."
Re:ai by Anonymous Coward · 2008-01-16 02:41 · Score: 0

People spent an extra decade yelling at their PCs and the PCs pretended to listen.

Talking to oneself by flyingfsck · 2008-01-15 20:14 · Score: 4, Informative

I tried Dragon a number of times, but it feels too much like talking to oneself. Training it is a chore too. 99% accuracy after 5 minutes is probably true, but I type much better than that. I suppose it will be great for people who either can't type properly or are lysdexic.

--
Excuse me, but please get off my Pennisetum Clandestinum, eh!

Re:Talking to oneself by calculadoru · 2008-01-15 21:58 · Score: 0, Redundant

people who either can't type properly or are lysdexic

Which one are you then?

--
The power of accurate observation is commonly called cynicism by those who have not got it. -- G.B. Shaw
Re:Talking to oneself by Seumas · 2008-01-15 22:06 · Score: 2, Insightful

I tried Dragon years ago and after a couple hours or so of training, it still completely sucked. Same with IBM Via Voice. Perhaps Google will help improve things with their GOOG411 service that they're using to build up a massive bank of phonetics. Otherwise, it seems like real speech recognition is never seriously going to get off the ground.
Re:Talking to oneself by alex4u2nv · 2008-01-15 22:30 · Score: 2, Funny

Training is tough because they replaced the iListen package with iStoppedListening.

Also, its use may be weak in dictating a paper,but it's great for dictating a command.

Think about it, you could walk up to your iComputer and say "Main Screen Turn on!!"
instead of pressing the power button.

--
-Alex. http://bit.ly/1iVPtfA
Re:Talking to oneself by rolfwind · 2008-01-15 22:37 · Score: 2, Interesting

I used it too a number of times - I probably have an accuracy rate not much better than 99% typing - I'm a clutz. But whereas fixing in middle of typing is pretty smooth and not too time consuming - Dragon makes it a chore over every little mistake.

I won't recommend "Don't use it" because it's really a personal choice - some people love it and some hate it. But I have tried 3 versions so far (including the latest)and it wasn't so much a conscious decision to stop using it as much as I just eventually stopped bothering.

I could see using it to write-up letters which is a chore Dragon is very competent once trained (not necessarily faster or even as fast as typing though) but a task I seldom engage in for extended durations.

But part of the dream of Speech Recognition is telling the computer to do this and that -- even just a simplistic version of what is in some Sci-Fi like in Star Trek -- and the computer just knows what it needs to do and does it. I'm not even talking anything as complicated as AI, just something like "look up slashdot" and it fires up the browser and goes to the site. Or while using Dragon the command won't be "Set my dentist appointment for 4:00pm Wednesday" but more like (open calendar app with mouse, put mouse on correct textbox and click) "Dentist Appointment.... Tab..... tab.... numeral 7...." (bring mouse over AM/PM selector and select PM).

This isn't something that is Dragon's fault -- I think in many years programs and OSes as well will have a number of keywords that will control them built in (if I'm not mistaken Apple has a primitive version of this but the speech recognition is crap). Dragon has great accuracy but the program is hopeless in commands and context (yes, I know it can be trained -- like a dog; a lot of effort for a few piddly tasks) and I think that's a major aspect of what many people would secretly like when they try out the program.
Re:Talking to oneself by xtracto · 2008-01-15 22:45 · Score: 1

I tried Dragon a number of times, but it feels too much like talking to oneself. Training it is a chore too. 99% accuracy after 5 minutes is probably true, but I type much better than that. I suppose it will be great for people who either can't type properly or are lysdexic. 99% accuracy after 5 minutes is probably true, but I type much better than that. I suppose it will be great for people who either can't type properly or are lysdexic.

99% accuracy means that for every 100 words (a paragraph) you will have a wrong word. Now, that accuracy is in the "optimal conditions" and talking at a specific phase. The problem with the other 1% is that the wrong word might not be even related to the text (whereas when you are writting, the error is mostly in spelling).

Personally, last time I tried Dragon was about 4 years ago (installed it for my mom to test it) and it was terrible. I wonder if it wont be a good idea to design a special soundcard (not only the mic) to aid in the recognition?

--
Ubuntu is an African word meaning 'I can't configure Debian'
Re:Talking to oneself by CastrTroy · 2008-01-15 23:15 · Score: 1

I would recommmend "don't use it" in an office environment, or any other environment where people can hear you speaking. Nothing more annoying than listening to somebody else say "Dentist appointment.... Tab.... Tab.... numeral 7..." all day long.

--

Anthropic principle: We see the universe the way it is because if it were different we would not be here to see it.
Re:Talking to oneself by Anonymous Coward · 2008-01-15 23:16 · Score: 0

you found the joke!

have a cookie.
Re:Talking to oneself by rucs_hack · 2008-01-15 23:52 · Score: 4, Funny

I tried it a few years back. I stopped when my youngest, who was still learning to talk started going round the house saying 'mousegrid' all the time.

Good job he didn't get the whole thing though, which was typically.

"Mousegrid...."

"Mousegrid...."

"MOUSEGRID!!...."

"FUCKING MOUSEGRID YOU PIECE OF SHIT PROGRAM!!!"
Re:Talking to oneself by Anonymous Coward · 2008-01-15 23:53 · Score: 0

Training is not a chore at all on Dragon in fact on version 9 you can skip it entirely - not sure whether MacSpeech will implement that feature. But can you type at 160 words a minute - I doubt you would achieve a third of that! Thats as fast as you can talk and the Dragon Software does a good job of keeping up!
Re:Talking to oneself by kurt555gs · 2008-01-16 00:51 · Score: 0, Flamebait

I wonder if they thought of including an algorithm to deal with the lisp that is present in most male MAC users?

This would be different than the Windows version.

Cheers

--
* Carthago Delenda Est *
Re:Talking to oneself by Sox2 · 2008-01-16 00:53 · Score: 2, Funny

hey, i'm using it now. It wonks fine.
Re:Talking to oneself by duvel · 2008-01-16 01:06 · Score: 4, Interesting

I am entering this comment while using Dragon NaturallySpeaking version 8.
I am not a native English speaker, but I am usually able to say just about anything I want. In this comments, I have not altered any of the mistakes (if any) that Dragon NaturallySpeaking made while I was dictating. As you can see, the error rate is probably a bit higher than 99 per cent correctness. Nevertheless, I used this extensively, because it increases the speed at which I can work.I often have to type reports, and it goes a lot faster while using this tool. The only problem is that these reports contain lots of enterprise specific (and IT specific) terms. Naturally, it takes a while before Dragon NaturallySpeaking knows all of these terms.
Other than that, I am very happy with it.

--
I have a photographic memory for numbers. I know almost a hundred of them.
Re:Talking to oneself by Anonymous Coward · 2008-01-16 01:09 · Score: 0

I wonder if they thought of including an algorithm to deal with the lisp that is present in most male MAC users?
They have. That same algorithm also corrects distortion from their holding of the microphone too close to their mouth, and the echo when inside it.
Re:Talking to oneself by ubrgeek · 2008-01-16 01:57 · Score: 1

If I'm not mistaken Apple has a primitive version of this but the speech recognition is crap.

Actually, from my experience it's pretty good, at least for short expressions. I've got mine set-up to do things exactly like your Slashdot example. (I tell it "Browser slashdot" and it works great (I'm guessing because it knows "browser" means that I want the word right afterward to mean the phonetic term "Slashdot" that I've previously told it meant the Website, not "/ .") It's also useful for things like launching mail.app and checking email. With Applescript, it becomes even more useful (I can tell it to launch a pre-built "app" that can do just about any number of things using automater.) While it would obviously be trivial to have those apps on the dock so that I can click them to launch, this way I don't have to take up Dock space to do so.

--
Bark less. Wag more.
Re:Talking to oneself by Anonymous Coward · 2008-01-16 02:10 · Score: 0

About 9 months ago, I broke my elbow. fucking ouch..

anyway, as an experiment work gave me Dragon naturally speaking so I could complete a bunch of reports and documents on time. I could type and use the mouse, but not for the shear volume of work I had to complete, plus I was also enjoying some awesome pain killers for about a month there as well which, whilst enjoyable, didn't help in the speed and accuracy department.

After about an hour of learning it and it learning me, I could use it as fast as I could type.

You do get punished for making mistakes. Making corrections or minor edits becomes really tedious, and after a while you simply cannot be bothered 'training' it to avoid the same mistake the next time.

There is absolutely no way it can replace a mouse and keyboard for regular interaction with the computer, but for keying in loads of text it is actually really great and I can recommend it.
Re:Talking to oneself by autophile · 2008-01-16 02:18 · Score: 1

Out of curiosity, how long did it take you to dictate the comment?
--Rob

--
Towards the Singularity.
Re:Talking to oneself by Anonymous Coward · 2008-01-16 02:33 · Score: 0

This would be different than the Windows version.
Yeah, the Windows version would fuck YOU in the ass... and you'd like it, Kurt. Come out of the closet!
Re:Talking to oneself by samkass · 2008-01-16 02:36 · Score: 2, Informative

Yeah, Apple's speech recognizer has very dissimilar goals to Dragon's (although both, if I recall correctly, got their start at Carnegie Mellon's speech labs). Apple is trying to build a speaker-independent, no-training-required recognizer that can handle short commands. Dragon doesn't care as much about speaker-independent, but requires accuracy over sentences and paragraphs. Very different algorithmic, HCI and optimization problems.

--
E pluribus unum
Re:Talking to oneself by LMacG · 2008-01-16 02:56 · Score: 2, Insightful

> I tried Dragon years ago

Yeah, software never gets better or anything. And faster processors and more memory surely couldn't help.

--
Slightly disreputable, albeit gregarious
Re:Talking to oneself by commanderfoxtrot · 2008-01-16 04:07 · Score: 1

Yes, in an office environment it's a bit harder.

But if you have a private office/space it's great.

I use it frequently for long documents. I don't use the mousegrid or other tools - just for lots of text.

I wish there was a way to tell DNS 9 to ignore voice commands!

--
http://blog.grcm.net/
Re:Talking to oneself by ColdWetDog · 2008-01-16 04:23 · Score: 1

Interestingly, it sounds like you dictated it when I read your comment. And you "chopped" your diction when you did it. It's perfectly understandable (unlike half the comments here) but it is different.
I've found the same thing when trying various dictation programs over the years (mostly one version or another of Dragon). After a while, it works, but it just doesn't flow the same and it interrupts my train of thought (such as it is). It feels like what I have to do when talking to our outsourced-to-bog-knows-where transcription service. You have to speak slowly and distinctly and it screws up my thought patterns. I'd rather type so as not to kill my word flow. It generally needs little help in that regard. I may have to try this on my MBP just to say I did it. It's rather a Holy Grail of computerdom if you could get it to work.
Dear aunt, let's set so double the killer delete select all.

--
Faster! Faster! Faster would be better!
Re:Talking to oneself by Fluk3 · 2008-01-16 04:36 · Score: 0

Or you could go to CES saying "main screen turn off" over and over.

--
I've been upgraded to "bad"!
Re:Talking to oneself by Spleen · 2008-01-16 05:00 · Score: 1

I agree. A professor here teaches it on one of her classes. Everytime it's not working correctly, others in my cube farm have to listen to me work on it. "Delete that" "Scratch that" "Go to the beginning of the line". They've never used it, and they hate it. I'm not fond of it myself.
Re:Talking to oneself by gobbo · 2008-01-16 06:23 · Score: 1

[...]But part of the dream of Speech Recognition is telling the computer to do this and that -- even just a simplistic version of what is in some Sci-Fi like in Star Trek -- and the computer just knows what it needs to do and does it. I'm not even talking anything as complicated as AI, just something like "look up slashdot" and it fires up the browser and goes to the site. [...]
This isn't something that is Dragon's fault -- I think in many years programs and OSes as well will have a number of keywords that will control them built in (if I'm not mistaken Apple has a primitive version of this but the speech recognition is crap). [...]
I used to do this on a mac running OS8 in the '90s, using the built in commands and a system-wide macro utility called KeyQuencer (hey, that was a really great app). "Computer: check mail" etc. Not due to accessibility problems on my own, just geeking out. Once you extended it with scripting, it was pretty amazing for repetitive tasks, and I never had problems with recognition, once I got the hang of it.

My impression is that Apple hasn't developed it at all, and their SR technology is stuck in the '90s.

--
Damn those pesky terrorists
Re:Talking to oneself by netsharc · 2008-01-16 09:27 · Score: 1

Remember http://www.youtube.com/watch?v=KyLqUf4cdwc

--
What time is it/will be over there? Check with my iPhone app!
Re:Talking to oneself by Anonymous Coward · 2008-01-16 10:02 · Score: 1, Informative

The problem is that speech recognition won't work not only with IT specific terms, but with ALL named entities: family names, places, street names, etc...

Speech recognition on totally free speech is still a dream, and is completely useless because you can type faster than what you speak. It might only be useful for doctors who want to take notes without needing to carry a laptop with them. Microsoft which employed many of the top speech researcher knows that pretty well. That's why they are not so active on that any more. Same for IBM.

BUT Dedicated speech recognition (even in noisy environment) made tremendous progress, and don't need any training any more. It will be part of your mobile phones pretty soon along with speech synthesis: to help drivers (GPS instructions), order a pizza, dial numbers, route telephone call in hotlines, book tickets...
This was made possible because the task is much easier, since the context is narrow, and the recognizer only expect a few choices from the user.
Re:Talking to oneself by mbourgon · 2008-01-16 18:22 · Score: 1

Actually, I seem to remember that Dragon _won't_ get better over the years - the core software and algorithms haven't changed any, they just got bought and changed hands, and (again, IIRC) the people who bought it didn't know how it worked, just that it did, and that they owned the code to do so.

All I can find offhand is the wiki page, which says "Lernout & Hauspie bought Dragon Systems in 2000. The dictation system bubble burst in 2001, and Lernout & Hauspie had a spectacular bankruptcy. ScanSoft Inc. bought the rights for Dragon products. In 2005, ScanSoft bought Nuance Communications , and changed the name of the newly combined entity to Nuance."

--
"Sometimes a woman is a kind of religion, she can save your soul & set you free from all your sins" - Bad Examples
Re:Talking to oneself by Anonymous Coward · 2008-01-22 06:24 · Score: 0

> lysdexic

That's hilarious!!!

Isn't that... by Sylos · 2008-01-15 20:17 · Score: 2, Informative

the whole intention of Dragon? For those people who *are* impaired in some way or another? I mean...I could never "speak" out a paper or something. I'd end up tearing my vocal cords out.

--
'Number-memorizing Chinese people.'-Anon

Re:Isn't that... by cheater512 · 2008-01-15 21:07 · Score: 1

Yes its useful for those people.

Its also incredibly useful for people who cant shut up.
I know quite a few people like that. ;)
Re:Isn't that... by Propaganda13 · 2008-01-15 21:07 · Score: 3, Interesting

David Weber http://www.baen.com/author_catalog.asp?author=DWeber uses voice recognition software for writing novels.

David talking about it back in 2002.
"On a more technical from I began using voice-activated software when I broke my wrist very badly about two years ago. I've found that it tends to increase the rate at which I can write while I'm actually working, but that it's more fatigue-sensitive than a keyboard. You can push your fingers further than you can push your voice when fatigue begins to blur your pronunciation and confuse the voice recognition feature of your software.

I don't think it's had a major impact on my writing style, but it does affect how I compose sentences. What I mean by that is that because the software prefers complete phrases, in order to let it extrapolate from context when it's trying to decide what word to use for an ambiguous pronunciation, I have to decide how I want a sentence to be shaped before I begin talking to a much greater extent than I had to do before I began typing."
http://sfcrowsnest.co.uk/features/arc/2002/nz5718.php
Re:Isn't that... by coolGuyZak · 2008-01-16 00:56 · Score: 1

I'm one of those people, but I wouldn't use it to enter text into a computer. It seems better suited as a transcription device.
Re:Isn't that... by ari_j · 2008-01-16 02:49 · Score: 1

I see it as being targeted more for, and more useful to, people who normally dictate instead of typing: lawyers, doctors, etc. The problem is that part of the joy of dictation is that you don't have to do the formatting yourself. For instance, a lawyer can dictate a letter and let his secretary (whose time isn't billed out at hundreds of dollars an hour) type it, format it, and print it on the lawyer's letterhead. Speech recognition software can't do that, so its only advantage is the extent to which it is faster or more accurate than just typing. The number of people who type slowly or inaccurately enough to benefit from this software is going down every day, as the last generation to grow up without computers and typing classes in school nears retirement.

I for one... by tieTYT · 2008-01-15 20:21 · Score: 5, Funny

...welcome our new Dear aunt, Let's set so double the killer delete select all

Re:I for one... by Anonymous Coward · 2008-01-15 22:59 · Score: 0

I personally like the Perl scripting one: http://www.youtube.com/watch?v=KyLqUf4cdwc

As the Apple ads have demonstrated... by kcbanner · 2008-01-15 20:23 · Score: 1, Funny

...Mac users will have no trouble chatting with their computer for 5 minutes. Think of how accurate the system will be if the users got into a heated debate!

--
Obligatory blog plug: http://www.caseybanner.ca/

Apple version by Wiseman1024 · 2008-01-15 20:34 · Score: 2, Funny

Will it recognize metrosexual accents?

--
I was about to say 13256278887989457651018865901401704640, but it appears this number is private property.

Re:Apple version by bobdotorg · 2008-01-15 20:53 · Score: 5, Funny

Will it recognize metrosexual accents?

Yes, select the check box: preferences/language settings/accent/Fanboi/Apple

This is the Mac equivalent to your current setting:

options/language setings/accent/Troll/WindowsME

--
__ Someday, but not this morning, I'll finally learn to use the preview button.
Re:Apple version by Anonymous Coward · 2008-01-15 21:40 · Score: 0

Score 3, Informative? Change metrosexual to homosexual and watch yourself get modded troll/flamebait. What's the difference?
Re:Apple version by ozmanjusri · 2008-01-15 21:50 · Score: 2, Funny

What's the difference?
Drop the soap and you'll find out.

--
"I've got more toys than Teruhisa Kitahara."
Re:Apple version by _merlin · 2008-01-15 21:51 · Score: 5, Funny

Score 3, Informative? Change metrosexual to homosexual and watch yourself get modded troll/flamebait. What's the difference?

The difference is that while metrosexuals try hard to be gay, homosexuals succeed.
Re:Apple version by Wiseman1024 · 2008-01-15 23:37 · Score: 2, Funny

Lol, Apple iFanboys are wasting their mod points on this. Better keep them busy here rather than have them influence meaningful discussion.

--
I was about to say 13256278887989457651018865901401704640, but it appears this number is private property.
Re:Apple version by Anonymous Coward · 2008-01-16 00:00 · Score: 0

Precisely. Which is why it's PC to make fun of them. Me, I would've just said homosexual.

(I'm the AC you're replying to)
Re:Apple version by Anonymous Coward · 2008-01-16 00:06 · Score: 0

Score 3, Informative? Change metrosexual to homosexual and watch yourself get modded troll/flamebait. What's the difference?

The difference is that while metrosexuals try hard to be gay, homosexuals succeed.

Score 4, Insightful? The defining trait of homosexuality is same-sex mating.

Mocking incidental characteristics associated with certain Western homosexual lifestyles is hateful towards homosexuals whether the target of your mocking commits to intercourse with same-sex partners or not. You fear and loathe the traits of homosexuals, but political correctness limits the expression of your loathing to non-protected groups exhibiting them.
Re:Apple version by Anonymous Coward · 2008-01-16 00:20 · Score: 0

butt fucking another dude is not an incidental characteristic of the homo lifestyle.
Re:Apple version by Malevolent+Tester · 2008-01-16 01:28 · Score: 1

Yeth.

--
If you haven't made a developer cry, you've wasted a day.
Re:Apple version by The+One+and+Only · 2008-01-16 07:21 · Score: 2, Funny

I can't speak for everyone, but personally, I have no problem with fucking other men and have in fact done it myself. But those "incidental characteristics" (i.e. being a flamboyant little fairy) are fucking annoying, and plenty of gay men are nothing like that.

--
In Repressive Burma, it's not just your connection that dies. slashdot.org/comments.pl?sid=314547&cid=20819199
Re:Apple version by Anonymous Coward · 2008-01-16 09:59 · Score: 0

Yes, and Chris Rock hates "niggaz". As a dinnermasher yourself, you have the right to criticize other pillowbiters without violating the P.C. code.

Regardless of anyone's (supposed) hatred for prissiness, a heterosexual cannot bash an actually-gay person for acting like a fairy or it will be considered prejudicial.
Re:Apple version by The+One+and+Only · 2008-01-16 12:40 · Score: 1

Gays don't actually consider me one of them because I also fuck women. Nor do I consider myself one of them. So for all I know I'm not allowed either.

--
In Repressive Burma, it's not just your connection that dies. slashdot.org/comments.pl?sid=314547&cid=20819199

Posting to oneself by Anonymous Coward · 2008-01-15 20:35 · Score: 0

"I suppose it will be great for people who either can't type properly or are lysdexic."

Getting First Post! will be a lot easier.

Whatever became of this technology? by lhaeh · 2008-01-15 20:37 · Score: 5, Insightful

The last time I tried using voice dictation was When I was running OS/2 Warp 4. Training took forever, and the experience of using it was nothing but an exercise in frustration, ending with me screaming at the bloody thing then seeing neat, yet random expletives on my screen. I later came across some budget software that required no training, yet worked surprisingly well compared to the $400 packages made by the big boys. That software really showed what voice diction should be like, if only it was developed further.

The training an accuracy seem like things that can be overcome, but I would really like to see a solution for things like punctuation and function keys, things that don't naturally come with speaking. Instead of having to say "delete that" or " delete" it would be nice to just have a button that I can hold down when saying things I want interpreted as commands.

Re:Whatever became of this technology? by jimicus · 2008-01-15 21:14 · Score: 4, Insightful

A few things became of the technology:

1. 99% accuracy rate is actually pretty bad in the real world. In a typical document, you might expect 12-15 words per line - so you have one error every 7 lines or so.
2. 99% accuracy rate is only achievable under ideal circumstances - ie. using a top quality microphone hooked up to a good soundcard in an environment with very little background noise and no echo. Basically, circumstances you only get in a half-decent recording studio. In the real world, you seldom get this.
3. Unless you happen to be blessed with amazing self-discipline (and/or can guarantee that nobody is going to approach you while you're working). Otherwise you get back to work after a distraction and find yourself having to delete a conversation you just had with a co-worker.
4. If you're in an open-plan office (that's probably about 99% of UK offices these days) your colleagues will not thank you for spending all day talking.
Re:Whatever became of this technology? by forkazoo · 2008-01-15 22:29 · Score: 3, Interesting

The training an accuracy seem like things that can be overcome, but I would really like to see a solution for things like punctuation and function keys, things that don't naturally come with speaking. Instead of having to say "delete that" or " delete" it would be nice to just have a button that I can hold down when saying things I want interpreted as commands.

Yes, and to follow along the same line of thought, nobody has ever come out with anything like a speech recogniser designed for programming. Personally, I always figured that a good speech recognition system for both text and commands would need to make use of sounds that don't occur as text. So, you could do something like a special double-whistle to enter command mode, or honk like a goose for undo. Likewise, you could use gibberish words as commands instead of "delete that."

Obviously, it violates the principle that all computers you can talk to should work like Star Trek. But, it seems that just like a command line interface, a spoken interface could be fantastically useful if only somebody would decide that the operator will need some instruction in a few special arcane incantations.

Then, all we'll need is an extension to C so that function prototypes include a way to express the pronunciation of a function name, so a spoken interface IDE could use something like intellisense to parse the API I am using and away we go.
Re:Whatever became of this technology? by Anonymous Coward · 2008-01-15 23:07 · Score: 0

So, you could do something like a special double-whistle to enter command mode, or honk like a goose for undo.

It's bad enough in my open plan office with people's cheesy ringtones and 'loud Howard' style phone conversations, without adding a menagerie of poor animal impersonations. And as for the whistling - imagine the scope for sexual harassment lawsuits!
Re:Whatever became of this technology? by Ed+Avis · 2008-01-16 00:03 · Score: 1

Yes, and to follow along the same line of thought, nobody has ever come out with anything like a speech recogniser designed for programming.
Stay tuned for Perl 7.

--
-- Ed Avis ed@membled.com
Re:Whatever became of this technology? by dpbsmith · 2008-01-16 03:18 · Score: 1

99% accuracy rate is actually pretty bad in the real world. In a typical document, you might expect 12-15 words per line - so you have one error every 7 lines or so.

Right, and my brief experience trying out ViaVoice convinced me that even that observation underestimates the seriousness of the problem.

1) You don't necessarily notice the errors as you make them.

2) Correcting errors takes a surprisingly large amount of attention and labor, as well as being a distraction from the real task.

3) Correcting errors on the keyboard feels like task-switching and feels distracting and laborious. Correcting errors by issuing verbal commands feels more natural and less laborious... but not infrequently gets you into the situation where the speech recognizer misunderstands your commands, which rapidly snowballs into a real mess.

It all reminds me of a situation some twenty years ago. In the company I worked for, typically, non-native-English-speaking scientists would typed up their papers and hand them to an "editorial services" department, which began by rekeying them into a word processor. I was present when some sales representatives were making a pitch for OCR equipment to the department head. She listened politely until they said that the equipment was "99.5% accurate." She stopped them right there.

"If you can't guarantee 100% accuracy," she said, "I'm not interested. Almost all the labor in rekeying is incurred, not in the actual keying, but in the subsequent proofreading. A document that's almost error-free takes just about as long to proofread as one that has a normal number of errors. The only thing that would give us a significant labor savings would be a system so accurate that we could eliminate the proofreading step."

--
"How to Do Nothing," kids activities, back in print!
Re:Whatever became of this technology? by Bloodoflethe · 2008-01-16 03:44 · Score: 1

A few things became of the technology:

1. 99% accuracy rate is actually pretty bad in the real world. In a typical document, you might expect 12-15 words per line - so you have one error every 7 lines or so.
2. 99% accuracy rate is only achievable under ideal circumstances - ie. using a top quality microphone hooked up to a good soundcard in an environment with very little background noise and no echo. Basically, circumstances you only get in a half-decent recording studio. In the real world, you seldom get this.
3. Unless you happen to be blessed with amazing self-discipline (and/or can guarantee that nobody is going to approach you while you're working). Otherwise you get back to work after a distraction and find yourself having to delete a conversation you just had with a co-worker.
4. If you're in an open-plan office (that's probably about 99% of UK offices these days) your colleagues will not thank you for spending all day talking.
1. This is very true, but if you have a good enough grasp of the language and proper pronunciations, the product can achieve much better than 99% accuracy. that 99% is really based on you having an accent that will diverge from proper pronunciation. Another important thing is enunciation! Move those lips and tongues! (A nice side effect of this is that people will understand you more clearly than ever in your social and home lives.)
2. Ideal circumstances are not necessary if you have a good mic, and it doesn't have to be top of the line either. I have a $120 headset that has excellent noise cancellation (of course it was on special, but there is always a special somewhere). I did also buy a $50 Andrea Stereo USB adapter that does some really nice filtering and signal conversion, as well.
3. All you really have to do for this is have that nice noise cancellation mic and tell it to sleep and then launch into your convo.
4. I work in an open-plan office in the US. I translate text using this (English to Spanish) and no one in the office speaks a lick of it. It's funny though, it suppresses the idle chatter and more work gets done. Everyone seems to be able to tune me out fairly well, until I after I take calls. (*curses* Tachar las últimas diez líneas! *curses*) My coworkers all know what "tachar" means now ;D. I really have to remember to turn off the software before answer any tech support calls that come my way. I just get calls infrequently enough that I spaz out when my headset rings! (Yes, I wear that hat too.)

--
"Little is much when little you need."
Re:Whatever became of this technology? by jdieterman · 2008-01-16 05:01 · Score: 1

My mother and I bought a copy of this software for our pastor as he was not fond of typing. As I recall, this was back in the Windows 95 days. I still remember testing it out first since the pastor would inevitably have questions. I cannot remember a time in my life since then that I have laughed so hard. The words dragon interpreted that we were saying were hilarious. We uninstalled it and gave it the pastor under one condition: that my mother and I were allowed to be there when he configured it. Man, good times.
Re:Whatever became of this technology? by esj+at+harvee · 2008-01-16 06:23 · Score: 1

not exactly true. There's a very interesting project called voice coder developed at nrc-it in Canada. It translates limited English expressions into code. The reason that approach was chosen is because most software using the current style of bmpyNms is literally unpronounceable and would require spelling out letter by letter which does incredible damage to your voice as well as your temper. As I pointed out elsewhere, the complete and total lack of a backdoor API also makes it extremely difficult to use speech recognition for programming. The last thing needed is the ability for the IDE to export all symbols and their scope so that the speech recognition environment can create a very high accuracy and grammar using those symbols.

On the other hand, I have been writing Python using speech recognition for a few years and I completely violate all coding standards because I'm not going to burn my vocal tract to comply with some tab's idea of good looking code. I have been using my hands to navigate and I only use voice to create symbols and comments. to make the jump to the next level, we will need editors with the ability to navigate to various features of code such as arguments, methods, nested parentheses, etc. these features are useless for hand users but invaluable for voice users. But, as usual, the problem is that the handicapped need tabs to write the code for them in the initial bootstrap phase.
Re:Whatever became of this technology? by torokun · 2008-01-16 08:02 · Score: 2, Interesting

Although your comments about open offices may be true (it may be a problem with colleagues), I offer my thoughts:

I was a software developer and now an IP lawyer doing patent law stuff. I quickly discovered that dictating vastly increased my productivity. Most people in software have no idea what a boon to productivity this could be, or they'd be dictating specs and pseudocode and notes all the time. I actually think that software developers should seriously think about dictating pseudocode and handing it off to newbies for implementation details. Obviously, it's more directly applicable to the types of work a lawyer does though.

In any case, because the turn-around for transcription in our firm can be a half-day to a day, I got this software to try out. It is actually amazingly good. You can tweak the settings for special spellings or acronyms, and can train special words for odd names, etc. When I don't have time to have our word processing department transcribe something, I use this, and the accuracy is very very good.

One thing most people who don't usually do dictation may not realize is that you don't get the efficiency boost unless you really just look away from the screen and dictate a good chunk, then go back for editing when done. The best is to dictate an entire document without worrying about any corrections, then come back and review it the next day for errors. With Dragon though, it's probably better to do a few paragraphs, then go back and check. If you constantly let minor corrections interrupt you, you don't get the benefit of the increased speed.
Re:Whatever became of this technology? by Matt+Perry · 2008-01-16 09:19 · Score: 2, Interesting

I've been using the Naturally Speaking 9 Medical for the last eight months. I bought it to reduce the amount of typing I have to do for lengthy papers and documentation because of RSI injuries. I have a few responses based on my own experience.

1. 99% accuracy rate is actually pretty bad in the real world. In a typical document, you might expect 12-15 words per line - so you have one error every 7 lines or so.
I make more mistakes than that just from typing. Of course, I catch and correct them faster when using the keyboard than I do when dictating. How many times do you have to use the backspace key every seven lines or so?

Part of reducing mistakes is learning that dictating clearly is a different skill than typing. Just because you can type well doesn't mean that you can speak and articulate your words clearly. Dictating to a computer has more in common with giving a presentation. If you litter your speech with "um," "ahh," and "ya know," then the program will dutifully represent that. Garbage in, garbage out. What's helped me is that I have a lot of experience with public speaking and narration. I've also produced a lot of training videos for companies that I've worked for which involves recording voice overs or presenting to the camera. So I'm comfortable "talking to myself" and learning to prepare what I want to say before I begin my delivery. These are useful skills that anyone can learn.

One of the first things I did when I got the program was try to read some of the documents that I had previously produced. There were some words that it wasn't recognizing correctly, and I later realized that these words were also in my custom dictionary in Word. You can train the software on individual words so I opened up my custom dictionary and taught it all of the words in there.

When dictating I don't worry too much about the mistakes because the dictation is just to get a first draft into the computer. Once I'm done, I proofread the document and use the keyboard to make corrections. Every now and then it'll hose some word, but if it's a word that I know that it knows, I'll just say the word "correction" and repeat the word clearly so that I know to fix that when editing the document. If it's a word that it just keeps getting stuck on I can select it and train it on the spot, or just type the correct word and then keep dictating. I usually take the latter approach so I don't get too distracted from dictating. But, this is a rare occurrence. As you keep tweaking its recognition, it gets better.

2. 99% accuracy rate is only achievable under ideal circumstances - ie. using a top quality microphone hooked up to a good soundcard in an environment with very little background noise and no echo. Basically, circumstances you only get in a half-decent recording studio. In the real world, you seldom get this.
Just such a microphone headset comes with the program when you buy it. It works well since it's a unidirectional mic and needs to be close to the sound source to pick up sound. I've used it in an environment with noise, including at work with other people around and at home with the TV on, and I haven't had any problem with it recognizing what I was saying.

3. Unless you happen to be blessed with amazing self-discipline (and/or can guarantee that nobody is going to approach you while you're working). Otherwise you get back to work after a distraction and find yourself having to delete a conversation you just had with a co-worker.
Or you just learn to say "microphone off" and it turns off the recognition engine. It can tell if you are saying it as a command or if it's part of a sentence that you are dictating and do the right thing. The program can recognize a bunch of different commands and apply them depending upon which program you are using. I must admit that I don't really use this feature. Browsing the

--
Slashdot: Failed Car Analogies. Amateur Lawyering. Anecdote Battles.

Already been done.. by Anonymous Coward · 2008-01-15 20:37 · Score: 3, Funny

"Computer... computer... hello computer?"

Grate product by Library+Spoff · 2008-01-15 20:39 · Score: 4, Funny

Am oosing it two type this comment. Didn't knead the fave mins train ming though...

--
Acid House saves Souls

Re:Grate product by Anonymous Coward · 2008-01-15 20:58 · Score: 0

Eye donut lycen ewe. Huma stubby real eSofa King Wee Todd Ted.
Re:Grate product by Anonymous Coward · 2008-01-16 04:39 · Score: 0

Wow, that software really socks!

But does it run on linux? by js_sebastian · 2008-01-15 20:43 · Score: 1

i know the answer. No it doesn't.

I own a copy of dragon 9 but having to reboot into windows to use it makes it too much of a hassle. Wine doesn't seem to handle it either.

It actually works quite well, although mileage may vary depending on the sound quality you get from your microphone, soundcard setup.

Re:But does it run on linux? by markdavis · 2008-01-16 00:28 · Score: 1

Probably not.

But I, personally, know several people that would buy a Linux version of Natural Speaking... including myself.

Perhaps the Mac version would be easier to port? Don't know. Best thing to do is send them Email saying you would pay for a Linux version. I did: questions@macspeech.com

Minion, do my bidding! by Anonymous Coward · 2008-01-15 20:46 · Score: 5, Interesting

I'll have to play with Dragon at some point; I just haven't gotten around to it yet. Aside from accuracy errors, the primary issue that bothers me about speech recognition solutions I've tried is the general lack of being able to recognize speech that seems natural to humans but isn't what the system is expecting as input.

This is especially true with over-the-telephone solutions. For example, I am with Rogers Wireless carrier here in Canada, and their automated customer service system prompts you for your phone number. My last 4 digits are 2125, and it is very natural to say "twenty-one, twenty-five" when giving the number to a human being. The speech system, unfortunately, is only sophisticated enough to understand one-digit-at-a-time mode, so you have to suffer through saying "two one two five". Which isn't truly a big deal, but it's frustrating having to learn each system's unique quirks and limits. I suppose the same can be said of any technology.

Oral dictation (as opposed to fixation) is frustrating at best. Punctuation is a critical item that I can't stand dealing with. Trying to get the goddamn software to insert commas and semi-colons can be difficult enough, let alone wanting to actually insert the word "comma" into a paragraph. Then there's trying to spell out acronyms (aka "aka"), or inserting the contents between and including those parentheses. Until dictation of a document can be done with truly minimal correction and post-editing, and can be spoken at a very comfortable pace, I will stick to a keyboard.

Of course, the most entertaining aspect of watching someone else play with speech recognition is the inevitable habit of sounding completely unnatural while speaking. The monotone voice and sounding like a robot are bad enough, let alone those who think that shouting or talking ree... aaa... llll... lllyy... sloowwwww.... llly is going to help. The funniest I've seen was a woman who seemed to think that talking in cutesy baby-talk would win the system over to her side. :)

I just want a system that responds to commands via a programmable keyword. Only when speech recognition is Star Treky enough to respond to its name will I be happy. My computer will be named Minion.

Minion, inform the family I love them.
Minion, crawl the web for the highest quality, free pr0n you can find
Minion, order me my favourite pizza. Oh, and hack a credit card number from the net to pay for it.
Minion, tell some slashdoters off for me. Make sure it's worthy of +5 funny.

Re:Minion, do my bidding! by LordLucless · 2008-01-15 21:01 · Score: 2, Insightful

I used Dragon Naturally Speaking for a while ages ago, and you could program it to respond to its name. Or rather, you setup a "start" sound that would indicate activate the listening algorithm. I had mine set to respond to "computer", but "minion" would work just as well.

I stopped using it after I accidentally left it on in training mode one day, when I was teaching it the word "bonza". The pet lorikeet outside my room made such a wide variety of noises, that from that time forth, it thought every word I said was bonza, and I couldn't be bothered retraining it - training time was more than 5 minutes back then.

I was using it more for commands than for dictation, and it was good at that, but there was one major drawback, and that was background noise - especially loud background noise emitted by the computer itself. One of the things I wanted to do was to get the computer to start and stop playing music on command. Unfortunately, once the music was playing, you had to really yell for the computer to differentiate the command from the music.

--
Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
Re:Minion, do my bidding! by andrewjhall · 2008-01-15 23:03 · Score: 2, Funny

I think I'd name mine Igor. Then, assuming I can find the right USB widgets, I can shout "Igor! Raise the lightning rod and find me a fresh brain" - at which point my life's final ambition will have been achieved.

That said, the USB iBrainExtractor is probably as much of a technical challenge as producing speech recognition that isn't a pain in the ass.
Re:Minion, do my bidding! by Narcogen · 2008-01-15 23:43 · Score: 2, Informative

MacOS has had a built-in feature called Speakable Items that does exactly this, and as an option you can have it respond only to things said after a specific key word-- in essence, the machine's name. "Minion" would work fine.

It is not true dictation. Essentially you create a script and give it a name. When your speech is recognized as the name of a corresponding script, the script is executed.

You can even make scripts that required multiple inputs. Some of the built-in ones in the Mac OS 9 days were knock knock jokes.
Re:Minion, do my bidding! by mwvdlee · 2008-01-16 00:16 · Score: 1

"twenty-one, twenty-five" = 201205.
Why do you expect a computer to get this right when humans don't?

--
Slashdot social media options: AIM, ICQ, Yahoo, Jabber and Mobile Text. Why no MySpace?
Re:Minion, do my bidding! by LordLucless · 2008-01-16 00:41 · Score: 1

That's essentially all software dictation is - it recognizes the pattern of your speech, and executes the corresponding instruction (prints the correct word). The thing that really defines quality software is the accuracy of its comparison algorithm, and the speed of its learning algorithm. But essentially they do the same as you describe, just with a much larger search space.

--
Just because you're paranoid doesn't mean there isn't an invisible demon about to eat your face
Re:Minion, do my bidding! by Anonymous Coward · 2008-01-16 02:33 · Score: 0

http://docs.info.apple.com/article.html?path=Mac/10.4/en/mh696.html

"Use the Speech Recognition pane of Speech preferences to turn speech recognition on, _set up how to signal your computer_ that you're speaking a command, create commands for applications, and open the Speakable Items folder."

OSX supports named Star Trek commands, built-in.

Posting from a Dragon Naturally Speaking Mac by Anonymous Coward · 2008-01-15 20:47 · Score: 5, Funny

iIt iworks iso iwonderfully iand iintegrates iwell iinto ithe iother iiproducts.

Re:Posting from a Dragon Naturally Speaking Mac by m85476585 · 2008-01-16 10:40 · Score: 1

But does it fit on the iRack?

Not to knock it.. by rastoboy29 · 2008-01-15 21:10 · Score: 1

But 99 out of 100 words correct still makes for a pretty lousy experience if you're trying to do anything serious. Personally, I think so much when I'm writing that typing is quite fast enough. Of course, I know not everyone is so fortunate.

--
expandfairuse.org

Re:Fanboys are getting awfully silent as of late.. by Anonymous Coward · 2008-01-15 21:15 · Score: 0

Yawn. We had it about a decade ago.

according to who? by dwater · 2008-01-15 21:16 · Score: 1

> The new product is said to reach 99% accuracy after 5 minutes of training.

According to MacSpeech, I suppose?

I'll bet what was said was something 99% different to what MacSpeech thought.

--
Max.

At Last! by Slurpee · 2008-01-15 21:18 · Score: 5, Interesting

I was at the Apple Dev conference in 1999 (or so) when the CEO of Dragon got up during Steve's keynote and announced that they were going to develop a Mac version of Dragon.

Almost 10 years later - and it's finally here!

Or at least a follow up announcement is here.

Re:At Last! by fortunato · 2008-01-15 22:50 · Score: 1

All I know is that if this means that my wife will be able to get to the right department when calling the insurance company to make a doctor appointment for our kids I'll be a happy camper. ;) I would forgo the cursing, redialing, and angry expletives that are required right now in order to make a simple pre-note that we are taking the kids in for their required annual physical.
Re:At Last! by evil_aar0n · 2008-01-16 05:08 · Score: 1

Maybe there's still hope for Duke Nukem Forever. Or G&R's Chinese Democracy.

--
Truth, Justice. Or the American Way.
Re:At Last! by Chaset · 2008-01-16 05:28 · Score: 2, Informative

Actually, almost 10 years ago, there WAS Dragon Naturally Speaking for Mac. I bought it, and its upgrade when it came out. (Unless my brain is totally whacked and it was some other voice recognition package for Mac) It came with a headset in the box, too. I'm sure that version is what that rep was talking about. It's funny... all these comments, and I didn't notice any high-scoring comments pointing out that there already WAS a voice recognition package for Mac years ago.

--
-- "This world is a comedy to those who think, a tragedy to those who feel."
Re:At Last! by Arcane_Rhino · 2008-01-16 07:07 · Score: 1

... the cursing, redialing, and angry expletives that are required ...
They aren't required. They are all part of the service that insurance companies are happy to provide.
Always remember, YOU are a valued... [click]
Re:At Last! by Slurpee · 2008-01-16 09:33 · Score: 2, Informative

What you are thinking of is "Dragon Power Secretary" which was available for early Macs in the early 90s - but dropped (way before OS X). The WWDC announcement came when OS X was also being announced in 1999. The announced product at WWDC never came out.

I was able to find this press release:

WWDC--SAN JOSE, Calif. and NEWTON, Mass.--(BUSINESS WIRE)--May 10, 1999--

Photo will be available at 2:30 pm EST on Associated Press via Business Wire

Dragon Systems, Inc. and Apple(R) Computer, Inc. today announced that Dragon Systems will create and market Macintosh-compatible products based on Dragon NaturallySpeaking, the top selling retail speech product in the U.S.(a) Dragon Systems Chairman, CEO, and Co-Founder Janet Baker, Ph.D. announced the company's plans during the keynote presentation at Apple's annual World Wide Developer's Conference (WWDC) in San Jose.

"It's great news for our customers that Dragon is bringing their world-class speech recognition software out on Macintosh," said Steve Jobs, Apple's interim CEO. "The underlying architecture of the Mac platform, with fast PowerPC processors and outstanding audio support, will make Macintosh the premier platform for Dragon NaturallySpeaking. Dragon's return to the Mac market is more evidence of the great business opportunities available on Macintosh for innovative developers."

"We have received many requests for a Macintosh version of Dragon Naturally Speaking and working with Apple we're going to deliver a high quality speech solution for Macintosh users," said Dragon Systems Chairman, CEO, and Co-Founder Janet Baker, Ph.D. "Over the last year we have seen Apple bring out some very innovative products and we think Dragon Systems will offer the ideal speech recognition solution for anyone who wants to extend the capabilities of their iMac, Power Macintosh G3, or PowerBook G3."

Dragon Systems' products for the Macintosh are planned initially for both American and British English, with the first U.S. product to be released later this year. French, German, and Japanese are also scheduled. Pricing, system requirements, and product specifications will be announced at product introduction.

Dragon Systems has a long history of supporting products for the Macintosh platform. Previously, Dragon Systems offered Dragon PowerSecretary(tm), a discrete recognition dictation system for the Macintosh.

I just saw these guys at macworld by Capt'n+Hector · 2008-01-15 21:19 · Score: 2, Interesting

I was a bit put off by their pricing scheme. It's $50 off the normal price (something like $200) if you buy it at macworld. The only problem is that it's a pre-order, so you can't try before you buy. Also, nobody has reviewed the software, since it doesn't exist yet, so if it turns out to be a stinker you're out $150. And if you don't like the product, their tech support will try and "walk you through" your problem to make it go away. They explicitly said "no refunds". No, thanks.

--
Quid festinatio swallonis est aetherfuga inonusti?
Africus aut Europaeus?

Re:I just saw these guys at macworld by CronoCloud · 2008-01-16 08:44 · Score: 1

My mother can't type (rheumatoid arthritis) so I bought DNR for her. She couldn't get past the first training sentence, it simply would not recognize her voice, but it worked perfectly well for me. So we called up ScanSoft and tried everything they suggested, and went back to them: No refunds.
Re:I just saw these guys at macworld by bill_mcgonigle · 2008-01-17 21:29 · Score: 1

So we called up ScanSoft and tried everything they suggested, and went back to them: No refunds.

Thanks for the warning - I won't even bother trying then.

You might see what your state has to say about defective products, merchantability and such. Your State's AG might have some info. Maybe you can just ask Scansoft what your State's AG would have to say on the matter.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)

Practical speech recognition, "House, lights on" by the+grace+of+R'hllor · 2008-01-15 21:23 · Score: 1

So I've always wanted to rig my house up with voice commands. My guess is I need the following:

*Simple* speech recognition. I want it to react to a keyword ("Computer", or "House", or similar sci-fi-ey) and then a few simple commands. Sphinx-2 seems ideal, but I'd need good dictionary files.
Ubiquitous microphones (preferably exclusively usable by the speech recognition engine. Setting proper /dev permissions will help). Probably the most difficult/expensive to get right; it needs to work in noisy environments.
Machine controllable electronics, sufficiently protected so that . Where those 433MHz remote switches come in I guess. Needs to be code protected, for obvious reasons.
Scripts to tie all this together.

Has anyone done this properly/successfully/usefully?

Use with caution by bozho · 2008-01-15 21:29 · Score: 0, Offtopic

http://ars.userfriendly.org/cartoons/?id=20010322/

Re:Practical speech recognition, "House, lights on by Anonymous Coward · 2008-01-15 21:46 · Score: 0

Surely you mean...

'Illuminate'
'Deluminate'

Ugh... why is MacSpeech doing this? by Anonymous Coward · 2008-01-15 21:53 · Score: 0

MacSpeech is scum that has been selling absolute shit for years.

Their iListen product was absolute unusable garbage, but that didn't stop them from marketing it as the Mac's equivalent of Dragon. Complete with "30 day money back guarantee" that meant you could get your money back only if you tried it full time for 30 days, and you convinced them that you had tried it full time for 30 days, and you had bought the special (shitty) microphone the software "requires" (I think they actually just had a marketting agreement with the manufacturer, because this TELEX headset was very low quality AND expensive), and they decided to give you a return authorization number, and the moon was in the house of Jupiter, etc.

Basically, nobody got their money back. Because they're liars and thieves who are used to selling garbage.

Truly unfortunate that this slime got the contract.

Re:Practical speech recognition, "House, lights on by amRadioHed · 2008-01-15 21:55 · Score: 2, Interesting

Back in the late 90's using only Applescript and the Apple built in speech recognition I was able to voice automate my music library. I don't remember all the details, but I could start and stop the music and select what artist I wanted to hear. It was pretty neat being able to say "Computer, play Nirvana" and getting my music all from the comfort of my bed.

--
We hope your rules and wisdom choke you / Now we are one in everlasting peace

Are cops using this now in Jail? by Anonymous Coward · 2008-01-15 22:01 · Score: 0

Because I helped a illiterate person write a letter in jail and spelled everything out for him verbally... then I thought it was like he led me right through a training program for a speech recognition program.

?

MST3K by Ethanol-fueled · 2008-01-15 22:13 · Score: 1

The writers must have been using that software when they wrote this song!

Re:But does it run on linux? = WMware by jackjeff · 2008-01-15 22:24 · Score: 1

Have you heard of VMWare ?

Re:Practical speech recognition, "House, lights on by forkazoo · 2008-01-15 22:46 · Score: 1

*Simple* speech recognition. I want it to react to a keyword ("Computer", or "House", or similar sci-fi-ey) and then a few simple commands. Sphinx-2 seems ideal, but I'd need good dictionary files.

Be careful what you use as the trigger, or else you won't be able to use the words "House" or "Computer" in any conversation while at home without the house thinking you are trying to command it, and starting the dishwasher or something. I suppose you could always name your house something sci-fi-ish, or fantasy-ish that would never come up in conversations, like "Malthikar." For extra points, establish some sort of visual avatar piped to your TV or something so you can see him while you talk to him.

As for implementation, Mac OS X comes with some sample code for Dictionary based untrained speech recognition. Should do exactly what you want. Since you can give a list of all possible words (the various valid commands) it works better then free-form recognition for general text input. And, you don't have to train it, so anybody who knows the right things to say could work your house. That just leaves having your app do the commands once they are recognized. I'm completely unfamiliar with that end of things, but I know there are home automation doodads which presumably shouldn't be that hard to access from a program.

Dragon is a NIGHTMARE. by Caspian · 2008-01-15 22:50 · Score: 3, Informative

I've worked with Nuance's server product in the Dragon NaturallySpeaking line as a developer. Their API is confusing, their speech recognition SUCKS, and their software bugs out in bizarre ways. It's also slow as a dog, and advanced functionality (like recognizing from wav files, as opposed to from a live audio stream) is so poorly implemented as to seem bolted on.

And the worst part? Nuance has a virtual monopoly in realistically priced (read: "in a budget that a normal small-to-medium-sized business can afford") general-purpose speech recognition systems. If I recall correctly, they bought out Lernout and Hauspie's speech recognition products and IBM's old consumer-level speech-recognition stuff. So you can't take your business elsewhere; there is no "elsewhere".

I loathe those guys.

--
With spending like this, exactly what are "conservatives" conserving?

Re:Dragon is a NIGHTMARE. by ChrisA90278 · 2008-01-16 06:15 · Score: 1

"Nuance has a virtual monopoly in realistically priced (read: "in a budget that a normal small-to-medium-sized business can afford") general-purpose speech recognition systems."

Not really. The best software as usual is free and Open Source. The trouble is that (1) It lacks a marketing budget so few people know about it and (2) it is software "by Phds and for Phds" meaning that it is not packaged with a slick installer and GUI.

The Mac and PC software we are talking about here mgame out of CMU decades ago and the basic science was funded mostly by DARPA. Well DARPA is still funding CMU and you can see their latest up to the minute work here
http://www.speech.cs.cmu.edu/ but like I said above the software's target user base has an advanced degree in computer science.

So Nuance does not have a monopoly on SR. They only have the monopoly on packaging and marking SR. Ste of the art SR part is available to anyone for free.
Re:Dragon is a NIGHTMARE. by Caspian · 2008-01-16 10:09 · Score: 1

I looked into that stuff. It was so un-user-friendly as to be worthless. I can manage sendmail... this stuff is harder to use and more opaque than sendmail.

--
With spending like this, exactly what are "conservatives" conserving?
Re:Dragon is a NIGHTMARE. by Charbox · 2008-01-16 19:48 · Score: 1

CMU Sphinx is NOT a dictation system. It comes with no language models, you have to build your own from scratch. It is good for command and control ("lower temperature by five degrees") but not much else.

I've used the software by TheVelvetFlamebait · 2008-01-15 23:29 · Score: 1

I had about a 98% accuracy rating with the included microphone and no sound card.

--
You know, there is a difference between trolling and pointing out the flaws in your reasoning. Just saying.

Re:I've used the software by jimicus · 2008-01-16 00:46 · Score: 1

That extra 1% is the part that's difficult to get.

If you'd said "I got 98.995% accuracy with the included microphone", I'd be more interested.

Tea. Earl Grey. Hot. by Anonymous Coward · 2008-01-15 23:46 · Score: 0

should cover most needs ;-)

Accessibility by Selanit · 2008-01-15 23:53 · Score: 3, Insightful

Five minutes training for most people, but not everyone. My boss uses Dragon NaturallySpeaking, and it took him nearly two weeks to complete the five-minute training due to some complications.

Namely, he's blind. He cannot read the training phrases off the screen, because he can't see them. Instead he had to have a screen reader (JAWS in this case) read the phrases aloud to him so that he can repeat them back. But of course, Dragon was not expecting to hear audio input from anything other than the user, so that confused things. There were problems even using a headset. And since he can't actually use the program at all without having the screen reader running, it was pretty awful trying to get the training done. I'm not even sure how he finally managed to do it - I suspect he probably got a sighted friend to help. Thankfully the training files can be copied from one computer to another so you don't need to retrain it on each different installation.

Once the training was finally finished, it worked well. He has poor fine motor control as a result of leukemia treatments - he can type, but only slowly and with a high error rate. His speech is slightly slurred as well, which reduces the accuracy of the transcription. Even so, the Dragon transcriptions are definitely better than manual typing. It's helped him a lot.

I just wish that the Dragon programmers would come up with a more easily accessible training routine. There aren't a whole lot of users with the same disabilities as my boss, but for the few like him having good, well-trained dictation software is vital. With it, he can control his computer reasonably well, if rather more slowly than a sighted person with normal motor control. Without it, using the computer is basically impractical. When he can't use Dragon, sending a single rather short email can take upwards of an hour.

Re:Accessibility by jfim · 2008-01-16 01:42 · Score: 1

Have you tried the speech recognition in Windows Vista? I haven't tried it with the screenreader at the same time, but it seemed to work semi-decently and I'm curious as to how people with actual disabilities think of it. I was quite surprised by how well it seemed to be integrated with the OS-bundled applications.

--
Jean-Francois Im's blog
Re:Accessibility by Bloodoflethe · 2008-01-16 04:13 · Score: 1

Y'know, I was thinking: "What about all the blind people," when I was training the software to recognize my voice (DNS9) and, I wondered why they wouldn't allow the program read you the phrase you are to use as you speak. Headsets would make this a snap, as there would be no noise from audio output.

--
"Little is much when little you need."

Compare like with like! by Anonymous Coward · 2008-01-16 00:01 · Score: 0

A lot of the comments here say something along the lines of "I tried it years ago and it was rubbish" - yeah well things have moved on!

It's like saying all cars still have wood framing and carburettors like they used to be - a contemporary vehicle is steel and has fuel injection - oh and an iPod dock...

Dragon 9 whilst still not perfect is really very good - and MacSpeech will build their product on that technology.

Oh and by the way the Google speech recognition is by Nuance too and ViaVoice whilst distributed by Nuance is an old seperate IBM product....

When the software's history involves jail terms... by Futurepower(R) · 2008-01-16 00:12 · Score: 5, Interesting

This software's history includes jail terms. Speech recognition has gotten an extremely bad reputation for being worthless garbage, maybe because it is worthless garbage.

Even a 0.5 percent recognition failure rate is enough to make speech recognition software worse than worthless. The reason is that speech recognition software never makes a spelling mistake. Instead, the mistakes are often extremely difficult to recognize, and sometimes change the meaning in subtle ways. That's partly because when the software is confused it tries to select something that is grammatically plausible.

The result is that it has become difficult to sell speech recognition software. A high enough percentage of people in the U.S. culture know that it isn't actually useful. The orginal owners of Dragon NaturallySpeaking sold the product to a company that sold it to the company that became Nuance, maybe because they felt the product was damaging the credibility of their trademarks.

Here is a quote from the ComputerWorld story linked in the earlier Slashdot story, Is Speech Recognition Finally 'Good Enough'?:

"In 1993 two executives from Kurzweill Applied Intelligence (which pioneered SR for the medical market) went to prison for faking sales. That firm was sold in 1997 to a Belgium SR firm, Lernout and Hauspie (L&H), which was reporting phenomenal sales growth at the time. Dragon Systems, which originated DNS that year, was reporting only anemic growth, and L&H had no trouble acquiring Dragon Systems in early 2000 in a stock deal. Within a year a series of accounting frauds came to light and L&H collapsed into bankruptcy. Its SR technology was sold in late 2001 to ScanSoft Inc., which kept the DNS line going. (It was then at Version 6.0.) ScanSoft later acquired Nuance and adopted its name.

"Thereafter, "It was with the launch of Version 8.0 (in November 2004) that the market became reinvigorated and took off," said Chris Strammiello, director of product management at Nuance. "We crossed an invisible line with Version 8.0, where the software actually delivered on its promises and offered real utility for the users. Sales have been growing at a rate of 30% yearly since then, except that we expect it to do better than 30% this year."

Read that again: "... the software actually delivered on its promises and offered real utility..." I called Nuance and was told that version 8 did not have a new recognition engine, but only had improvements in the user interface. A friend who owns and tested version 8 told me he could see no difference in accuracy between that and version 7.

So, in my opinion, Nuance has done common deceitful things that are called "Marketing":

1) Bring out new versions. Previously, when there has been a "new version" of Dragon NaturallySpeaking, I call Nuance technical support and ask if there is a new recognition engine. I didn't call for version 9, but for the last two versions they have said no. So, nothing is changed; the software is still worse than useless to me, in spite of the fact that they advertise that the software is now more accurate.

How is it possible that the software is more accurate, if the recognition engine did not change? Maybe it isn't true. Or maybe the company improved the guesses the software makes when the software really has no clue what the user said. As I mentioned, those guesses have become so sophisticated that you can become confused about what you actually said, and you have to spend time re-creating your ideas. If you are saying simple things about a simple subject, this is not as much of problem as when you are writing about contract negotiations, for example.

In the words of a Slashdot reader: "The opinions expressed here may be those of my speech recognition so

noocular by Hognoxious · 2008-01-16 00:24 · Score: 2, Funny

I don't think dictation's the solution. If you're discelyc what you really need is a spielchucker.

And what about about people who speak dyslexically? Yes, Dubya, as it happens I am looking at you.

--
Confucius say, "Find worm in apple - bad. Find half a worm - worse."

Re:noocular by CSMatt · 2008-01-16 01:03 · Score: 1

That depends on how good your spelling skills are. You still need to spell the word well enough that the spell checker can guess what word you want to use. I'm a horrible speller, and I know that I've encountered a number of times where I had to keep guessing at a word's spelling just to get the spell checker to recognize what it should be.
Re:noocular by Anonymous Coward · 2008-01-18 02:45 · Score: 0

Dude, if you're that bad you aren't dyslexic - you're illiterate.

But will it run on Linux.? by tiluki · 2008-01-16 00:27 · Score: 1

Seriously though, is it just me or is speech recognition support still sadly lacking under all current distros?

Based on the fact there are no leading edge projects out there. I mean, apart from IBM's ViaVoice a few years back (and now no more), and the CMU Sphinx project http://cmusphinx.sourceforge.net/html/cmusphinx.php is there any other Linux/FOSS solution?

Re:But does it run on linux? = WMware by markdavis · 2008-01-16 00:34 · Score: 1

In his case, that might be OK.

But for the rest of us- we choose to use Linux because we want to use Linux. For most Linux users, it doesn't make much sense to buy and install MS-Windows and Dragon to use in the free/open Virtualbox or the proprietary/closed VMware. With such a model, you cannot use the speech recognition in the Linux applications.

Can it write software? by tgd · 2008-01-16 00:53 · Score: 1

Understanding 99% of what I say correctly after 5 minutes is a lot better than the developers do...

First announced eleven years ago by Anonymous Coward · 2008-01-16 00:53 · Score: 1, Funny

This was first announced eleven years ago. It's about time. Maybe Pogue will stop using Windows now?

I acutally use it... by lanzek · 2008-01-16 01:19 · Score: 1

And it's amazing. I find that it's much more natural and fluid for language to go from thought to speech than from thought to typing. Also, the accuracy is better than typing, (including spelling) and it comes with a headset that is more than adequate. Give it a try folks, and forget about carpal tunnel forever...

Re:I acutally use it... by jfim · 2008-01-16 01:36 · Score: 1

Give it a try folks, and forget about carpal tunnel forever...
Beware of straining your voice though. Also, while speech recognition is actually pretty good for natural text, it is pretty awful for programming due to the tediousness of entering punctuation and variable names, which aren't dictionary words most of the time.

--
Jean-Francois Im's blog

It's about accessibility... by Tibor+the+Hun · 2008-01-16 01:27 · Score: 2, Insightful

This is fantastic news for those who need extra accessibility features.
It may be fine for you or me to hit any key, but there are many other folks with various disabilities for whom such a task is not an easy one. So it may make more sense for them to use their voice and move on.

If any of us were to lose fingers or hands in an accident, I bet we'd all be using something like Dragon to continue our work, rather than try to become a tap dancer.

And let's not forget about accessibility in the workplace. This is great news for Mac shops, as now there is one less reason for having to support a rogue Windows machine...

--
If you don't know what AltaVista is (was), get off my lawn.

Re:It's about accessibility... by timftbf · 2008-01-16 01:58 · Score: 1

Thank you. It winds me up seeing the product getting a slamming because it's "only" 99% accurate, or because "it sucks - so much better to type". While they might be marketing it at people who are too lazy to type, or who think it's cool to talk to their computer, it's an absolute boon for people who really *can't* type.

My wife has been through bouts of severe RSI, and while a lot of the time she can now manage with a specialist keyboard, Dragon kept her able to work and to communicate through a long bad stretch after the initial onset, and is an ongoing help. 5% time typing to go back and make manual corrections is still 95% less trying to painfully use the keyboard.

Now I have a real chance of getting her off the PC and onto a Mac - no more Windows support for me! :)

Urgh!! Wrong PLATFORM!!!! by wonkavader · 2008-01-16 01:34 · Score: 3, Interesting

It's fine to port this to the Mac. Fine. Good. Whoopie.

But they are so DROPPING THE BALL. They have the best voice-rec platform. (You can think it's not good enough, but it's still the best.) What they need is to port it to Linux. Duh! Wake UP!

No, I'm not just saying the usual "Does it run on Linux?" bit. Linux is the now (and coming even more) obvious OS for small devices. When you want to talk to ANY device in your home or car, or your cell phone or PDA, you'll be talking to LINUX. THAT'S where we need a great voice-rec system. We need it ported to Linux and opened for an API. This will catapult this annoying desktop app into a present on almost everything type software device in a matter of a couple of years -- as low power devices provide enough umph to do what the heavy machines of a few years ago do.

Re:Urgh!! Wrong PLATFORM!!!! by gstoddart · 2008-01-16 05:20 · Score: 1

No, I'm not just saying the usual "Does it run on Linux?" bit. Linux is the now (and coming even more) obvious OS for small devices. When you want to talk to ANY device in your home or car, or your cell phone or PDA, you'll be talking to LINUX. THAT'S where we need a great voice-rec system. We need it ported to Linux and opened for an API.

But, really, what is the incentive for a commercial organization to do this? They're not gonna get paid for it. So, other than some altruistic reason so that everyone can get access to it, why would they? What do they get out of it?

They bought the property from a now defunct company. Clearly, with the intent of making money off it. They're not motivated by social justice and "oooh, wouldn't it be cool if we had a universal, open voice recognition".

Make it profitable for them, and you might see this. In general, I think what you're hoping for is just wishful thinking.

Cheers

--
Lost at C:>. Found at C.
Re:Urgh!! Wrong PLATFORM!!!! by AnyoneEB · 2008-01-16 08:27 · Score: 1

He did not ask for a free edition for Linux. In fact, the applications he suggested were for various embedded systems which run Linux which most users would not be modifying. I am sure the hardware developers can handle licensing fees for their own devices if they think voice recognition is worth the cost.

In short: "For Linux" does not mean "For free".

--
Centralization breaks the internet.

It's a good thing, too. by benmhall · 2008-01-16 01:38 · Score: 3, Informative

My wife needed voice dictation software a year or two ago. She had been a Linux user. I gave her my PowerBook and bought iListen for her. It was terrible. And it was a resource hog. It used the Philips engine and, even with extensive training, was the pits. We even tried several high-quality mics to no avail.

She went from my G4/1.5GHz/1.25GB RAM PowerBook running iListen to Dragon NaturallySpeaking 8 on an IBM ThinkPad T23. (P3 1GHz, 768MB RAM, WinXP.) The difference was night and day. Not only did Dragon run much faster on the lowly P3, but the quality of speech recognition was _much_ better. As a result of this, she's now back to being a Windows user with Dragon.

At least it looks like our iListen purchase won't be a complete waste, as we can use it to upgrade to NaturallySpeaking for Mac. I'm glad that MacSpeech has killed iListen. It needed it. It was an embarrassment compared to Dragon.

Speech recognition has been a big hole in the Mac's software line-up. It looks like that is finally coming to an end. Now if only someone would release something that works for Linux.* I know that we'd have paid $200 for something approaching Dragon 8's capabilities.

----
*Yes, I know about IBM ViaVoice. Good luck getting that to work on any recent distribution. I also know about Sphinx. Unfortunately, it seems to be a perpetual research tool rather than an end-user program.

Correction: Dragon develops for Mac Again by JoeCommodore · 2008-01-16 02:10 · Score: 1

Dragon had a Mac product once before - Dragon Power Secretary. It was tied to specific apps. Didn't get much updating or new versions after the initial release and died an agonizing death.

--
"Enjoy what you're doing! If it becomes drudgery, you're doing it wrong!" - Jim Butterfield

How about a Linux version? by Michael+Ross · 2008-01-16 02:14 · Score: 1

Here's hoping they support Linux next.

Haven't tried the recognition, but... by Pedrito · 2008-01-16 02:34 · Score: 1

I have used their speech synthesis products and they're quite impressive. I used one of the voices to dictate a textbook into an MP3 file so that I could then do a book-on-tape type thing to play my textbook in my car. The pronunciation was generally pretty good. I had to define the pronunciation of a few words here and there (it had problems with some of the less common geek words, like "macromolecular"). But after giving it the proper pronunciations, it was quite excellent. The voice sounded natural a good portion of the time.

Shame... by Anonymous Coward · 2008-01-16 02:54 · Score: 1, Funny

Eye lie kit mice elf.

99% accurate by Random+BedHead+Ed · 2008-01-16 02:55 · Score: 1

... has launched a Mac version of Dragon NaturallySpeaking ... The new product is said to reach 99% accuracy after 5 minutes of training.

About that 99% ... in honor of his Steveness, who tested the software while writing a recent keynote, one out of every 100 words is "Boom." But in true Mac fashion the options panels are severely minimalist, so the Boom feature cannot be disabled.

Re:99% accurate by russotto · 2008-01-16 05:17 · Score: 1

But in true Mac fashion the options panels are severely minimalist, so the Boom feature cannot be disabled.

You can turn it off with this command:

defaults write com.nuance.dragon boomtoday false

But you have to run it every day, as there's ALWAYS "boom" tomorrow.

Re:Practical speech recognition, "House, lights on by Anonymous Coward · 2008-01-16 02:58 · Score: 0

I am looking for exactly this, except for Windows XP. I don't need speech recognition for anything but playing music and it seems like this should be completely possible.

Do you happen to know of any (preferably free) software or plugins for existing media players that will do this? I would like the basic functions such as play songname, play artist, play genre, pause, next and previous.

It's about time! by a_chameleon · 2008-01-16 02:59 · Score: 1

Macs have been lagging behind for years now. With the advent of Windows Vista, complete with a 16kHz recognizer and "easy-to-use" integration into practically every section of the OS, MAC had no choice when it comes to "Keeping up with the Jones" ;-)

--
Your Audio Content, Live Streams, and more... To every phone: via Shout-Outs, or On-Demand www.PhonePortals.com -

And the 1%? by Cathoderoytube · 2008-01-16 03:13 · Score: 1

It should be noted that the 1% inaccuracy includes uncommon words that people never use like 'damn' 'this' 'stupid' 'software' 'that's' 'not' 'what' 'I' 'said' 'delete' 'no' 'don't' 'write' 'that'

--
I have nothing compelling to say

scotty picks up mouse, talks into it;-) by airdrummer · 2008-01-16 03:27 · Score: 0

the built-in s.r. works great for this...lifehacker.com has a good how2 on creating app.specific voice cmds.

the other issue is microphones: only built-in or hardwired work...i tried a bluetooth headset, and while i could successfully pair & tx/rx, the s.r. setup wouldn't calibrate:-(min.level was 1/2 scale)-:

then i read that b.t. is too noisy...

tabs just don't understand by esj+at+harvee · 2008-01-16 03:32 · Score: 2, Insightful

Reading the comments I'm see a bunch of tabs[1] with no clue about being disabled, the speech recognition market, the history of the product, and how nuance is probably hampered by the management attitude towards money and the history of the code base.

for someone who's been disabled (temporarily or permanently) speech recognition means the difference between making a living and being able to support oneself, a mortgage, family etc. and sitting around on your ass in section 8 housing on Social Security disability. Pain from RSI once made it extremely difficult to feed myself. When you've experienced that level of pain, disability and the associated despair, you get the attitude that anything that gives a disabled person independence and an ability to make a living should be encouraged with all possible resources.

Listening to someone dictating using speech recognition will drive you mad. You would have the same problem with a blind person listening to text-to-speech. But that's not the fault of speech recognition or text-to-speech. That's the fault of management not providing the disabled person with an acoustically isolated environment (i.e. reasonable accommodat.

Desktop speech recognition is a monopoly because it's extremely expensive and difficult to develop speech recognition and there is not a large market. the market consists of lawyers, doctors, and the disabled. There is not enough money to support two companies (or more) to develop desktop speech recognition applications.

NaturallySpeaking is very buggy. There are bugs that cause people problems that were first seen in NaturallySpeaking 5. These are not hidden or hard-to-find bugs. They don't affect nuances ability to sell NaturallySpeaking. There's no reason for them to fix them except for the fact that they interfere with the use of many programs by the disabled. If you are just doing dictation into Microsoft Word or DragonPad, you'll never notice. If you try to dictate into Thunderbird, Firefox, Open office,... you're screwed. For example, I cannot dictate directly into Firefox for this comment, I need to use a workaround for dictation and then paste the result into the text box. The reason why this problem exists is because nuance management has the reputation of not making any change or feature unless you can make a business case and show them they will get revenue from that change. This is not such a bad model because it can keep nuance profitable and product available to people who truly need it (i.e. the disabled). The downside is that it doesn't leave room for changes necessary for the disabled.

I've heard from people working inside dragon that part of the problem also is the code base. It was written by a bunch of Ph.D.'s who are really really good at speech recognition but are not so good at writing code. Also in the last few years, there has the huge turnover and people working on the code as NaturallySpeaking was sold first to L&H and then to nuance. That kind of change alone will wreak havoc on the code base as knowledge is lost and never really acquired by the new people. by the way, I have talked with some people from nuance, and they are basically good people. They understand the needs of the handicapped but they are constrained in what they can do for us because of budget and resources.

When people talk about alternatives with open source speech recognition, only a tab would think they would work for the disabled. Their recognition speed is significantly slower, vocabulary size is smaller, and they are really more projects to keep grad students busy than be anything useful in the real world.

The last problem with speech recognition sits in your lap if you are a manager of a software product or a developer. As far as I can tell, the number of applications that are speech recognition friendly is vanishingly small. It seems to me that software developers go out of their way to make software handicap hostile. It starts with the multiplatform GUI toolkits that do not

Re:tabs just don't understand by gstoddart · 2008-01-16 05:35 · Score: 1

Listening to someone dictating using speech recognition will drive you mad. You would have the same problem with a blind person listening to text-to-speech. But that's not the fault of speech recognition or text-to-speech. That's the fault of management not providing the disabled person with an acoustically isolated environment (i.e. reasonable accommodat.

I've only ever known a single blind programmer.

He had a text-to-speech program running. He had the damned thing set to be so fast (he had astounding hearing and processed it very fast) and set the volume so low, it was a little wee low warble you barely noticed. From 5 feet away, it was gone in the background noise.

People in his immediate area didn't seem to care, and he was really good at what he did. Nobody was asking that he be put into some acoustically isolated environment. Hell, the people I want isolated are the people using speaker phones -- there's always one clown who figures that the 20 or so people nearby should listen to his con-call so he can fiddle away with something. I'll at least cut the blind guy some slack on their need to have the additional noise.

On a completely unrelated note, I once watched no less than five non-blind people watching the blind guy assemble a ping-pong table because they, collectively, couldn't figure out how to put it together. He was one of the most amazing people I've ever known when it came to taking something apart and putting it back together.

Cheers

--
Lost at C:>. Found at C.
Re:tabs just don't understand by TechnicolourSquirrel · 2008-01-16 09:04 · Score: 1

[quote]Your design decisions could leave one to think that you are deliberately trying to keep the disabled out of the workforce and dependent on charity.[/quote] What would be their motivation for doing this? Maybe I'm just a know-nothing 'tab', but I think your disability has twisted your mind a little bit.
Re:tabs just don't understand by esj+at+harvee · 2008-01-16 14:10 · Score: 1

What would be their motivation for doing this?

I think a significant hint lies in your use of the word "motivation". In order to be motivated to do something, you have to have conscious thought and a sense that there is something wrong happening. When it comes to accessibility and software development, accessibility issues doesn't even enter the mind of most software developers and if it does, it's overruled by most managers as irrelevant to the majority of the customer base. interesting calculation would be to see if it's cheaper to warehouse disabled people than it would be to make the world accessible to them. from people's actions, I would say that their intuitive sense is that it would be cheaper to warehouse. I'm not saying that's what they think but that's what they demonstrate in their actions.

Maybe I'm just a know-nothing 'tab', but I think your disability has twisted your mind a little bit.

the fact that you raise this question at all shows that you are more aware than 99.9% of all software developers. as for twisted mind, please consider that I'm posting on Slashdot. A little more seriously however try living in a world where applications can degrade the accuracy of your input device. For example, Thunderbird, Firefox, open office, chatzilla, aim, emacs, ulipad, jedit, nvu and others I can't remember right now all cause serious degradation in recognition accuracy. Other applications massively fail with minimal correction techniques (i.e. natural text). For example, tools like pyscripter which uses Smart IDE type technology, fail because if you try to correct a misrecognition, the wrong text is selected, deleted, and overwritten with the correction. then you have applications like VM Ware workstation which doesn't accept any textual input from NaturallySpeaking which means in order to gain access, ssh in, use X11 forwarding to bring up a window to display on your XP X11 server. Then, maybe then you can dictate into an application. But even if you get the application to accept input from speech recognition, you have the problem that the built-in macros for cutting and pasting follow Windows conventions and can't be changed. As you probably can guess, any application without Windows style cut and paste doesn't work very well. A major shortcoming is you can't use the NaturallySpeaking Select-and-Say feature which is an absolute godsend for hands-free editing. Without it, you burn a significant amount of hand time just getting the cursor to the right place and changing the right text. smart completion text boxes are another barrier. For example, in Firefox the search engine bar will drop down when the focus is in that window. If the drop-down is present, you can't dictate into that window. Many JavaScript editors will destroy your text if you try to correct a misrecognition. If focus shifts in the middle of a recognition output, it's effectively like typing random keys on the keyboard and commanding your application to do God knows what. I'm always forgetting recognition is turned on when I'm in Thunderbird. And I'm always losing messages because of many characters shoved into Thunderbird. I could go on with more examples but I hope you get the idea of what you would bump into within the first day of use.

You do not need to take my word on all the failings I've described above. You can verify them for yourself. Pick up a copy of NaturallySpeaking, purchase a real microphone instead of the piece of crap in the box, install, train, use after throwing away your keyboard. Then you'll get a very clear example of just how inaccessible your working set of applications are. Then you get to make a choice about whether you are going to do something to make systems more accessible to speech recognition users or not.

Summary is wrong by Anonymous Coward · 2008-01-16 03:40 · Score: 0

The top selling speech recognition software is the one that comes bundled with every copy of Windows Vista.

Frankly, I'm thrilled by jslarve · 2008-01-16 04:03 · Score: 1

Now Mac users can enjoy the merciless flow of spam that PC users enjoy. "LAST CHANCE!! Nuance blah blah blah". AAAUGH!

Re:When the software's history involves jail terms by Fear+the+Clam · 2008-01-16 04:19 · Score: 1

This software's history includes jail terms.

Citation and relevance, please.

Dragon Naturally Speaking by christurkel · 2008-01-16 04:23 · Score: 1

In my job, I teach clients how to use this software everyday.
The 99% accuracy is after the initial training. Then comes the tutorials which further enhances recognition and use, which makes even more accurate. Dragon is invaluable for those would cannot use a computer any other way.
Accuracy does increase with time and use.

--

CDE open sourced! https://sourceforge.net/projects/cdesktopenv/

Good News! by Efialtis · 2008-01-16 04:34 · Score: 1

SO, now that Dragon has been ported to Mac, it is only a little more time till it is ported to Linux.
I have searched for a good bit of software in Linux to do voice recognition, but nothing is a) easy to use, and b) (in my opinion) not ready for market. This has left me with using Microsquat's OS and Dragon. IBM also has ViaVoice, but it isn't for Linux yet, either...
Still waiting

--
--E--

Could be great, this is what David Pogue uses by pbooktebo · 2008-01-16 04:40 · Score: 1

I do know that David Pogue uses DNS for all of his writing (NYTimes, books, etc.). He writes about Dragon often, and how he previously used to carry a Windows laptop essentially just for writing.

Many Mac folks, myself included, have installed windows via Fusion or Parallels so that we can run DNS alongside OS X. I have got it working reasonably well, and have been doing all my writing and email via speech for about 6 months. There are still some frustrations, but in general it works great and I'm happy to have it.

The big question I have is whether this version will be better than using DNS via a virtual Windows machine. Unless the implementation is horrible, I'd expect so. Well, reasonably I would expect it to be crappy until a few versions in, but I'll be watching closely for reviews.

Assistive Technology by Anonymous Coward · 2008-01-16 04:45 · Score: 0

I work as an Assistive Technology Trainer, working with HE clients who have a variety of disabilities, from severe physical difficulties to learning difficulties such as Dyslexia.

I can understand the negative comments about voice recognition. Generally, they are based on poor experiences with Dragon (or god forbid ViaVoice, or iListen) as it was a few years ago. It is only since version 7 that recognition accuracy has increased to a useable level.

At present, we use DNS9, which, let me emphasise now, is BLOODY EXCELLENT.

Within 10 minutes of training, which involves reading a segment from a text, accuracy is between 85-95%. Further refinement comes when using the program after the training. This is a simple process, where the user can choose alternates to misrecognised phrases/words ("ice cream" for "i scream") for example, or misrecognised words can be selected and spelt by speaking the letters.

Once you have used the program for a week, it is 99% perfect.

Speech recognition depends entirely on the QUALITY of the headset you use, the POSITIONING of said headset, and the AVAILABLE MEMORY of the machine. In my experience of use since version 7, it these 3 factors alone that cause untold difficulties. Dragon used to come with the crappiest headset known to man, which was supplied with the product, in the box. Standard analogue jacks, and rubbish microphone. What you need is a USB Plantronics headset, postioned correctly, and at least 2GB of RAM. Recognition will be perfect, i promise you.

My clients find Dragon enormously useful. For example, a student I was training was severely disabled, wheelchair bound, and had only the power of speech. With Dragon Naturally Speaking, she could COMPLETELY control her PC, from dictating documents/email, using the internet, opening folders etc. Everything a user using a mouse can do. Dragon allowed her to open a window on the world.

In addition, Dragon is one of the most intuitive, and straightforward pieces of software ive ever used. It does exactly what it says on the tin, and within 10 minutes you are up and going and it works perfectly. Unless you have VISTA of course, but thats another story!

Cheers

Tim Symons AKA Anonymous Coward

radiology by tinku99 · 2008-01-16 05:07 · Score: 1

I use voice recognition in dictating radiology reports all the time. It actually works pretty well. It is faster than typing, particularly with voice macros. Maybe because we have a limited highly specialized vocabulary...

Re:When the software's history involves jail terms by Anonymous Coward · 2008-01-16 05:33 · Score: 0

Citation and relevance, please.

If you're going to throw out this tired old "i can't be bothered to google" shit, at least you can be bothered to read past the first sentence. Like to the part just after that where he links to the computer world story and quotes the part about the executives going to prison.

Relevance? He explained how after Nuance took over, they were selling the exact same recognition engine with a prettier UI each new version and claiming that each new version had a better recognition engine, giving anecdotal evidence that claims that two versions produced the same results. Is it different with the recent versions? Who knows, as they say in the stock market, past performance does not indicate future returns.

I have to admit though, that it would have been much more interesting if he'd found a case of someone going to prison because the software mangled their words into something else.

(Voice recognition + shell)alias = happy by ffflala · 2008-01-16 05:35 · Score: 1

Focusing on using speech recognition software for dictation has always seemed to me an aggravatingly limiting use for such technology. I want to use voice recognition at the command line, not as a substitute for typing a document! In that setting, 99% accuracy will be sufficient.

Even the voice recognition that was included with MS Office was accurate enough to recognize the far smaller dictionary of command terms I would need to use voice recognition as an OS interface, with accurate letter recognition and auto-complete features for navigating the file structure.

Piping terms into the shell may now be a possibility.

I've used speech recognition for commands... by argent · 2008-01-16 05:43 · Score: 1

My experience with using speech recognition for commands is, well, not good. You really need *better* accuracy than you do for transcription. But if that's what you want, it'd be trivial to implement. You could possibly do it with Applescript and the existing tools.

Personally, I want it for transcription of recorded speech, not real-time transcription.

Re:I've used speech recognition for commands... by ffflala · 2008-01-16 06:08 · Score: 1

Can you give more detail on your experience using speech recognition for commands: what app & what problems?
Re:I've used speech recognition for commands... by argent · 2008-01-16 08:38 · Score: 1

Dragon Naturally Speaking on the iPaq Pocket PC.

The only command I could get it to reliably recognize was "Power Off".

I imagine it would work better for people from New York or San Jose, but my accent defeated it.

Re:When the software's history involves jail terms by not_anne · 2008-01-16 06:31 · Score: 1

Sometimes software is just a useful solution to a need.

When I was in college, I had to have unscheduled surgery on my right elbow the week before the final paper was due for my Anthropology class. The outline for the paper was complete; I just needed to type it out. Being in a cast after surgery made typing impossible but with one hand. I purchased and used DNS to finish my paper. The software was easy to use and errors were minimal, even for an Anthro paper with lots of jargon.

I got an A on the paper.

--
My comments here are my own; I do not speak for my employer.

While it isn't perfect I would argue that ... by rjwilmsi · 2008-01-16 06:36 · Score: 1

I actually bought Dragon naturally speaking 9 about a week ago, and have been using it in the evenings at home since. Of course I was sceptical about how accurate the software would be but I thought it was worth a gamble for the £40 asking price for the Standard Edition. I have since dictated only around 3000 words using the software and already I can dictate faster than I can type (50 words per minute typing) and the accuracy seems to be around 97% on sample articles. What also encourages me is that the accuracy has already increased and seems to continue to be increasing, so on that basis and having only dictated a short document's worth of words, I expect this software be very useful to me. I don't have any disabilities, but thought dictating would be more comfortable than typing for extended periods. However I can already imagine that for people with disabilities, RSI or sight problems this software would be extremely useful to them. While it isn't perfect I would argue that it is of production quality.

Re:chatting with computer... joke? by Psykechan · 2008-01-16 07:18 · Score: 1

If you have a Mac with a microphone (which should just about cover every consumer Mac made within the last decade), turn on speech recognition and say the words "Tell me a joke".

Have fun.

Re:When the software's history involves jail terms by Anonymous Coward · 2008-01-16 08:23 · Score: 0

Accuracy may improve without a new engine in various ways. The software may have a better dictionary, the engine may have been trained on more data, and the program may include better noise filters.

Also, speaking like a robot will actually hurt the performance. Software like this has been trained on real-world audio data. In fact, speaking overly clear is a mistake people often make when speaking to speech recognition software. The software will handle casual, daily speech better than overly formal or "robot" accents.

Pissed Off by grolaw · 2008-01-16 11:20 · Score: 1

Let's see,

I've tried them all - and spent dozens of hours and hundreds of dollars for "custom dictionaries" - as a lawyer I dictate in a language that is almost, but not quite totally, unlike English.

I've had excellent "luck" with iListen and almost bought their "last" upgrade a few weeks ago. I decided not to because I couldn't get the helpdesk to tell me if my custom dictionaries and profile would transfer to the latest release.

I know why they didn't answer - and I'm not alone - the support mailing list posted this response today:

I can give you some basic information right now. MacSpeech Dictate is
a brand-new product written from the ground up around the same engine
that is in Dragon NaturallySpeaking. The new product is Intel only.
There was just no way for us to bring out a product based on the
Dragon engine and make it anything but Intel only. Besides being an
Intel only product you must be running MacOS 10.4.11 or higher. In
other words the last version of Tiger or a version of Leopard. Minimum
processor speed and memory requirements have yet to be determined as
we are still in development.

--
Technical Support Manager
MacSpeech, Inc.

Check out our online Helpdesk at:
http://www.macspeech.com/support/

Guess who is throwing in the towel? I've had it with the whole lot. Wake me in 10 years when somebody actually has a product that works. (And, don't blather at me - these bozos still don't know the final memory / processor configuration their new program will run on!) I suppose a Mac Pro 8 Core running at 3.2 gig and having 32 gig of RAM might suffice for that 99% accuracy - BUT that Mac Pro with a display and standard hard drive / standard graphics card + warranty would only cost 14,766.95

Oh well, MacSpeech by walter_f · 2008-01-16 12:02 · Score: 1

... the guys who've promised german language support for iListen nearly ten years ago.
They just didn't deliver (as far as I know, as I quit paying attention to them some years ago).

A usable solution in the field of speech recognition would still be a very important feature (something like a "killer feature") for any desktop OS, be it Mac OS, Windows, or Linux.

Re:When the software's history involves jail terms by Ferante125 · 2008-01-16 12:57 · Score: 1

It is indeed funny how people talk when they use ASR. I work on speech recognition in grad school and we do usability experiments and people need to be explicitly reminded not to talk like a robot. the performance is best when talking normally, but because people attribute human-like qualities to the recogniser, they talk to it like it was a robot child that didn't understand, which makes the performance a lot worse. But it should be stressed that it isn't the fault of the recognition that people talk like robots, it's just what people do naturally when they think there's a lack of understanding. --why procrastinate today when you can do it tomorrow?

Re:When the software's history involves jail terms by Anonymous Coward · 2008-01-16 13:27 · Score: 0

1) Bring out new versions. Previously, when there has been a "new version" of Dragon NaturallySpeaking, I call Nuance technical support and ask if there is a new recognition engine. I didn't call for version 9, but for the last two versions they have said no. So, nothing is changed; the software is still worse than useless to me, in spite of the fact that they advertise that the software is now more accurate.

How is it possible that the software is more accurate, if the recognition engine did not change? Maybe it isn't true. Or maybe the company improved the guesses the software makes when the software really has no clue what the user said. As I mentioned, those guesses have become so sophisticated that you can become confused about what you actually said, and you have to spend time re-creating your ideas. If you are saying simple things about a simple subject, this is not as much of problem as when you are writing about contract negotiations, for example. I'm in the biz, and I'm not about to start defending Nuance, but this comment is a bit unfair. The underlying recognition engine doesn't have to change for accuracy to increase. The way that these systems are trained is to take a large amount of speech data and build models of the way that people speak. These models are plugged into the engine, which one can think of as like a big search engine, but instead of searching over a document collection, one searches over possible word sequences.

Two possible ways that improvements can occur without changing the engine: if I train baseline models with 10x or 100x the original data (collected from a large number of users, with a variety of accents, vocal tract lengths, etc), my statistical models of what a /k/ sound, for example, is like gets much more refined. "The best data is more data" is a frequently heard (and derided by some) motto. Secondly, there have been a significant number of algorithms developed over the last few years which improve the adaptation of the baseline models quickly to a particular speaker. I don't know the details of the Naturally Speaking adaptation algorithm, but likely it is related to these advances (speech companies tend to pick up the latest tricks and implement them relatively rapidly). Adaptation is technically not part of the engine proper, so one could say that the engine hadn't changed. In short: there are really two pieces to a speech recognizer: a bunch of models (which can be read in from a data file) and an engine to find the most likely word sequence.

That said, yes, there has been a long problem with over-hyping of products. Speech recognition has helped for many different domains (and yes, I can gripe about lost-luggage systems like everyone else), but clearly we as a field have a long way to go. Language is complicated, and yet people (usually) have no trouble using it, which is what makes it such an interesting thing to study and work with.

Re:When the software's history involves jail terms by antirealist · 2008-01-16 15:27 · Score: 3, Interesting

I'm a radiologist who uses a Nuance product for several hours a day, every day, and my experience has been overwhelmingly positive. Whereas I used to waste a great deal of time editing and correcting mistakes by human transcriptionists, I only occasionally have to manually correct the Nuance transcriptions. Our throughput and efficiency have increased considerably since we started with the product, and there is absolutely no way that I'd ever return to the previous system. The adoption of speech recognition has been the biggest advance in my field since digital imaging, IMO. Oh, and "when the software is confused it tries to select something that is grammatically plausible"? I don't think so - the software has no concept of grammar.

Works with Boot Camp by Anonymous Coward · 2008-01-16 16:15 · Score: 0

After reading David Pogue's review of Dragon Naturally Speaking in the New York Times a few years ago, I had been considering that app as the only reason I personally have that I would ever want to buy a Windows machine. Now that the Macs can simulate a PC, I finally went ahead and installed Windows on a partition on my Mac recently and bought the Windows version of DNS. It works pretty good for me. I use it to transcribe telephone interviews I do for my work (I can capture Skype audio) -- playing the conversation back to myself off my iBook and repeating both sides into DNS on the Mini. It has problems -- like, my little Mac Mini with the factory 1 gig of RAM is a little slow; and it's tricky. Like, I have to wear earplug phones *and* a headset mike -- I won't go into it all, but so far it is definitely awkward. However, it does work. It's faster than using my fingers, and although can be frustrating with the mistakes, it is close enough most of the time.

I've been working on getting the Mini to run the Mac OS and Windows at the same time using Parallels, so I can play my recordings and transcribe on the same computer. No dice so far -- the Mini is too slow, there are conflicts with the USB mike, and other problems; and I have other things to do than mess around with it.

So it's good for me that DNS is now available for the Mac. Although it does gets my ass that I could have waited six weeks and bought DNS native for the Mac instead of going through all that Boot Camp and Parallels crap. Figures. But yeah, let me have it. It's a productivity tool, for this workin dog at least.

I can't wait until I can drive my robot by walking on my treadmill with needles in my skull. Now that -- that will be progress. I will totally have the power.

Utter Command is a powerful extension of DNS by Anonymous Coward · 2008-01-17 02:26 · Score: 0

I recently watched a demonstration of Utter Command which is an extension of Dragon NaturallySpeaking and it was pretty powerful.

I've heard about success with that use. by Futurepower(R) · 2008-01-17 02:43 · Score: 1

"... uses a Nuance product..."

Which Nuance product are you using? Is it a special medical version of Dragon NaturallySpeaking? Those are much more expensive, and I've never tried them. They also use special dictionaries provided by Nuance.

I've heard about success with that use. Partly the success seems to be due to the fact that there is never confusion about what you said, so that mistakes are easily corrected. Another reason is that technical words are much more easily recognized.

Speech recognition software checks the grammar to see if the use of a word is plausible.

Speech Recognition a Language-specific problem? by Magitek0777 · 2008-01-17 07:41 · Score: 1

Does anyone have any experience with speech recognition software in other languages? It seems to me that languages such as English with it's many exceptions to pronunciation rules and it's grammar would make it a more difficult problem to solve than a language such as Japanese which has a smaller subset of sounds and more parsing friendly grammatical particles (post-positions as opposed to prepositions) that mark the word's part of speech. Does the problem become easier or harder based on the language?

I worked with a DragonSoft founder by professorguy · 2008-01-18 09:44 · Score: 1

My friend, who developed some of DragonSoft's algorithms, was a founder of the company. I estimated his equity in the company (based on many conversations) at about $50 million US in 1999. His equity in 2001? Less than a million.

Yowza, did those sleazy Belgians ever take him for a ride!

Re:When the software's history involves jail terms by Anonymous Coward · 2008-01-19 02:27 · Score: 0

That is such nonsense. I suppose that if one is an excellent typist, the 99% accuracy rate is not great. But as a person who never learned to type, and has dictated to secretaries (some of whom generated work with less than 99% accuracy), I find the latest version (9) of Dragon Naturally Speaking to be outstanding. I do not find it frustrating at all. Frustrating is using two or three fingers to type more than a sentence or two.

The accuracy issue is overblown anyway. Responsible people proofread and revise what they produce, especially where documents are important.

The robotic talking habit is more ridiculous tripe. Never, because you don't have to talk like a robot to use the software.

You can actually generate work faster that people who type. And some typists get repetitive strain injuries.

Voice recognition may not be for everyone, but I think it is fantastic.

iListen by Anonymous Coward · 2008-01-24 15:08 · Score: 0

Well I've been told that iListen is great. I wonder why would they pull it?

Re:When the software's history involves jail terms by Futurepower(R) · 2008-01-29 12:42 · Score: 1

It's true, voice recognition may be useful to people who don't know how to type.

"The accuracy issue is overblown anyway. Responsible people proofread and revise what they produce, especially where documents are important."

It depends on the material. In some kinds of dictation, voice recognition software makes plausible mistakes that may be missed by a proofreader, but make a big difference in meaning.

You are very unlikely to be aware of changes in your own accent.

Slashdot Mirror

Mac Version of NaturallySpeaking Launched

176 comments