Consonants Not Required
billybob2001 writes: "A report at the
BBC explains how voice-control of computers can be more successful using grunts and sighs, as "voice recognition programs often failed to accurately capture words". Dr Takeo Igarashi, of Brown University suggests the use of "ahhhh" for skipping tracks on a cd, or adjusting tv volume, but I wonder what the effect would be on pr0n sites? Another suggestion is "uh oh" for undo. Perfect for online banking. Is this going to confuse your system or what?"
Surely "Ah, shit!" is the obvious choice for an undo command?
Linux advocates are in a no Win situation
Anyone that's worked at a Help Desk should know that Users have been trying this for years.
Carl G. Jung
--
"With one breath, with one flow, You will know Synchronicity" -La Policia
D'oh!
-- Ed Avis ed@membled.com
The computer can't distinguish words easily, so we'll give you a potentially much smaller vocabulary and see if it does better? Of course it'll do better, whether or not that smaller vocabulary contains consonants.
What I'd worry about is whether these unarticulated sounds sound more like background noise than articulated speech; if so, then you've made the situation worse by making it harder for the computer to know when you're talking to it.
On "uh oh": Dragon Dictate (discrete speech recognition from a few years ago) used "oops" for telling the SR system when it made a mistake; it was reasonably easy to distinguish from words that you actually wanted to put into your text with any frequency.
Asking people to use another language when dealing with machines -- especially one that's more visceral -- is just asking for trouble. Already computers are seriously affecting the ability of humans to communicate orally, by concentrating the language into short bursts used during chats we lose the particles of sentences that help establish context in speech (yes, there is a reason for "the" and "a"). Besides, here's an oppurtunity to elleviate a lot of the bad habits that make dialectic English so tough to understand for those outside the dialect: set the machines to understand one sort of English, so that everybody has to speak at least that type along with their colloquial speech. Of course, there's always the possibility for eugenic practices with this, so my proposal is this: teach the computer the differences between the 8 vowel sounds used by people in Colorado, where pretty much every vowel approaches the schwa (the schwa being the neutral position for the human vocal system and therefore easiest to pronounce). After a while, people will realise that to be successful at using voice activated systems, they'll need to adjust their inflection, and after a while will adjust it automatically when dealing with people who don't understand them, either.
But voice activated systems are stupid, anyway...speech is one of the slowest forms of human interaction, and is one of the few we have to actively concentrate on to perform. You know when people say, "Think before you speak?" That's because once you start speaking a large portion of your brain activity is devoted to doing so...it actually becomes harder to think about what to say next. Pressing a button or turning a dial takes practically no thought...which is another reason why a speech written in spontaneous draft still sounds better than one that is spoken aloud. If we convert machines to speach recognition, we're effectively asking people to interact with them in dumber ways. And can you imagine the logic involved with processing a fairly simple statement like "This check in my hand should be processed by you and in return i'd like fifty bucks in tens and ten one dollar bills." Since the command isn't linear, the machine not only has to recognize what each word means, but try and interpret them in queue. And if humans can't construct complicated sentences like the one above -- which any human over the age of about 4 can understand, before that kids can't identify the subject and object in complex sentences -- they'll be inconvenienced by speaking machines. Oh and for a simpler example, try this: "My pin number? 376 uhhhhhh...Forty-two thirteen...aaaaaaaaaaaand...is it six? no. Eight?...oh! oh! sixty eight!" A human can understand that...we'd be annoyed, but we'd get it.
Hey freaks: now you're ju
2020: Computers everywhere are controlled by grunts, moans, sighs, and snorts.
2040: Computers are finally small enough that they're all embedded into our environments, but neural interfaces don't work, so we still grunt and snort into our computers, but it looks like we're just grunting and snorting in general. People use computers exclusively, and never talk to one another; thus, language is lost and we just grunt and snort a lot.
2060: aliens visit hoping to find intelligent life, but instead find a bunch of snorting, grunting apes. They leave.
-- "Those who cast the votes decide nothing. Those who count the votes decide everything." -Joseph Stalin