IBM Strives For 'Superhuman' Speech Tech

Which ... by spiny · 2006-01-24 21:36 · Score: 3, Interesting

Which witch blew the blue candle out ?

--

Fry: heh, Yakov Smirnoff said it
Leela: No he didn't.

Re:Which ... by jakeweston · 2006-01-24 21:46 · Score: 3, Funny

To wreck a nice beach...
Re:Which ... by cs02rm0 · 2006-01-24 21:48 · Score: 1

The one that understood context.
Re:Which ... by lahvak · 2006-01-24 21:49 · Score: 1

Not really a problem. Machine translation already can handle many words that spell the same but have different meaning (homographs), based on context and position in the sentence. With speech recognition, you just have more of those, you have to throw in homonyms, too.

For simple example, blue in "the blue candle" cannot be a verb.

--
AccountKiller
Re:Which ... by prSpectiv2 · 2006-01-24 22:12 · Score: 1

But if the witch simply 'blew blue candles well'...

--
Nice guys don't finish last. In reality, they're abducted halfway through the race.
Re:Which ... by soulctcher · 2006-01-24 22:25 · Score: 1

It would still follow simple context rules of verb - adjective - noun.
Re:Which ... by paedobear · 2006-01-24 22:32 · Score: 1

As someone who works in his non-native language in the world of high-tech, I'd love to see the miracle context-aware machine translation software you speak of.
Re:Which ... by jcupitt65 · 2006-01-24 22:41 · Score: 5, Interesting

Or I can wreck a nice beach versus I can recognise speech.
Sometimes you need rather a large context to disambiguate: is this sentence part of a discussion on shore-front management, or spoken language understanding?
Re:Which ... by FirienFirien · 2006-01-25 00:33 · Score: 1

I agree with the parent, but will take it one step further:

So do we. I can recognise the differences and meaning of "Which witch blew the blue candle" written - but if someone said it to me out of the blue (npi), I'd have to think through it a couple of times to parse it, because if said as intended - with matching sounds, to rely entirely on context inside the sentence to decipher which word is which, then I'd have as much problem as a computer. The semantic rules I was taught as a child are what enables me to understand the sentence; the exact same semantic rules, in a computer, will most likely parse faster than I will.

Teletext subtitles are a good example of this; if watching a live news program (in the UK, dunno what it's like in the US) the subtitles are transcribed by a typist. While I would have expected them to straight copy the prompter words to subtitles, occasionally the newsreader speeds up (running out of time, or whatever reason) and the typed text starts getting typos and - here's the key - phonographical errors. Again, you can usually pick it out as being incorrect by context, and reparse it correctly and move on. The exact same mistakes are those being noted in this topic as those experienced by speech recognition software.

--
Browsing with +2 to insightful posts and a higher threshold makes the average post seen seem a lot more ingenious
Re:Which ... by RossumsChild · 2006-01-25 01:37 · Score: 1

FWIW: this is a problem of enunciation, not the limitations of technology.

Recognize has a G asnd a Z in it. And 'B' and 'P' are (subtly) different sounds. The computer cannot be blaimed for being unable to translate a language that the user isn't correctly speaking.
Re:Which ... by mwood · 2006-01-25 01:59 · Score: 2, Insightful

Just remember that *you* have a truly enormous and well-filled content-addressable memory, a huge and richly-connected semantic network, and untold numbers of self-adapting heuristics that have been trained all day every day for decades, with more coming into production constantly. It's hard for a machine to match that. Feeding 100,000 distinct pattern matchers in parallel is something most computers just aren't architected to do well. That a machine can do even a passable job of speaker-independant continuous speech recognition is an amazing achievement.

BTW what Teletext is like in the U.S. is that we don't have it. :-( We do have titling on some shows, but to compare that to Teletext is like comparing a single couplet to the poetry section of a library.
Re:Which ... by The+Spoonman · 2006-01-25 03:03 · Score: 1

Mary was merry when she got married. Sorry, couldn't resist. There's a slim portion of the population to whom that sentence makes no sense. They hear those three words exactly the same, regardless of speaker.

--
Which is more painful? Going to work or gouging your eye out with a spoon? Find out!
http://www.workorspoon.com
Re:Which ... by Helios1182 · 2006-01-25 03:14 · Score: 1

Part of Speach Tagger -> Word Sense Disambiuation. It should work on a sentance like that.
Re:Which ... by Squalish · 2006-01-25 04:18 · Score: 2, Interesting

The computer is being programmed with the goal of understanding the user, not some arbitrarily defined 'perfect speech' dialect/accent.

--
People in Soviet Russia, however, appear to be afflicted with amusing juxtapositions of the aforementioned situation
Re:Which ... by kryonD · 2006-01-25 05:34 · Score: 2, Insightful

Don't hold your breath on that. After spending seven years studying Japanese just to speak it conversationally, I can tell you flat out that there will never be on the fly translations between Japanese and English. Why you ask? Because the languages and cultures behind the languages are so drastically different, you often have to listen to several sentences before you can organize the correct context for words in the other language. Not to mention occasionally having to add material in the translated output to explain why a certain sequence of words means something.

For example, go watch Memiors of a Geisha and note that Chiyo keeps calling Mameha "oneesan" (Oh-Nay-San) which literally and figuratively translates to big sister. They are not related, and it is not an afectionate reference that someone might make in English to an older woman who provides protection and guidance. The term actually holds a special meaning in the Japanese world of Hostessing (both Geisha and less formal such as snack bars) that I would find difficult to even explain in English. Good luck IBM.

--
I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
Re:Which ... by Anonymous Coward · 2006-01-25 06:37 · Score: 1, Interesting

Which witch blew the blue candle out

Computer: In my probability based language model, "*empty* Which witch" occurs more than "*empty* witch witch", "*empty* witch which", or "*empty* which which". Therefore I will assume "which witch".

My point is that computers, like human beings (shockingly enough), use contextual information. Assuming they don't is assuming dumb programmers, low computer resources, or not much interest in the problem. All 3 assumptions are wrong (given a relative definition of 'low')
Re:Which ... by lahvak · 2006-01-25 07:18 · Score: 1

I have seen couple of those. Back in 1980's, a group from the Institute of Mathematical Linguistics ai the Charles University at Prague was developing two such systems, one for translation from English to Czech and one from Czech to Russian. I worked on the second one. The system analyzed structure of each sentence, using grammar rules supported by a dictnionary, then translated it to a sort of internal representation, which was then translated (again using bunch of grammar rules supported by a dictionary) into the target language. At that time, it did not keep track of context between sentences, but it was certainly capable to decide whether a word was a noun, a verb, etc in many cases using the position and/or context withing each sentence. Even if blue and blew were spelled the same, the system would have no trouble distinguishing them in the sentence we were discussing. The system at that time was certainly not capable of translation on the fly, but it was used quite successfully for translating of computer manuals and other technical texts.

--
AccountKiller
Re:Which ... by mehu · 2006-01-25 09:04 · Score: 1

...not to mention languages like Chinese in which literally EVERY WORD has dozens of homonyms. Hell, first day of class when I took it way back, we learned 4 different meanings for the sound 'ma', and 3 of those used the same character radical- I'm sure there are plenty more.

And then there's Japanese, where the verb is always last, and subjects (or just about anything else, actually) are frequently just plain omitted. A person can tell what's going on based on context, but without some seriously advanced AI, I highly doubt a machine could do the same.
Re:Which ... by Mattcelt · 2006-01-25 10:35 · Score: 1

A computer's mastery of a language is almost identical to a human's. The methods for learning languages are hierarchical... you can learn things out of order (for instance if you're only interested in learning the written language), but for the most part, you must learn things in order. Based on my experience (which is not scholarly, but is rather extensive), here is the basic hierarchy for full mastery of a language:

Phonemes
-the sounds of the language

Diction/Accent
-how the sounds are pronounced in different situations

Vocabulary
-the words themselves

Grammar
-this and vocabulary tend to be complementary and are often learned together

Context
-What does each word mean in a given structure?

Inflection
-or other vocal queues not distinctly related to the language itself

Construction
-how many ways can you convey a particular idea? e.g., "Who are you?" vs. "Now who might you be?"

Sociology
-What social clues or second meanings are there? E.g., double entendres, puns, etc.-

History
-What historical or background clues are there? E.g., non-literal meanings: having a "loose tongue" isn't to be taken literally.

Computers have finally gotten up through grammar and are now working on context. This is an important step, but it is truly a long way from mastery of any language. While it shows significant progress, it's still a loooooong way from "superhuman".
Re:Which ... by Sarisar · 2006-01-25 20:25 · Score: 1

Darmok and Jilad at Tanagra. Chaka...when the walls fell. Sinda - His face black, his eyes red. Temba, his arms open.

Of course even that is a culteral reference which only trekkies (er trekkers or whatever it is now) will understand. I doubt the IBM program would get it.
Re:Which ... by mehu · 2006-01-25 23:53 · Score: 1

Yeah, but given at least e.g. 80 characters with the reading 'shi' and only 4 possible tones, a large number of them still ARE homonyms, tone & all. Multi-character compounds might help narrow things down a bit, but there's still a lot of room for error.

And yes, Japanese has the homonym problem as well- was that 'seikou' referring to a bullseye, success, political platform, steel manufacture, affection, sex, etc.? Was that guy talking about paper, hair, or god (all 'kami')? Not to mention the DOZENS of different ways that first names can be written. But as for the multiple readings, that would only be a problem for reading, not for speech recognition.

Coherency? by PrinceAshitaka · 2006-01-24 21:38 · Score: 4, Insightful

From The article "For now, all video processed through Tales is delayed by about four minutes, with an accuracy rate of between 60 and 70 percent" and "The accuracy rate could be increased to 80 percent, Roukos added"

Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.

I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.

--
quis custodiet ipsos custodes

Re:Coherency? by Yahweh+Doesn't+Exist · 2006-01-24 21:48 · Score: 3, Interesting

yes, there will always be delay for the reason you state. but that's true even with human translators, yet no-one claims real-time meetings between people via translators is a waste of time.

since even "live" boradcasts are usually delayed several minutes for technical and legal reasons anyway, if this technology can get to the state where you're just one or two sentences behind real-life it will be effectively real-time anyway for almost all practical purposes.
Re:Coherency? by grimJester · 2006-01-24 22:16 · Score: 1

In what cases is a four minute delay noticable if the picture and sound are delayed four minutes too? I'd love this for watching movies that are currently completely incomprehensible to me.

For the 80% part, it's good enough to get the gist of what is said. It won't compete with professional human translators, but it will make translation easily available for those who don't have access to a translator.
Re:Coherency? by sumdumass · 2006-01-24 22:29 · Score: 2, Funny

I'm wondering if this was used durring the lead up on Iraq? "i'm unclear if there are bombs here" and end up getting translated into "there are nuclear bombs here".
Re:Coherency? by wizrd_nml · 2006-01-24 22:37 · Score: 2, Informative

For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.
Not necessarily. An on-the-fly translator could translate words as it hears them filling in the translated words in the correct location in the sentence. In other words, the sentence doesn't have to be completed in order. It can dynamically expand to fit in new words.
If you listen to human translators doing on-the-fly translation you'll see this is how they work.
Re:Coherency? by dancallaghan · 2006-01-24 22:40 · Score: 3, Interesting

but I personally don' think there could ever be real time translation for the following reason. [German]

You are going to have that problem whether it's a machine doing the translating or a human. As I understand it, interpreters of German get around this by some quick-thinking restructuring of the translated sentence, or they simply lag a half-sentence or so behind.

The real problem for machine translation is, and always has been, determining the sense of a word from context (indeed I recall a recent Slashdot article about some guy who suggests this is the separating factor between computers and animal intelligence). Most languages have a great many homonyms whose meaning a listener can determine only from the surrounding contenxt and, often, general background knowledge of the language or topic at hand.
Re:Coherency? by MaxiumMahem · 2006-01-24 22:59 · Score: 1

I don't think 20% inaccuracy will be a problem. One of the great capacities of the human mind is to develope correct inferences from limited information. Of course the developers should always strive to do better, but being able to understand 4 out of every 5 words is probably enough for someone to grasp the meaning of phrase, especialy within a larger context of information.
And it's not as if people achive 100% accuracy, even in their own languages. We constantly misread and mishear things all the time, yet somehow communication manages to function. Indeed, I suffer from mild dislexia and often misread words in books, usualy without even realising it. I would guess I read at probably only 90-95% accuracy, but comprehend at close to 100% anyways, it would be intresting to test this (any psychologist present?). Human translators obviously do much worse than a person in their native language does. So the computer may be coming pretty close.
Lastly, grammer is probably the least of the issue for the program. Languages have their own code, complete with important bits of meta-information like conjugations and articles to tell what bit means what. People have trouble dealing with new/diffrent rule concepts, probably due to the ingraned way we learn languages, but they are easy for machines. Translating from one bit of grammer rules is pretty easy mechanicaly. The bigger issue for them is to actualy understand what is said.
Re:Coherency? by Fruit · 2006-01-24 23:48 · Score: 1

Actually the verb comes last in relative clauses, which also happens to be the base word order for German. In main clauses a bit of movement causes the verb to end up as the second word of the sentence.
Re:Coherency? by kklein · 2006-01-25 00:07 · Score: 1

This is abysmal. Dr. Paul Nation and others have found that even if a non-native speaker understands 95% of the words in a text, he cannot accurately guess the remaining 5% and comprehension will suffer greatly. This isn't hard to imagine being that even at 95% text coverage, 1 in every 20 words will be unknown. Even if they bump this up to 80%, that is still useless. That's 1 in 5 words!
Granted, I'm pulling from my research in a different field (second language acquisition), but the concepts are the same. What is even worse is if this is WRONG 40% of the time! Then that isn't just MISSING information, it is INCORRECT information! This is utterly useless and probably represents more of an impediment to coherence than an aid.
I'm a geek, and I know we'll get computers to speak fairly authentic language in my lifetime, but as a linguist moving into cognitive science, I'm telling you all it's gonna be awhile. We barely know how WE do it, let alone make it happen with a completely different system architecture!
Re:Coherency? by Anpheus · 2006-01-25 00:53 · Score: 1

I think you misinterpreted the meaning of 'accuracy' used here. What they're saying is, their software will provide a sentence correctly approximately 60 to 80% of the time, and on other occasions, it may make errors in the context of certain words. That isn't to say it won't understand them, or that it will leave spots in the sentence blank (just to irk you!), but rather, it will simply make errors that we would call human errors.
Re:Coherency? by somersault · 2006-01-25 02:11 · Score: 1

I'd just like to point out that translators would likely be as good as a person speaking in their own native language (or better if as you point out most people dont even achieve 100% accuracy in their own language, just looks at a few slashdot posts and notice all the spelling and grammar mistakes.. :p mine included sometimes!).

In lots of countries children are brought up learning other languages from an early age (most Germans are taught German French and English I believe), and also you just get some people who are well equipped to learn languages and love doing it - translators probably come into this category (though presumably sometimes you just have to use the average person who knows both languages rather than someone who studied to be a translator). Just thought I'd point out that there is a difference to finding learning foreign languages in school and finding it difficult (mostly because, at least in the UK, we aren't taught from a young enough age I'd say..), and those who actually study to be professionals in a foreign language.

Many people have enough trouble getting a job in their own country - just imagine how crazy it would be if you had to find a job where you didnt use your native language! (I can program in several languages but I dont think that counts :p). Though when you learn a language properly it does become 2nd nature, so maybe it isnt so amazing; just to a single language speaker like us, it seems to be.

--
which is totally what she said
Re:Coherency? by somersault · 2006-01-25 02:17 · Score: 1

You dont necessarily have to know how the brain does something, to duplicate the results. Computers can replicate 'smart' decisions by using pure processing power, which can be increased through heuristics. For specific tasks you can usually get computers to have great results, ie if they trained up this system for use on the news then it would likely have a lot better accuracy than if they used it for every single program on TV (news usually seems to me about death and destruction and would use similar language most of the time - maybe the translator would be thrown off by news about Timmy being rescued from a well by Lassy).

I wouldnt think that the innacuracy from the computer would be because of lack of knowledge of words, it would be more because of sloppy pronunciation etc from the person speaking, but yes it's hard for a computer to fill in the gaps when it comes to language, considering the range of possibilities. But then again, when you limit the system to a single domain, like news broadcasting, you can reduce the possibilities to such a level that the computer will be able to guess accurately what it missed. And being given 4 minutes to think about and analyse things certainly gives it time to check the possibilities..

--
which is totally what she said
Re:Coherency? by vertinox · 2006-01-25 02:40 · Score: 1

I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason...

Still, the only thing faster or just as fast is a human translator for real time translation. Even then it is more or less based on the skill of the person doing the translating.

--
"I am the king of the Romans, and am superior to rules of grammar!"
-Sigismund, Holy Roman Emperor (1368-1437)
Re:Coherency? by foo+fighter · 2006-01-25 03:13 · Score: 1

Try watching BBC World's newscast on PBS in the evening, muted with closed captioning on. They obviously have an automated speech-to-text system working in the background and it's terrible.

The News Hour with Jim Lehrer is much better, and I guess that they actually are paying humans to type instead of relying on an automated system.

--
obviously no deficiencies vs. no obvious deficiencies
Re:Coherency? by ArwynH · 2006-01-25 03:38 · Score: 1

You mean something like 'there is no clear bomb threat evident' and 'there is nuclear bomb threat evident'? :)
Re:Coherency? by Guspaz · 2006-01-25 06:15 · Score: 1

I agree, that accuracy is uselessly low. I mean, imagine this:

1) Take low-quality speech with lots of background noise
2) Run it through a bad voice recognition system
3) Run it through a bad translator
4) Run it through a bad text-to-speech engine

It would be a miracle if a human could even understand the output after so many steps of badness.

I'm not kidding either. I tried IBM ViaVoice five or six years ago. It was absolutely horrible and almost impossible to even train. Perhaps it has improved since then, but now we're talking about working with NO training.

Next, machine translators. Ever tried to translate anything with Babelfish or Google Language Tools? The results are barely comprehensible. I speak both English and French, (Quebec anglophone), and while I can read the source French text just fine, I can barely make sense of the English translation. Google is supposedly working on some statistical translation that will be worlds better, but they haven't shown anything publicly yet. Now, take the badly recognized speech and translate it badly.

Next, take our badly recognized badly translated text and run it through a text-to-speech engine. You know, the ones that sound perfectly natural and identical to a human for half the words and devolve into tonal nightmares for the other half.

This is doomed to failure with modern technology. As you said, 80% is horrible, and they're currently only at 60%. That is barely more than half the speech.
Re:Coherency? by Fratz · 2006-01-25 06:39 · Score: 1

Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.
Maybe this is because the input isn't grammatical and contains fake words. :)

--
-- Fratz, human
Re:Coherency? by kklein · 2006-01-25 11:26 · Score: 1

It's a good point, of course, and one I probably should have addressed better. But here's the major problem these systems always have: they aren't human.
Yes, it runs statistical analyses on the propositional content of the utterances it picks up, which is LIKE what we do (because the propositional content of human language output--esp. spoken output) is actually pretty sparse. It assumes a shared understanding of the world, a shared schema. This is actually one of the major problems behind communication with speakers of other languages anyway. Coming from different cultures, people often have vastly different schemas. So something that someone would say in one language, expecting to hit schemas resident and already active in the listener's brain, may hit absolutely nothing. Here's my newest favorite example, which is from my wife, a Japanese person:
Sore wa are da yo ne!
This translates, roughly, to "That is that thing over there, huh!"
Now, I speak pretty darn good Japanese, but whenever she spits this gem out, I get dizzy and fall over. Especially if it's in the middle of a disagreement. You have two deictic, demonstrative adjectives lacking any kind of antecedent. What she means is "that thing we were just talking about is an example of (whatever concept it is she's trying to tell me it's like)." But God knows what the concept is; I sure don't.
Now, Japanese is a mean example for this point, because it is just LOADED with propositionally empty phrases that assume shared schema. But every language has them, and not just in these little social phrases either. Take the word "crusade." This got Bush in trouble a couple years ago, because, yeah, it can mean a righteous struggle for good... Or it could call up images of a long-term Christian genocide of anyone who looked vaguely Arabic a few hundred years ago. It is likely to hit both schema equally and translating it is going to be difficult, even for a human interpreter.
Anyhoo, I'd love to fiddle with some of this software to confirm my suspicions that it's nowhere near ready for prime time, but until I can, I think I'm right to assume it doesn't work. That makes my computer and cognitive science armchair all the more comfy. "Fie!" and "fiddlesticks!" and "humbug!", I say!
Re:Coherency? by somersault · 2006-01-25 23:19 · Score: 1

heh.. well I think translating the actual words would be a good thing to start with - people can then take their subjective meaning for 'that thing' and 'crusade' and pick it up themselves. If a human cant work out for themselves what something refers to, then they're not likely to blame a computer for not being able to do so either..

actually being able to catch every single word that is being said definitively would be a great start (but again humans wouldnt even be able to do that all of the time!), then computers with massively parallel processors and lots of schemas could be used to infer context etc.. it's all very interesting I guess, though I'd find the context processing a lot more fun than the language processing, which I think would just be tedious.

--
which is totally what she said

first? by Anonymous Coward · 2006-01-24 21:39 · Score: 5, Funny

however the researchers stated "We still can't figure out what Bob Dylan is saying"

Re:first? by Orgazmus · 2006-01-24 21:42 · Score: 1

Bobs speck is totly legbl

--
The system had the verbosity of HTML combined with all the readability of compiled assembly viewed as bitmap images
Re:first? by Mr.+Bad+Example · 2006-01-25 03:32 · Score: 1

> however the researchers stated "We still can't figure out what Bob Dylan is saying"

It doesn't help that all the error messages come out as:

"Because something is happening here
But you don't know what it is
Do you, Mister Jones?"
Re:first? by bobdylan · 2006-01-25 05:35 · Score: 1

Hey man, it's like I once wrote "Don't criticize what you can't understand."

Together with my affected (and patent pending) Mumbling Singing Style, I was a guaranteed critical success before I ever picked up a guitar!

A great advance in technology! by themysteryman73 · 2006-01-24 21:39 · Score: 1

Reminds me of a Simpsons episode "Hello Homer, it's me, KITT from Knight Rider"

Seriously though, this is a great advance in technology, but will it still be as funny to listen to? It's always fun typing in words into speech recognition programs and listening to the unexpected results!

Re:A great advance in technology! by Steven_Lunn · 2006-01-27 01:45 · Score: 1

Yes, Jan Michael Vincent translated to January Michael Vincent

Nuances by AnonymousYellowBelly · 2006-01-24 21:43 · Score: 4, Funny

GB on TV: "We have prevailed"
Subtitle: "All your base are belongs to us"

--
Disclosure: I'm stupid

Re:Nuances by argStyopa · 2006-01-25 02:10 · Score: 1

Not sure why this is rated as Funny (+5).
Sounds like a perfect translation success to me.

--
-Styopa

NSA Babelfish by Elixon · 2006-01-24 21:44 · Score: 2, Funny

I cannot wait when I buty the first eBabelfish gadget that I will put in my ear so I can understand spoken language of my russian colegues... ;-) :-) I hope that someobody will not consider it as "important technology for the national security" and will not restrict it by any mean...

(I'm sure that this eBabelfish is already installed - not in my ear - but on the telecommunication centers...)

--
Well, I've got to get back to work. When I stop rowing, the slave ship just goes in circles.

Re:NSA Babelfish by sumdumass · 2006-01-24 22:32 · Score: 1

You don't want to understand what they are saying. I have heard them and just trust me on this.

BTW nice 'buty'
Re:NSA Babelfish by Elixon · 2006-01-24 23:26 · Score: 1

It makes me curious even more... But from now on they should be taking care of what they say... they will never know who has the headphones connected to iPod and whose headphones are connected to the iPod w/ "IBM Babelfish Inside" logo sticked on... :-)

( s/buty/buy/ => Sorry, I'm sometimes fighting with my notebook's keyboard...)

--
Well, I've got to get back to work. When I stop rowing, the slave ship just goes in circles.

Opensource? by Anonymous Coward · 2006-01-24 21:45 · Score: 1, Interesting

Will IBM make this technology public or will it be proprietary?

Re:Opensource? by omeg · 2006-01-24 22:22 · Score: 3, Insightful

Of course it won't be open source. They achieved what they dub a "breakthrough in speech recognition". They plan on making a lot of money with this.
Re:Opensource? by robyn217 · 2006-01-25 05:26 · Score: 1

When I interviewed Salim Roukos, the project lead for Tales, he said that a portion of the Tales code would be available as open source. However, I believe that in order for this "open souce" portion of code to function as it does in the Tales project (i.e., translating TV on-the-fly), it needs some proprietary IBM software, and probably IBM consultants to integrate it. So, the real answer is: Yes, some of the technology will be open source, but it will come at a cost.
-robyn

Foreign languages are complex... by pubjames · 2006-01-24 21:52 · Score: 5, Insightful

I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.

It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.

This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.

I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.

Re:Foreign languages are complex... by Viol8 · 2006-01-24 22:00 · Score: 2, Insightful

"It's not until you learn another foreign language that you realise how complex languages are, and how subtle."

And how wierd sometimes. English for example loves to use the word "up" in all
sorts of unsuitable places:

give up
shut up
fed up
wash up
fuck up
laid up
muck up
turn up
free up
look up
make up
put up
screw up
hang up
wrap up
hold up
grow up

Wtf?

And home come we say "didn't he.." but in longhand its "did he not...". Shouldn't
it be "did not he"? Why does the "not" shift to the other side of the pronoun?
But then all languages have similar wierd , illogical syntax.
Re:Foreign languages are complex... by MPHellwig · 2006-01-24 22:09 · Score: 4, Funny

And of course: "Up yours!" ;-)
Re:Foreign languages are complex... by Mushdot · 2006-01-24 22:16 · Score: 3, Interesting

I have a friend works in Japan and he tells me the same. He often goes to watch English films that are subtitled in Japanese and tells me that they completely miss-translate most of the jokes and miss subtle nuances of speech. One example he gave was a scene from 'The Full Monty' (im doing this from distant memory so it might not be quite right - in fact, a bad translation :-)

One of the characters is shouting up to someone in their bedroom window. They don't respond to the shouting and the character says "He obviously can't hear me because of his triple glazing".

This is a sarcastic comment relating to the house owners supposed wealth but in Japanese it was translated as:

"He has thick windows"

Perhaps in this case there was no easy way to translate - but I suspect films are probably translated in one pass and there is no time to understand the context of each sentence spoken so it's left to literal translatation only.
Re:Foreign languages are complex... by pubjames · 2006-01-24 22:21 · Score: 1

Another example of this I saw in a french film recently. A character was overhearing a conversation about a ship being under quarentine. He said "Is it the captains birthday?" Makes no sense at all in English but in French it is a play on words and (feeble) joke. Impossible to translate.
Re:Foreign languages are complex... by pubjames · 2006-01-24 22:27 · Score: 1

I suspect films are probably translated in one pass and there is no time to understand the context of each sentence spoken so it's left to literal translatation only

I think it is more to do with the fact that they have to write the subtitles so that they can be read at the speed of the speech. And so they cannot go into subtleties. In fact often when there is fast dialogue they will miss whole phrases out.
Re:Foreign languages are complex... by virtualsid · 2006-01-24 22:45 · Score: 2, Insightful

I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.

I'm not quite sure what you mean here not bother because of this technology?

I can't see anyone not wanting to bother learning a language because of this technology. Not unless it was a babelfish/universal translator type technology - i.e. basically invisible. In which case, what's the issue? ;-)

What are you going to do:
a) Walk around with a little device which translates with 60-80% accuracy when you're in a country where people speak a language you do not understand.
b) Try to learn the language so you don't have to rely on a gadget?

I think I know which one I'd choose - not that I can speak anything other than English, but I do try.

Once devices get to 100% accuracy, my argument disappears. I'd love for that to happen too :-)

Sid
Re:Foreign languages are complex... by pubjames · 2006-01-24 22:53 · Score: 1

I'm not quite sure what you mean here not bother because of this technology?

Perhaps you a not like most people... I often hear English only speaking people say there is no point in learning another language because everyone learns English these days. This just gives them another excuse.
Re:Foreign languages are complex... by anum · 2006-01-24 22:59 · Score: 2, Insightful

Learning a foreign language is a net good and the only way to really understand another culture is to experience it. That said, there are a large number of languages and an even larger number of cultures. Do you intend to learn/experience them all?

Can you see no good in a rough translation for some purposes?

Calculators have largely eliminated the need (an in some cases the ability) for people to do basic math. Therefore we should eliminate calculators before these people start believing that they completely understand cube roots when they just know how to push buttons.

Oh yeah, that reminds me...Cartoons aren't real.

Good luck IBM and I hope this stuff becomes viable soon.

--
I don't think, Therefore I'm not.
Re:Foreign languages are complex... by pubjames · 2006-01-24 23:03 · Score: 1

Can you see no good in a rough translation for some purposes?

Of course.

But from the description I think this is being developed for military or intelligence work. In those fields, mistranslations can cause death. And unfortunately I think the current administration is unsophisticated enough to think that machine translation is better than (more expensive) human translation.
Re:Foreign languages are complex... by virtualsid · 2006-01-24 23:05 · Score: 1

I wrote:
I'm not quite sure what you mean here not bother because of this technology?

(I also can't write sense!)

You wrote:
Perhaps you a not like most people...

Perhaps you're right, perhaps I'm not like most people. In any case, this technology is not yet the kind that is useful to most people I believe.

I do think it's cool technology, but not really a cause for concern with languages.
Re:Foreign languages are complex... by Splab · 2006-01-24 23:30 · Score: 5, Funny

From boondock saints:
Rocco: Fucking... What the fuck. Who the fuck fucked this fucking... How did you two fucking fucks...
[shouts]
Rocco: fuck!
Connor: Well, that certainly illustrates the diversity of the word.

Think that just about covers it...
Re:Foreign languages are complex... by bogado · 2006-01-24 23:30 · Score: 1

And of course: "Up yours!" ;-)

Well, in this particular case dosen't "up" means what the word supose to mean?

--
[]'s Victor Bogado da Silva Lins
^[:wq
Re:Foreign languages are complex... by anum · 2006-01-24 23:33 · Score: 2, Interesting

Ya, I got ya'.

I almost added "I just hope GWB doesn't decide to fire all his intell linguists based on this post" but it seemed kind of like bashing the Prez and i would never do that...

Cheers

--
I don't think, Therefore I'm not.
Re:Foreign languages are complex... by Red+Alastor · 2006-01-24 23:38 · Score: 1

Do you remember the joke ? I speak french and I can't figure out what it originaly was.

--
Slashdot anagrams to "Sad Sloth"
Re:Foreign languages are complex... by pubjames · 2006-01-24 23:50 · Score: 1

The (stupid) character assumed that the captain was having fortieth birthday party - forty being "quarante" in French, so a "quarantaine" sounds a bit like a word for a fortieth birthday party. I said it was feeble. But it is an example of a joke that's impossible to translate.
Re:Foreign languages are complex... by Archibald+Buttle · 2006-01-25 00:30 · Score: 1

There's a really simple reason why film subtitles omit jokes and get things wrong. It is almost never possible to directly translate from one language to another, so subtitles inevitably have to be an aproximation of the original speech in order to help match the pacing of the original film. They also have to not be too wordy, since the viewer needs to watch the film, as well as read the subtitles.

Language is about more than just words, it's about phrases too. A speakers choice of words and phrases gives an additional element of communication to their speech than just the underlying message they're trying to get across. Film subtitles cannot hope to convey anything but the basics and even printed texts suffer when translated.
Re:Foreign languages are complex... by Red+Alastor · 2006-01-25 01:05 · Score: 1

That's far stretched since the right word would be "quarantième" et non "quarantaine". Beside, it's hard to make a sentence that doesn't make it obvious that it's the ship and not the captain whose the subject.

An exemple of English / French I saw in a movie :

- Yeah but...
- What about my butt ?!

I don't remember at all how they translated that :)

--
Slashdot anagrams to "Sad Sloth"
Re:Foreign languages are complex... by ChunderDownunder · 2006-01-25 01:29 · Score: 1

Well at least with subtitled movies you have some hint of the original subtext.
Worse is when films are dubbed so vocab needs to be matched to lip movements.
Or when actors of a different nationality to their characters are cast merely because they speak English.
Then there's news bulletins and documentaries where an English translation is loudly superimposed over the top of native speaker. You can still hear the tones of the person they're interviewing but it's drowned out due to the translation.
All because television programmers dumb down content because the average television viewer is perceived to be "too stupid" to concentrate on subtitles. I'll take subtitled content any day...
Re:Foreign languages are complex... by Julian+Morrison · 2006-01-25 01:30 · Score: 1

I don't necessarily agree. Like most tech it's a tool - the task is up to the user. I find that fansubbed anime helps my Japanese. I'm picking out words and grammar from the flow of speech and simultaneously matching them against the translation. Often I can actually pick out where the translation was fudged or the subtleties were left out. Without the feedback from the subscripts, I wouldn't have that yet.

On the other hand, there are cases where I just want to read something quiclkly, and putting the page through Google translate is a whole lot better than staring at a page of squiggles.
Re:Foreign languages are complex... by makomk · 2006-01-25 01:52 · Score: 1

Then there's news bulletins and documentaries where an English translation is loudly superimposed over the top of native speaker. You can still hear the tones of the person they're interviewing but it's drowned out due to the translation.

OT, but that reminds me - I saw a docudrama (in English) a while back where you could tell the documentary bit from the drama bit because the actors spoke in a foreign language and were subtitled, while the actual interviewees who spoke in a foreign language had a spoken translation superimposed. That makes no sense!
Re:Foreign languages are complex... by mwood · 2006-01-25 02:22 · Score: 1

"Did he not" and "did not he" both work.

But when did you last see anyone write either of these forms?
Re:Foreign languages are complex... by mwood · 2006-01-25 02:31 · Score: 1

Time to check out that Asimov story about a society where mechanical computation was so pervasive that people no longer learned arithmetic. "The Feeling of Power"
Re:Foreign languages are complex... by mwood · 2006-01-25 02:35 · Score: 1

Read the English translation of Lem's _Cyberiad_ before you tell us how impossible it is to translate humor. I'll buy the time-to-read argument, though.
Re:Foreign languages are complex... by milimetric · 2006-01-25 03:19 · Score: 1

actually, i think it'll facilitate learning foreign languages much faster than we do right now. The biggest waste of time is hearing something, having to trudge through a dictionary to find what it means and then hearing the next thing. This way, you'll have streaming translations at your command so you can learn faster. Of course, lazy people would never learn another language in the first place but I'll be using this as a tool not a substitute.
Re:Foreign languages are complex... by metlin · 2006-01-25 04:28 · Score: 1

It's spelt weird, not wierd. =)
Re:Foreign languages are complex... by Viol8 · 2006-01-25 04:32 · Score: 1

They might both work but no one has said the 2nd form probably since
chaucer. But you see the 1st form quite often in more literary
writings.
Re:Foreign languages are complex... by poot_rootbeer · 2006-01-25 04:33 · Score: 1

I don't think TV and movie subtitling is going to be the primary application of this technology. Which is more cost-efficient to a typical TV station or film distributor -- buying one of IBM's big heavy translation computers and support to keep it running, or hiring a dozen bilingual humans to do translations the old-fashioned way?

I expect the spooks (CIA, NSA, etc.) to be the big customers, at least in the near-term. There's countless hours of recorded audio of terrorists or suspected terrorists talking to each other, and not enough time to translate them all by ear. Speech recognition, even if not flawless, will help the intelligence community separate the wheat from the chaff, freeing up time for the translators to work on the most critical cases instead of the mundane and the dead-ends.
Re:Foreign languages are complex... by neibwe · 2006-01-25 12:34 · Score: 1

That's why I like fan subs with their long parenthetical explanations (Ghost in the Shell Standalone Complex.) VH1-pop up video or Starship Troopers "Would you like to learn more" (?) forms of subtitles would be interesting (esp. if we were allowed to toggled them on and off.) ...and what ever happened to the old text messaging for sarcasm...?
Re:Foreign languages are complex... by neibwe · 2006-01-25 12:38 · Score: 1

I had a LessThanSign 's' GreaterThanSign for "sarcasm" that got filtered out. >=L
Re:Foreign languages are complex... by ceoyoyo · 2006-01-25 17:55 · Score: 1

i before e, except after c or when it says eh, as in neighbor and weigh. Oh yeah, and weird is weird.

Ghee... by Anonymous Coward · 2006-01-24 21:54 · Score: 4, Insightful

Hmm, instantaniously translation from arabic, wonder who "cough cough echelon cough!" they are marketing this to.. ?

Re:Ghee... by forgotten_my_nick · 2006-01-25 00:21 · Score: 1

> cough cough echelon cough

Funny you should mention that. I recall a US government department set up just after 9/11 which one of the things it would be working on was a handheld device that could translate from English to Arabic on the fly.

Only reason I recall this is because the logo of said department was the all seeing eye shining some kind of beam over the rest of the world. Prehaps someone with a better TFH then me has a link. :)
Re:Ghee... by amliebsch · 2006-01-25 02:35 · Score: 1

You're probably thinking of the (now-defunct?) Information Awareness Office.

--
If you don't know where you are going, you will wind up somewhere else.
Re:Ghee... by SchwarzeReiter · 2006-01-25 04:22 · Score: 2

Man, if IBM markets this in 2006, NSA has it working since 2000
Re:Ghee... by benjamindees · 2006-01-25 14:16 · Score: 1

That department was around long before 9/11 ;)

--
"I assumed blithely that there were no elves out there in the darkness"

Re:Just what we need... by pubjames · 2006-01-24 21:55 · Score: 4, Insightful

More opportunities for Arabic speaking people to misinterpret western media.

I think you've got it the wrong way round haven't you? Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?

If they REALLY want to test it properly... by Viol8 · 2006-01-24 21:56 · Score: 4, Funny

...they should send it to Glasgow on a saturday night just after the pubs
have closed.

"Ye loooiii ahhh me jimmeh??! *belch* C'mere ya wee electrahnich bastid, I'll
shoo ye!"

Re:If they REALLY want to test it properly... by LiquidCoooled · 2006-01-24 22:02 · Score: 1

Clippy: "It looks like your having a seizure, would you like me to call an ambulance?"

--
liqbase :: faster than paper
Re:If they REALLY want to test it properly... by GSV+Ethics+Gradient · 2006-01-25 01:19 · Score: 1

Clippy: (Interrupting himself) "It looks like you meant 'you're', not 'your'. Please learn to spell..." ;-)

Available with old version of Mandrake Linux by yamum · 2006-01-24 21:58 · Score: 1

ViaVoice was shipped with an older version of Mandrake Linux.

Anyone know where I can get this from?

It isn't worth it by YearOfTheDragon · 2006-01-24 22:00 · Score: 5, Funny

May be IBM is going to make speech recognition true, but Bill Gates said that this was posible a long time ago. Simply genius.

--
-= If you fight Dragons long enough, you will become a Dragon =-

On-The-Fly by Trurl's+Machine · 2006-01-24 22:02 · Score: 4, Informative

They really do it on the fly? You mean, [on the surface of] [a particular] [insect of a Musca domestica species]?

I have read a lot of auto-translated documents and it is always a good laughter in terms of "crapslation cabaret". So far, there is no technology that could auto-translate a text document succesfully. The "80% success" is a myth - they just count how many words were found in the vocabulary, not how many of them were put into a good context. A "fly" translated as an insect would be accounted as a success!

Even if you are not a bot but a human being with some knowledge of the other language and culture, it's very easy to involuntary offend someone or just to make a ridiculous faux-pas. Polish and Czech languages, for example, are very much alike and use common roots for many words, but because of the way both languages evolved, some neutral terms on one side of the border have become offensive on the other side. Czechs evolved an euphemism for sexual intercourse based on the verb "to look for". Poles still use this word when they look for something, which leads to constant crapslation cabaret gags when a Polish tourist appears in a Czech town "looking for a parking lot". Now, auto-translate this...

Re:On-The-Fly by coofercat · 2006-01-24 22:39 · Score: 1

Doubtless what they're saying is over-stretched hype. However, the application of speech recognition to translation to natural language processing makes for some interesting stuff.

The problems you outline happen in English -> Canadian (and probably American too), let alone more complex translations (try calling a Canadian a 'native' - doesn't tend to go down well, but it's normal fare in the UK).

However, add in "domain knowledge" and you're in some interesting territory. I think this is essentially what Google did - they fed in oodles of texts in the various languages so that the system could statistically match phrases. At a simple level, you could have a lookup table of common colloquialisms (eg. 'he's kicked the bucket'(English/UK) == 'he broke his pipe' (French/FR)).

Of course, the only way to really get this going is with natural language processing. At the moment though, computers can (AFAIK) only understand things they're expecting, as opposed to understanding anything and then reproducing it in another language. A way to go there, but I'm sure IBM are on it already... Natural Language processing has to evolve with the language, so it's always a bit of a moving target, and hard to do, because the kids keep inventing new versions of the language ("naar wot I'm sayin'?).

What are the dangers of seeing this in the wild anytime soon? Very slim, I'd say. Of course, they may release the raw speech-to-text engine as a binary, but the rest of it is experimental at best, and currently has enormous amounts of R&D budget absorbed into it (and NL will probably be on subscription). You may be able to buy it as a service sometime though, I guess...?
Re:On-The-Fly by Aceticon · 2006-01-24 23:34 · Score: 1

Portuguese is both spoken in Portugal and Brasil.

Still, for example the slang word use in Portugal for "traffic jam" (bicha) is the slang word in Brasil for "gay".

Talking about the congestion on the streets of Lisbon takes a whole new meaning in Brasil.
Re:On-The-Fly by Red+Alastor · 2006-01-24 23:48 · Score: 2, Insightful

However, add in "domain knowledge" and you're in some interesting territory. I think this is essentially what Google did - they fed in oodles of texts in the various languages so that the system could statistically match phrases. At a simple level, you could have a lookup table of common colloquialisms (eg. 'he's kicked the bucket'(English/UK) == 'he broke his pipe' (French/FR)).
The problem is that why French/FR people will understand the expression, others like French/CA won't. And even if they did special lookup tables, you'll still miss subtely. For instance, if I want to use the expression you gave as an exemple as a warning to someone in French/CA, I could say "You'll break your neck." which would carry the same meaning. But if I say that someone broke his neck, then it should be understood literally.

--
Slashdot anagrams to "Sad Sloth"
Re:On-The-Fly by blackest_k · 2006-01-25 00:11 · Score: 1

machine translation is ropey admittedly but one of the best for polish english translation is
English Translator3 www.techland.pl
Earlier versions didn't know the difference between a shower of rain and taking a shower for instance. although you still need to take care with Polish and polish the capital P makes a difference.
it does provide alternative translations so you can do a basic translation and apply a more appropriate translation.
It's getting old now so perhaps there has been an update.

--
Blarney Quality Restaurant, Plants
Re:On-The-Fly by Cro+Magnon · 2006-01-25 01:56 · Score: 1

Polish and Czech languages, for example, are very much alike and use common roots for many words, but because of the way both languages evolved, some neutral terms on one side of the border have become offensive on the other side. Czechs evolved an euphemism for sexual intercourse based on the verb "to look for". Poles still use this word when they look for something, which leads to constant crapslation cabaret gags when a Polish tourist appears in a Czech town "looking for a parking lot". Now, auto-translate this...

Hmmm, it might be tolerable if the Czech took the word for an adjective. OTOH, if he took it for a verb... *snicker*

--
Slow down, cowboy! It has been 4 hours since you last posted. You must wait another few hours.
Re:On-The-Fly by Bohiti · 2006-01-25 05:11 · Score: 1

Talking about the congestion on the streets of Lisbon takes a whole new meaning in Brasil.

..you mean Lithbon?
Re:On-The-Fly by Deluge · 2006-01-25 07:45 · Score: 1

You have to help me out here - I speak fluent Czech, but not having lived there for 17 years now I'm obviously a bit behind on the latest slang. I've been wracking my brain and I just can't think of the word/expression that translates into "to look for" and means sexual intercourse.

So please, expand my horizons :)
Re:On-The-Fly by Trurl's+Machine · 2006-01-25 09:33 · Score: 1

From Cafe Babel:

Let's imagine that a Pole and a Czech decide to stay in and watch a match at the Pole's house. During the match, our Czech friend feels hunger pangs and asks his companion to get some peanuts (burak) to go with his beer. The Pole will tell him that he doesn't have any beetroot at home but, if it takes his fancy that much, he can head out to the shop (sklep). The Czech will reply that he didn't know that the Pole had a cellar in his house, at which the Pole will look confused and tell him that he'll go and look for them himself. Our Czech friend will be lost for a response to the latter, since the word szukac, which means 'to look for something' in Polish, means 'to make love' in Czech!

IBM and Google cooperation to come? by Mostly+a+lurker · 2006-01-24 22:13 · Score: 2, Interesting

IBM has been one of the pioneers in speech recognition for a long time. However, indications are that Google (in the lab) has been making tremendous progress in translation. While the two companies are bound to be fierce competitors, it would seem they would both have much to gain from cooperation in the area of language recognition and translation.

This won't make speech recognition mainstream by thbb · 2006-01-24 22:16 · Score: 4, Interesting

As it has been the case for the past thirty years, the description of the prowesses of the system are still written in the conditional form: "...IBM technology can be used to control computers and devices..." rather than the active form: "is being used"...

Ben Shneiderman is the person who, in my opinion, articulates the best the limits of speech recognition.

One of my favorite phrases to explain this issue is: "You don't want to speak to a computer, because you can't speak and think at the same time". More precisely, speech utterance makes use of some modules in our brain which are required for planification too. Hence, you can't plan as well what to do next when you speak, which is a big hurdle in the type of intellectual activities one carries with a computer.

Re:This won't make speech recognition mainstream by aug24 · 2006-01-25 00:59 · Score: 1

'Planification'?
Hmmm, this computer's going to have a hard time understanding you.
Justin.

--
You're only jealous cos the little penguins are talking to me.
Re:This won't make speech recognition mainstream by thbb · 2006-01-25 01:17 · Score: 1

Indeed, it shows I'm not a native English speaker. I meant planning if that wasn't clear. I'm sure you had figured it out, but it's yet another challenge for speech recognition and translation: pretty much everything we say or write follows a loose and approximate grammar that our listeners can only get through common understanding of the context...
Re:This won't make speech recognition mainstream by aug24 · 2006-01-25 01:43 · Score: 1

Absolutely - should've put a smiley on the end to make my meaning clear.

J.

--
You're only jealous cos the little penguins are talking to me.
Re:This won't make speech recognition mainstream by mwood · 2006-01-25 02:46 · Score: 1

Planification: the process (ation) of making (fic) plans. Easy. I would have said "planning" or "generating plans" though. "Planification" is probably a term of art. Eventually the recognizer would be set up to know this, and the metadata indicating domain-specificity could even help it work out the rest of the sentence.
Re:This won't make speech recognition mainstream by milimetric · 2006-01-25 03:16 · Score: 1

or more importantly, what if

Girlfriend - You're always typing on your computer

became

Girlfriend - You're always talking to the computer

you'd be in real shit. I mean, fuck that you can't think while typing, how are you going to communicate to your girlfriend while working on the computer? Type on her?
Re:This won't make speech recognition mainstream by aug24 · 2006-01-25 04:14 · Score: 1

Indeed. Now if you could just point out the computer that can do that... ;-)

--
You're only jealous cos the little penguins are talking to me.
Re:This won't make speech recognition mainstream by mwood · 2006-01-25 05:16 · Score: 1

Planning, or using metadata to resolve ambiguities in speech? Shakey was doing planning in a limited environment many years ago.

Is using more information about input symbols to reduce the number of viable choices difficult? I confess I haven't studied the matter.
Re:This won't make speech recognition mainstream by ijablokov · 2006-01-25 05:48 · Score: 1

So, in my capacity as an official spokesperson, I'll clarify:

"...IBM technology *is* being used to control computers and devices..." ...by customers buying Hondas, GMs, XM Radios, and in quite a number of enterprises within banking, healthcare, etc.

No conditional statement, is that better? ;-)

Awful default TTS by Council · 2006-01-24 22:19 · Score: 3, Insightful

Speech-to-text is cool, but for 30 years they've been predicting it's the next new thing in interfaces, and it's remained a niche thing as it gets better and better. Maybe it'll hit the point where it's flawless and suddenly find new markets, but we'll see.

What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.

This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.

I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.

--
xkcd.com - a webcomic of mathematics, love, and language.

Re:Awful default TTS by nogginthenog · 2006-01-24 22:46 · Score: 1

I know what you mean. I remember the speech functionality that came with my Amiga in 1989 was superior.
Re:Awful default TTS by wfWebber · 2006-01-24 22:48 · Score: 2, Informative

Then again, if they supplied a version that produced awesome quality voices, they'd be accused of trying to kill their TTS competition.

That said, in Microsoft Windows Vista (ETA 2019), the default TTS engine will be replaced by a new one sporting Anna. Have heard her in the preview and I have to say, it's one hell of an improvement.

--
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. -- Andrew S. Tanenbaum
Re:Awful default TTS by Viol8 · 2006-01-24 22:49 · Score: 1

Probably BECAUSE speech is a niche market , MS don't want to spend the
money on making it any better. So long as it sort-of works then the marketing
droids have something apparently bleeding edge to waffle on about in the sales
pitch knowing full well very few people will use it and discover how crap it
is, and the ones who do are such a small percentage anyway that they won't care.
Re:Awful default TTS by mrjb · 2006-01-24 23:53 · Score: 1

Amiga? In 1982, the TI-99/4a with Terminal Emulator II and speech synthesizer already did what XP's tin man does nowadays. Pity that machine was crippleware, you had to buy all kinds of add-ons for it to get some power from it.

--
Visit http://ringbreak.dnd.utwente.nl/~mrjb/growingbettersoftware to download your free copy of the book
Re:Awful default TTS by AndroidCat · 2006-01-25 01:54 · Score: 1

I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.
That's never stopped Microsoft before. Look at the graveyard of all the companies who made some niche product that was smashed when MS included it as a feature in Windows. (It didn't always work as well, but it was there and killed the niche.) The real money in TTS isn't for individual computers, but for large services and such. (Which is a shame because there's a lot of emergent SOHO apps that could use cheap but good TTS and VR.)
MS has actually gone backwards on TTS, because the L&H TruVoice American English engine that comes SAPI4 and MS Agent sounds much better than Mike, Mary and Sam. Of course, Lernout & Hauspie managed to graveyard themselves. MS has been stagnant on TTS since 1998, but hopefully they're doing more than just shipping a newly tweaked voice for Vista.

--
One line blog. I hear that they're called Twitters now.
Re:Awful default TTS by Kuciwalker · 2006-01-25 03:25 · Score: 1

Because most people don't care about having their computer read to them.
Re:Awful default TTS by mdarksbane · 2006-01-25 04:44 · Score: 1

That's because most of the included TTS modules are based almost directly off of military research done in the late 70's and released into the public domain. While more recently there have been a few open source endeavours to improve free TTS, most of the research in that area in the last 20 years has been by corporations that AREN'T MS (like IBM) and kept under strict lock and key or expensive licenses. The text to speech included in Windows and OS X has barely changed in over 20 years. Of course it sounds like crap.
Re:Awful default TTS by bill_mcgonigle · 2006-01-25 12:33 · Score: 1

This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.,/i>

Patents. At least that's what's keeping Apple back. Reportedly they have a much better implementation in the lab but won't ship it due to patent issues.

One would have to further suppose that the patent holder isn't willing to license for what Apple and Microsoft consider reasonable prices. If they're in agreement on this they're probably right.

Anyway, just get festival.

--
My God, it's Full of Source!
OUTSIDE_IP=$(dig +short my.ip @outsideip.net)

What about SubHuman Speech? by shotgunefx · 2006-01-24 22:21 · Score: 1

Serious, you hear how some people "talk" these days?

--

-William Shatner can be neither created nor destroyed.

Re:What about SubHuman Speech? by cyberbian · 2006-01-25 02:44 · Score: 1

or ebonics?

I personally think they just wanted an upgrade to the old sound...

--
if I claimed I was emperor just because some watery tart lobbed a scimitar at me they'd put me away!

Re:Just what we need... by MichaelSmith · 2006-01-24 22:21 · Score: 1

Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?

Yeah, that too.

--
http://michaelsmith.id.au

ViaVoice by TheRealDamion · 2006-01-24 22:27 · Score: 1

The xvoice team have failed to get IBM to recompile newer ViaVoice libraries, or even the same code against a more modern libc, ld.so and gcc environment making it quite hard to keep it working on newer distributions. It's also limited to ia32. They certainly don't seem likely to release the source code.

So I'm surprised to see an announcement like this one.

American or English? by squoozer · 2006-01-24 22:30 · Score: 2, Interesting

I realize that Anericans and British (English at least ;o)) speak essentially the same language but I have yet to find any speech recognition software that can get more than roughly 85% of what I say correct. I have a fairly soft neutral english accent with pretty good enunciation so I would have expectd to be getting a recognition rate in the high 90%s. I'm wondering if, as most of this software is developed in the US, it is tuned specifically to pick up on english with a US accent? I realize that you train the software for your voice but AIUI all you are doing is tuning a basic speech model. Has anyone else had this problem or is it just me?

--
I used to have a better sig but it broke.

Re:American or English? by Vengeance · 2006-01-24 23:23 · Score: 2, Funny

I'm sorry, what?!?!?

I cannot understand a word you're saying. What's with that accent?

--
It was a joke! When you give me that look it was a joke.
Re:American or English? by IamTheRealMike · 2006-01-24 23:27 · Score: 1

Existing speech recognition engines rely on statistical approaches just like this "miracle" product does to disambiguate sounds and words, and yes about 80% accuracy sounds right. Of course this is too low when competing against a keyboard, even though speech recognition could be a lot faster by the time you corrected all the mistakes it works out slower - hence the reason it's only used in limited applications.
I have virtually no accent at all, except for very mild British overtones, yet speech recognition has never worked well for me either.
Re:American or English? by djmurdoch · 2006-01-25 01:58 · Score: 1

I have virtually no accent at all, except for very mild British overtones...

That claim makes no sense whatsoever. You have a regional accent, it just happens to come close to the one you hear around you most commonly. I'm guessing it's a midwest accent, aka "General American", aka the US TV network announcer accent.
Re:American or English? by deimtee · 2006-01-25 02:16 · Score: 1

Strewth mate, I don't have any bloody accent either, but the bastards can't recognize a bloody word I say!!

--
I'm guessing that wasn't on their radar screen...

Oh oh oh. by Anonymous Coward · 2006-01-24 22:33 · Score: 3, Funny

I think it was about 1996 or maybe 1997 when I attended an IBM demonstration (for retailers) for its speech recognition software. Anyway, the lady who was narrating the text and. talking. like. a. robot. to. do. it. was half-way through when, for no apparent reason, the word uterus appeared in the text.

So I'm sitting here thinking of how funny it was to the juvenile me back then, and how unfunny it seems right now. Oh well.

Not _that_ amazing by johndoe42 · 2006-01-24 22:42 · Score: 2, Interesting

It's been well-known among language researchers that both speech recognition and parsing/comprehension are much easier when applied to a small problem domain. SRI in Palo Alto and CSLI at Stanford, for example, have a number of very impressive speech recognition packages that understand, for example, medicine-related sentences. The dashboard controls just sound like a logical progression of this to faster computers and an even smaller problem domain. They're cool nonetheless.

The translation, on the other hand, sounds damned impressive. For unrestricted content, especially with an untrained voice (I imagine that IBM isn't individually training to each Al Jazeera talking head), 70% recognition sounds quite good. 70% accuracy post-translation ought to be quite a bit better than what's currently out there. The description of MASTOR, however, is useless -- it could easily describe anything that isn't word-for-word translation.

Re:Not _that_ amazing by dchaley · 2006-01-24 23:11 · Score: 1

Since you mentioned CSLI at Stanford, they are in fact already working on a speech-driven (human to system and system to human) in-car radio and navigation system in collaboration with Bosch. The prototypes are very impressive, but unfortunately not many details are available on the public web.

So yes, this is cool stuff, but as you say, not _that_ cool.

And German is an easy one by Ogemaniac · 2006-01-24 22:44 · Score: 4, Informative

It is as closer to English as any other language. In general, European languages have the same basics as English (such as "the") and are fairly easy to learn and translate. Right now I live in Japan, where the language and its underlying way of thinking basically run in the reverse direction of English. To translate, you are essentially running the whole thing backwards. Worse yet, the fundamental parts of the language are quite different. For example, Japanese does not have articles or prepositions, though it has post-positions that roughly correspond. However, there are fewer of them, so they have "lots of meanings" when translated into English. Translation can be a "#$#, even for a human who understands both languages very well (which is why anime comes off so corny sometimes). There are countless times where there is just no simple way to express a thought in one language that is trivial in the other.

Re:And German is an easy one by ookaze · 2006-01-25 02:53 · Score: 1

I would not say that german is easy.
Anyway, in japanese, you forgot the fact that the verb is not even always present in the sentence (just guessed depending on the context), and that sometimes, with the exact same sentence, subject and object are switched depending on the context too.
This require some training to understand, I still did not mastered it well, and seeing lots of fansubs shows me that I'm not the only one that has not mastered this (and I'm not the worse).
I guess a machine would have a really hard time with japanese (and even worse for chinese or russian).

Japanese and English are quite different by Ogemaniac · 2006-01-24 22:48 · Score: 2, Insightful

and it is usually extremely difficult to translate jokes. The senses of humor are quite different as well. I think this is part of the charm of anime, actually - we are laughing at things Japanese aren't always intended to find funny, while missing half of the jokes that are supposed to be there.

Buyer beware by 99luftballon · 2006-01-24 23:04 · Score: 4, Insightful

Speech recognition has long been the land of inflated promises and little returns. Anyone remember Lernout & Hauspie and its supposed 15 minutes learning time?

Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.

So don't rush to buy. Let the labs check it out first.

Re:Buyer beware by kaos.geo · 2006-01-24 23:53 · Score: 1

Here's a link about the Lernout&Hauspie Fallout... http://www.businessweek.com/2000/00_37/b3698218.ht m Plus there was a much more detailed Wall street journal Article (very well written, BTW), but seems now to be either in the Archives of the WSJ or unavailable to me. This was a mini-enron-like story.
Re:Buyer beware by Koyaanisqatsi · 2006-01-25 12:20 · Score: 1

Well, I did used VIA VOICE (the software in question) on a PC before, and it takes a bit of learning. At first you are asked to read a short text to it, so that it can analyse your speech patterns and gather the dynamics of your room and mike.

But after that, is that thing precise! It hardly misses a word, and when it does because it does not know the work (like a domain-specific word), you can add it to its dictionary, so that next time it is recognized. Simply amazing how well it works - and I'm talking about a previous version, not this release.

Finally, the version I use is for parsing Portuguese, which incidentally is a bit more complex in structure than English.

My advice? Check it out before dismissing it, you may end-up liking it :)

Re:Just what we need... by user9918277462 · 2006-01-24 23:13 · Score: 4, Insightful

There's a very good reason they're testing this tech on Arabic speech primarily. Although they won't say it, I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.

Trusted Computing by The+New+Andy · 2006-01-24 23:15 · Score: 1

This is one of those things that won't be possible with trusted computing. With encrypted audio+video streams for everything, all these cool technologies won't be able to be made. Hopefully, someone makes a program like this which goes mainstream - that ought to educate people about trusted computing as soon as they try to sneak it in.

Re:Trusted Computing by benjamindees · 2006-01-25 11:49 · Score: 1

Why do you think content providers would let you do your own translating instead of providing you with an "official" translation?

--
"I assumed blithely that there were no elves out there in the darkness"

I'll just be happy if by el_womble · 2006-01-24 23:23 · Score: 1

it does what the current generation of speech recognition claims to do. I have yet to find any dictation software that is even remotely accurate, and the voice command software has been pap, at least for me. There is something about my accent that really upsets speech recogntion software.

Nintendogs: I've stopped trying to train my dog, its never going to happen.
Apple Speech: Only works if I use a terrible californian accent. Not worth the embarresment.
Nokia: Even with just one voice command, my girlfriends name, if still can't match my voice.

If this can translate foreign languages in to American (sic) then it definately sounds like it could stand a chance at translating English into text and command.

--
Scared of flying, pointy things snce 1979!

Re:I'll just be happy if by Cro+Magnon · 2006-01-25 02:01 · Score: 1

I once read about someone dictating to his voice-recog software when 2 of his cow-orkers stopped by. He said "Hi, Nick and Ben". The software printed "Hi, naked men".

--
Slow down, cowboy! It has been 4 hours since you last posted. You must wait another few hours.

funny this subject should come up... by dafragsta · 2006-01-25 00:04 · Score: 2, Interesting

I've actually never used any speech recognition software before today. That said, today just happens to be the day. That said, I tried out Dragon NaturallySpeaking for the first time, and it is a complete coincidence that this topic should come up. I'm actually dictating this post with Dragon, as we speak. ha ha

the training process definitely has its ups and downs. The more you work with it however, the more it becomes attenuated to your own speech patterns and moreover, the quirky words we use every day. If you can get past the first two or three hours, you'll see that it is totally worth the effort, especially if this IBM tech isn't available to end-users for some time. There is also an aspect of the software training you, while you train the software. At the present time, I can dictate to slightly slower than I can probably type.

In the end, I can see where this would make a writing e-mails and other such time-consuming tasks, which involve spellchecking, grammar, and other proof reading significantly quicker. When you really hit your stride, it's easy to write at the speed of thought, which is really appealing. There are caveats, however. it's very easy to dictate several sentences worth of tax and taken for granted that it to everything down the way you attendedselect tax select select tax undo

Re:funny this subject should come up... by dafragsta · 2006-01-25 00:09 · Score: 1

A good case-in-point example of a pitfall is that I totally forgot that some of the text that I dictated from within the Slashdot window was mangled to hell and gone. That said, within Notepad, the results are very acceptable to say the least. I'm definitely getting closer to being able to write at the speed of thought. When the application is hitting on all eight cylinders. Another thing I forgot to mention is that occasionally it will get confused. If it happens to get confused with regard to the built-in commands. You will want to straighten that out in a hurry, because it was easily the most frustrating part of the training process. In the best case scenario, you'll be using these built-in commands to make the training process go faster.
Re:funny this subject should come up... by dafragsta · 2006-01-25 08:12 · Score: 1

Given that I wrote that post very very early this morning, and hadn't yet been to bed. I gave you the benefit of the doubt. I read my second post, which doesn't have any of the bad examples of mangled dictation, which is the one you replied to. I've come to the conclusion that you are just trolling.

I would say that I get into many a heated discussion with people who exhibit their grasp of grammar, not to mention significantly more intelligence in a much more convincing way than you have. Yet you're the first person that I can think of in a long time that flat out told me that I have bad grammar, (punctuation non-withstanding) nor did you cite words that you think I've misused.

Given my fatigue at the time, it's possible I went a little overboard with the clichés, but I never overstepped my bounds with my vernacular. You can either point out the words you percieve I've misused, or may you forever get slapped in the face with a hefty self-pleasure device intended for females in the minds of anyone else who should fall victim to your misplaced aggression. I hope the sum of what gets your rocks off is: posting on Slashdot, being a completely unpleasant person, or better yet, both at the same time. If that's the case, then I have you pegged, and I didn't need to write three paragraphs of well-chosen words for you to know that, I just felt like putting you in your place by unveiling the linguistic capabilities of this fully armed and operational battle station, when I have both slightly less bloodshot eyes open, sans-toothpicks, and take the time to properly manage the dictation.

"There's nothing quite as exhilarating as pointing out the shortcomings of others, is there?"-Randal Graves, Clerks

Way to go on the Anonymous Coward post, it suits you.
Re:funny this subject should come up... by dafragsta · 2006-01-25 08:14 · Score: 1

That first sentence is not intended to be a non-sequitur either. As your warped little brain probably failed to read, there is still considerable training to do, especially with regard to pacing, which translates into sporadic and unpredictable punctuation.
Re:funny this subject should come up... by FCP · 2006-01-25 10:34 · Score: 1

I know many very intelligent people who seem to have a random-homophone-substitution filter between their brains and their tongues. It can get pretty funny sometimes. I suppose you're nagging about things like "attenuated" instead of "attuned." It's a pretty long leap from a couple of examples of that to proof that someone is vocabulary-challenged.

Anyway, most people who have that problem in speech will make many fewer substitutions while typing, so they might want to be extra vigilant when using dictation (as if one didn't have to be, anyway).

Oh, and: grammar Nazis are among the lowest forms of net.life, slightly above Anonymous Cowards. Oh, wait, you're both ... sux to be you.

--
.plan: file not found

Re:Tip by squoozer · 2006-01-25 00:08 · Score: 1

I gave up on speech recognition as everything but a toy a while ago but your tip could lead to some interesting mistakes. Take for instance the sentence fragment "Runing to the door". If it is pronounced as you suggest it could easliy be misunderstood by the machine to be "run in to the door" which could have nasty consequences.

--
I used to have a better sig but it broke.

I am surprised by Shar-Kali-Sharri · 2006-01-25 00:08 · Score: 1

... how critical people have been in their replies 'till now. I mean sure there are bound to be problems with this tech, but I think what's really interesting is the implications of a mostly succesful on-the-fly translation, - babblefish anyone... Supposedly with fast enough computers and advanced enough programs - imagine being able to commicate with everyone in the whole **cking world.... This would have enormous consequences for everything... humanity unite - (or problably bloody warfare ...). It might be true that this would problably remove some peoples motivation for learning other languages... but if look at the world today, there are quite a lot of bi-lingual people, but how many tri-lingual and in extreme consequence of this tech - 500-lingual.... You could potentially communicate with bloody QuEthc-indians..... This is what I think is the real issue here - not that some subtitles might miss a joke....

--
In Soviet Russia my signature is reading YOU

Re:I am surprised by Dark_MadMax666 · 2006-01-25 05:40 · Score: 1

I can already communicate with more people than I care to . -All I needed to do was learning English. - I dont think being able to speak africaans or mandarin will significantly expand my horizons in any meaningfull way.

Real-time eavesdropping by 0xC2 · 2006-01-25 00:30 · Score: 2, Interesting

Although most of the discussion so far has focused on foreign language translation, this technology is about *real-time-audio-to-text* conversion. The feds will be able to monitor, analyze, and record our conversations in real time:

Monitor all conversation.
Apply real-time text filters.
Assign live agents to priority eavesdropping.
Profit!

If you could apply a filter to listen in to any call what would it be?

--
Be heard || Be herd

Re:Real-time eavesdropping by BlackTarw · 2006-01-25 01:13 · Score: 1

Just learn to speak Welsh, try training a computer to understand what a double 'L' sounds like ;)

Finally! by digitaldc · 2006-01-25 00:35 · Score: 1

We can figure out just what the hell Ozzy Osbourne is saying!

--
He who knows best knows how little he knows. - Thomas Jefferson

No, German also changes word order by hughk · 2006-01-25 00:47 · Score: 1

Although from the same linguistic family (but English also owes a lot to French and Latin) there are some important grammatical differences. The issue with interpreting German is that the verb (and any negation) may come at the end of the sentence. German can have some very long sentences.

For a human, the issue is that you can't interpret based on the phrase, so a human interpreter has quite a lot to do. The interesting thing is that experienced interpreters do this unconsciously.

I have been an admiring user of interpreters for many years now and one handled English/Japanese/Russian.

--
See my journal, I write things there

Re:No, German also changes word order by mwood · 2006-01-25 02:10 · Score: 1

For some entertaining examples, see Mark Twain's "The Awful German Language".
Re:No, German also changes word order by hughk · 2006-01-25 06:53 · Score: 1

He should know. He apparently spoke it rather well. I still find his comments from his travels through Germany rather amusing and my daughter was able to use some of his description of Heidelberg in her studies at the university there.

--
See my journal, I write things there

Translating Arab TV by Perl-Pusher · 2006-01-25 00:56 · Score: 2, Informative

I imagine it is easier to translate repetitive phrases such as "The zionist oppresssor shall be eliminated", "The great Satan America will be destroyed" and "Our martyrs have struck fear in the hearts of the infidels ".

I was in Kuwait and watched arab TV with english subtitles, it was enlightening to say the least. One long tribute to racism paid for by the Amir of Quatar. Only on arab TV will you see such trash as "the jews are descended from pigs".

Big deal, I can do that on my Apple ][ by Fear+the+Clam · 2006-01-25 01:05 · Score: 1

One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles.

10 PRINT "DEATH TO AMERICA";
20 GOTO 10

RUN

I agree, it does by Ogemaniac · 2006-01-25 01:22 · Score: 1

But not to the extent of Japanese. I lived in Austria for a summer, and after just three months, with no prior study, I started "getting" it sometimes. On the other hand, with 2.5 years of university study and ten months of living in Japan, I often hard time following the logic of a long sentence - even when written and when I know all of the words.

Generally, it is estimated that it takes an English speaker about twice as long to learn a languages from the Asian or Arabian groups as it does a European language.

Why "superhuman" tech? by ian_mackereth · 2006-01-25 01:36 · Score: 1

Is it really that hard to understand Chris or George Reeves saying "Up, up and awaaaayyy!"?

Re:Why "superhuman" tech? by serial_crusher · 2006-01-25 08:45 · Score: 1

Why superhuman? ...Because it can understand women too!

Speech Synthesis. by crhylove · 2006-01-25 01:39 · Score: 1

So I think there should be a program to resynthesize the "learned" words into the most exact average of any given way to say it. I'd love to hear the results, that would be fascinating.

--
I hold very few opinions. I hold information based on observation and fact. If you wish to disagree, please use facts.

Re:Just what we need... by sikandril · 2006-01-25 01:41 · Score: 1

You assume that this sort of thing hasn't been going on for many years now.

Excellent Product, Confused Reviewers by MarsGov · 2006-01-25 01:44 · Score: 2, Informative

ViaVoice Embedded, the product that they're releasing, works on limited-domain problems: for example, tasks related to control of your car's peripherals. When the vocabulary and grammars are constrained it's possible to acheive very decent accuracy.

Dictation, however, is a completely different problem. There are far fewer constraints on what can be said, and the system makes errors as it picks through the possible choices. As a result, most dictation software requires training: the system will use your voice to train its recognition models to improve its word selection. Dictation systems also ask for samples of your documents to train its language models on how you put words together; that also helps determine the probabiity of proper word choice. (Example of how you put words together: "Peanut butter sandwich" is a much more likely choice than "peanut butter sand," and will get a higher score.)

The IBM announcement is about embedded, task-oriented speech recognition. It's not "superhuman," according to the article's text and ignoring its headline. I'll have an opportunity to see it in action next week at SpeechTek West. Expect to see other product announcements about speech technology in the next few days as the conference approaches.

As for the TV translation software, it's still in the research stage according to the article. I've seen BBN's version of this software, and frankly it's amazing how good real-time translation can be.

Bell Canada deployed Emily a few years back, and the results to date have been excellent. A top-level question of "How can I help you?" replaces several layers of DTMF auto-attendant complexity.

If you're interested in trying speech recognition and text-to-speech out for yourself, you can use Voxeo's servers, program in VoiceXML, and my Voice Conference Manager app as a starting point (yeah, VCM needs a new release, and it's getting one soon).

Re:Excellent Product, Confused Reviewers by Deluge · 2006-01-25 08:16 · Score: 1

"Bell Canada deployed Emily a few years back, and the results to date have been excellent. A top-level question of "How can I help you?" replaces several layers of DTMF auto-attendant complexity."

This Emily should be taken out and f*cking shot. Half the time it doesn't understand a simple "Yes" and the other half the time you're trying to figure out how in the hell to express what you want (if it's more complicated than, say "pay bill") so the machine knows how to route your call. Everytime I hit one of those goddamn speech recognition systems I just start stabbing 0 in an attempt to talk to a person.

At least with the old "Press 1 to hear a eunuch burp" kind of system you knew what your choices were! Now it's a stab in the dark as to whether the braindead twits that decided to blow all that money on a completely backwards and unnecessary system had to foresight to preprogram the voice recognition to recognize the words pertaining to MY issue. This is because when I actually call in, it is to solve a problem that a simple trip to the website won't.

We also showed off... by ijablokov · 2006-01-25 01:54 · Score: 1

...our speech-enabled Web browsers for mobile devices and set top boxes. More info on them here: http://ibm.com/pvc/multimodal

Not only do they allow you to navigate by voice, but using X+V (a blend of XHTML and VoiceXML), you could have fully speech-enabled Web apps. Example: "show me nearby sushi restaurants" or "movie schedules in my area".

We also released our Multimodal Tools Project for Eclipse a couple weeks ago: http://alphaworks.ibm.com/tech/mmtp

Go ahead and play. ;-)

Let's see it translate poems by roman_mir · 2006-01-25 01:57 · Score: 2, Interesting

When and if it can translate poems from language to language, while keeping the style, the nuances, the rythm, the cultural references, the general idea and the details, then we will know - it is done. Until then, don't hold your breath.

--
You can't handle the truth.

Re:Let's see it translate poems by hunterx11 · 2006-01-25 02:23 · Score: 3, Interesting

I'd be happy enough if humans could do this.

--
English is easier said than done.

Re:Just what we need... by meringuoid · 2006-01-25 01:59 · Score: 1, Funny

Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?

Pah. English-speaking people never misinterpret Arabic media. al-Jazeera is a terrorist front organisation and ought to be bombed, and that's all there is to it!

--
Real Daleks don't climb stairs - they level the building.

Re:Just what we need... by mwood · 2006-01-25 02:03 · Score: 2, Insightful

Patriotic. What part of "*International* Business Machines" did you not understand? More likely it's to show that they really understand the problem and not just the English-only subset.

Anime fansubs! by CptNerd · 2006-01-25 02:06 · Score: 1

What a boon this will be to those anime fansub groups who can't find decent translators, or at least translators who aren't overworked.

--
By the taping of my glasses, something geeky this way passes

Re:Just what we need... by pev · 2006-01-25 02:18 · Score: 1

I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.

And more importantly (for them) no pesky staff translators with a conscience leaking what they transcribed or the greater good.

~Pev

Thanks for the laugh! by Ancient_Hacker · 2006-01-25 02:20 · Score: 1

Ah yes, super-duper speech recognition is right around the corner!

I've been hearing this every 6 months for about the last, oh, thiry years.

Given that the state of the art in something much simpler, like automatic language translation, is pitifully inadequate, how likely is it IBM has conquered speech recognition AND translation?

Har har har.

S-to-T in hospitals by stardancer · 2006-01-25 02:20 · Score: 2, Interesting

I know that one hospital in Norway has been experimenting with/testing speech-to-text software for a while, and reports say it's been very successful! (this supports what was said about speech recognition within a tight context in an earlier comment). I believe the plan is to, at some point, eliminate the need of secretaries transcribing what the doctors dictate, so that ideally the doctors can just speak into a mic and the text automagically appears in the patient's (electronic/digital) journal!

this of course worries secretaries, since they might eventually lose their job/"career". on the other hand it would improve effeciency *a lot*.

--
There's nothing too profound behind this sig.

Re:S-to-T in hospitals by FCP · 2006-01-25 10:56 · Score: 1

Actually, a lot of hospitals use voice recognition systems for dictation. There are usually still medical transcriptionists in the loop, but they have at least a partial text to work from instead of just doing straight-ahead transcription. I have heard some of them claim that working from partial text is slower than just doing what they used to do, because the interfaces provided for editing the text are slower than their blinding typing speeds.

--
.plan: file not found

Re:Sorry to disagree. by dunkelfalke · 2006-01-25 02:30 · Score: 1

slavic languages are also indoeuropean.

--
Conservatism: The fear that somewhere, somehow, someone you think is your inferior is being treated as your equal.

Re:learn the langage ? by yoprst · 2006-01-25 02:34 · Score: 1

Tap all arabic/international lines, install zillions speech recognition nodes, make them write everyting to log files and use grep to find whatever you want. Your Arabic may be a hundred times better, but you cannot do anything like that even if you hire a whole Lebanon to help you.

Fantastic direction by Simonetta · 2006-01-25 02:35 · Score: 1

This is a fantastic development. It is exactly the kind of thing that 64-bit processors were made for. It is the 'killer ap', the best since MP3 and CD-rippers. If it actually works, the high-tech equivalent of 'in-shaa Allah'.

We should encourage IBM to allow enough of the technology to 'escape' in order to enable other languages to be translated from speech into English. There should be some kind of open review of the translation involved, also. This can help prevent subtle errors in translation that will arise. Hopefully we can catch these before they get widespread.

Perhaps we should also remember the ancient parable of the Tower of Babel. This is a story from about 3000 years ago where a united monolingual people tried to pool all of their resources and build a tower to reach God. God, not wishing to have so many freeloaders and boors hanging around eating his food, drinking his liquor, dipping into his stash, and impregnating his angels, cast an environmental change over all the people that split them into many, many groups that spoke mutually incomprehensible languages. Perhaps this is an ancient folk explanation of how different languages came to be; perhaps it is a veiled warning about the consequences that can arise from having everyone speaking the same language.

In any event, kudos to IBM. Keep up the good work.

Re:Fantastic direction by RicktheBrick · 2006-01-25 05:52 · Score: 1

Can it distinguish between someone speaking to you and the television/radio that is playing in the background? I want speech and noise recognition so that the computer can recognize a plea for help over the noise of the furnace, refrigerator or any other noise generator in the house. We should be able to put speakers/microphones in every room of the house and the computer should be able to detect any problem in that house. For instance fire, any leak of either water or gas or break-in. The computer should communicate with the owner and just by getting yes or no answers be able to ensure that the owner is doing well.

Live experiment with Dragon 8 by bdwoolman · 2006-01-25 02:47 · Score: 4, Funny

Here we go:

I can wreck a nice beach. I can recognize speech.

Well, Dragon Systems eight passed the beach test first try. Knowing the program, however, I did use pretty clear diction.

I use Dragon Systems and find it absolutely great. There are a few persistent errors. For example, It frequently fails to get "there" and " there" right on the first try. But the fly down menu system enables me to quickly correct the problem on the run. Certainly I pick it up on an edit. If IBM has something better than this -- and it sounds like they do -- then it must be pretty darn good. Of course, you have to insert the punctuation verbally. But that comes with a little practice -- provided that you know what to do in the first place.

It does take a little bit of investment in time. But not nearly as much as learning to type at seventy words a minute, which I can now do in dictation. I have added very little by way of customized commands etc. The program has done a lot of learning on its own.

Let's try once again: I can't recognize beach. I can recognize speech. Oops. Okay, it failed that time. Let's try one more time: I can wreck a nice beach. I can recognize speech. Well, the phrases have to be enunciated pretty clearly or the program has trouble.

Which which blew the blue candle. Failed on the second "which" the b*tch.

Okay, okay. I'll put the laundry in the dryer. No I am not just screwing around on Slashdot again I'm getting some work done down here. Just a minute. Just a MINUTE.

One trouble. You do have to put the mike to sleep during family discussions.

--
"No fear. No envy. No meanness." Liam Clancy

Re:Live experiment with Dragon 8 by jvkjvk · 2006-01-25 07:25 · Score: 1

I admit that I haven't used 8 but I tried 7 when it came out after a five year hiatus motivated by disgust at the absolutely horrible accuracy of speech recognition systems at the time.

I am on another hiatus. I found Dragon 7 unusable for work or home use. For one thing, it was slooow. Admittedly, I only had a dual Xeon 2.0 Ghz with a couple gigs of RAM to play with - but still. For another, even after scanning my hundreds of megs of sent mail and documents and multiple hours of training it still couldn't figure out what I was saying a quarter of the time.

I found that I was spending more time trying to get it to work properly and fixing errors than I would have spent just typing the documents. That's unworkable. I can't attribute the results to not spending enough time with the software, either, as I banged my head against it for a month before going back to the keyboard. It's a shame, as I would much rather create at least the rough draft verbally.

WoT by foo+fighter · 2006-01-25 03:10 · Score: 1

...perpetually monitors Arabic television...

Sounds like the results of a DOD/DARPA/NSA funded research grant. They'd love to be able to translate on the fly, instead of having to train and pay actual humans to manually translate several hours -- or even days and weeks -- after the original transmission.

Now that IBM has something kinda working and the grant money is running out they are trying to market it to the public. Kinda like Tang for the War on Terror-age.

--
obviously no deficiencies vs. no obvious deficiencies

Re:WoT by otis+wildflower · 2006-01-25 03:29 · Score: 1

Kinda like Tang for the War on Terror-age.

What's so funny about Tang? Growing boys need Tang!

I know what you mean by killmenow · 2006-01-25 03:14 · Score: 1

No matter how hard I try, TTS always sounds horrible. Just that same robotic, metallic voice saying "Would you like to play a game?"

'Twas Brillig by engineerofsorts · 2006-01-25 03:21 · Score: 1

I've always found it most entertaining to check the effects reciting Lewis Carroll's Jabberwocky has on any new/exciting speech reco program.

On a more serious note, however, my wife was involved in an ill-fated-due-to-ancient-technology project back in grad school in the early 70's which involved:

1. Speech recognition.
2. Machine translation into a universal grammar
3. Translation of the universal grammer into various target languages.
4. Speech synthesis in the various target languages, using the same vocal qualities as the original speaker.

Pretty lofty goals cosidering they were probably using computers with discrete components in them.

Curiously, my wife (a native Japanese speaker) was teamed with the Suomi (Finnish) team because of the similarities in the two language's structures.

--
Life is tough. Life is even tougher when you're stupid.

Re:learn the langage ? by 9Nails · 2006-01-25 03:38 · Score: 1

Quote: Tap all arabic/international lines, install zillions speech recognition nodes, make them write everyting to log files and use grep to find whatever you want. Your Arabic may be a hundred times better, but you cannot do anything like that even if you hire a whole Lebanon to help you.

That sounds like a job we should sub-contract to India! I'm sure it would be much cheaper, and the results would be equally hilarious.

Mod parent up! by Spy+der+Mann · 2006-01-25 04:15 · Score: 1

This is the *PERFECT* use for a technology like this! :)

what about... by blue_adept · 2006-01-25 04:18 · Score: 1

"boy, I sure hope my stupid radio.. doesn't... uh... play 92.3"

vs,

"Does your radio suck? boy I sure hope my stupid radio doesn't. Uh, play 92.3"

--

"Is this just useless, or is it expensive as well?"

breakdown of the article by Anonymous Coward · 2006-01-25 04:32 · Score: 1, Interesting

The article is really saying two things:

1. IBM has updated their ViaVoice large vocabulary continuous speech recognition (LVCSR) engine.

2. IBM has paired ViaVoice with some clever apps to use the ViaVoice output in interesting ways (e.g. "on the fly" recognition, translation).

Things that are not obvious from the article:

1. ViaVoice has been around for ages and has always been pretty darn good at LVCSR. Without seeing numbers and knowing exactly how they were measured, it's impossible to know how much of an improvement 4.4 is over previous versions.

2. Speaker-dependent speech recognition can always achieve much higher accuracy rates than speaker-independent systems like ViaVoice. Dragon NaturallySpeaking is an example of speaker-dependent speech recognition.

3. Limited grammatical contexts (i.e. language models with low perplexity) always give better recognition than when you don't know what to expect next. For example, when your phone only has to tell "home" and "wife" apart, it's a lot less likely to make a mistake than if it has to figure out which word out of a list of 50,000 you just said. The more context, the better. The most interesting tech in the article seems to be the algorithms "that can determine this context on the fly."

4. No improvements in translation technology were noted in the article; it sounds like they might as well have fed ViaVoice through BabelFish, made it happen in real time, and slapped a UI on it. The app might be new, but the tech is not.

Capitalization????? by Khyber · 2006-01-25 04:42 · Score: 1

"I had to help my uncle Jack off a horse."

"I had to help my uncle jack off a horse."

Will it ever catch that one?

--
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.

Re:Capitalization????? by djkuhl · 2006-01-25 07:37 · Score: 1

Could be. In proper context to explain yourself, you'd stress the name stronger than you would the verb. A computer could learn to pick up on that, but I'm still not thinking it could ever understand the differences between there, their, there, and they're within human dialects so easily. Some things would mean a computer would have to have a vast comprehension of breaking a sentence down and calculate the varying importance within each term of a sentence to understand the complete meaning.

Complexity by ratboy666 · 2006-01-25 04:43 · Score: 1

I have been reading up on literacy. One fascinating point is that written language (at least English) is many times richer (in vocabulary and structure) when compared to spoken language. As heard on (for example) television.

If this holds true for other languages, it may be easier to translate spoken material.

Still, I want to see the result.

Ratboy.

--
Just another "Cubible(sic) Joe" 2 17 3061

funded by DARPA by Anonymous Coward · 2006-01-25 04:43 · Score: 1, Informative

IBM does admit it. They thank DARPA
and other DoD groups for their funding
in their research papers. Most of the
current funding for speech-related research
in DoD is run through the GALE project:

Global Autonomous Language Exploitation

http://www.darpa.mil/ipto/programs/gale/index.htm

Salim Roukous of IBM, whom they quote in the
article, is the main player from the IBM
side and IBM is one of the main players in
this project. They were formerly a primary
player in TIDES:

Translingual Information Detection, Extraction and Summarization
http://www.darpa.mil/ipto/programs/tides/index.htm

In fact, that site has the last link I can find on
DARPA's site about TIA (Total Information Awareness),
which is a program formerly run by ex-Admiral Poindexter
(Iran-Contr fame) and shut down by an act of congress
(and erased from DARPA's site as if it never happened):

http://www.darpa.mil/ipto/programs/tides/accomplis hments.htm

These are not classified projects. You can
read about most of the techniques in the proceedings
of conferences such as ACL, ICSLP and Eurospeech.

I helped apple... by xquark · 2006-01-25 04:54 · Score: 1

I helped apple wreck a beach!

--
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.

Re:Just what we need... by faust13 · 2006-01-25 05:15 · Score: 1

Or warrants.

Another scam by Master+of+Transhuman · 2006-01-25 05:25 · Score: 1

This must be the day of the week that scams are announced.

First we have software that cannot be reverse engineered and guarantees the free speech rights of Americans.

It comes attached to the Brooklyn Bridge and some Florida swamp land.

Now we have this crap: "By limiting the domain, the system can make assumptions or inferences about what the user would like to accomplish, he said."

This is not exactly "superhuman" speech recognition.

None of this is feasible absent conceptual processing technology. Period.

I don't know why I don't clean up at the public trough by simply announcing I have "true artificial intelligence" and wait for the checks to roll in before leaving for Brazil.

--
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!

OT: Scots by pipingguy · 2006-01-25 05:32 · Score: 1

"Brothers and sisters are natural enemies. Like Englishmen and Scots. Or Welshmen and Scots. Or Japanese and Scots. Or Scots and other Scots. Damn Scots! They ruined Scotland!"

Sorry.

Unlikely by rcbarnes · 2006-01-25 06:31 · Score: 2, Insightful

Transcription? Not too hard. Translation? I highly doubt it.

Recent studies of the efficacy of machine translation found that we have made only marginal progress by modern engines from those of the *70s*, (in fact, one of them, SysTrans, is the most used translation engine online) and there were *no* descernable difference between engines of the eighties and current engines. I hope that they're not trying to claim that they suddenly overcame the vast problems of translation wholly independent of the linguistic community. That's just ludicrous.

I'd love to see the this engine handle a parasitic sentence like this between two largely different languages and catch the nuance in the parens: "Which report did she file (that report) without (her) reading (that same report)?" Sure some engines will hit by chance, but only because of similar structure, but the engine is lucky, not actually parsing the "meaning."

--
"Fight for lost causes. You may discover they weren't."

Re:Sorry to disagree. by dunkelfalke · 2006-01-25 06:56 · Score: 1

i know, finno-ugric. used to learn estonian (the same language family as finnish) at school years ago

--
Conservatism: The fear that somewhere, somehow, someone you think is your inferior is being treated as your equal.

It all makes sense now! by burndive · 2006-01-25 07:09 · Score: 1

That's why it's pronounced nuk-ya-lar.

--
...because "hacker" sounds way sexier than "code drone."

Re:Just what we need... by wolenczak · 2006-01-25 07:48 · Score: 1

Instead of Arabic they should have started translating from Dubya speech to English

Speech recognition would be easy.... by schlick · 2006-01-25 08:00 · Score: 1

if we all spoke Lojban.

--
"It's because they're stupid, that's why. That's why everybody does everything." -Homer Simpson

What about Dragon NaturallySpeaking? by bigpat · 2006-01-25 08:20 · Score: 1

Not for translation, but for just speach recocgnition? ScanSoft Dragon NaturallySpeaking 8 Preferred is in the top 30 for software sales through amazon, so obviously some people find this useful.

Real great example ... by FCP · 2006-01-25 10:16 · Score: 1

I just love the example[1] the IBM marketroids chose for this: "For example, when asking for 'Radio 104.3 FM,' the new IBM-pioneered technology allows drivers to simply say, 'Tune to 104.3,' or 'Set the radio station to 104.3,' or 'Change the radio station to 104.3.'" Of all the amazing applications one could dream up, saving a driver from having to punch a radio preset is what they came up with.

I rather like "Open the pod bay door, Hal" myself.

--
1. http://www-03.ibm.com/press/us/en/pressrelease/191 50.wss

--
.plan: file not found

What is it they say about that preview button? by bdwoolman · 2006-01-25 12:53 · Score: 1

Anyway, QED for D8s shortcomings... and mine. d:-b

--
"No fear. No envy. No meanness." Liam Clancy

Dragon 8 made a real difference. by bdwoolman · 2006-01-25 13:29 · Score: 1

I was frustrated with earlier versions as well, although was able to use them more than you, but not in my work.. 8 yes. David Pogue of the New York Times has used Dragon in many versions for all his work due to carpel tunnel. He raved about Dragon 8 so I gave it a try. It really worked a lot better out of the box than any of the others. I tried to find that review for you, but was unable to do so. Here is an article he wrote that discusses Dragon 5 in context, which he likes better than Via Voice. But 8 was a watershed. http://www.abilities.com/news-articles.html

It sounds like you were willing to put in the time to get the good of this program. I don't know if they sell an evaluation copy. You might find a boxed used version of DS8 on Ebay since so many do not have the patience you showed to use speech to text. On the other hand IBM has been in this game for decades. Dragon beat them for a while (in my opinion) but this new sofware seems pretty unique. You might want to hold out for that. In any case this tech is maturing. There is hope.

I concur that it is a hog for resources.

--
"No fear. No envy. No meanness." Liam Clancy

Re:learn the langage ? by yoprst · 2006-01-25 15:09 · Score: 1

Can't you imagine how much better than nothing it is?

I don't think we disagree much by Ogemaniac · 2006-01-25 15:47 · Score: 1

No big deal. It's always subject+object+verb...

Actually, word order in Japanese is quite flexible, especially spoken. Subjects are often dropped, or sometimes tagged onto the end of the sentence as an afterthought. That is only part of it, however - not just the sentence, but its constituent parts are backwards. Dependant phrases are often in the opposite order, if/when words come at the end of the phrase rather than the beginning, descriptive clauses before the noun rather than after, words equivalent to "to" or "from" come after rather than before, negations come after the verb rather than before. About the only things that do match English are adjective before noun and subject at the beginning (if it is there at all).

Since when this is a fundamnetal part??? You know, there's the stuff and then there are complements. Just an opinion, I think we're firmly in "complements" terrain now

Semantically, these are the most important parts. Words like "car" "blue" and "to drive" are easy to translate. English articles and prepositions, and Japanese particles, are the words that define the relationships between the nouns, verbs, and adjectives in the sentence. These words are by far the hardest to translate and most difficult for a human to learn to use properly. When I correct papers for my Japanese colleagues, do they mess up terms like "high molecular weight polymer" or "oxygen dissolution"? Nope. They mess up a, an, and the. Same holds for me when I try to speak Japanese. I get the little words wrong and sometimes say something far different from what I intended.

Sorry to make your world a lttle sadder, but this is not only a language problem. It's a cultural one.

Yes, these sentiments are almost impossible to translate. English simply does not have a mechanism for it. The reverse also holds true in many situations.

Translation _is_ difficult because you really don't translate words or phrases (symbols, seen or heard) but semantic concepts (somebody help me here with the right technical word). This means purely simbolic methods (like "which word is equivalent to" or "search and replace") are bound to fail.

I agree. I think the statistical methods that are currently popular for machine translation will never get passed the barely-understandable level. To do that, you have to have context and meaning. Computers are a long way from that point.

Slashdot Mirror

IBM Strives For 'Superhuman' Speech Tech

213 of 289 comments (clear)