IBM Strives For 'Superhuman' Speech Tech
robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."
Which witch blew the blue candle out ?
Fry: heh, Yakov Smirnoff said it
Leela: No he didn't.
yes, there will always be delay for the reason you state. but that's true even with human translators, yet no-one claims real-time meetings between people via translators is a waste of time.
since even "live" boradcasts are usually delayed several minutes for technical and legal reasons anyway, if this technology can get to the state where you're just one or two sentences behind real-life it will be effectively real-time anyway for almost all practical purposes.
I have a friend works in Japan and he tells me the same. He often goes to watch English films that are subtitled in Japanese and tells me that they completely miss-translate most of the jokes and miss subtle nuances of speech. One example he gave was a scene from 'The Full Monty' (im doing this from distant memory so it might not be quite right - in fact, a bad translation :-)
One of the characters is shouting up to someone in their bedroom window. They don't respond to the shouting and the character says "He obviously can't hear me because of his triple glazing".
This is a sarcastic comment relating to the house owners supposed wealth but in Japanese it was translated as:
"He has thick windows"
Perhaps in this case there was no easy way to translate - but I suspect films are probably translated in one pass and there is no time to understand the context of each sentence spoken so it's left to literal translatation only.
As it has been the case for the past thirty years, the description of the prowesses of the system are still written in the conditional form: "...IBM technology can be used to control computers and devices..." rather than the active form: "is being used"...
Ben Shneiderman is the person who, in my opinion, articulates the best the limits of speech recognition.
One of my favorite phrases to explain this issue is: "You don't want to speak to a computer, because you can't speak and think at the same time". More precisely, speech utterance makes use of some modules in our brain which are required for planification too. Hence, you can't plan as well what to do next when you speak, which is a big hurdle in the type of intellectual activities one carries with a computer.
You are going to have that problem whether it's a machine doing the translating or a human. As I understand it, interpreters of German get around this by some quick-thinking restructuring of the translated sentence, or they simply lag a half-sentence or so behind.
The real problem for machine translation is, and always has been, determining the sense of a word from context (indeed I recall a recent Slashdot article about some guy who suggests this is the separating factor between computers and animal intelligence). Most languages have a great many homonyms whose meaning a listener can determine only from the surrounding contenxt and, often, general background knowledge of the language or topic at hand.
I'd be happy enough if humans could do this.
English is easier said than done.