IBM Strives For 'Superhuman' Speech Tech
robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."
From The article "For now, all video processed through Tales is delayed by about four minutes, with an accuracy rate of between 60 and 70 percent" and "The accuracy rate could be increased to 80 percent, Roukos added"
Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.
I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.
quis custodiet ipsos custodes
I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.
It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.
This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.
I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.
Hmm, instantaniously translation from arabic, wonder who "cough cough echelon cough!" they are marketing this to.. ?
More opportunities for Arabic speaking people to misinterpret western media.
I think you've got it the wrong way round haven't you? Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?
Speech-to-text is cool, but for 30 years they've been predicting it's the next new thing in interfaces, and it's remained a niche thing as it gets better and better. Maybe it'll hit the point where it's flawless and suddenly find new markets, but we'll see.
What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.
This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.
I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.
xkcd.com - a webcomic of mathematics, love, and language.
Of course it won't be open source. They achieved what they dub a "breakthrough in speech recognition". They plan on making a lot of money with this.
and it is usually extremely difficult to translate jokes. The senses of humor are quite different as well. I think this is part of the charm of anime, actually - we are laughing at things Japanese aren't always intended to find funny, while missing half of the jokes that are supposed to be there.
Speech recognition has long been the land of inflated promises and little returns. Anyone remember Lernout & Hauspie and its supposed 15 minutes learning time?
Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.
So don't rush to buy. Let the labs check it out first.
There's a very good reason they're testing this tech on Arabic speech primarily. Although they won't say it, I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.
Slashdot anagrams to "Sad Sloth"
Just remember that *you* have a truly enormous and well-filled content-addressable memory, a huge and richly-connected semantic network, and untold numbers of self-adapting heuristics that have been trained all day every day for decades, with more coming into production constantly. It's hard for a machine to match that. Feeding 100,000 distinct pattern matchers in parallel is something most computers just aren't architected to do well. That a machine can do even a passable job of speaker-independant continuous speech recognition is an amazing achievement.
:-( We do have titling on some shows, but to compare that to Teletext is like comparing a single couplet to the poetry section of a library.
BTW what Teletext is like in the U.S. is that we don't have it.
Patriotic. What part of "*International* Business Machines" did you not understand? More likely it's to show that they really understand the problem and not just the English-only subset.
Don't hold your breath on that. After spending seven years studying Japanese just to speak it conversationally, I can tell you flat out that there will never be on the fly translations between Japanese and English. Why you ask? Because the languages and cultures behind the languages are so drastically different, you often have to listen to several sentences before you can organize the correct context for words in the other language. Not to mention occasionally having to add material in the translated output to explain why a certain sequence of words means something.
For example, go watch Memiors of a Geisha and note that Chiyo keeps calling Mameha "oneesan" (Oh-Nay-San) which literally and figuratively translates to big sister. They are not related, and it is not an afectionate reference that someone might make in English to an older woman who provides protection and guidance. The term actually holds a special meaning in the Japanese world of Hostessing (both Geisha and less formal such as snack bars) that I would find difficult to even explain in English. Good luck IBM.
I've dirtied my hands writing poetry, for the sake of seduction; that is, for the sake of a useful cause. --Dostoevsky
Transcription? Not too hard. Translation? I highly doubt it.
Recent studies of the efficacy of machine translation found that we have made only marginal progress by modern engines from those of the *70s*, (in fact, one of them, SysTrans, is the most used translation engine online) and there were *no* descernable difference between engines of the eighties and current engines. I hope that they're not trying to claim that they suddenly overcame the vast problems of translation wholly independent of the linguistic community. That's just ludicrous.
I'd love to see the this engine handle a parasitic sentence like this between two largely different languages and catch the nuance in the parens: "Which report did she file (that report) without (her) reading (that same report)?" Sure some engines will hit by chance, but only because of similar structure, but the engine is lucky, not actually parsing the "meaning."
"Fight for lost causes. You may discover they weren't."