IBM Strives For 'Superhuman' Speech Tech
robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."
From The article "For now, all video processed through Tales is delayed by about four minutes, with an accuracy rate of between 60 and 70 percent" and "The accuracy rate could be increased to 80 percent, Roukos added"
Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.
I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.
quis custodiet ipsos custodes
I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.
It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.
This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.
I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.
Hmm, instantaniously translation from arabic, wonder who "cough cough echelon cough!" they are marketing this to.. ?
More opportunities for Arabic speaking people to misinterpret western media.
I think you've got it the wrong way round haven't you? Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?
Speech-to-text is cool, but for 30 years they've been predicting it's the next new thing in interfaces, and it's remained a niche thing as it gets better and better. Maybe it'll hit the point where it's flawless and suddenly find new markets, but we'll see.
What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.
This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.
I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.
xkcd.com - a webcomic of mathematics, love, and language.
Of course it won't be open source. They achieved what they dub a "breakthrough in speech recognition". They plan on making a lot of money with this.
Speech recognition has long been the land of inflated promises and little returns. Anyone remember Lernout & Hauspie and its supposed 15 minutes learning time?
Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.
So don't rush to buy. Let the labs check it out first.
There's a very good reason they're testing this tech on Arabic speech primarily. Although they won't say it, I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.