IBM Strives For 'Superhuman' Speech Tech
robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."
They really do it on the fly? You mean, [on the surface of] [a particular] [insect of a Musca domestica species]?
I have read a lot of auto-translated documents and it is always a good laughter in terms of "crapslation cabaret". So far, there is no technology that could auto-translate a text document succesfully. The "80% success" is a myth - they just count how many words were found in the vocabulary, not how many of them were put into a good context. A "fly" translated as an insect would be accounted as a success!
Even if you are not a bot but a human being with some knowledge of the other language and culture, it's very easy to involuntary offend someone or just to make a ridiculous faux-pas. Polish and Czech languages, for example, are very much alike and use common roots for many words, but because of the way both languages evolved, some neutral terms on one side of the border have become offensive on the other side. Czechs evolved an euphemism for sexual intercourse based on the verb "to look for". Poles still use this word when they look for something, which leads to constant crapslation cabaret gags when a Polish tourist appears in a Czech town "looking for a parking lot". Now, auto-translate this...
Not necessarily. An on-the-fly translator could translate words as it hears them filling in the translated words in the correct location in the sentence. In other words, the sentence doesn't have to be completed in order. It can dynamically expand to fit in new words.
If you listen to human translators doing on-the-fly translation you'll see this is how they work.
It is as closer to English as any other language. In general, European languages have the same basics as English (such as "the") and are fairly easy to learn and translate. Right now I live in Japan, where the language and its underlying way of thinking basically run in the reverse direction of English. To translate, you are essentially running the whole thing backwards. Worse yet, the fundamental parts of the language are quite different. For example, Japanese does not have articles or prepositions, though it has post-positions that roughly correspond. However, there are fewer of them, so they have "lots of meanings" when translated into English. Translation can be a "#$#, even for a human who understands both languages very well (which is why anime comes off so corny sometimes). There are countless times where there is just no simple way to express a thought in one language that is trivial in the other.
Then again, if they supplied a version that produced awesome quality voices, they'd be accused of trying to kill their TTS competition.
That said, in Microsoft Windows Vista (ETA 2019), the default TTS engine will be replaced by a new one sporting Anna. Have heard her in the preview and I have to say, it's one hell of an improvement.
Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway. -- Andrew S. Tanenbaum
I was in Kuwait and watched arab TV with english subtitles, it was enlightening to say the least. One long tribute to racism paid for by the Amir of Quatar. Only on arab TV will you see such trash as "the jews are descended from pigs".
ViaVoice Embedded, the product that they're releasing, works on limited-domain problems: for example, tasks related to control of your car's peripherals. When the vocabulary and grammars are constrained it's possible to acheive very decent accuracy.
Dictation, however, is a completely different problem. There are far fewer constraints on what can be said, and the system makes errors as it picks through the possible choices. As a result, most dictation software requires training: the system will use your voice to train its recognition models to improve its word selection. Dictation systems also ask for samples of your documents to train its language models on how you put words together; that also helps determine the probabiity of proper word choice. (Example of how you put words together: "Peanut butter sandwich" is a much more likely choice than "peanut butter sand," and will get a higher score.)
The IBM announcement is about embedded, task-oriented speech recognition. It's not "superhuman," according to the article's text and ignoring its headline. I'll have an opportunity to see it in action next week at SpeechTek West. Expect to see other product announcements about speech technology in the next few days as the conference approaches.
As for the TV translation software, it's still in the research stage according to the article. I've seen BBN's version of this software, and frankly it's amazing how good real-time translation can be.
Bell Canada deployed Emily a few years back, and the results to date have been excellent. A top-level question of "How can I help you?" replaces several layers of DTMF auto-attendant complexity.
If you're interested in trying speech recognition and text-to-speech out for yourself, you can use Voxeo's servers, program in VoiceXML, and my Voice Conference Manager app as a starting point (yeah, VCM needs a new release, and it's getting one soon).
IBM does admit it. They thank DARPA
m
s hments.htm
and other DoD groups for their funding
in their research papers. Most of the
current funding for speech-related research
in DoD is run through the GALE project:
Global Autonomous Language Exploitation
http://www.darpa.mil/ipto/programs/gale/index.htm
Salim Roukous of IBM, whom they quote in the
article, is the main player from the IBM
side and IBM is one of the main players in
this project. They were formerly a primary
player in TIDES:
Translingual Information Detection, Extraction and Summarization
http://www.darpa.mil/ipto/programs/tides/index.ht
In fact, that site has the last link I can find on
DARPA's site about TIA (Total Information Awareness),
which is a program formerly run by ex-Admiral Poindexter
(Iran-Contr fame) and shut down by an act of congress
(and erased from DARPA's site as if it never happened):
http://www.darpa.mil/ipto/programs/tides/accompli
These are not classified projects. You can
read about most of the techniques in the proceedings
of conferences such as ACL, ICSLP and Eurospeech.