IBM Strives For 'Superhuman' Speech Tech
robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."
Which witch blew the blue candle out ?
Fry: heh, Yakov Smirnoff said it
Leela: No he didn't.
...More opportunities for Arabic speaking people to misinterpret western media.
Yes, I know that this is meant to be better speach recognition, but how about on the fly translation?
http://michaelsmith.id.au
From The article "For now, all video processed through Tales is delayed by about four minutes, with an accuracy rate of between 60 and 70 percent" and "The accuracy rate could be increased to 80 percent, Roukos added"
Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.
I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.
quis custodiet ipsos custodes
however the researchers stated "We still can't figure out what Bob Dylan is saying"
Seriously though, this is a great advance in technology, but will it still be as funny to listen to? It's always fun typing in words into speech recognition programs and listening to the unexpected results!
GB on TV: "We have prevailed"
Subtitle: "All your base are belongs to us"
Disclosure: I'm stupid
I cannot wait when I buty the first eBabelfish gadget that I will put in my ear so I can understand spoken language of my russian colegues... ;-) :-) I hope that someobody will not consider it as "important technology for the national security" and will not restrict it by any mean...
(I'm sure that this eBabelfish is already installed - not in my ear - but on the telecommunication centers...)
Well, I've got to get back to work. When I stop rowing, the slave ship just goes in circles.
Will IBM make this technology public or will it be proprietary?
I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.
It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.
This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.
I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.
Hmm, instantaniously translation from arabic, wonder who "cough cough echelon cough!" they are marketing this to.. ?
...they should send it to Glasgow on a saturday night just after the pubs
have closed.
"Ye loooiii ahhh me jimmeh??! *belch* C'mere ya wee electrahnich bastid, I'll
shoo ye!"
ViaVoice was shipped with an older version of Mandrake Linux.
Anyone know where I can get this from?
May be IBM is going to make speech recognition true, but Bill Gates said that this was posible a long time ago. Simply genius.
-= If you fight Dragons long enough, you will become a Dragon =-
They really do it on the fly? You mean, [on the surface of] [a particular] [insect of a Musca domestica species]?
I have read a lot of auto-translated documents and it is always a good laughter in terms of "crapslation cabaret". So far, there is no technology that could auto-translate a text document succesfully. The "80% success" is a myth - they just count how many words were found in the vocabulary, not how many of them were put into a good context. A "fly" translated as an insect would be accounted as a success!
Even if you are not a bot but a human being with some knowledge of the other language and culture, it's very easy to involuntary offend someone or just to make a ridiculous faux-pas. Polish and Czech languages, for example, are very much alike and use common roots for many words, but because of the way both languages evolved, some neutral terms on one side of the border have become offensive on the other side. Czechs evolved an euphemism for sexual intercourse based on the verb "to look for". Poles still use this word when they look for something, which leads to constant crapslation cabaret gags when a Polish tourist appears in a Czech town "looking for a parking lot". Now, auto-translate this...
IBM has been one of the pioneers in speech recognition for a long time. However, indications are that Google (in the lab) has been making tremendous progress in translation. While the two companies are bound to be fierce competitors, it would seem they would both have much to gain from cooperation in the area of language recognition and translation.
As it has been the case for the past thirty years, the description of the prowesses of the system are still written in the conditional form: "...IBM technology can be used to control computers and devices..." rather than the active form: "is being used"...
Ben Shneiderman is the person who, in my opinion, articulates the best the limits of speech recognition.
One of my favorite phrases to explain this issue is: "You don't want to speak to a computer, because you can't speak and think at the same time". More precisely, speech utterance makes use of some modules in our brain which are required for planification too. Hence, you can't plan as well what to do next when you speak, which is a big hurdle in the type of intellectual activities one carries with a computer.
Speech-to-text is cool, but for 30 years they've been predicting it's the next new thing in interfaces, and it's remained a niche thing as it gets better and better. Maybe it'll hit the point where it's flawless and suddenly find new markets, but we'll see.
What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.
This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.
I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.
xkcd.com - a webcomic of mathematics, love, and language.
Serious, you hear how some people "talk" these days?
-William Shatner can be neither created nor destroyed.
The xvoice team have failed to get IBM to recompile newer ViaVoice libraries, or even the same code against a more modern libc, ld.so and gcc environment making it quite hard to keep it working on newer distributions. It's also limited to ia32. They certainly don't seem likely to release the source code.
So I'm surprised to see an announcement like this one.
I realize that Anericans and British (English at least ;o)) speak essentially the same language but I have yet to find any speech recognition software that can get more than roughly 85% of what I say correct. I have a fairly soft neutral english accent with pretty good enunciation so I would have expectd to be getting a recognition rate in the high 90%s. I'm wondering if, as most of this software is developed in the US, it is tuned specifically to pick up on english with a US accent? I realize that you train the software for your voice but AIUI all you are doing is tuning a basic speech model. Has anyone else had this problem or is it just me?
I used to have a better sig but it broke.
I think it was about 1996 or maybe 1997 when I attended an IBM demonstration (for retailers) for its speech recognition software. Anyway, the lady who was narrating the text and. talking. like. a. robot. to. do. it. was half-way through when, for no apparent reason, the word uterus appeared in the text.
So I'm sitting here thinking of how funny it was to the juvenile me back then, and how unfunny it seems right now. Oh well.
Some other articles about this technology.
& ncl=http://www.pcmag.com/article2/0,1895,1915071,0 0.asp
http://news.google.com/news?hl=en&ned=us&ie=UTF-8
It's been well-known among language researchers that both speech recognition and parsing/comprehension are much easier when applied to a small problem domain. SRI in Palo Alto and CSLI at Stanford, for example, have a number of very impressive speech recognition packages that understand, for example, medicine-related sentences. The dashboard controls just sound like a logical progression of this to faster computers and an even smaller problem domain. They're cool nonetheless.
The translation, on the other hand, sounds damned impressive. For unrestricted content, especially with an untrained voice (I imagine that IBM isn't individually training to each Al Jazeera talking head), 70% recognition sounds quite good. 70% accuracy post-translation ought to be quite a bit better than what's currently out there. The description of MASTOR, however, is useless -- it could easily describe anything that isn't word-for-word translation.
It is as closer to English as any other language. In general, European languages have the same basics as English (such as "the") and are fairly easy to learn and translate. Right now I live in Japan, where the language and its underlying way of thinking basically run in the reverse direction of English. To translate, you are essentially running the whole thing backwards. Worse yet, the fundamental parts of the language are quite different. For example, Japanese does not have articles or prepositions, though it has post-positions that roughly correspond. However, there are fewer of them, so they have "lots of meanings" when translated into English. Translation can be a "#$#, even for a human who understands both languages very well (which is why anime comes off so corny sometimes). There are countless times where there is just no simple way to express a thought in one language that is trivial in the other.
and it is usually extremely difficult to translate jokes. The senses of humor are quite different as well. I think this is part of the charm of anime, actually - we are laughing at things Japanese aren't always intended to find funny, while missing half of the jokes that are supposed to be there.
go on, take a wild guess.
Speech recognition has long been the land of inflated promises and little returns. Anyone remember Lernout & Hauspie and its supposed 15 minutes learning time?
Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.
So don't rush to buy. Let the labs check it out first.
Pronounce any word ending with 'ing' as if it ended in 'in'.
That should boost your accuracy.
This is one of those things that won't be possible with trusted computing. With encrypted audio+video streams for everything, all these cool technologies won't be able to be made. Hopefully, someone makes a program like this which goes mainstream - that ought to educate people about trusted computing as soon as they try to sneak it in.
it does what the current generation of speech recognition claims to do. I have yet to find any dictation software that is even remotely accurate, and the voice command software has been pap, at least for me. There is something about my accent that really upsets speech recogntion software.
Nintendogs: I've stopped trying to train my dog, its never going to happen.
Apple Speech: Only works if I use a terrible californian accent. Not worth the embarresment.
Nokia: Even with just one voice command, my girlfriends name, if still can't match my voice.
If this can translate foreign languages in to American (sic) then it definately sounds like it could stand a chance at translating English into text and command.
Scared of flying, pointy things snce 1979!
so, can it do a bangalore accent, maybe it can call itself for support when gets into difficulties. but then if it's real time will the onscreen subtitles just say "your call is important to us". ouch, all those poor call centers
why ? those c**ksuckers in US governement agencies can't learn arabic ?
He's dead, you insensitive clod!
Huh, kAKO TO TRANSLATE?
I've actually never used any speech recognition software before today. That said, today just happens to be the day. That said, I tried out Dragon NaturallySpeaking for the first time, and it is a complete coincidence that this topic should come up. I'm actually dictating this post with Dragon, as we speak. ha ha
the training process definitely has its ups and downs. The more you work with it however, the more it becomes attenuated to your own speech patterns and moreover, the quirky words we use every day. If you can get past the first two or three hours, you'll see that it is totally worth the effort, especially if this IBM tech isn't available to end-users for some time. There is also an aspect of the software training you, while you train the software. At the present time, I can dictate to slightly slower than I can probably type.
In the end, I can see where this would make a writing e-mails and other such time-consuming tasks, which involve spellchecking, grammar, and other proof reading significantly quicker. When you really hit your stride, it's easy to write at the speed of thought, which is really appealing. There are caveats, however. it's very easy to dictate several sentences worth of tax and taken for granted that it to everything down the way you attendedselect tax select select tax undo
... how critical people have been in their replies 'till now. I mean sure there are bound to be problems with this tech, but I think what's really interesting is the implications of a mostly succesful on-the-fly translation, - babblefish anyone... Supposedly with fast enough computers and advanced enough programs - imagine being able to commicate with everyone in the whole **cking world.... This would have enormous consequences for everything... humanity unite - (or problably bloody warfare ...). It might be true that this would problably remove some peoples motivation for learning other languages... but if look at the world today, there are quite a lot of bi-lingual people, but how many tri-lingual and in extreme consequence of this tech - 500-lingual.... You could potentially communicate with bloody QuEthc-indians..... This is what I think is the real issue here - not that some subtitles might miss a joke....
In Soviet Russia my signature is reading YOU
Don't be snarky, dude; they've got bills to pay!
Although most of the discussion so far has focused on foreign language translation, this technology is about *real-time-audio-to-text* conversion. The feds will be able to monitor, analyze, and record our conversations in real time:
Monitor all conversation.
Apply real-time text filters.
Assign live agents to priority eavesdropping.
Profit!
If you could apply a filter to listen in to any call what would it be?
Be heard || Be herd
for humans, you can do it for machines.
I, for one, well come R knew cybernetic mast er, HAL 9000.
PS, whatever you do, don't open the pod bay door!
We can figure out just what the hell Ozzy Osbourne is saying!
He who knows best knows how little he knows. - Thomas Jefferson
Someone is going to be heavily depressed if they try that _superhuman_ stuff on an EMINEM song. Or maybe they would conclude that EMINEM songs are way too much better than superhuman speech?
For a human, the issue is that you can't interpret based on the phrase, so a human interpreter has quite a lot to do. The interesting thing is that experienced interpreters do this unconsciously.
I have been an admiring user of interpreters for many years now and one handled English/Japanese/Russian.
See my journal, I write things there
I was in Kuwait and watched arab TV with english subtitles, it was enlightening to say the least. One long tribute to racism paid for by the Amir of Quatar. Only on arab TV will you see such trash as "the jews are descended from pigs".
so the definitive test of this technology... Bob Dylan.
One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles.
10 PRINT "DEATH TO AMERICA";
20 GOTO 10
RUN
But not to the extent of Japanese. I lived in Austria for a summer, and after just three months, with no prior study, I started "getting" it sometimes. On the other hand, with 2.5 years of university study and ten months of living in Japan, I often hard time following the logic of a long sentence - even when written and when I know all of the words.
Generally, it is estimated that it takes an English speaker about twice as long to learn a languages from the Asian or Arabian groups as it does a European language.
Terrorists have learned to subvert advanced intelligence gathering mechanisms by knocking back a few drinks prior to electronic communication and slurring slightly more than appropriate.
Is it really that hard to understand Chris or George Reeves saying "Up, up and awaaaayyy!"?
So I think there should be a program to resynthesize the "learned" words into the most exact average of any given way to say it. I'd love to hear the results, that would be fascinating.
I hold very few opinions. I hold information based on observation and fact. If you wish to disagree, please use facts.
ViaVoice Embedded, the product that they're releasing, works on limited-domain problems: for example, tasks related to control of your car's peripherals. When the vocabulary and grammars are constrained it's possible to acheive very decent accuracy.
Dictation, however, is a completely different problem. There are far fewer constraints on what can be said, and the system makes errors as it picks through the possible choices. As a result, most dictation software requires training: the system will use your voice to train its recognition models to improve its word selection. Dictation systems also ask for samples of your documents to train its language models on how you put words together; that also helps determine the probabiity of proper word choice. (Example of how you put words together: "Peanut butter sandwich" is a much more likely choice than "peanut butter sand," and will get a higher score.)
The IBM announcement is about embedded, task-oriented speech recognition. It's not "superhuman," according to the article's text and ignoring its headline. I'll have an opportunity to see it in action next week at SpeechTek West. Expect to see other product announcements about speech technology in the next few days as the conference approaches.
As for the TV translation software, it's still in the research stage according to the article. I've seen BBN's version of this software, and frankly it's amazing how good real-time translation can be.
Bell Canada deployed Emily a few years back, and the results to date have been excellent. A top-level question of "How can I help you?" replaces several layers of DTMF auto-attendant complexity.
If you're interested in trying speech recognition and text-to-speech out for yourself, you can use Voxeo's servers, program in VoiceXML, and my Voice Conference Manager app as a starting point (yeah, VCM needs a new release, and it's getting one soon).
Well, sincerely, not very sorry. ;-P
>> [German]... is as closer to English as any other language.
No, can't agree: English is seen as a Germanic language, almost as close to it as, e.g., Portuguese and Spanish (their several forms excluding Basque), but not so close as, say, Dutch and German.
>> In general, European languages have the same basics as English (such as "the") and are fairly easy to learn and translate.
Ok, agree, as most are Indo-European (excluding notable exceptions like Finnish and Basque and maybe disregarding Slavic branches).
>> Right now I live in Japan, where the language and its underlying way of thinking basically run in the reverse direction of English.
No big deal. It's always subject+object+verb with only the order changing (notable exceptions were American native tribes, which seemingly used only verbs) . We should always wait for the end of a phrase to understand the whole thing anyway.
>> To translate, you are essentially running the whole thing backwards.
Yep, as the French put it: "C'est la vie" (that's "So what?" without the attitude thing).
>> Worse yet, the fundamental parts of the language are quite different.
Whoa, whoa, whoa... waiddaminute here!
>> For example, Japanese does not have articles or prepositions, though it has post-positions that roughly correspond.
Since when this is a fundamnetal part??? You know, there's the stuff and then there are complements. Just an opinion, I think we're firmly in "complements" terrain now.
>> However, there are fewer of them, so they have "lots of meanings" when translated into English. Translation can be a "#$#, even for a human who understands both languages very well (which is why anime comes off so corny sometimes).
Yes, some changes in meaning are important in the overall picture, which more or less prove your point. Or, in other words, spices are important, not just the main stuff.
>> There are countless times where there is just no simple way to express a thought in one language that is trivial in the other.
Sorry to make your world a lttle sadder, but this is not only a language problem. It's a cultural one. For instance, Japanese many times associates "respect" particles (like in nissan/nechan/onechan). You catch yourself imparting meanings you might not in other languages. My native language (Portuguese), for instance, has a diminutive suffix -- -inha (female) or -inho(female) -- which sometimes means "little", sometimes express affection, sometimes despise, sometimes an appraisal... and it goes on...
Translation _is_ difficult because you really don't translate words or phrases (symbols, seen or heard) but semantic concepts (somebody help me here with the right technical word).
This means purely simbolic methods (like "which word is equivalent to" or "search and replace") are bound to fail.
...our speech-enabled Web browsers for mobile devices and set top boxes. More info on them here: http://ibm.com/pvc/multimodal
;-)
Not only do they allow you to navigate by voice, but using X+V (a blend of XHTML and VoiceXML), you could have fully speech-enabled Web apps. Example: "show me nearby sushi restaurants" or "movie schedules in my area".
We also released our Multimodal Tools Project for Eclipse a couple weeks ago: http://alphaworks.ibm.com/tech/mmtp
Go ahead and play.
When and if it can translate poems from language to language, while keeping the style, the nuances, the rythm, the cultural references, the general idea and the details, then we will know - it is done. Until then, don't hold your breath.
You can't handle the truth.
What a boon this will be to those anime fansub groups who can't find decent translators, or at least translators who aren't overworked.
By the taping of my glasses, something geeky this way passes
How much do you want to bet that IBM sells this technology to the Middle East? So the terrorists won't have to learn English, they can get all our news real time, on the fly. The potential is that this will help in the war against terror. Allowing the other side to have it, would make it almost a moot point. It's just like Microsoft, google, et al, enabling China's human rights abuses because they want to make a buck.
I've been hearing this every 6 months for about the last, oh, thiry years.
Given that the state of the art in something much simpler, like automatic language translation, is pitifully inadequate, how likely is it IBM has conquered speech recognition AND translation?
Har har har.
this of course worries secretaries, since they might eventually lose their job/"career". on the other hand it would improve effeciency *a lot*.
There's nothing too profound behind this sig.
My Dixie Wrecked
Its stuff like this that really makes you wonder what life will be like in another 50 years. Even 15 years, for that matter. Gotta love computers.
Maybe he is an Arabic speaker who misinterpreted Slashdot without IBM's help.
This is a fantastic development. It is exactly the kind of thing that 64-bit processors were made for. It is the 'killer ap', the best since MP3 and CD-rippers. If it actually works, the high-tech equivalent of 'in-shaa Allah'.
We should encourage IBM to allow enough of the technology to 'escape' in order to enable other languages to be translated from speech into English. There should be some kind of open review of the translation involved, also. This can help prevent subtle errors in translation that will arise. Hopefully we can catch these before they get widespread.
Perhaps we should also remember the ancient parable of the Tower of Babel. This is a story from about 3000 years ago where a united monolingual people tried to pool all of their resources and build a tower to reach God. God, not wishing to have so many freeloaders and boors hanging around eating his food, drinking his liquor, dipping into his stash, and impregnating his angels, cast an environmental change over all the people that split them into many, many groups that spoke mutually incomprehensible languages. Perhaps this is an ancient folk explanation of how different languages came to be; perhaps it is a veiled warning about the consequences that can arise from having everyone speaking the same language.
In any event, kudos to IBM. Keep up the good work.
This is old news, check this article from 2000.s html
http://www.satirewire.com/news/0009/satire-voice.
(yes, it's a classic)
I can wreck a nice beach. I can recognize speech.
Well, Dragon Systems eight passed the beach test first try. Knowing the program, however, I did use pretty clear diction.
I use Dragon Systems and find it absolutely great. There are a few persistent errors. For example, It frequently fails to get "there" and " there" right on the first try. But the fly down menu system enables me to quickly correct the problem on the run. Certainly I pick it up on an edit. If IBM has something better than this -- and it sounds like they do -- then it must be pretty darn good. Of course, you have to insert the punctuation verbally. But that comes with a little practice -- provided that you know what to do in the first place.
It does take a little bit of investment in time. But not nearly as much as learning to type at seventy words a minute, which I can now do in dictation. I have added very little by way of customized commands etc. The program has done a lot of learning on its own.
Let's try once again: I can't recognize beach. I can recognize speech. Oops. Okay, it failed that time. Let's try one more time: I can wreck a nice beach. I can recognize speech. Well, the phrases have to be enunciated pretty clearly or the program has trouble.
Which which blew the blue candle. Failed on the second "which" the b*tch.
Okay, okay. I'll put the laundry in the dryer. No I am not just screwing around on Slashdot again I'm getting some work done down here. Just a minute. Just a MINUTE.
One trouble. You do have to put the mike to sleep during family discussions.
"No fear. No envy. No meanness." Liam Clancy
My LG 8100 has an excellent speech recognition capability. Say the person's name and the phone will dial. It is correct over 85% of the time. Impressive technology. Even a name like Zecchini is recognized.
...perpetually monitors Arabic television...
Sounds like the results of a DOD/DARPA/NSA funded research grant. They'd love to be able to translate on the fly, instead of having to train and pay actual humans to manually translate several hours -- or even days and weeks -- after the original transmission.
Now that IBM has something kinda working and the grant money is running out they are trying to market it to the public. Kinda like Tang for the War on Terror-age.
obviously no deficiencies vs. no obvious deficiencies
I, for one, welcome our 'Superhuman' Speech Tech equipped arabic terrorist overlords who will create a havoc in-flight(by asking "Who's your daddy" to flight attendances), Scream out "your mum" to the public from parks, and much more, only possible with speech recognition system "that can evencomprehend the nuances of spoken words"TM
No matter how hard I try, TTS always sounds horrible. Just that same robotic, metallic voice saying "Would you like to play a game?"
I've always found it most entertaining to check the effects reciting Lewis Carroll's Jabberwocky has on any new/exciting speech reco program.
On a more serious note, however, my wife was involved in an ill-fated-due-to-ancient-technology project back in grad school in the early 70's which involved:
1. Speech recognition.
2. Machine translation into a universal grammar
3. Translation of the universal grammer into various target languages.
4. Speech synthesis in the various target languages, using the same vocal qualities as the original speaker.
Pretty lofty goals cosidering they were probably using computers with discrete components in them.
Curiously, my wife (a native Japanese speaker) was teamed with the Suomi (Finnish) team because of the similarities in the two language's structures.
Life is tough. Life is even tougher when you're stupid.
My father used to have an older version of ViaVoice. My god, that thing was awful. Just for fun, I said "Beat up Martin" to it and it typed back "Be a martian." Well, better than "Eat up Martha."
A wise man once said, "wtf h4x."
This is the *PERFECT* use for a technology like this! :)
"boy, I sure hope my stupid radio.. doesn't... uh... play 92.3"
vs,
"Does your radio suck? boy I sure hope my stupid radio doesn't. Uh, play 92.3"
"Is this just useless, or is it expensive as well?"
The article is really saying two things:
1. IBM has updated their ViaVoice large vocabulary continuous speech recognition (LVCSR) engine.
2. IBM has paired ViaVoice with some clever apps to use the ViaVoice output in interesting ways (e.g. "on the fly" recognition, translation).
Things that are not obvious from the article:
1. ViaVoice has been around for ages and has always been pretty darn good at LVCSR. Without seeing numbers and knowing exactly how they were measured, it's impossible to know how much of an improvement 4.4 is over previous versions.
2. Speaker-dependent speech recognition can always achieve much higher accuracy rates than speaker-independent systems like ViaVoice. Dragon NaturallySpeaking is an example of speaker-dependent speech recognition.
3. Limited grammatical contexts (i.e. language models with low perplexity) always give better recognition than when you don't know what to expect next. For example, when your phone only has to tell "home" and "wife" apart, it's a lot less likely to make a mistake than if it has to figure out which word out of a list of 50,000 you just said. The more context, the better. The most interesting tech in the article seems to be the algorithms "that can determine this context on the fly."
4. No improvements in translation technology were noted in the article; it sounds like they might as well have fed ViaVoice through BabelFish, made it happen in real time, and slapped a UI on it. The app might be new, but the tech is not.
"I had to help my uncle Jack off a horse."
"I had to help my uncle jack off a horse."
Will it ever catch that one?
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
I have been reading up on literacy. One fascinating point is that written language (at least English) is many times richer (in vocabulary and structure) when compared to spoken language. As heard on (for example) television.
If this holds true for other languages, it may be easier to translate spoken material.
Still, I want to see the result.
Ratboy.
Just another "Cubible(sic) Joe" 2 17 3061
IBM does admit it. They thank DARPA
m
s hments.htm
and other DoD groups for their funding
in their research papers. Most of the
current funding for speech-related research
in DoD is run through the GALE project:
Global Autonomous Language Exploitation
http://www.darpa.mil/ipto/programs/gale/index.htm
Salim Roukous of IBM, whom they quote in the
article, is the main player from the IBM
side and IBM is one of the main players in
this project. They were formerly a primary
player in TIDES:
Translingual Information Detection, Extraction and Summarization
http://www.darpa.mil/ipto/programs/tides/index.ht
In fact, that site has the last link I can find on
DARPA's site about TIA (Total Information Awareness),
which is a program formerly run by ex-Admiral Poindexter
(Iran-Contr fame) and shut down by an act of congress
(and erased from DARPA's site as if it never happened):
http://www.darpa.mil/ipto/programs/tides/accompli
These are not classified projects. You can
read about most of the techniques in the proceedings
of conferences such as ACL, ICSLP and Eurospeech.
I helped apple wreck a beach!
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
I would be very suspicious of any software making these sorts of claims. The proof is in the pudding, right? When I see an independant review, I might start believing the claims. Another poster was dead on when saying that there are a LOT of over-hyped under-abled products in the speech recognition field. What if some software actually is 90% accurate? What would you think of someone who mis-typed every 10th word? And many products that make 90% or similar claims are doing so under *highly* idealized conditions which do *not* reflect real-life situations, so that the real-life number might be significantly lower.
IMHO, the largest problem in the speech recognition field is the weak recognition that it is SPEECH that is being recognized. Human speech has some interesting qualities which have been studied for years in a field called "Linguistics". Start bringing linguistic axioms to speech recognition and you will eventually come up with something worthwhile.
I was really in doubt about the slavic ones.
This must be the day of the week that scams are announced.
First we have software that cannot be reverse engineered and guarantees the free speech rights of Americans.
It comes attached to the Brooklyn Bridge and some Florida swamp land.
Now we have this crap: "By limiting the domain, the system can make assumptions or inferences about what the user would like to accomplish, he said."
This is not exactly "superhuman" speech recognition.
None of this is feasible absent conceptual processing technology. Period.
I don't know why I don't clean up at the public trough by simply announcing I have "true artificial intelligence" and wait for the checks to roll in before leaving for Brazil.
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
We have been testing out IBM's recognizer against our own (we have been using a Sphinx variant). And so far IBM's results are quite fantastic. Using untuned models against our very noisy data IBM has shown impressive speed and accuracy.
"Brothers and sisters are natural enemies. Like Englishmen and Scots. Or Welshmen and Scots. Or Japanese and Scots. Or Scots and other Scots. Damn Scots! They ruined Scotland!"
Sorry.
So... Why is Nuance continuing to spank IBM so badly in this space, what with your Superhuman Speech Reco and everything?
Transcription? Not too hard. Translation? I highly doubt it.
Recent studies of the efficacy of machine translation found that we have made only marginal progress by modern engines from those of the *70s*, (in fact, one of them, SysTrans, is the most used translation engine online) and there were *no* descernable difference between engines of the eighties and current engines. I hope that they're not trying to claim that they suddenly overcame the vast problems of translation wholly independent of the linguistic community. That's just ludicrous.
I'd love to see the this engine handle a parasitic sentence like this between two largely different languages and catch the nuance in the parens: "Which report did she file (that report) without (her) reading (that same report)?" Sure some engines will hit by chance, but only because of similar structure, but the engine is lucky, not actually parsing the "meaning."
"Fight for lost causes. You may discover they weren't."
Try watching live sports with the captioning on and the sound off. Live captioning is pretty good for certain things, but it sucks for sports (lots of names, fast paced, etc). I'd estimate typical sports captioning at 80%. I think you'll find that you can generally get the gist of what is being said by the color commentator.
That's why it's pronounced nuk-ya-lar.
...because "hacker" sounds way sexier than "code drone."
if we all spoke Lojban.
"It's because they're stupid, that's why. That's why everybody does everything." -Homer Simpson
Not for translation, but for just speach recocgnition? ScanSoft Dragon NaturallySpeaking 8 Preferred is in the top 30 for software sales through amazon, so obviously some people find this useful.
I just love the example[1] the IBM marketroids chose for this: "For example, when asking for 'Radio 104.3 FM,' the new IBM-pioneered technology allows drivers to simply say, 'Tune to 104.3,' or 'Set the radio station to 104.3,' or 'Change the radio station to 104.3.'" Of all the amazing applications one could dream up, saving a driver from having to punch a radio preset is what they came up with.
1 50.wss
I rather like "Open the pod bay door, Hal" myself.
--
1. http://www-03.ibm.com/press/us/en/pressrelease/19
.plan: file not found
Anyway, QED for D8s shortcomings... and mine. d:-b
"No fear. No envy. No meanness." Liam Clancy
It sounds like you were willing to put in the time to get the good of this program. I don't know if they sell an evaluation copy. You might find a boxed used version of DS8 on Ebay since so many do not have the patience you showed to use speech to text. On the other hand IBM has been in this game for decades. Dragon beat them for a while (in my opinion) but this new sofware seems pretty unique. You might want to hold out for that. In any case this tech is maturing. There is hope.
I concur that it is a hog for resources.
"No fear. No envy. No meanness." Liam Clancy
No big deal. It's always subject+object+verb...
Actually, word order in Japanese is quite flexible, especially spoken. Subjects are often dropped, or sometimes tagged onto the end of the sentence as an afterthought. That is only part of it, however - not just the sentence, but its constituent parts are backwards. Dependant phrases are often in the opposite order, if/when words come at the end of the phrase rather than the beginning, descriptive clauses before the noun rather than after, words equivalent to "to" or "from" come after rather than before, negations come after the verb rather than before. About the only things that do match English are adjective before noun and subject at the beginning (if it is there at all).
Since when this is a fundamnetal part??? You know, there's the stuff and then there are complements. Just an opinion, I think we're firmly in "complements" terrain now
Semantically, these are the most important parts. Words like "car" "blue" and "to drive" are easy to translate. English articles and prepositions, and Japanese particles, are the words that define the relationships between the nouns, verbs, and adjectives in the sentence. These words are by far the hardest to translate and most difficult for a human to learn to use properly. When I correct papers for my Japanese colleagues, do they mess up terms like "high molecular weight polymer" or "oxygen dissolution"? Nope. They mess up a, an, and the. Same holds for me when I try to speak Japanese. I get the little words wrong and sometimes say something far different from what I intended.
Sorry to make your world a lttle sadder, but this is not only a language problem. It's a cultural one.
Yes, these sentiments are almost impossible to translate. English simply does not have a mechanism for it. The reverse also holds true in many situations.
Translation _is_ difficult because you really don't translate words or phrases (symbols, seen or heard) but semantic concepts (somebody help me here with the right technical word). This means purely simbolic methods (like "which word is equivalent to" or "search and replace") are bound to fail.
I agree. I think the statistical methods that are currently popular for machine translation will never get passed the barely-understandable level. To do that, you have to have context and meaning. Computers are a long way from that point.