IBM Strives For 'Superhuman' Speech Tech
robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."
Which witch blew the blue candle out ?
Fry: heh, Yakov Smirnoff said it
Leela: No he didn't.
From The article "For now, all video processed through Tales is delayed by about four minutes, with an accuracy rate of between 60 and 70 percent" and "The accuracy rate could be increased to 80 percent, Roukos added"
Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.
I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.
quis custodiet ipsos custodes
however the researchers stated "We still can't figure out what Bob Dylan is saying"
Seriously though, this is a great advance in technology, but will it still be as funny to listen to? It's always fun typing in words into speech recognition programs and listening to the unexpected results!
GB on TV: "We have prevailed"
Subtitle: "All your base are belongs to us"
Disclosure: I'm stupid
I cannot wait when I buty the first eBabelfish gadget that I will put in my ear so I can understand spoken language of my russian colegues... ;-) :-) I hope that someobody will not consider it as "important technology for the national security" and will not restrict it by any mean...
(I'm sure that this eBabelfish is already installed - not in my ear - but on the telecommunication centers...)
Well, I've got to get back to work. When I stop rowing, the slave ship just goes in circles.
Will IBM make this technology public or will it be proprietary?
I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.
It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.
This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.
I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.
Hmm, instantaniously translation from arabic, wonder who "cough cough echelon cough!" they are marketing this to.. ?
More opportunities for Arabic speaking people to misinterpret western media.
I think you've got it the wrong way round haven't you? Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?
...they should send it to Glasgow on a saturday night just after the pubs
have closed.
"Ye loooiii ahhh me jimmeh??! *belch* C'mere ya wee electrahnich bastid, I'll
shoo ye!"
ViaVoice was shipped with an older version of Mandrake Linux.
Anyone know where I can get this from?
May be IBM is going to make speech recognition true, but Bill Gates said that this was posible a long time ago. Simply genius.
-= If you fight Dragons long enough, you will become a Dragon =-
They really do it on the fly? You mean, [on the surface of] [a particular] [insect of a Musca domestica species]?
I have read a lot of auto-translated documents and it is always a good laughter in terms of "crapslation cabaret". So far, there is no technology that could auto-translate a text document succesfully. The "80% success" is a myth - they just count how many words were found in the vocabulary, not how many of them were put into a good context. A "fly" translated as an insect would be accounted as a success!
Even if you are not a bot but a human being with some knowledge of the other language and culture, it's very easy to involuntary offend someone or just to make a ridiculous faux-pas. Polish and Czech languages, for example, are very much alike and use common roots for many words, but because of the way both languages evolved, some neutral terms on one side of the border have become offensive on the other side. Czechs evolved an euphemism for sexual intercourse based on the verb "to look for". Poles still use this word when they look for something, which leads to constant crapslation cabaret gags when a Polish tourist appears in a Czech town "looking for a parking lot". Now, auto-translate this...
IBM has been one of the pioneers in speech recognition for a long time. However, indications are that Google (in the lab) has been making tremendous progress in translation. While the two companies are bound to be fierce competitors, it would seem they would both have much to gain from cooperation in the area of language recognition and translation.
As it has been the case for the past thirty years, the description of the prowesses of the system are still written in the conditional form: "...IBM technology can be used to control computers and devices..." rather than the active form: "is being used"...
Ben Shneiderman is the person who, in my opinion, articulates the best the limits of speech recognition.
One of my favorite phrases to explain this issue is: "You don't want to speak to a computer, because you can't speak and think at the same time". More precisely, speech utterance makes use of some modules in our brain which are required for planification too. Hence, you can't plan as well what to do next when you speak, which is a big hurdle in the type of intellectual activities one carries with a computer.
Speech-to-text is cool, but for 30 years they've been predicting it's the next new thing in interfaces, and it's remained a niche thing as it gets better and better. Maybe it'll hit the point where it's flawless and suddenly find new markets, but we'll see.
What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.
This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.
I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.
xkcd.com - a webcomic of mathematics, love, and language.
Serious, you hear how some people "talk" these days?
-William Shatner can be neither created nor destroyed.
Yeah, that too.
http://michaelsmith.id.au
The xvoice team have failed to get IBM to recompile newer ViaVoice libraries, or even the same code against a more modern libc, ld.so and gcc environment making it quite hard to keep it working on newer distributions. It's also limited to ia32. They certainly don't seem likely to release the source code.
So I'm surprised to see an announcement like this one.
I realize that Anericans and British (English at least ;o)) speak essentially the same language but I have yet to find any speech recognition software that can get more than roughly 85% of what I say correct. I have a fairly soft neutral english accent with pretty good enunciation so I would have expectd to be getting a recognition rate in the high 90%s. I'm wondering if, as most of this software is developed in the US, it is tuned specifically to pick up on english with a US accent? I realize that you train the software for your voice but AIUI all you are doing is tuning a basic speech model. Has anyone else had this problem or is it just me?
I used to have a better sig but it broke.
I think it was about 1996 or maybe 1997 when I attended an IBM demonstration (for retailers) for its speech recognition software. Anyway, the lady who was narrating the text and. talking. like. a. robot. to. do. it. was half-way through when, for no apparent reason, the word uterus appeared in the text.
So I'm sitting here thinking of how funny it was to the juvenile me back then, and how unfunny it seems right now. Oh well.
It's been well-known among language researchers that both speech recognition and parsing/comprehension are much easier when applied to a small problem domain. SRI in Palo Alto and CSLI at Stanford, for example, have a number of very impressive speech recognition packages that understand, for example, medicine-related sentences. The dashboard controls just sound like a logical progression of this to faster computers and an even smaller problem domain. They're cool nonetheless.
The translation, on the other hand, sounds damned impressive. For unrestricted content, especially with an untrained voice (I imagine that IBM isn't individually training to each Al Jazeera talking head), 70% recognition sounds quite good. 70% accuracy post-translation ought to be quite a bit better than what's currently out there. The description of MASTOR, however, is useless -- it could easily describe anything that isn't word-for-word translation.
It is as closer to English as any other language. In general, European languages have the same basics as English (such as "the") and are fairly easy to learn and translate. Right now I live in Japan, where the language and its underlying way of thinking basically run in the reverse direction of English. To translate, you are essentially running the whole thing backwards. Worse yet, the fundamental parts of the language are quite different. For example, Japanese does not have articles or prepositions, though it has post-positions that roughly correspond. However, there are fewer of them, so they have "lots of meanings" when translated into English. Translation can be a "#$#, even for a human who understands both languages very well (which is why anime comes off so corny sometimes). There are countless times where there is just no simple way to express a thought in one language that is trivial in the other.
and it is usually extremely difficult to translate jokes. The senses of humor are quite different as well. I think this is part of the charm of anime, actually - we are laughing at things Japanese aren't always intended to find funny, while missing half of the jokes that are supposed to be there.
Speech recognition has long been the land of inflated promises and little returns. Anyone remember Lernout & Hauspie and its supposed 15 minutes learning time?
Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.
So don't rush to buy. Let the labs check it out first.
There's a very good reason they're testing this tech on Arabic speech primarily. Although they won't say it, I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.
This is one of those things that won't be possible with trusted computing. With encrypted audio+video streams for everything, all these cool technologies won't be able to be made. Hopefully, someone makes a program like this which goes mainstream - that ought to educate people about trusted computing as soon as they try to sneak it in.
it does what the current generation of speech recognition claims to do. I have yet to find any dictation software that is even remotely accurate, and the voice command software has been pap, at least for me. There is something about my accent that really upsets speech recogntion software.
Nintendogs: I've stopped trying to train my dog, its never going to happen.
Apple Speech: Only works if I use a terrible californian accent. Not worth the embarresment.
Nokia: Even with just one voice command, my girlfriends name, if still can't match my voice.
If this can translate foreign languages in to American (sic) then it definately sounds like it could stand a chance at translating English into text and command.
Scared of flying, pointy things snce 1979!
I've actually never used any speech recognition software before today. That said, today just happens to be the day. That said, I tried out Dragon NaturallySpeaking for the first time, and it is a complete coincidence that this topic should come up. I'm actually dictating this post with Dragon, as we speak. ha ha
the training process definitely has its ups and downs. The more you work with it however, the more it becomes attenuated to your own speech patterns and moreover, the quirky words we use every day. If you can get past the first two or three hours, you'll see that it is totally worth the effort, especially if this IBM tech isn't available to end-users for some time. There is also an aspect of the software training you, while you train the software. At the present time, I can dictate to slightly slower than I can probably type.
In the end, I can see where this would make a writing e-mails and other such time-consuming tasks, which involve spellchecking, grammar, and other proof reading significantly quicker. When you really hit your stride, it's easy to write at the speed of thought, which is really appealing. There are caveats, however. it's very easy to dictate several sentences worth of tax and taken for granted that it to everything down the way you attendedselect tax select select tax undo
I gave up on speech recognition as everything but a toy a while ago but your tip could lead to some interesting mistakes. Take for instance the sentence fragment "Runing to the door". If it is pronounced as you suggest it could easliy be misunderstood by the machine to be "run in to the door" which could have nasty consequences.
I used to have a better sig but it broke.
... how critical people have been in their replies 'till now. I mean sure there are bound to be problems with this tech, but I think what's really interesting is the implications of a mostly succesful on-the-fly translation, - babblefish anyone... Supposedly with fast enough computers and advanced enough programs - imagine being able to commicate with everyone in the whole **cking world.... This would have enormous consequences for everything... humanity unite - (or problably bloody warfare ...). It might be true that this would problably remove some peoples motivation for learning other languages... but if look at the world today, there are quite a lot of bi-lingual people, but how many tri-lingual and in extreme consequence of this tech - 500-lingual.... You could potentially communicate with bloody QuEthc-indians..... This is what I think is the real issue here - not that some subtitles might miss a joke....
In Soviet Russia my signature is reading YOU
Although most of the discussion so far has focused on foreign language translation, this technology is about *real-time-audio-to-text* conversion. The feds will be able to monitor, analyze, and record our conversations in real time:
Monitor all conversation.
Apply real-time text filters.
Assign live agents to priority eavesdropping.
Profit!
If you could apply a filter to listen in to any call what would it be?
Be heard || Be herd
We can figure out just what the hell Ozzy Osbourne is saying!
He who knows best knows how little he knows. - Thomas Jefferson
For a human, the issue is that you can't interpret based on the phrase, so a human interpreter has quite a lot to do. The interesting thing is that experienced interpreters do this unconsciously.
I have been an admiring user of interpreters for many years now and one handled English/Japanese/Russian.
See my journal, I write things there
I was in Kuwait and watched arab TV with english subtitles, it was enlightening to say the least. One long tribute to racism paid for by the Amir of Quatar. Only on arab TV will you see such trash as "the jews are descended from pigs".
One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles.
10 PRINT "DEATH TO AMERICA";
20 GOTO 10
RUN
But not to the extent of Japanese. I lived in Austria for a summer, and after just three months, with no prior study, I started "getting" it sometimes. On the other hand, with 2.5 years of university study and ten months of living in Japan, I often hard time following the logic of a long sentence - even when written and when I know all of the words.
Generally, it is estimated that it takes an English speaker about twice as long to learn a languages from the Asian or Arabian groups as it does a European language.
Is it really that hard to understand Chris or George Reeves saying "Up, up and awaaaayyy!"?
So I think there should be a program to resynthesize the "learned" words into the most exact average of any given way to say it. I'd love to hear the results, that would be fascinating.
I hold very few opinions. I hold information based on observation and fact. If you wish to disagree, please use facts.
You assume that this sort of thing hasn't been going on for many years now.
ViaVoice Embedded, the product that they're releasing, works on limited-domain problems: for example, tasks related to control of your car's peripherals. When the vocabulary and grammars are constrained it's possible to acheive very decent accuracy.
Dictation, however, is a completely different problem. There are far fewer constraints on what can be said, and the system makes errors as it picks through the possible choices. As a result, most dictation software requires training: the system will use your voice to train its recognition models to improve its word selection. Dictation systems also ask for samples of your documents to train its language models on how you put words together; that also helps determine the probabiity of proper word choice. (Example of how you put words together: "Peanut butter sandwich" is a much more likely choice than "peanut butter sand," and will get a higher score.)
The IBM announcement is about embedded, task-oriented speech recognition. It's not "superhuman," according to the article's text and ignoring its headline. I'll have an opportunity to see it in action next week at SpeechTek West. Expect to see other product announcements about speech technology in the next few days as the conference approaches.
As for the TV translation software, it's still in the research stage according to the article. I've seen BBN's version of this software, and frankly it's amazing how good real-time translation can be.
Bell Canada deployed Emily a few years back, and the results to date have been excellent. A top-level question of "How can I help you?" replaces several layers of DTMF auto-attendant complexity.
If you're interested in trying speech recognition and text-to-speech out for yourself, you can use Voxeo's servers, program in VoiceXML, and my Voice Conference Manager app as a starting point (yeah, VCM needs a new release, and it's getting one soon).
...our speech-enabled Web browsers for mobile devices and set top boxes. More info on them here: http://ibm.com/pvc/multimodal
;-)
Not only do they allow you to navigate by voice, but using X+V (a blend of XHTML and VoiceXML), you could have fully speech-enabled Web apps. Example: "show me nearby sushi restaurants" or "movie schedules in my area".
We also released our Multimodal Tools Project for Eclipse a couple weeks ago: http://alphaworks.ibm.com/tech/mmtp
Go ahead and play.
When and if it can translate poems from language to language, while keeping the style, the nuances, the rythm, the cultural references, the general idea and the details, then we will know - it is done. Until then, don't hold your breath.
You can't handle the truth.
Pah. English-speaking people never misinterpret Arabic media. al-Jazeera is a terrorist front organisation and ought to be bombed, and that's all there is to it!
Real Daleks don't climb stairs - they level the building.
Patriotic. What part of "*International* Business Machines" did you not understand? More likely it's to show that they really understand the problem and not just the English-only subset.
What a boon this will be to those anime fansub groups who can't find decent translators, or at least translators who aren't overworked.
By the taping of my glasses, something geeky this way passes
I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.
And more importantly (for them) no pesky staff translators with a conscience leaking what they transcribed or the greater good.
~Pev
I've been hearing this every 6 months for about the last, oh, thiry years.
Given that the state of the art in something much simpler, like automatic language translation, is pitifully inadequate, how likely is it IBM has conquered speech recognition AND translation?
Har har har.
this of course worries secretaries, since they might eventually lose their job/"career". on the other hand it would improve effeciency *a lot*.
There's nothing too profound behind this sig.
slavic languages are also indoeuropean.
Conservatism: The fear that somewhere, somehow, someone you think is your inferior is being treated as your equal.
Tap all arabic/international lines, install zillions speech recognition nodes, make them write everyting to log files and use grep to find whatever you want. Your Arabic may be a hundred times better, but you cannot do anything like that even if you hire a whole Lebanon to help you.
This is a fantastic development. It is exactly the kind of thing that 64-bit processors were made for. It is the 'killer ap', the best since MP3 and CD-rippers. If it actually works, the high-tech equivalent of 'in-shaa Allah'.
We should encourage IBM to allow enough of the technology to 'escape' in order to enable other languages to be translated from speech into English. There should be some kind of open review of the translation involved, also. This can help prevent subtle errors in translation that will arise. Hopefully we can catch these before they get widespread.
Perhaps we should also remember the ancient parable of the Tower of Babel. This is a story from about 3000 years ago where a united monolingual people tried to pool all of their resources and build a tower to reach God. God, not wishing to have so many freeloaders and boors hanging around eating his food, drinking his liquor, dipping into his stash, and impregnating his angels, cast an environmental change over all the people that split them into many, many groups that spoke mutually incomprehensible languages. Perhaps this is an ancient folk explanation of how different languages came to be; perhaps it is a veiled warning about the consequences that can arise from having everyone speaking the same language.
In any event, kudos to IBM. Keep up the good work.
I can wreck a nice beach. I can recognize speech.
Well, Dragon Systems eight passed the beach test first try. Knowing the program, however, I did use pretty clear diction.
I use Dragon Systems and find it absolutely great. There are a few persistent errors. For example, It frequently fails to get "there" and " there" right on the first try. But the fly down menu system enables me to quickly correct the problem on the run. Certainly I pick it up on an edit. If IBM has something better than this -- and it sounds like they do -- then it must be pretty darn good. Of course, you have to insert the punctuation verbally. But that comes with a little practice -- provided that you know what to do in the first place.
It does take a little bit of investment in time. But not nearly as much as learning to type at seventy words a minute, which I can now do in dictation. I have added very little by way of customized commands etc. The program has done a lot of learning on its own.
Let's try once again: I can't recognize beach. I can recognize speech. Oops. Okay, it failed that time. Let's try one more time: I can wreck a nice beach. I can recognize speech. Well, the phrases have to be enunciated pretty clearly or the program has trouble.
Which which blew the blue candle. Failed on the second "which" the b*tch.
Okay, okay. I'll put the laundry in the dryer. No I am not just screwing around on Slashdot again I'm getting some work done down here. Just a minute. Just a MINUTE.
One trouble. You do have to put the mike to sleep during family discussions.
"No fear. No envy. No meanness." Liam Clancy
...perpetually monitors Arabic television...
Sounds like the results of a DOD/DARPA/NSA funded research grant. They'd love to be able to translate on the fly, instead of having to train and pay actual humans to manually translate several hours -- or even days and weeks -- after the original transmission.
Now that IBM has something kinda working and the grant money is running out they are trying to market it to the public. Kinda like Tang for the War on Terror-age.
obviously no deficiencies vs. no obvious deficiencies
No matter how hard I try, TTS always sounds horrible. Just that same robotic, metallic voice saying "Would you like to play a game?"
I've always found it most entertaining to check the effects reciting Lewis Carroll's Jabberwocky has on any new/exciting speech reco program.
On a more serious note, however, my wife was involved in an ill-fated-due-to-ancient-technology project back in grad school in the early 70's which involved:
1. Speech recognition.
2. Machine translation into a universal grammar
3. Translation of the universal grammer into various target languages.
4. Speech synthesis in the various target languages, using the same vocal qualities as the original speaker.
Pretty lofty goals cosidering they were probably using computers with discrete components in them.
Curiously, my wife (a native Japanese speaker) was teamed with the Suomi (Finnish) team because of the similarities in the two language's structures.
Life is tough. Life is even tougher when you're stupid.
That sounds like a job we should sub-contract to India! I'm sure it would be much cheaper, and the results would be equally hilarious.
This is the *PERFECT* use for a technology like this! :)
"boy, I sure hope my stupid radio.. doesn't... uh... play 92.3"
vs,
"Does your radio suck? boy I sure hope my stupid radio doesn't. Uh, play 92.3"
"Is this just useless, or is it expensive as well?"
The article is really saying two things:
1. IBM has updated their ViaVoice large vocabulary continuous speech recognition (LVCSR) engine.
2. IBM has paired ViaVoice with some clever apps to use the ViaVoice output in interesting ways (e.g. "on the fly" recognition, translation).
Things that are not obvious from the article:
1. ViaVoice has been around for ages and has always been pretty darn good at LVCSR. Without seeing numbers and knowing exactly how they were measured, it's impossible to know how much of an improvement 4.4 is over previous versions.
2. Speaker-dependent speech recognition can always achieve much higher accuracy rates than speaker-independent systems like ViaVoice. Dragon NaturallySpeaking is an example of speaker-dependent speech recognition.
3. Limited grammatical contexts (i.e. language models with low perplexity) always give better recognition than when you don't know what to expect next. For example, when your phone only has to tell "home" and "wife" apart, it's a lot less likely to make a mistake than if it has to figure out which word out of a list of 50,000 you just said. The more context, the better. The most interesting tech in the article seems to be the algorithms "that can determine this context on the fly."
4. No improvements in translation technology were noted in the article; it sounds like they might as well have fed ViaVoice through BabelFish, made it happen in real time, and slapped a UI on it. The app might be new, but the tech is not.
"I had to help my uncle Jack off a horse."
"I had to help my uncle jack off a horse."
Will it ever catch that one?
Still waiting on Serviscope_minor to wake up to fucking reality and realize that Jessica Price isn't going to fuck him.
I have been reading up on literacy. One fascinating point is that written language (at least English) is many times richer (in vocabulary and structure) when compared to spoken language. As heard on (for example) television.
If this holds true for other languages, it may be easier to translate spoken material.
Still, I want to see the result.
Ratboy.
Just another "Cubible(sic) Joe" 2 17 3061
IBM does admit it. They thank DARPA
m
s hments.htm
and other DoD groups for their funding
in their research papers. Most of the
current funding for speech-related research
in DoD is run through the GALE project:
Global Autonomous Language Exploitation
http://www.darpa.mil/ipto/programs/gale/index.htm
Salim Roukous of IBM, whom they quote in the
article, is the main player from the IBM
side and IBM is one of the main players in
this project. They were formerly a primary
player in TIDES:
Translingual Information Detection, Extraction and Summarization
http://www.darpa.mil/ipto/programs/tides/index.ht
In fact, that site has the last link I can find on
DARPA's site about TIA (Total Information Awareness),
which is a program formerly run by ex-Admiral Poindexter
(Iran-Contr fame) and shut down by an act of congress
(and erased from DARPA's site as if it never happened):
http://www.darpa.mil/ipto/programs/tides/accompli
These are not classified projects. You can
read about most of the techniques in the proceedings
of conferences such as ACL, ICSLP and Eurospeech.
I helped apple wreck a beach!
Arash Partow's Philosophy: Be a person who knows what they don't know, and not a person who doesn't know.
Or warrants.
This must be the day of the week that scams are announced.
First we have software that cannot be reverse engineered and guarantees the free speech rights of Americans.
It comes attached to the Brooklyn Bridge and some Florida swamp land.
Now we have this crap: "By limiting the domain, the system can make assumptions or inferences about what the user would like to accomplish, he said."
This is not exactly "superhuman" speech recognition.
None of this is feasible absent conceptual processing technology. Period.
I don't know why I don't clean up at the public trough by simply announcing I have "true artificial intelligence" and wait for the checks to roll in before leaving for Brazil.
Richard Steven Hack - This sig is TOO GODDAMN SHORT TO DO ANYTHING USEFUL WITH! MORONS!
"Brothers and sisters are natural enemies. Like Englishmen and Scots. Or Welshmen and Scots. Or Japanese and Scots. Or Scots and other Scots. Damn Scots! They ruined Scotland!"
Sorry.
Transcription? Not too hard. Translation? I highly doubt it.
Recent studies of the efficacy of machine translation found that we have made only marginal progress by modern engines from those of the *70s*, (in fact, one of them, SysTrans, is the most used translation engine online) and there were *no* descernable difference between engines of the eighties and current engines. I hope that they're not trying to claim that they suddenly overcame the vast problems of translation wholly independent of the linguistic community. That's just ludicrous.
I'd love to see the this engine handle a parasitic sentence like this between two largely different languages and catch the nuance in the parens: "Which report did she file (that report) without (her) reading (that same report)?" Sure some engines will hit by chance, but only because of similar structure, but the engine is lucky, not actually parsing the "meaning."
"Fight for lost causes. You may discover they weren't."
i know, finno-ugric. used to learn estonian (the same language family as finnish) at school years ago
Conservatism: The fear that somewhere, somehow, someone you think is your inferior is being treated as your equal.
That's why it's pronounced nuk-ya-lar.
...because "hacker" sounds way sexier than "code drone."
Instead of Arabic they should have started translating from Dubya speech to English
if we all spoke Lojban.
"It's because they're stupid, that's why. That's why everybody does everything." -Homer Simpson
Not for translation, but for just speach recocgnition? ScanSoft Dragon NaturallySpeaking 8 Preferred is in the top 30 for software sales through amazon, so obviously some people find this useful.
I just love the example[1] the IBM marketroids chose for this: "For example, when asking for 'Radio 104.3 FM,' the new IBM-pioneered technology allows drivers to simply say, 'Tune to 104.3,' or 'Set the radio station to 104.3,' or 'Change the radio station to 104.3.'" Of all the amazing applications one could dream up, saving a driver from having to punch a radio preset is what they came up with.
1 50.wss
I rather like "Open the pod bay door, Hal" myself.
--
1. http://www-03.ibm.com/press/us/en/pressrelease/19
.plan: file not found
Anyway, QED for D8s shortcomings... and mine. d:-b
"No fear. No envy. No meanness." Liam Clancy
It sounds like you were willing to put in the time to get the good of this program. I don't know if they sell an evaluation copy. You might find a boxed used version of DS8 on Ebay since so many do not have the patience you showed to use speech to text. On the other hand IBM has been in this game for decades. Dragon beat them for a while (in my opinion) but this new sofware seems pretty unique. You might want to hold out for that. In any case this tech is maturing. There is hope.
I concur that it is a hog for resources.
"No fear. No envy. No meanness." Liam Clancy
Can't you imagine how much better than nothing it is?
No big deal. It's always subject+object+verb...
Actually, word order in Japanese is quite flexible, especially spoken. Subjects are often dropped, or sometimes tagged onto the end of the sentence as an afterthought. That is only part of it, however - not just the sentence, but its constituent parts are backwards. Dependant phrases are often in the opposite order, if/when words come at the end of the phrase rather than the beginning, descriptive clauses before the noun rather than after, words equivalent to "to" or "from" come after rather than before, negations come after the verb rather than before. About the only things that do match English are adjective before noun and subject at the beginning (if it is there at all).
Since when this is a fundamnetal part??? You know, there's the stuff and then there are complements. Just an opinion, I think we're firmly in "complements" terrain now
Semantically, these are the most important parts. Words like "car" "blue" and "to drive" are easy to translate. English articles and prepositions, and Japanese particles, are the words that define the relationships between the nouns, verbs, and adjectives in the sentence. These words are by far the hardest to translate and most difficult for a human to learn to use properly. When I correct papers for my Japanese colleagues, do they mess up terms like "high molecular weight polymer" or "oxygen dissolution"? Nope. They mess up a, an, and the. Same holds for me when I try to speak Japanese. I get the little words wrong and sometimes say something far different from what I intended.
Sorry to make your world a lttle sadder, but this is not only a language problem. It's a cultural one.
Yes, these sentiments are almost impossible to translate. English simply does not have a mechanism for it. The reverse also holds true in many situations.
Translation _is_ difficult because you really don't translate words or phrases (symbols, seen or heard) but semantic concepts (somebody help me here with the right technical word). This means purely simbolic methods (like "which word is equivalent to" or "search and replace") are bound to fail.
I agree. I think the statistical methods that are currently popular for machine translation will never get passed the barely-understandable level. To do that, you have to have context and meaning. Computers are a long way from that point.