Slashdot Mirror


IBM Strives For 'Superhuman' Speech Tech

robyn217 writes "IBM unveiled new speech recognition technology today that can comprehend the nuances of spoken English, translate it on the fly, and even create on-the-fly subtitles for foreign-language television programs. One of the projects perpetually monitors Arabic television stations, dynamically transcribing and translating any words spoken into English subtitles. Videos can then be viewed via a web browser, with all transcriptions indexed and searchable."

26 of 289 comments (clear)

  1. Which ... by spiny · · Score: 3, Interesting

    Which witch blew the blue candle out ?

    --

    Fry: heh, Yakov Smirnoff said it
    Leela: No he didn't.
    1. Re:Which ... by jakeweston · · Score: 3, Funny

      To wreck a nice beach...

    2. Re:Which ... by jcupitt65 · · Score: 5, Interesting
      Or I can wreck a nice beach versus I can recognise speech.

      Sometimes you need rather a large context to disambiguate: is this sentence part of a discussion on shore-front management, or spoken language understanding?

  2. Coherency? by PrinceAshitaka · · Score: 4, Insightful

    From The article "For now, all video processed through Tales is delayed by about four minutes, with an accuracy rate of between 60 and 70 percent" and "The accuracy rate could be increased to 80 percent, Roukos added"

    Still even at 80 percent how good is this translation. If that 20% is the important parts of speech You could still be left clueless. Even the best Machine translations of text I have seen always leaves the text a bit garbled and confusticated.

    I don't know how much delay is implied in the phrase "on the fly" , but I personally don' think there could ever be real time translation for the following reason. Sentences in different languages have different sentence structures. While in English the verb is usually the second part, in other languages the verb comes many times last (German). For the translator to get the second word of a sentence, it would have to wait till the end, of what could be a long sentence. This necessarily adds delay.

    --
    quis custodiet ipsos custodes
    1. Re:Coherency? by Yahweh+Doesn't+Exist · · Score: 3, Interesting

      yes, there will always be delay for the reason you state. but that's true even with human translators, yet no-one claims real-time meetings between people via translators is a waste of time.

      since even "live" boradcasts are usually delayed several minutes for technical and legal reasons anyway, if this technology can get to the state where you're just one or two sentences behind real-life it will be effectively real-time anyway for almost all practical purposes.

    2. Re:Coherency? by dancallaghan · · Score: 3, Interesting

      but I personally don' think there could ever be real time translation for the following reason. [German]

      You are going to have that problem whether it's a machine doing the translating or a human. As I understand it, interpreters of German get around this by some quick-thinking restructuring of the translated sentence, or they simply lag a half-sentence or so behind.

      The real problem for machine translation is, and always has been, determining the sense of a word from context (indeed I recall a recent Slashdot article about some guy who suggests this is the separating factor between computers and animal intelligence). Most languages have a great many homonyms whose meaning a listener can determine only from the surrounding contenxt and, often, general background knowledge of the language or topic at hand.

  3. first? by Anonymous Coward · · Score: 5, Funny

    however the researchers stated "We still can't figure out what Bob Dylan is saying"

  4. Nuances by AnonymousYellowBelly · · Score: 4, Funny

    GB on TV: "We have prevailed"
    Subtitle: "All your base are belongs to us"

    --
    Disclosure: I'm stupid
  5. Foreign languages are complex... by pubjames · · Score: 5, Insightful

    I'm afraid this type of technology will be used as an exuse for people not to learn foreign languages, which is a shame.

    It's not until you learn another foreign language that you realise how complex languages are, and how subtle. Learning another language can literally change the way you think about things.

    This type of technology will make people think they completely understand a foreign language, but they won't. Their understanding will be crude, without the subtleties and cultural understanding.

    I can speak English and Spanish fluently, and if I watch an English film with Spanish subtitles I'm always thinking - damn, they missed a good joke there, they got that wrong, etc. (Equally so with a Spanish film with English subtitles). And film subtitles are done by professional translators. God only knows what a terrible job a computer would make of film translation.

    1. Re:Foreign languages are complex... by MPHellwig · · Score: 4, Funny

      And of course: "Up yours!" ;-)

    2. Re:Foreign languages are complex... by Mushdot · · Score: 3, Interesting

      I have a friend works in Japan and he tells me the same. He often goes to watch English films that are subtitled in Japanese and tells me that they completely miss-translate most of the jokes and miss subtle nuances of speech. One example he gave was a scene from 'The Full Monty' (im doing this from distant memory so it might not be quite right - in fact, a bad translation :-)

      One of the characters is shouting up to someone in their bedroom window. They don't respond to the shouting and the character says "He obviously can't hear me because of his triple glazing".

      This is a sarcastic comment relating to the house owners supposed wealth but in Japanese it was translated as:

      "He has thick windows"

      Perhaps in this case there was no easy way to translate - but I suspect films are probably translated in one pass and there is no time to understand the context of each sentence spoken so it's left to literal translatation only.

    3. Re:Foreign languages are complex... by Splab · · Score: 5, Funny

      From boondock saints:
      Rocco: Fucking... What the fuck. Who the fuck fucked this fucking... How did you two fucking fucks...
      [shouts]
      Rocco: fuck!
      Connor: Well, that certainly illustrates the diversity of the word.

      Think that just about covers it...

  6. Ghee... by Anonymous Coward · · Score: 4, Insightful

    Hmm, instantaniously translation from arabic, wonder who "cough cough echelon cough!" they are marketing this to.. ?

  7. Re:Just what we need... by pubjames · · Score: 4, Insightful

    More opportunities for Arabic speaking people to misinterpret western media.

    I think you've got it the wrong way round haven't you? Did you mean to say "More opportunities for English speaking people to misinterpret Arabic media."?

  8. If they REALLY want to test it properly... by Viol8 · · Score: 4, Funny

    ...they should send it to Glasgow on a saturday night just after the pubs
    have closed.

    "Ye loooiii ahhh me jimmeh??! *belch* C'mere ya wee electrahnich bastid, I'll
    shoo ye!"

  9. It isn't worth it by YearOfTheDragon · · Score: 5, Funny

    May be IBM is going to make speech recognition true, but Bill Gates said that this was posible a long time ago. Simply genius.

    --
    -= If you fight Dragons long enough, you will become a Dragon =-
  10. On-The-Fly by Trurl's+Machine · · Score: 4, Informative

    They really do it on the fly? You mean, [on the surface of] [a particular] [insect of a Musca domestica species]?

    I have read a lot of auto-translated documents and it is always a good laughter in terms of "crapslation cabaret". So far, there is no technology that could auto-translate a text document succesfully. The "80% success" is a myth - they just count how many words were found in the vocabulary, not how many of them were put into a good context. A "fly" translated as an insect would be accounted as a success!

    Even if you are not a bot but a human being with some knowledge of the other language and culture, it's very easy to involuntary offend someone or just to make a ridiculous faux-pas. Polish and Czech languages, for example, are very much alike and use common roots for many words, but because of the way both languages evolved, some neutral terms on one side of the border have become offensive on the other side. Czechs evolved an euphemism for sexual intercourse based on the verb "to look for". Poles still use this word when they look for something, which leads to constant crapslation cabaret gags when a Polish tourist appears in a Czech town "looking for a parking lot". Now, auto-translate this...

  11. This won't make speech recognition mainstream by thbb · · Score: 4, Interesting

    As it has been the case for the past thirty years, the description of the prowesses of the system are still written in the conditional form: "...IBM technology can be used to control computers and devices..." rather than the active form: "is being used"...

    Ben Shneiderman is the person who, in my opinion, articulates the best the limits of speech recognition.

    One of my favorite phrases to explain this issue is: "You don't want to speak to a computer, because you can't speak and think at the same time". More precisely, speech utterance makes use of some modules in our brain which are required for planification too. Hence, you can't plan as well what to do next when you speak, which is a big hurdle in the type of intellectual activities one carries with a computer.

  12. Awful default TTS by Council · · Score: 3, Insightful

    Speech-to-text is cool, but for 30 years they've been predicting it's the next new thing in interfaces, and it's remained a niche thing as it gets better and better. Maybe it'll hit the point where it's flawless and suddenly find new markets, but we'll see.

    What really bothers me is the state of Windows text-to-speech. The TTS that ships with the most popular operating system on Earth is easily trumped in understandability by a small third-party program I downloaded literally TWELVE YEARS AGO. I really wonder if M$ made some pact to give out crappy TTS so as not to stifle sales of some business partner's application.

    This seems pretty ridiculous, but I'm at a loss as to why their text-to-speech programs are of 12-year-old quality.

    I'm glad people are doing good speech research, (I know I've seen a demo of good IBM TTS somewhere) but I hope it finds its way into Windows someday.

    --
    xkcd.com - a webcomic of mathematics, love, and language.
  13. Re:Opensource? by omeg · · Score: 3, Insightful

    Of course it won't be open source. They achieved what they dub a "breakthrough in speech recognition". They plan on making a lot of money with this.

  14. Oh oh oh. by Anonymous Coward · · Score: 3, Funny

    I think it was about 1996 or maybe 1997 when I attended an IBM demonstration (for retailers) for its speech recognition software. Anyway, the lady who was narrating the text and. talking. like. a. robot. to. do. it. was half-way through when, for no apparent reason, the word uterus appeared in the text.

    So I'm sitting here thinking of how funny it was to the juvenile me back then, and how unfunny it seems right now. Oh well.

  15. And German is an easy one by Ogemaniac · · Score: 4, Informative

    It is as closer to English as any other language. In general, European languages have the same basics as English (such as "the") and are fairly easy to learn and translate. Right now I live in Japan, where the language and its underlying way of thinking basically run in the reverse direction of English. To translate, you are essentially running the whole thing backwards. Worse yet, the fundamental parts of the language are quite different. For example, Japanese does not have articles or prepositions, though it has post-positions that roughly correspond. However, there are fewer of them, so they have "lots of meanings" when translated into English. Translation can be a "#$#, even for a human who understands both languages very well (which is why anime comes off so corny sometimes). There are countless times where there is just no simple way to express a thought in one language that is trivial in the other.

  16. Buyer beware by 99luftballon · · Score: 4, Insightful

    Speech recognition has long been the land of inflated promises and little returns. Anyone remember Lernout & Hauspie and its supposed 15 minutes learning time?

    Speech recognition is riddled with problems. From a computing side it's enormously processor intensive and memory hungry. From a computer side it's very com,plex code and the 'learning' process is fraught with problems - surnames, company names and locations are all very poorly recognised.

    So don't rush to buy. Let the labs check it out first.

  17. Re:Just what we need... by user9918277462 · · Score: 4, Insightful

    There's a very good reason they're testing this tech on Arabic speech primarily. Although they won't say it, I'd be very surprised if the DOD isn't sponsoring this. NSA would absolutely love to be able to translate and transcribe monitored Arabic speech (ie, phone calls) in real time. No backlog of untranslated intercepts, no staff shortages.

  18. Re:Let's see it translate poems by hunterx11 · · Score: 3, Interesting

    I'd be happy enough if humans could do this.

    --
    English is easier said than done.
  19. Live experiment with Dragon 8 by bdwoolman · · Score: 4, Funny
    Here we go:

    I can wreck a nice beach. I can recognize speech.

    Well, Dragon Systems eight passed the beach test first try. Knowing the program, however, I did use pretty clear diction.

    I use Dragon Systems and find it absolutely great. There are a few persistent errors. For example, It frequently fails to get "there" and " there" right on the first try. But the fly down menu system enables me to quickly correct the problem on the run. Certainly I pick it up on an edit. If IBM has something better than this -- and it sounds like they do -- then it must be pretty darn good. Of course, you have to insert the punctuation verbally. But that comes with a little practice -- provided that you know what to do in the first place.

    It does take a little bit of investment in time. But not nearly as much as learning to type at seventy words a minute, which I can now do in dictation. I have added very little by way of customized commands etc. The program has done a lot of learning on its own.

    Let's try once again: I can't recognize beach. I can recognize speech. Oops. Okay, it failed that time. Let's try one more time: I can wreck a nice beach. I can recognize speech. Well, the phrases have to be enunciated pretty clearly or the program has trouble.

    Which which blew the blue candle. Failed on the second "which" the b*tch.

    Okay, okay. I'll put the laundry in the dryer. No I am not just screwing around on Slashdot again I'm getting some work done down here. Just a minute. Just a MINUTE.

    One trouble. You do have to put the mike to sleep during family discussions.

    --
    "No fear. No envy. No meanness." Liam Clancy