Slashdot Mirror


US Intelligence Seeks a Universal Translator For Text Search In Any Language (arstechnica.com)

An anonymous reader quotes a report from Ars Technica: The Intelligence Advanced Research Projects Agency (IARPA), the U.S. Intelligence Community's own science and technology research arm, has announced it is seeking contenders for a program to develop what amounts to the ultimate Google Translator. IARPA's Machine Translation for English Retrieval of Information in Any Language (MATERIAL) program intends to provide researchers and analysts with a tool to search for documents in their field of concern in any of the more than 7,000 languages spoken worldwide. The specific goal, according to IARPA's announcement, is an "'English-in, English-out' information retrieval system that, given a domain-sensitive English query, will retrieve relevant data from a large multilingual repository and display the retrieved information in English as query-biased summaries." Users would be able to search vast numbers of documents with a two-part query: the first giving the "domain" of the search in terms of what sort of information they are seeking (for example, "Government," "Science," or "Health") and the second an English word or phrase describing the information sought (the examples given in the announcement were "zika virus" and "Asperger's syndrome"). The system would be used in situations like natural disasters or military interventions in remote locations where the military has little or no local language expertise. Those taking on the MATERIAL program will be given access to a limited set of machine translation and automatic speech recognition training data from multiple languages "to enable performers to learn how to quickly adapt their methods to a wide variety of materials in various genres and domains," the announcement explained. "As the program progresses, performers will apply and adapt these methods in increasingly shortened time frames to new languages... Since language-independent approaches with quick ramp up time are sought, foreign language expertise in the languages of the program is not expected." The good news for the broader linguistics and technology world is that IARPA expects the teams competing on MATERIAL to publicly publish their research. If successful, this moonshot for translation could radically change how accessible materials in many languages are to the rest of the world.

47 comments

  1. Oxymoron by thechemic · · Score: 1, Funny

    US Intelligence

    --
    Let's make like a bird... and get the flock outta here.
    1. Re:Oxymoron by rmdingler · · Score: 1

      Clearly, two words in juxtapostion, unlike Canadian politeness, African poverty, or European Union discord.

      --
      Happiness in intelligent people is the rarest thing I know.

      Ernest Hemingway

    2. Re: Oxymoron by Anonymous Coward · · Score: 0

      This type of thing is essentially in a search engine company's line of business. I'm not sure of the value of giving the intelligence community first access to a tool like this. The public does not actually get much benefit from their work, it's usually baked into a big lie by top officials before being trying to sell a useless war.

  2. Agreed by Anonymous Coward · · Score: 0

    Oxymoron... that's like, a moron that is addicted to OxyContin?

  3. Clever by Anonymous Coward · · Score: 0

    MATERIAL... They think they are so clever.

  4. Have they tried Google? by Anonymous Coward · · Score: 0

    It's free and fairly accurate.

  5. Easy to defeat... by bogaboga · · Score: 1

    Here's how:

    How about writing, "The sheep are coming..."

    ...And this to mean something entirely different in the bad guys' minds?

    Easy and effective. Isn't it?

    1. Re:Easy to defeat... by admin7087 · · Score: 1

      Jesus Christ, you're a genius! They didn't think of that! How did you come up with this idea?

    2. Re:Easy to defeat... by dbIII · · Score: 1

      How about writing, "The sheep are coming..."

      In a Desmond Bagley spy thriller the automatic translator turned "hydraulic ram" into "water sheep".

    3. Re: Easy to defeat... by Anonymous Coward · · Score: 0

      "The system would be used in situations like natural disasters or military interventions in remote locations where the military has little or no local language expertise."

      I don't think its intended for surveillance. Not saying the government doesn't love to spy but if that's its intended purpose they probably wouldn't have made it public:

      "The good news for the broader linguistics and technology world is that IARPA expects the teams competing on MATERIAL to publicly publish their research."

    4. Re: Easy to defeat... by Anonymous Coward · · Score: 0

      "The system would be used in situations like natural disasters or military interventions in remote locations where the military has little or no local language expertise."

      I don't think its intended for surveillance. Not saying the government doesn't love to spy but if that's its intended purpose they probably wouldn't have made it public:

      Not all languages are written. Most are not [chiefly in the trouble-making backwaters of the world].

  6. Will it translate /. binspam bad-translations? by davidwr · · Score: 0

    Will it be able to give meaning to poorly-translated newsfeeds like the ones this slashdot contributor's history?

    Sample:

    "Various framerates have been a warm theme before few years?"

    It gets worse from there.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
    1. Re:Will it translate /. binspam bad-translations? by sexconker · · Score: 1

      Will it be able to give meaning to poorly-translated newsfeeds like the ones this slashdot contributor's history?

      Sample:

      "Various framerates have been a warm theme before few years?"

      It gets worse from there.

      Different framerates have been a hot topic in recent years.

      That'll be .001 BTC.

    2. Re:Will it translate /. binspam bad-translations? by PolygamousRanchKid+ · · Score: 1

      Different framerates have been a hot topic in recent years.

      On Slashdot, different flamerates have been a hot topic in recent years.

      --
      Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
  7. Don't think that'll work by Rick+Schumann · · Score: 1

    I don't speak but a handful of words in a very short list of languages, I'm certainly no expert in language, but aren't there some languages that are so nuanced that a slight change in inflection, or tone, or emphasis, or maybe even cadence changes the entire meaning of what's being said? Wouldn't that be rather difficult to code for?

    1. Re:Don't think that'll work by slew · · Score: 2

      I don't speak but a handful of words in a very short list of languages, I'm certainly no expert in language, but aren't there some languages that are so nuanced that a slight change in inflection, or tone, or emphasis, or maybe even cadence changes the entire meaning of what's being said? Wouldn't that be rather difficult to code for?

      Two things...
      1. I think they are thinking about a computer database of text not audible data base.
      2. Nobody really technically "codes" this stuff anymore, a deep learning networks is conceived and configured in a framework and then trained with petabytes of data.

    2. Re:Don't think that'll work by Anonymous Coward · · Score: 0

      1) Text is even more ambiguous than spoken language
      2) Coded or trained, it's still something Google already did and nobody else is likely to make any landmark improvements without going directly through Google. Which, incidentally, is fused at the dickbutt with the NSA.

  8. yo, Dawg by turkeydance · · Score: 0

    PAWG

    1. Re:yo, Dawg by Anonymous Coward · · Score: 0

      phat ass white girls are definitely cool, but rosebud tranny porn is better!

  9. Isn't that...Google? by OldMugwump · · Score: 4, Funny

    Doesn't Google already do exactly that? Oh, wait. Yes, it does. But the DoD would have to let Google index their archive...

    --
    "Shoot, a fella could have a pretty good weekend in Vegas with all that stuff."
    1. Re:Isn't that...Google? by Anonymous Coward · · Score: 0

      Universal Translator : Google Translate :: A car : A Big Wheel.

  10. and they really really promise by Anonymous Coward · · Score: 0

    to only use it for good!!
    scouts honour!

  11. FTFY by slew · · Score: 3, Insightful

    If successful, this moonshot for translation could radically change how accessible materials in many languages are to the rest of the English speaking world.

  12. This will end in tears by Anonymous Coward · · Score: 1

    We need only to look at the BIBLE to see what happened the LAST TIME someone tried to create a Tower of Babel to see what will happen THIS time.

    1. Re:This will end in tears by chill · · Score: 1

      Yeah, but I'm pretty sure the machines are going to be rackmounts and/or blade servers. No one uses towers anymore.

      --
      Learning HOW to think is more important than learning WHAT to think.
    2. Re:This will end in tears by K.+S.+Kyosuke · · Score: 1

      That makes no sense. What does the story about an architectural project ruined by a fictional prig turned language inventor have in common to do with a later attempt to bridge the languages? Is anyone building another tower or are we expecting to get yet more languages or what?

      --
      Ezekiel 23:20
  13. I would like to apply for the job by iTrawl · · Score: 3, Insightful

    Dear Sir,

    My name is Mahindresh Jalabahamatra* from India. I would like to apply for the Universal Translator job that you are offering. I am very skilled in Universal Translation and have many years of experience. I have done Universal Translation for many clients in the past, and I consider your offered job as Universal Translator to fit my skills perfectly.

    Hoping to hear from you soon.

    --
    "Everybody's naked underneath" -- The Doctor
    1. Re:I would like to apply for the job by ems2004 · · Score: 2

      ...and as a proof for my skills I am currently on H1B visa specifically granted for Universal Translator..... I also have Master's degree in Universal Translator.... In my last 3 jobs I developed and wrote Universal Translator only....

      --
      ..... best things in life are not so free..........
    2. Re:I would like to apply for the job by rosshalz · · Score: 1

      Dear Sir,

      My name is Mahindresh Jalabahamatra* from India. I would like to apply for the Universal Translator job that you are offering. I am very skilled in Universal Translation and have many years of experience. I have done Universal Translation for many clients in the past, and I consider your offered job as Universal Translator to fit my skills perfectly.

      Hoping to hear from you soon.

      I'm Indian and I find this absolutely hilarious.. Should I be offended? Naa.. I've seen too many applicants use similar language... I will however laugh uncontrollably for the next 5 minutes at that name... Mahindresh Jalabahamatra

    3. Re:I would like to apply for the job by erapert · · Score: 1

      Could you please explain why that name is so funny?

    4. Re:I would like to apply for the job by Anonymous Coward · · Score: 0

      My name is Mahindresh Jalabahamatra* India. I have to apply to the universal translators are offered the job you want. I'm very efficient and universal translation has many years of experience. I have many clients is universal, the past and the translation I my skills as a universal translator to your job offer fit perfectly.

      FTFY

  14. I propose ... by CaptainDork · · Score: 0

    ... Sheldon Cooper.

    --
    It little behooves the best of us to comment on the rest of us.
  15. Bound to failure in natural context by MountainLogic · · Score: 0

    Natural language is inherently ambiguous and real humans love to make it more so with slang and swearing. Take the story of the gorilla artist Jason Sprinkle from Seattle. He was once most known for attaching a ball and chain to the massive hammering man statue on labor day. He had a commission for an art project to support job corp where he made a giant heart and drove it around to different job corp sites where he allow participants to sign the art and his truck. Once person wrote on his truck, "“Timberlake Carpentry Rules (the ‘Bomb’)” on the front bumper of the truck" as a slang for very cool. One day, pre-9/11, he was upset with cuts to city art funding and decided to park the truck, heart and all, in Seattle's main square to draw attention to the arts. Needless to say, the police interpreted the graffiti on his truck literally and the artiest ended-up in jail for a month which essentially ruined his life. OK, cops panicking in the heat of the moment you might expect, but if in the cold light of day prosecutors and the courts have such a problem handling slang, what are the chances some brainless code will be able to handle it?

    1. Re:Bound to failure in natural context by Anonymous Coward · · Score: 0

      You meant "guerilla"-- How many translation programs would understand that?

  16. Most difficult part done by manu0601 · · Score: 1

    The most difficult part of the project is completed: they found a nice backronym for it.

  17. Text search has limitations WITHIN a language. by hey! · · Score: 1

    As you've no doubt experienced when you've done a Google Search on a word which has multiple meanings. For example, suppose you google "How do I get rid of a mole?" Are you worried about a skin condition or a small burrowing mammal? It so happens that Google tries to give you a mix of both answers, which I suspect may reflect the result of some ad hoc result tweaking.

    So you do sometimes have to know how to rephrase a query, e.g. "pictures of a flying crane" to "pictures of an aerial crane".

    The problem is when you cross languages, words don't have a simple one-to-one relationship. For example the Latin world "sacer" can mean either "holy" or "unholy"; in a sense English treats the concepts as antonyms whereas Latin treats them as two kinds of the same thing. And there's idioms, like the Arabic "Ya'aburnee" (unicode redacted), which literally means "you bury me" but usually means "I love you" (i.e., I can't live without you). Of course you can program idioms like that into your translator, but your'e still going to have to accept either lots of false positives or false negatives. If you're a native speaker of Arabic you can tell from context whether the document you're looking at is talking about love or burial; if you're looking at a machine translation you won't be as sure.

    But of course just as false positives don't make Google useless, false positives wouldn't make a multi-language search engine useless. You just have to be aware of the limitations. But what concerns me is the tendency of people to think this stuff works like magic.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    1. Re:Text search has limitations WITHIN a language. by Agripa · · Score: 2

      "Out of sight, out of mind."

      Translation:

      "Invisible idiot."

  18. 42 by Anonymous Coward · · Score: 0

    babelfish

  19. Lol. no, Google does it better than you, a human by raymorris · · Score: 1

    > Dor example, suppose you google "How do I get rid of a mole?" Are you worried about a skin condition or a small burrowing mammal?

    No, Google would already know that neither of those interpretations is correct. Google tracks your search history, it knows who is asking. So when the CIA asks how to get rid of a mole, Google knows they are talking about a https://en.m.wikipedia.org/wik... mole.

  20. the how and the why are unrelated by raymorris · · Score: 1

    They expect people to publish research into how to take some English search terms and then search a pile of assorted documents in different languages. The public can see (some of) HOW one can search text. So we get to see some ideas about searching general text.

    Which text they later search, for what reasons, is a completely separate issue. If they can get a system like this developed, they would be foolish if they didn't use it in their national security mission. In fact, most intelligence is from open sources (OSINT). The challenge for the intelligence agencies is to glean some useful information from the billions of newspaper articles, forum posts, tweets, ads, presentations, scholarly papers, job postings, etc that are available. For example, if a government posts a job ad for highly skilled machinists, and separately a requisition for Acme model 502 control circuits, and got a large shipment of helium, and the power plant in Skitsville is supplying abl heavier load than normal, that suggests the country is building ______ in Skitsville. The challenge is finding all these little bits of information, and then putting the pieces together. Before 9-11, various US agencies had different pieces of intelligence, but none had them all together, to see how the tidbits fit together to reveal the danger.

    Here's an entertaining example where there was no need to put the pieces together, the spy agency just needed to find this one secret published in the open. When the B2 bomber was revealed to the public, reporters only got a front view and had to stand 200 feet back, so they couldn't see the rear of the plane or the overall shape as would be seen from above. BEFORE even that much was revealed, Honda ran this ad:

    https://i.kinja-img.com/gawker...

    Honda got called to Washington to answer how the hell they knew exactly what the plane looked like - nothing like that had been released, the shape was classified at the time. Intelligence services from other nations only had to find that ad, in a mountain of ads, to get a picture of the USA's top-secret plane.

  21. Mmmm..k? Code: by TheOuterLinux · · Score: 1

    text=document.txt translate="$(wget -U "Mozilla/5.0" -q0 - "http://translate.googleapis.com/translate_a/single?client=gtx&sl=auto&tl=en&dt=t&q=$(xsel -o | sed "s/[\"']//g")" | sed "s/,,,0]],,.*//g" | awk -F'"' '{print $2, $6}')" echo $translate Not tested, but should work. Idea came from here:http://www.webupd8.org/2016/03/translate-any-text-you-select-on-your.html?m=1 Got more links and stuff at TheOuterLinux.com

    1. Re: Mmmm..k? Code: by TheOuterLinux · · Score: 1

      Did that with my phone. Had no idea it would mash it all together. :(

  22. Re:Babel by hackwrench · · Score: 1

    The Tower wasn't the instrument of bad translation, just the beneficiary of it. My takeaway was that much like your example with the word, "Tower" different words had different meanings for different people but were close enough to work in enough contexts, but fell apart in all the contexts that were needed to build the tower.

  23. Re:Much better actually by hackwrench · · Score: 1

    You see, the software isn't predisposed to one interpretation or another. After the training is another matter however.

  24. C-code by Anonymous Coward · · Score: 0

    Most of my on-line footprint is C-code on github, and I'd be surprised if anyone or anything can read that mess.

  25. BabelFish by Anonymous Coward · · Score: 0

    Univerisal Translators may be bought at competitive prices from General Intelligence, a subsidiary of Douglas Adams Industries.

  26. Cool by zedaroca · · Score: 1

    Now they'll drone murder us based on what an algorithm mistranslated.
    If at least a bilingual murderer had to listen to the xbox record of us joking in the living room our chances would be higher.