Slashdot Mirror


US Intelligence Seeks a Universal Translator For Text Search In Any Language (arstechnica.com)

An anonymous reader quotes a report from Ars Technica: The Intelligence Advanced Research Projects Agency (IARPA), the U.S. Intelligence Community's own science and technology research arm, has announced it is seeking contenders for a program to develop what amounts to the ultimate Google Translator. IARPA's Machine Translation for English Retrieval of Information in Any Language (MATERIAL) program intends to provide researchers and analysts with a tool to search for documents in their field of concern in any of the more than 7,000 languages spoken worldwide. The specific goal, according to IARPA's announcement, is an "'English-in, English-out' information retrieval system that, given a domain-sensitive English query, will retrieve relevant data from a large multilingual repository and display the retrieved information in English as query-biased summaries." Users would be able to search vast numbers of documents with a two-part query: the first giving the "domain" of the search in terms of what sort of information they are seeking (for example, "Government," "Science," or "Health") and the second an English word or phrase describing the information sought (the examples given in the announcement were "zika virus" and "Asperger's syndrome"). The system would be used in situations like natural disasters or military interventions in remote locations where the military has little or no local language expertise. Those taking on the MATERIAL program will be given access to a limited set of machine translation and automatic speech recognition training data from multiple languages "to enable performers to learn how to quickly adapt their methods to a wide variety of materials in various genres and domains," the announcement explained. "As the program progresses, performers will apply and adapt these methods in increasingly shortened time frames to new languages... Since language-independent approaches with quick ramp up time are sought, foreign language expertise in the languages of the program is not expected." The good news for the broader linguistics and technology world is that IARPA expects the teams competing on MATERIAL to publicly publish their research. If successful, this moonshot for translation could radically change how accessible materials in many languages are to the rest of the world.

28 of 47 comments (clear)

  1. Oxymoron by thechemic · · Score: 1, Funny

    US Intelligence

    --
    Let's make like a bird... and get the flock outta here.
    1. Re:Oxymoron by rmdingler · · Score: 1

      Clearly, two words in juxtapostion, unlike Canadian politeness, African poverty, or European Union discord.

      --
      Happiness in intelligent people is the rarest thing I know.

      Ernest Hemingway

  2. Easy to defeat... by bogaboga · · Score: 1

    Here's how:

    How about writing, "The sheep are coming..."

    ...And this to mean something entirely different in the bad guys' minds?

    Easy and effective. Isn't it?

    1. Re:Easy to defeat... by admin7087 · · Score: 1

      Jesus Christ, you're a genius! They didn't think of that! How did you come up with this idea?

    2. Re:Easy to defeat... by dbIII · · Score: 1

      How about writing, "The sheep are coming..."

      In a Desmond Bagley spy thriller the automatic translator turned "hydraulic ram" into "water sheep".

  3. Don't think that'll work by Rick+Schumann · · Score: 1

    I don't speak but a handful of words in a very short list of languages, I'm certainly no expert in language, but aren't there some languages that are so nuanced that a slight change in inflection, or tone, or emphasis, or maybe even cadence changes the entire meaning of what's being said? Wouldn't that be rather difficult to code for?

    1. Re:Don't think that'll work by slew · · Score: 2

      I don't speak but a handful of words in a very short list of languages, I'm certainly no expert in language, but aren't there some languages that are so nuanced that a slight change in inflection, or tone, or emphasis, or maybe even cadence changes the entire meaning of what's being said? Wouldn't that be rather difficult to code for?

      Two things...
      1. I think they are thinking about a computer database of text not audible data base.
      2. Nobody really technically "codes" this stuff anymore, a deep learning networks is conceived and configured in a framework and then trained with petabytes of data.

  4. Isn't that...Google? by OldMugwump · · Score: 4, Funny

    Doesn't Google already do exactly that? Oh, wait. Yes, it does. But the DoD would have to let Google index their archive...

    --
    "Shoot, a fella could have a pretty good weekend in Vegas with all that stuff."
  5. FTFY by slew · · Score: 3, Insightful

    If successful, this moonshot for translation could radically change how accessible materials in many languages are to the rest of the English speaking world.

  6. Re:Will it translate /. binspam bad-translations? by sexconker · · Score: 1

    Will it be able to give meaning to poorly-translated newsfeeds like the ones this slashdot contributor's history?

    Sample:

    "Various framerates have been a warm theme before few years?"

    It gets worse from there.

    Different framerates have been a hot topic in recent years.

    That'll be .001 BTC.

  7. This will end in tears by Anonymous Coward · · Score: 1

    We need only to look at the BIBLE to see what happened the LAST TIME someone tried to create a Tower of Babel to see what will happen THIS time.

    1. Re:This will end in tears by chill · · Score: 1

      Yeah, but I'm pretty sure the machines are going to be rackmounts and/or blade servers. No one uses towers anymore.

      --
      Learning HOW to think is more important than learning WHAT to think.
    2. Re:This will end in tears by K.+S.+Kyosuke · · Score: 1

      That makes no sense. What does the story about an architectural project ruined by a fictional prig turned language inventor have in common to do with a later attempt to bridge the languages? Is anyone building another tower or are we expecting to get yet more languages or what?

      --
      Ezekiel 23:20
  8. I would like to apply for the job by iTrawl · · Score: 3, Insightful

    Dear Sir,

    My name is Mahindresh Jalabahamatra* from India. I would like to apply for the Universal Translator job that you are offering. I am very skilled in Universal Translation and have many years of experience. I have done Universal Translation for many clients in the past, and I consider your offered job as Universal Translator to fit my skills perfectly.

    Hoping to hear from you soon.

    --
    "Everybody's naked underneath" -- The Doctor
    1. Re:I would like to apply for the job by ems2004 · · Score: 2

      ...and as a proof for my skills I am currently on H1B visa specifically granted for Universal Translator..... I also have Master's degree in Universal Translator.... In my last 3 jobs I developed and wrote Universal Translator only....

      --
      ..... best things in life are not so free..........
    2. Re:I would like to apply for the job by rosshalz · · Score: 1

      Dear Sir,

      My name is Mahindresh Jalabahamatra* from India. I would like to apply for the Universal Translator job that you are offering. I am very skilled in Universal Translation and have many years of experience. I have done Universal Translation for many clients in the past, and I consider your offered job as Universal Translator to fit my skills perfectly.

      Hoping to hear from you soon.

      I'm Indian and I find this absolutely hilarious.. Should I be offended? Naa.. I've seen too many applicants use similar language... I will however laugh uncontrollably for the next 5 minutes at that name... Mahindresh Jalabahamatra

    3. Re:I would like to apply for the job by erapert · · Score: 1

      Could you please explain why that name is so funny?

  9. Most difficult part done by manu0601 · · Score: 1

    The most difficult part of the project is completed: they found a nice backronym for it.

  10. Re:Will it translate /. binspam bad-translations? by PolygamousRanchKid+ · · Score: 1

    Different framerates have been a hot topic in recent years.

    On Slashdot, different flamerates have been a hot topic in recent years.

    --
    Schroedinger's Brexit: The UK is both in and out of the EU at the same time!
  11. Text search has limitations WITHIN a language. by hey! · · Score: 1

    As you've no doubt experienced when you've done a Google Search on a word which has multiple meanings. For example, suppose you google "How do I get rid of a mole?" Are you worried about a skin condition or a small burrowing mammal? It so happens that Google tries to give you a mix of both answers, which I suspect may reflect the result of some ad hoc result tweaking.

    So you do sometimes have to know how to rephrase a query, e.g. "pictures of a flying crane" to "pictures of an aerial crane".

    The problem is when you cross languages, words don't have a simple one-to-one relationship. For example the Latin world "sacer" can mean either "holy" or "unholy"; in a sense English treats the concepts as antonyms whereas Latin treats them as two kinds of the same thing. And there's idioms, like the Arabic "Ya'aburnee" (unicode redacted), which literally means "you bury me" but usually means "I love you" (i.e., I can't live without you). Of course you can program idioms like that into your translator, but your'e still going to have to accept either lots of false positives or false negatives. If you're a native speaker of Arabic you can tell from context whether the document you're looking at is talking about love or burial; if you're looking at a machine translation you won't be as sure.

    But of course just as false positives don't make Google useless, false positives wouldn't make a multi-language search engine useless. You just have to be aware of the limitations. But what concerns me is the tendency of people to think this stuff works like magic.

    --
    Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
    1. Re:Text search has limitations WITHIN a language. by Agripa · · Score: 2

      "Out of sight, out of mind."

      Translation:

      "Invisible idiot."

  12. Lol. no, Google does it better than you, a human by raymorris · · Score: 1

    > Dor example, suppose you google "How do I get rid of a mole?" Are you worried about a skin condition or a small burrowing mammal?

    No, Google would already know that neither of those interpretations is correct. Google tracks your search history, it knows who is asking. So when the CIA asks how to get rid of a mole, Google knows they are talking about a https://en.m.wikipedia.org/wik... mole.

  13. the how and the why are unrelated by raymorris · · Score: 1

    They expect people to publish research into how to take some English search terms and then search a pile of assorted documents in different languages. The public can see (some of) HOW one can search text. So we get to see some ideas about searching general text.

    Which text they later search, for what reasons, is a completely separate issue. If they can get a system like this developed, they would be foolish if they didn't use it in their national security mission. In fact, most intelligence is from open sources (OSINT). The challenge for the intelligence agencies is to glean some useful information from the billions of newspaper articles, forum posts, tweets, ads, presentations, scholarly papers, job postings, etc that are available. For example, if a government posts a job ad for highly skilled machinists, and separately a requisition for Acme model 502 control circuits, and got a large shipment of helium, and the power plant in Skitsville is supplying abl heavier load than normal, that suggests the country is building ______ in Skitsville. The challenge is finding all these little bits of information, and then putting the pieces together. Before 9-11, various US agencies had different pieces of intelligence, but none had them all together, to see how the tidbits fit together to reveal the danger.

    Here's an entertaining example where there was no need to put the pieces together, the spy agency just needed to find this one secret published in the open. When the B2 bomber was revealed to the public, reporters only got a front view and had to stand 200 feet back, so they couldn't see the rear of the plane or the overall shape as would be seen from above. BEFORE even that much was revealed, Honda ran this ad:

    https://i.kinja-img.com/gawker...

    Honda got called to Washington to answer how the hell they knew exactly what the plane looked like - nothing like that had been released, the shape was classified at the time. Intelligence services from other nations only had to find that ad, in a mountain of ads, to get a picture of the USA's top-secret plane.

  14. Mmmm..k? Code: by TheOuterLinux · · Score: 1

    text=document.txt translate="$(wget -U "Mozilla/5.0" -q0 - "http://translate.googleapis.com/translate_a/single?client=gtx&sl=auto&tl=en&dt=t&q=$(xsel -o | sed "s/[\"']//g")" | sed "s/,,,0]],,.*//g" | awk -F'"' '{print $2, $6}')" echo $translate Not tested, but should work. Idea came from here:http://www.webupd8.org/2016/03/translate-any-text-you-select-on-your.html?m=1 Got more links and stuff at TheOuterLinux.com

    1. Re: Mmmm..k? Code: by TheOuterLinux · · Score: 1

      Did that with my phone. Had no idea it would mash it all together. :(

  15. Re:Babel by hackwrench · · Score: 1

    The Tower wasn't the instrument of bad translation, just the beneficiary of it. My takeaway was that much like your example with the word, "Tower" different words had different meanings for different people but were close enough to work in enough contexts, but fell apart in all the contexts that were needed to build the tower.

  16. Re:Much better actually by hackwrench · · Score: 1

    You see, the software isn't predisposed to one interpretation or another. After the training is another matter however.

  17. Cool by zedaroca · · Score: 1

    Now they'll drone murder us based on what an algorithm mistranslated.
    If at least a bilingual murderer had to listen to the xbox record of us joking in the living room our chances would be higher.