US Intelligence Seeks a Universal Translator For Text Search In Any Language (arstechnica.com)
An anonymous reader quotes a report from Ars Technica: The Intelligence Advanced Research Projects Agency (IARPA), the U.S. Intelligence Community's own science and technology research arm, has announced it is seeking contenders for a program to develop what amounts to the ultimate Google Translator. IARPA's Machine Translation for English Retrieval of Information in Any Language (MATERIAL) program intends to provide researchers and analysts with a tool to search for documents in their field of concern in any of the more than 7,000 languages spoken worldwide. The specific goal, according to IARPA's announcement, is an "'English-in, English-out' information retrieval system that, given a domain-sensitive English query, will retrieve relevant data from a large multilingual repository and display the retrieved information in English as query-biased summaries." Users would be able to search vast numbers of documents with a two-part query: the first giving the "domain" of the search in terms of what sort of information they are seeking (for example, "Government," "Science," or "Health") and the second an English word or phrase describing the information sought (the examples given in the announcement were "zika virus" and "Asperger's syndrome"). The system would be used in situations like natural disasters or military interventions in remote locations where the military has little or no local language expertise. Those taking on the MATERIAL program will be given access to a limited set of machine translation and automatic speech recognition training data from multiple languages "to enable performers to learn how to quickly adapt their methods to a wide variety of materials in various genres and domains," the announcement explained. "As the program progresses, performers will apply and adapt these methods in increasingly shortened time frames to new languages... Since language-independent approaches with quick ramp up time are sought, foreign language expertise in the languages of the program is not expected." The good news for the broader linguistics and technology world is that IARPA expects the teams competing on MATERIAL to publicly publish their research. If successful, this moonshot for translation could radically change how accessible materials in many languages are to the rest of the world.
US Intelligence
Let's make like a bird... and get the flock outta here.
Oxymoron... that's like, a moron that is addicted to OxyContin?
MATERIAL... They think they are so clever.
It's free and fairly accurate.
Here's how:
How about writing, "The sheep are coming..."
...And this to mean something entirely different in the bad guys' minds?
Easy and effective. Isn't it?
Will it be able to give meaning to poorly-translated newsfeeds like the ones this slashdot contributor's history?
Sample:
"Various framerates have been a warm theme before few years?"
It gets worse from there.
Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
I don't speak but a handful of words in a very short list of languages, I'm certainly no expert in language, but aren't there some languages that are so nuanced that a slight change in inflection, or tone, or emphasis, or maybe even cadence changes the entire meaning of what's being said? Wouldn't that be rather difficult to code for?
PAWG
Doesn't Google already do exactly that? Oh, wait. Yes, it does. But the DoD would have to let Google index their archive...
"Shoot, a fella could have a pretty good weekend in Vegas with all that stuff."
to only use it for good!!
scouts honour!
If successful, this moonshot for translation could radically change how accessible materials in many languages are to the rest of the English speaking world.
We need only to look at the BIBLE to see what happened the LAST TIME someone tried to create a Tower of Babel to see what will happen THIS time.
Dear Sir,
My name is Mahindresh Jalabahamatra* from India. I would like to apply for the Universal Translator job that you are offering. I am very skilled in Universal Translation and have many years of experience. I have done Universal Translation for many clients in the past, and I consider your offered job as Universal Translator to fit my skills perfectly.
Hoping to hear from you soon.
"Everybody's naked underneath" -- The Doctor
... Sheldon Cooper.
It little behooves the best of us to comment on the rest of us.
Natural language is inherently ambiguous and real humans love to make it more so with slang and swearing. Take the story of the gorilla artist Jason Sprinkle from Seattle. He was once most known for attaching a ball and chain to the massive hammering man statue on labor day. He had a commission for an art project to support job corp where he made a giant heart and drove it around to different job corp sites where he allow participants to sign the art and his truck. Once person wrote on his truck, "“Timberlake Carpentry Rules (the ‘Bomb’)” on the front bumper of the truck" as a slang for very cool. One day, pre-9/11, he was upset with cuts to city art funding and decided to park the truck, heart and all, in Seattle's main square to draw attention to the arts. Needless to say, the police interpreted the graffiti on his truck literally and the artiest ended-up in jail for a month which essentially ruined his life. OK, cops panicking in the heat of the moment you might expect, but if in the cold light of day prosecutors and the courts have such a problem handling slang, what are the chances some brainless code will be able to handle it?
The most difficult part of the project is completed: they found a nice backronym for it.
As you've no doubt experienced when you've done a Google Search on a word which has multiple meanings. For example, suppose you google "How do I get rid of a mole?" Are you worried about a skin condition or a small burrowing mammal? It so happens that Google tries to give you a mix of both answers, which I suspect may reflect the result of some ad hoc result tweaking.
So you do sometimes have to know how to rephrase a query, e.g. "pictures of a flying crane" to "pictures of an aerial crane".
The problem is when you cross languages, words don't have a simple one-to-one relationship. For example the Latin world "sacer" can mean either "holy" or "unholy"; in a sense English treats the concepts as antonyms whereas Latin treats them as two kinds of the same thing. And there's idioms, like the Arabic "Ya'aburnee" (unicode redacted), which literally means "you bury me" but usually means "I love you" (i.e., I can't live without you). Of course you can program idioms like that into your translator, but your'e still going to have to accept either lots of false positives or false negatives. If you're a native speaker of Arabic you can tell from context whether the document you're looking at is talking about love or burial; if you're looking at a machine translation you won't be as sure.
But of course just as false positives don't make Google useless, false positives wouldn't make a multi-language search engine useless. You just have to be aware of the limitations. But what concerns me is the tendency of people to think this stuff works like magic.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
babelfish
> Dor example, suppose you google "How do I get rid of a mole?" Are you worried about a skin condition or a small burrowing mammal?
No, Google would already know that neither of those interpretations is correct. Google tracks your search history, it knows who is asking. So when the CIA asks how to get rid of a mole, Google knows they are talking about a https://en.m.wikipedia.org/wik... mole.
They expect people to publish research into how to take some English search terms and then search a pile of assorted documents in different languages. The public can see (some of) HOW one can search text. So we get to see some ideas about searching general text.
Which text they later search, for what reasons, is a completely separate issue. If they can get a system like this developed, they would be foolish if they didn't use it in their national security mission. In fact, most intelligence is from open sources (OSINT). The challenge for the intelligence agencies is to glean some useful information from the billions of newspaper articles, forum posts, tweets, ads, presentations, scholarly papers, job postings, etc that are available. For example, if a government posts a job ad for highly skilled machinists, and separately a requisition for Acme model 502 control circuits, and got a large shipment of helium, and the power plant in Skitsville is supplying abl heavier load than normal, that suggests the country is building ______ in Skitsville. The challenge is finding all these little bits of information, and then putting the pieces together. Before 9-11, various US agencies had different pieces of intelligence, but none had them all together, to see how the tidbits fit together to reveal the danger.
Here's an entertaining example where there was no need to put the pieces together, the spy agency just needed to find this one secret published in the open. When the B2 bomber was revealed to the public, reporters only got a front view and had to stand 200 feet back, so they couldn't see the rear of the plane or the overall shape as would be seen from above. BEFORE even that much was revealed, Honda ran this ad:
https://i.kinja-img.com/gawker...
Honda got called to Washington to answer how the hell they knew exactly what the plane looked like - nothing like that had been released, the shape was classified at the time. Intelligence services from other nations only had to find that ad, in a mountain of ads, to get a picture of the USA's top-secret plane.
text=document.txt translate="$(wget -U "Mozilla/5.0" -q0 - "http://translate.googleapis.com/translate_a/single?client=gtx&sl=auto&tl=en&dt=t&q=$(xsel -o | sed "s/[\"']//g")" | sed "s/,,,0]],,.*//g" | awk -F'"' '{print $2, $6}')" echo $translate Not tested, but should work. Idea came from here:http://www.webupd8.org/2016/03/translate-any-text-you-select-on-your.html?m=1 Got more links and stuff at TheOuterLinux.com
The Tower wasn't the instrument of bad translation, just the beneficiary of it. My takeaway was that much like your example with the word, "Tower" different words had different meanings for different people but were close enough to work in enough contexts, but fell apart in all the contexts that were needed to build the tower.
You see, the software isn't predisposed to one interpretation or another. After the training is another matter however.
Most of my on-line footprint is C-code on github, and I'd be surprised if anyone or anything can read that mess.
Univerisal Translators may be bought at competitive prices from General Intelligence, a subsidiary of Douglas Adams Industries.
Now they'll drone murder us based on what an algorithm mistranslated.
If at least a bilingual murderer had to listen to the xbox record of us joking in the living room our chances would be higher.