Natural Language Processing for State Security
Roland Piquepaille writes "Obviously, computers can't have an opinion. What computers are very good at, though, is scanning through text to deduct human opinions from factual information. This branch of natural-language processing (NLP) is called 'information extraction' and is used for sorting facts and opinions for Homeland Security. Right now, a consortium of three universities is for the U.S. Department of Homeland Security (DHS) which doesn't have enough in-house expertise in NLP. Read more for additional references and a diagram showing how information extraction is used."
Has anyone looked at the "diagram" cited in the original post? At first, I couldn't make sense of anything on there, especially the confusion of colors in the lower-right corner of the image. Then I spotted the "IE system components" and thought, "Oh, that explains it. This is describing Internet Explorer. No f***ing wonder it's a mess."
! troll
An Indian-American Hindu committed to non-violent thought/speech/action alarmed by the global explosion of radical Islam
For as smart as the average /. user is (allegedly anyway), it never ceases to amaze me to read the tags associated to a story. Are /. users incapable of comprehending what a *tag* is for?
/. needs certain "tags" to be given automatically ... ala Wheel of Fortune's vowels and consonants (Wheel of Fortune is a game show, google it if you don't know.)
... uhh... ok ... yeah - that makes sense.
It's supposed to be a way to identify an article based on keywords. It's not an opinion poll. Keywords like "yes", "no", and "duh", are completely irrelevant!
Every article on
We automatically give you - "yes", "no", "maybe", "duh", "slownewsday", "slashdotted", and "fud"
Search slashdot articles based on tags for "duh"
What comptuers are very good at, though, is scanning through text to deduct human opinions from factual information. This branch of natural-language processing (NLP) is called 'information extraction' and is used for sorting facts and opinions for Homeland Security.
Yeah, because we need AT&T giving wide-scale, undocumented wiretaps to the NSA, who use voice recognition to generate transcripts of everyone's phone calls, and then DHS can run NLP on those transcripts to compile a list of "persons of interest", who are then automatically added to the TSA no-fly lists.
Yeah, I can envision the future, and the future sucks.
Push Button, Receive Bacon
What comptuers are very good at, though,
.... is spell-checking.....
....something, apparently, the editors are not good at....
Have you read my journal today?
1337 to 10101001101
The slippery slope to being automatically flagged as someone to watch out for. No human control in the process, but one day when you go to apply for a loan or get your drivers' licence renewed, you might get a surprise.
Job? I don't have time to get a job! Who will sit around and bitch about being broke and unemployed then?
Number 891224 has expressed a dislike of Emperor Bush, incident reported to FBI and Homeland Security.
Great Intellect...
... I want to see this functionality in Internet search engines!
http://outcampaign.org/
There is a great little company in Brooklyn, NY called Alias-i. Some years ago they built this interesting "tool" called....guess....ThreatTracker. Information Extraction, Named Entity Recognition and other interesting stuff, if you are into this.
No, I don't work for them, but their LingPipe toolkit has some cooooool stuff.
Simpy
I would say that comptuers (sic) aren't very good at deducting human opinions yet. They _may_ become better. Are humans good at deducting other humans opinion yet?
I just can't be bothered.
""Obviously, computers can't have an opinion."
. stm
http://news.bbc.co.uk/2/hi/science/nature/5303126
"Meet George, 39, single, quirky sense of humour, looking for friends to chat with online.
He's a profound intellect and speaks 40 languages, but is also prone to unwarranted rudeness and his banter can be slightly disjointed. "
Is that it could be used to train a true AI (uh... not "artificial insemination"... the other kind). Just what do you think you're doing, Dave?
Beer is proof that God loves us, and wants us to be happy.
I have, in agregate, spent about 3 1/2 years in the last 20 years working on using NLP for semantic information extraction.
Possible? Yes, given very narrow domains of discourse and lots of work.
It's clear "national security" has become what "the internet" or "the cold war" were in their prime: an all-purpose catchphrase to get funding for any research whatsoever, no matter how tenuously connected.
Look at the two project proposals below and imagine which one will have an easier time getting funding:
"An epistemological metaanalysis of object-subject interrelations and conflict avoidance in Beowulf"
or
"An epistemological metaanalysis of object-subject interrelations and conflict avoidance in Beowulf to better understand threats to NATIONAL SECURITY"
Trust the Computer. The Computer is your friend.
Wow, thanks for another waste of time. And you people stop linking to his blog in comments, he exists for nothing but ad clicks.
The tagging system here seriously sucks. Tags should be there to enable users to efficiently look up relevent topics. Not so that slashdot editors can express their rather childish opinion of the article. (This is why I have tag display turned off)
So this program is supposed to determine opinions held by people who write factual statements?
So if you fed it something like "President Bush and Clinton met briefly today, at a charity dinner for underprivileged children." Would the computer tell a Democrat that the author's opinion is that "Bush deserves to be impeached and the Republicans are all hate filled fearmongers" and tell the Republicans that the author's opinion is that "Clinton is evil and the Democrats are all hate filled idiots"?
Or would it be the other way around? I can see it being a useful tool on the War On Terrah: "folks, all of our evidence against this man is top secret and can't even be released to a court martial of highly respected and cleared military officers and the records sealed. But I can tell you that the computer says the man hates America."
There goes a promising career path. I know any technology can be used for good or for evil, but in today's political climate, it seems especially irresponsible to be aiding and abetting what may wind up becoming the pretext for torture of some 16 year old blogger.
Now, if you'll excuse me, I have to prepare myself for my upcoming extraordinary rendition....
Of course, stuff that is stated as fact could be opinion, conveniently made to look like fact. Hence Orwellian doublespeak. Given how far AI is at current, I would say that such an algorithm would not really be able to alert flag doublespeak.
Do not downmod posts "overrated" simply because you disagree with them.
Sounds kind of like DARPA's Information Processing Technology Office's GALE Program:
" The goal of the GALE (Global Autonomous Language Exploitation) program is to develop and apply computer software technologies to absorb, analyze and interpret huge volumes of speech and text in multiple languages, eliminating the need for linguists and analysts and automatically providing relevant, distilled actionable information to military command and personnel in a timely fashion. Automatic processing "engines" will convert and distill the data, delivering pertinent, consolidated information in easy-to-understand forms to military personnel and monolingual English-speaking analysts in response to direct or implicit requests."
Demented But Determined.
That doesn't stop the really determined idiot though. Oh no.
I have a spelling checker,
It came with my PC.
It plane lee marks four my revue
Miss steaks aye can knot sea.
Eye ran this poem threw it,
Your sure reel glad two no.
Its vary polished in it's weigh.
My checker tolled me sew.
A checker is a bless sing,
It freeze yew lodes of thyme.
It helps me right awl stiles two reed,
And aides me when eye rime.
Each frays come posed up on my screen
Eye trussed too bee a joule.
The checker pours o'er every word
To cheque sum spelling rule.
Bee fore a veiling checker's
Hour spelling mite decline,
And if we're lacks oar have a laps,
We wood bee maid too wine.
Butt now bee cause my spelling
Is checked with such grate flare,
Their are know fault's with in my cite,
Of nun eye am a wear.
Now spelling does knot phase me,
It does knot bring a tier.
My pay purrs awl due glad den
With wrapped word's fare as hear.
To rite with care is quite a feet
Of witch won should bee proud,
And wee mussed dew the best wee can,
Sew flaw's are knot aloud.
Sow ewe can sea why aye dew prays
Such soft wear four pea seas,
And why eye brake in two averse
Buy righting want too pleas.
-- "Candidate for a Pullet Surprise"
By Jerrold H. Zar, Northern Illinois University
Journal of Irreproducible Results 39, 1 (Jan.-Feb. 1994): 13
Deleted
That is just a totally ridiculous way to waste money. If the umerican government actually have people employed to READ a lot of text to see what peoples opinions are; good; people can sometimes *get* how other people feel about a topic (though not often).
But a machine? COME ON! NLP has not moved (forward) AT ALL since it came. To rely on (and/or) pay for something like this is just a waste of money. An overoptimistic academic wanking-session doomed to fail. (I've seen it before).
They should use that money on I.Q. testing/screening the next presidential candidates; that will improve that "homeland thingy" more than anything.
That's why they're problems and not inconveniences.
Why do I immediately assume this will be abused?
DHS officer: Mr. 100%, I'm afraid we'll have to take you into custody. Our information extraction search on your blog concluded you are anti-American.
Me: From my blog? Is this about my criticism of the Iraq war?
DHS officer: Our results are classified, but please accompany us to GTMO for further "information extraction" to confirm the results of our investigation...
Ok, I know I'm taking a very cynical view here and that's pretty full of FUD, but why else does State security need this? Is this for them to monitor every chat room and blog?
What great breaking news! Praise be to Roland P. for his insight!
Your natural language parser will be considered acurate only once it can understand the meaning of Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo, until then it is useless.
Obviously, computers can't have an opinion.
Welcome the new opinion-based CAPTCHA-s!
A sarcasm detector, that's a real useful invention. - CBG
I was a little confused how they used the link between human brain activity on different wave lengths to extract opinion from written text, but Neuro-linguistic programming is apparently not the most popular term with NLP as an acronym.
This could be a double edge sword for the government. What if it falls in the wrong hands? People all over could use the technology on the news to extract the real information, and realize that things are not what they seem.
Of course, I suppose that what they would probably do at that point is turn back to Fox, because it is more entertaining news, that is the reason the news sucks now anyways.
You take it, I don't want it...
Results 1 - 7 of about 14 for "roland piquepaille" "cmdrtaco" "fan fiction". (0.18 seconds)
This is a pumped-up military-grade version of Word executive summaries?
What comptuers are very good at, though, is scanning through text to deduct human opinions from factual information.
... aims to teach computers to scan through text and sort opinion from fact. Or, We're interested in seeing how we would extract information about opinions.
Funny, because neither of the articles state that. In fact, they don't even say that software can do that at all yet: A new research program
So yeah, it would be nice if they could sort opinions from facts. Why they're at it, why don't they just recognize lies from truth too, because wouldn't that be doing the exact same thing? Then we can just run statements made by people suspected of committing a crime through the software, which can then sort out all the facts from the opinions, and we'll no longer need judges, juries or attorneys.
Roland, next time save yourself some time and just make the whole freaking thing up from scratch.
Dan East
Better known as 318230.
another thing Rolands computer is not very good at is spell checking his posts!
Screw national security, how about search, how about for business and commerce, how about for for culturial exchange and global interaction. The chances of me getting attacked by a terrorist are less than getting hit by lightning, the chances with dealing with foriegn cultures, foriegn business and commerce are rapidly approaching 100%. There are 4 billion people out there who have the potential to mutually benifit from clean communication. Please don't patrinoze me, I'm not too worried about getting nailed by terrorists, but am very bothered by the possibility of having my individual liberties nickeled and dimed to death.
"Right now, a consortium of three universities is for the U.S. Department of Homeland Security (DHS) which doesn't have enough in-house expertise in NLP."
If one of these NLP "expert" systems can extract fact or opinion from that sentence, we should delete it.
--
make install -not war
Knowing the general quality of the average programmer, it stands to reason that this code will only be validated to function in the usual case; thus, the 3l33t coder immediately realizes that simp1e substitutions present an initial defense against the naive academic's simple-minded algorithms and the cut-and-past output of their underpaid cheating slaves (which is, to mean, graduate students or even cheaper undergrads), bringing us closer to the more important question for which this test sentence is being written; therefore, we begin the second half of this ramble by introducing the astute and perhaps somewhat peeved reader to the conundrum with which the beast is to be tamed, but not before further wasting precious time on behalf of the experiment, not that any of this would be enough to induce buffer overflow attacks in the aforementioned poorly written code which would probably never even notice the following nop sled that is to be delivered by overflowing one of the many buffers in the parse tree required to decipher the previous drivel -- 0x0000 0x0000 0x0000 0xDEAD.
Of course universities will be scrambling to help. Big dollars, imprecise goals..... and many of the professors would have done research in related fields.
Engineering is the art of compromise.
This story fits in the broader context of a developing "surveillance state" in the USA. Forget about wiretaps and such, I just want to focus stuff that is out in plain view.
The 4th amendment says:
The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.
Evidence gathered via public cameras, recording of pulic conversations, etc - all stuff out in "public" is generally protected by the doctrines of plain view or Open Fields with the reasoning that people do not have an expectation to privacy when they are out in public. That makes sense as long as we maintain a sense of proportion.
Nowadays (versus in ye olde tymes when the bill of rights was written) it is becoming feasible for governments and large corporations to have a much "broader view" of events "out in public" - a view that is more broad and far-reaching than that of any regular person. In England, they've got thousands, probably tens of thousands, of cameras recording public areas 24x7. No one but the government can do that. Similarly, no one can read every post to slashdot, every post to every blog, discussion forum, etc on the web - you can't even do it for just 1/1000th of them.
Even with public tools like google, there are still some kinds of things - like the more orwellian uses of sophisticated NLP tools, that regular people just can't do, but large organizations like the government can, and seemingly want to do.
I think that when it gets to the point where large-scale and automated surveillance programs are used to gather evidence, that such searches no longer fall under the definition of "reasonable." That video monitoring of even a "fair-sized" minority (a purposely vague term on my part) of public places is not a reasonable search because the means to do so are far beyond those available to an average person or group of people.
So, what kind of doctrine am I proposing to replace "Plain View" or "Open Fields?" I don't quite know yet - maybe something that differentiates between actively searching for specific facts or events versus passive monitoring that records any and everything for later examination.
It just seems to me that when the primary reason that you can't expect privacy in some semi-public area is because the government has the equivlanet of 10million guards watching and listening to most every public space in real life and online, that the situation has progressed far beyond the state of reasonable and off deep into the territory of excessive or extreme.
When information is power, privacy is freedom.
I wonder how long before we have to pledge allegiance to the NSA to support their war on terrorism?
Hmmm, someone at the front door at this late hour. Be right bac...%&$...no carrier.....
Next up: Solving the halting problem to prevent child pornography, finding Osama by solving the traveling salesman problem in constant time, defeating global warming with the Turing test, and using learning computers to stop illegal immigration!
Can someone clue me on in this funny mod? I must say I'm puzzled. Is it because it looks like astroturfing?
Knee-jerk big brother posts really don't belong here, as most of their research on subjectivity and sentiment is general-purpose - companies, for example, would love to know who are talking about their products and how they feel, without hiring people to scrape for this stuff all day.
:-O
Like the article says, most of the documents they work with are newspaper text and so on. And there's a LOT that needs manually annotated
Anywho, scientists NEED funding somewhere, guys.
http://projects.ldc.upenn.edu/ace/
GALE seems geared towards translation and aggregation of data for convenient
access by mono-lingual military and intelligence personnel. The goal of the
ACE project is to provide classification of data based on what it actually
means.
Probably both...
Call me old fashioned, but I like a dump to be as memorable as it is devastating - Bender
Semantic analysis of opinion in USENET http://www.crs4.it/ict/dart06/program.html and follow the pdf link under G.Attardi of Univ. of Pisa Italy "Extracting Dependency Relations for Opinion Mining" - treats languages other than English - avoids Chomsky
Artificial intelligence is the study of how to make real computers act like the ones in the movies.
..."King George" instead of "Emperor Bush". The last jackass who thought he was king here was named George too, so it has a more signifigant ring to it.
"Our morality is good, theirs is repressive."- Partisanship Rule #3
Analysis complete.
Subject is: Terrorist
Aim is: Blow up Universities to destabilise Dollar
You mean it was not the computers that voted for George W Bush? Then who the hell did?
Sent from my ASR33 using ASCII
How about we develop a GALG (Global Autonomous Language Generation) bot that will spit out threats to US national security and spam this Orwellian system to oblivion? If GALE can analyze it, GALG can create it.
Sounds to me like spam advertisers would welcome the extra increased eyes and attention that a few additional threats included by their spambot will give their messages. -> PROFIT
What computers are very good at, though, is scanning through text to deduct human opinions from factual information.
Nope. Computers are good at processing data that has been formatted in a way that they can interpret and running that data through algorithms to come up with some sort of result. They're also good at making grilled cheese sandwiches.
Obviously there will be bias. That's the whole point. Life is biased. Deal with it. Not everybody is equally likely to commit a crime, for example 3-year-old girls are very unlikely to bomb skyscrapers. Is there anything wrong with not checking them ?
The point is to find relations between people that commit crimes so they can be caught red-handed TRYING to hijack a boeing, finding 20 armed policemen inside the plane instead of the innocent passengers they were expecting to kill.
If they're wrong. You cannot be sentenced without an independant review of the evidence. So what's the problem ?
Let's take a stupid simple case. Say they find 45% of muslim redheads kill people at round points, then what exactly is wrong with making sure a policeman is watchin round points near the places they live ?
Life is biased. In a thousand ways. One of them is that YOU are biased (against neo-cons for example), so why whine about it ?
I don't know what 'opinion' means here. If we take opinion == suggestion, computers have already done that. Just look at MS Office 2007 Beta. The spell checker now comes with more features than just a spell checker. It does check the grammar as well as the context of a word that been used. Then, it gives some 'opinions' on how to correct the errors..
Often, on Windows, there always been 'opinion' given by the system, at least to say that your system is not secure by turning off Windows XP SP2 Firewall.. Doesn't that sounds opinion to all of you?
Or does opinion here means, something that totally not been programmed for the computer, and it can obtain the information itself? Good Luck in finding it... if there is one..
AI systems, also need HUMANS to give the input and set up the prolog..there's no way computer can give their pure opinion, as human can by setting up theirs, word by word..
For me, opinion from computers are : suggestion or error message that has been already input by HUMANS..
http://www.whitehouse.gov/
:-) No, this should be open-source, open standards, and distributed, not centalized. Only thing the State could do to help would be to waive copyright and database aggregation rights. Grant fair use license. Hell, abolish copyright and patents, but I digress. I wager "homeland security" would be much more assured, though.
eg. "Our goal, and our mission, is to help Lebanese citizens and Lebanese businesses not only recover, but to flourish, because we believe strongly in the concept of a democracy in Lebanon."
I really dread reading the newspaper anymore. One morning I'm going find that someone has come out four-square for the concept of a democracy in the U.S.A.
This is incredibly useful and worthwhile research, but I fear it would be totally lost on DHS if it were left in their hands. Just the second-order effects of self-reference, not just exposing the Prez' badly parsed horse-puckey, they couldn't handle. Too much paperwork
jbdigriz
Information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured or semistructured information from unstructured machine-readable documents. A type of concept extraction that automatically recognizes significant vocabulary items in text documents, such as, names, terms, and expressions.
No-one seems to have pointed out that "to deduct human opinions from factual information" means to subtract human opinions from factual information. The intended word is deduce.
/. audience is maturing, and no longer finds it necessary to correct every little error. Hmmm - sounds a little far-fetched...)
Apparently, everyone who actually knows English has now officially abandoned Slashdot. (Unless the lack of corrections is a sign that the
I'd rather have a computer flagging me than a human who may judge me by the color of my skin?? whoever care about computer flagging...because computer is made by human and computer is stupid...they know nothing about judging..hahaha
...or how I learned to stop worrying and love corporate campaign contributions. How far has it gone? Well, the vice president of the United States is a military subcontractor for God's sake.
None of this shit has to work, it just has to cost a boatload of money. And the subcontractors who get the wads of cash have to be those who contributed to the right people.
I think I need a splint for my fractured brain... or else I need to spring this on some other people, and expand the mental carnage...
you really expect me to be able to express my opinion of what's so fucked up in this world in 120 characters or less?
computers are already becoming very very advance. but till today computers can only do what we want it to do. we provide information to it and then its processes it within seconds. given the right input computers can come up with great amounts of information. besides that computers can also hold up to millions of information within its memory. what if computers are built so that it can think and act on its own without any human intervantion, just by observing human behavior they will be able to act. computers will then take over the world. people will not have a place to work. no use of military defences... it will be doom days for the human race... and this time it will not be because of a meteorite or a plague... but it will be because of our own doing. the time of the human race will end and the time of the machines will begin, and we will be slaves to the very things we created.... SCARRY!!!!!1 hahahhhh
Natural language processing as a part of Information extraction or information retrieval is used to represent information in intelligent way and its exploits the information which is in unstructured machine-readable to represent it in efficient way to the user.