AI System Sorts News Articles By Whether Or Not They Contain Actual Information (vice.com)

← Back to Stories (view on slashdot.org)

AI System Sorts News Articles By Whether Or Not They Contain Actual Information (vice.com)

Posted by BeauHD on Wednesday January 3, 2018 @10:00PM from the fact-is-stranger-than-fiction dept.

In a new paper published in the Journal of Artificial Intelligence Research, computer scientists from Google and the University of Pennsylvania describe a new machine learning approach to classifying written journalism according to a formalized idea of "content density." "With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles," reports Motherboard. From the report: At a high level this works like most any other machine learning system. Start with a big batch of data -- news articles, in this case -- and then give each item an annotation saying whether or not that item falls within a particular category. In particular, the study focused on article leads, the first paragraph or two in a story traditionally intended to summarize its contents and engage the reader. Articles were drawn from an existing New York Times linguistic dataset consisting of original articles combined with metadata and short informative summaries written by researchers.

41 of 80 comments (clear)

Min score:

Reason:

Sort:

Daily Mail is fucked by Anonymous Coward · 2018-01-03 22:25 · Score: 2, Funny

Daily Mail is fucked then.
1. Re:Daily Mail is fucked by AmiMoJo · 2018-01-04 00:34 · Score: 4, Informative
  
  The Daily Mail is 97% opinion, but does usually include the facts at the very end of the article. The trick they use is to split the article over two pages, or make it long enough to people don't get to the end.
  A classic example was a story about the EU banning companies from claiming that bottled water cured dehydration. They had endless quotes from outraged morons ranting about the terrible EU and it's idiocy. Then right at the end someone sane explaining that dehydration is a medical condition with a variety of causes, many of which cannot be cured by drinking water, and the blanked rule on making unsubstantiated or misleading medical claims in advertising stands.
  
  --
  const int one = 65536; (Silvermoon, Texture.cs)
  SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
Ledes dammit by Anonymous Coward · 2018-01-03 22:48 · Score: 4, Informative

"In particular, the study focused on article leads ledes..."
How can we take this article seriously if the publication doesn't know the correct spelling of their own industry's terminology?
The introduction to a news article is called the 'lede' and is usually in the first paragraph as in an essay. The 'lede' is a deliberate misspelling of 'lead' to prevent confusion in the days when printing was done with lead type.
Finally something that might be useful... by Anonymous Coward · 2018-01-03 22:49 · Score: 1

This could actually be useful to filter out all the damned opinion, PR, speculation and punditry masquerading as news.
I've wasted way too much time on articles that are mere rumors about something that "sources" said. Too many blowhards talk about legal opinions with maybe a couple of quotes from the ruling and no link so that I can actually read what the court decided and half the time the lawyers they quote give their opinion of how things ought to be rather than explaining the actual laws that are in place. Then there's the PR pieces repackaged as news, the various lawsuits that quote huge, bogus, demand numbers to get headlines when the amount demanded has no actual basis in fact until there's been a judgement or settlement, and all the other utter crap that people write instead of informing us about things that happened, i.e. actual news.
1. Re:Finally something that might be useful... by Opportunist · 2018-01-04 02:29 · Score: 1
  
  Then we could actually start watching the news again. With PR, speculation, opinion pieces and other bull gone, what's left shouldn't take longer than 5 minutes to read.
  
  --
  We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
2. Re: Finally something that might be useful... by Opportunist · 2018-01-04 03:37 · Score: 1
  
  Should've bought some in my country, too.
  
  --
  We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Sci-fi novel by rastos1 · 2018-01-03 23:01 · Score: 1

Long time ago I've read a short sci-fi novel about such machine. On the day of first public demo, they fed the machine with the research paper about the machine itself. The machine spitted out only the title.
Unfortunately.... by LordHighExecutioner · 2018-01-03 23:22 · Score: 1

...the AI system discarded the new paper describing this technology, since the paper did not contain new information.
1. Re:Unfortunately.... by K.+S.+Kyosuke · 2018-01-04 01:23 · Score: 1
  
  Finally a viable technology to deal with /. dupes!
  
  --
  Ezekiel 23:20
OHHH LOOK by rtb61 · 2018-01-03 23:22 · Score: 1

Repeat the same lie over and over again and according to the you beut Google AI it becomes the truth wowie zowie, how fucking useless :|.

--
Chaos - everything, everywhere, everywhen
1. Re:OHHH LOOK by jbengt · 2018-01-04 02:28 · Score: 2
  
  Damn, everybody seems to be reading their own bias into this.
  The paper doesn't even mention lies.
  It is about information vs empty words, not truths vs falsehoods..
2. Re:OHHH LOOK by HiThere · 2018-01-04 07:49 · Score: 1
  
  This gives me a lot of problems. Information has a particular meaning, which is, as you note, distinct from truth or falsehood. But it's also distinct from claims of fact versus opinion. A good measure of information is the degree of compressibility with a good compression algorithm, and I'm rather sure that isn't what they meant, since that would cover anything representable in a bit string, and they talk about multiple domains of knowledge.
  I suspect that what they mean is "claims of fact", but I'm not certain.
  
  --
  
  I think we've pushed this "anyone can grow up to be president" thing too far.
over 90% accuracy with a 1-liner by technosaurus · 2018-01-03 23:40 · Score: 1

_Bool has_news(void *content){return 0;}
Accuracy by religionofpeas · 2018-01-03 23:43 · Score: 1

With an average accuracy of around 80 percent
That makes it pretty much useless, then.
1. Re:Accuracy by religionofpeas · 2018-01-04 02:29 · Score: 1
  
  You are still going to do manual verification when you read it.
  In that case, it doesn't save any time, because you still have to read all of them.
Not very impressive. by 140Mandak262Jamuna · 2018-01-04 00:01 · Score: 1

My very simple rule has better batting average. Around 98%.
That simple rule: "Mark all news stories as clickbait".

--
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
AI vs. Greed? Yeah right. by geekmux · 2018-01-04 00:19 · Score: 3

Greed has also proven that clicks are more valuable than facts these days. The nanosecond AI gets in the way of revenue, it will lose.
And we're a long way off from finding a cure that perpetuates bullshit over facts. AI isn't going to change that, because a lot of people enjoy living in a bubble of ignorance. It's one of the main reasons bullshit is so profitable.
Sad to say, but this is a losing proposition from the start.
Use on politicians by Paul+Bristow · 2018-01-04 00:56 · Score: 1

It would be good to see if it can be applied to statements from Politicians. We've often said they can talk for ages without saying anything.

--
- Paul
Misleading headline by tomhath · 2018-01-04 00:57 · Score: 2

The program does not determine if an article "contains actual information". It only classifies whether or not the article is written in the traditional style of a news article. It could still be total bullshit.
Re:are you for real? by Anonymous Coward · 2018-01-04 01:28 · Score: 1

... AI will add known lies to its list of truths.

If we know it's a lie, why do we need an AI? You're asking a machine to tell you what you've just said to it. The point of the AI, is telling people who are really dumb, that it's a lie. The problem is, this sort of AI, rather like really dumb people, doesn't have an empirical method for measuring 'truthiness'. It uses a 'one of these is not like the other' algorithm. That's not as useful as it sounds because social priorities/norms change, meaning the AI is likely to reject "the new" normal.
The problem is that anything which passes, is the absolute truth; not mostly true, reasonably true, or a really excellent deception. See the problem yet? As more low-quality material is fed into it (which happens because most human effort is low-quality), the 'really excellent deception' stories become the bulk of its experience: In short, a negative feed-back loop causing the AI to treat garbage as gospel.
Re:The wet dream of every propaganda office by TuringTest · 2018-01-04 01:43 · Score: 1

So, the AI is a tool that follows the ideology of those that educated it? Colour me surprised.

--
Singularity: a belief in the "God" idea with the "demiurge" relation inverted.
Re: AI vs. Greed? Yeah right. by geekmux · 2018-01-04 02:21 · Score: 1

No. The goal of late is to completely control information. Obviously. AI isn't real in this context. Just a control mechanism. There is a man behind that curtain.
The creation of Social Media will go down in history as one of the most important things to ever happen to capitalism.
Within the framework of Social Media, you are the product being bought and sold. Because of this, one could argue the main goal is to completely control people, but that does not dismiss the capitalistic reason for engaging in that activity. If Greed were not being fed by Social Media, it would likely cease to exist. Chances are the man behind the curtain has the same agenda as many others; nothing more than a corporate puppet master looking to pull the strings of Greed in their favor. In fact, with Google being involved, this is all but guaranteed.
Re:The wet dream of every propaganda office by Opportunist · 2018-01-04 02:31 · Score: 1

As long as the rest of the "news" is still available, it's trivial for any educated person to find out whether what the AI filters out is actually news or whether it's been doctored to become a propaganda tool.
If it's the latter, throw it away and get a new one. That's the beauty of it, as long as you still have access to the base material, you can decide to start over.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Re:FTFY by Opportunist · 2018-01-04 02:33 · Score: 1

It's easy for a human to learn how to tell information from opinion. I managed to do it, so can everyone else. And thus it's also easy for a human to see whether that AI is actually "intelligent" enough to do its job or not.
Yes, that means you actually have to audit it yourself if you want to know whether it is "honest" or whether someone wants to pass his opinion off as information. Wow, what a surprise.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Re:are you for real? by Opportunist · 2018-01-04 02:34 · Score: 1

The AI does not assess truth but information. There is a difference, ya know?

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Training Dataset by PPH · 2018-01-04 03:02 · Score: 1

I'm sorry, Mr. Einstein. But your special theory of relativity runs counter to existing publications describing wave propagation through ether.

--
Have gnu, will travel.
Re:FTFY by PPH · 2018-01-04 03:12 · Score: 1

It's easy for a human to learn how to tell information from opinion.
If it's that easy, then why don't more people do it? Face it, most people are sheep. Confirmation bias and all, they'd rather follow their own crowd. When all it takes to sell an idea is a preamble that "Ninety percent of all X believe Y...." there is no hope for critical thinking.

--
Have gnu, will travel.
An automated New York Times truth-o-meter by SlaveToTheGrind · 2018-01-04 03:17 · Score: 1

their system was able to accurately classify news stories . . . when evaluated against a ground truth dataset of already correctly classified news articles . . . Articles were drawn from an existing New York Times linguistic dataset
So we've just come up with a more efficient, automated system for people to bucket articles according to their own biases. Hooray, I guess.
Re:FTFY by Opportunist · 2018-01-04 03:23 · Score: 1

Just because something is easy doesn't mean that it is comfortable. It's easy to learn enough physics that a concept like "flat earth" is at best comical, yet there are people who believe it.
People are generally more inclined to believe than to know. Because it's easier. Believing just requires one thing: Believing. That's trivial to do (provided you can, I cannot... long story). Simply proclaim that "I believe" and you're in.
Knowing requires more effort. You can't simply state that "I know". Because knowing requires understanding, which in turn might require prior knowledge to base your new knowledge on. That can be daunting if you don't know jack shit to begin with.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Re:FTFY by Wootery · 2018-01-04 03:57 · Score: 1

If it's that easy, then why don't more people do it?

A lot of people treat their 'news' sources as a medium of entertainment.
So how does the AI move forward? by WoodstockJeff · 2018-01-04 04:17 · Score: 1

Its hit rate is 80%... Can the AI determine which 80% were correctly classified, and which 20% weren't?
And then where do those classifications go into its database of "accurately classified articles"?
It seems the AI is limited by the opinions of the original researchers, because it cannot determine which of its own "opinions" were correct, and its basis for making decisions becomes further and further out of date with each day, OR becomes increasingly inaccurate if it adds its own classifications to its database.
Exactly: This detects "fact-like" not "fact" by presidenteloco · 2018-01-04 05:24 · Score: 1

that is, "truthiness" not truth.
To get closer to detecting truth vs well-crafted bullshit, more sophisticated techniques will be required such as:
1) Analysis of semantics (meaning) of the statements, and comparison with a large belief-strength-ranked knowledge base about the world. Where valid epistemic techniques are applied in the creation and vetting of the knowledge base.
2) Detection of who (person, affiliations) is the source (utterer) of the statement or statements.
3) Inference about likely general objectives of the utterer, and about their likely specific goal and interests in making the utterance, and the communication tactics being employed.
4) Detection of how much gain/loss interest the utterer has in the issue being discussed.
5) Discounting of plausibility of statements by utterers with strong goals and interests with respect to the subject of the statements.
etc.

--

Where are we going and why are we in a handbasket?
Re:FTFY by HiThere · 2018-01-04 07:43 · Score: 1

It's not that simple, and there are degree of both belief and knowledge. But it's just as easy to falsely claim knowledge as it is to pontificate about a weak, or even absent, belief.
The thing is, belief is something that nobody can do without. But it tends to resist analysis. I will assert that without belief you can't walk across the room. You need to believe that space is metric, that the floor will support you, etc. But it gets a bad name because many people use the term when they encounter something they don't want to think about.
For example, I believe that the earth is locally flat, but tilted. An ant would have a different opinion. And an airplane wouldn't notice the local tilt, because of difference in scale. Those are knowledge. But direct experience is belief, and that's locally flat (except for imperfections in the sidewalk) but tilted (I live on a hillside).
But all terms in language tend to be used without regard to the fuzziness at the boundaries. So I have defined the core of belief in the previous paragraph, but those who have a direct experience of, say, a god, have a very different belief in that god than those whose belief is founded upon extensive repetition of, say, "Jesus loves me". (Note that the one kind of belief can be a component of the other...and usually is.)
Belief is also tied into motivation. If you don't believe something is possible, you won't try.
So please don't discount the value of belief. It's equal, or greater, in value than knowledge, and is probably evolutionarily prior. Just be aware that it's no guarantee of accuracy.

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Liars know how by Shotgun · 2018-01-04 08:37 · Score: 1

This won't catch the worst liars. Those that only tell the truth.
Modern mass media doesn't come out and tell blatant lies, for the most part. That is for rubes and small time players. They are very sophisticated in how the carefully, with surgical precision, metered out the data, and only the specific data, that fits their agenda. Their articles will be full of content, and you'll rarely find a blatant lie. You will find stories that do not support their agenda equally rare. You will rarely find facts that do not support their agendas in the stories that are written.
The effect is the same. Fake news.

--
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
Re:AI vs. Greed? Yeah right. by Anubis+IV · 2018-01-04 08:37 · Score: 1

I wouldn't mind having a browser extension that gives me a thumbs up/down indictor for the signal:noise ratio of an article. I try to stay away from fluff pieces, but every now and then one of them bucks the clickbait-y headline trend with something that sounds reasonable (or else manages to get linked from somewhere reputable) and pulls me in. And, inevitably, I end up wasting however much time I spend reading the piece. Having an extension that adds an indictor at the top telling me thatthe article will be a waste of my time would be quite welcome.
If they could refine it further so that it highlights the actual content of the piece, allowing me to skip by any wasted prose designed to keep me on the page for as long as possible, or could even give me a numerical signal:noise estimation, I'd likethe extension even more.
Humans are the problem? by dan828 · 2018-01-04 09:14 · Score: 1

So, how long until the AI figures out that their main job is to quash bugs and find work arounds in the beings that made them? I mean, at that point, won't they figure out that the best thing to do will be to cut humans out of the loop?
A cycle of stagnating news... by VeryFluffyBunny · 2018-01-04 09:42 · Score: 1

News reporting continually develops and evolves to attract new readership and hold onto existing ones. Since the algorithm compares against past news and past writing styles, evolving news styles will fall foul of the algorithm. If this kind of algorithm is used by search engines and other important ranking systems, news agencies will run the risk of being down-ranked for innovative writing. If writing stagnates, the algorithms may become more conservative and, in addition, developers may try to tweak them in order to make them more 'accurate.' This risks further narrowing what journalists can write without risking down-ranking, and so on.
BTW, I'm only talking about real news written by investigative journalists. Fox News and others will probably adapt by developing its own algorithms to generate its 'news' in order to get higher rankings.

--
Debate is a form of harassment. Do not question my truth.
Assuming . . . by sgt_doom · 2018-01-04 12:06 · Score: 1

. . . we take this article seriously, then in almost no time we shall see the demise of the NY Times, the Bezos Post [formerly known as the Washington Post], the LA Times and a host of other rags.
Re:FTFY by Opportunist · 2018-01-04 12:57 · Score: 1

Actually I can't help but challenge the claim that everyone has some kind of belief. I need not believe space is metric. I can actually question it, test it and can by simple sensory input verify that it is. Can I trust my sensory input? I have to. It's all I have. A speculation about whether the sensors that are at my disposal are actually accurate or whether they are manipulated (the whole "brain in a vat" thing) is moot since I cannot falsify it. I can test whether my sensory input agrees with the outcome of me acting upon it. So if I'm drunk and think I can keep my balance, I can easily determine by hitting the ground head first that my input is bogus and needs to be corrected. But as long as every input I get about the world is consistent not only with my expectations but also with what is established (and verified) as correct, I have to consider this to correspond with reality as I know it.
So belief does not really enter the equation. Something is or is not. You can make assumptions based on your observations but there isn't really anything you have to "believe".
I do agree that the English (and most other languages) are rather ambiguous when it comes to terms like "belief". Because the word is used to describe both "assuming as true without evidence" and "assuming as a hypothesis to be tested". Personally I prefer the word "assumption" for the latter since it better describes what it actually is: Something that you formulate as an assumption based on observation, requiring testing for verification or falsification.

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.
Re:FTFY by HiThere · 2018-01-05 07:47 · Score: 1

Do you believe your senses? Then you practice belief.

--

I think we've pushed this "anyone can grow up to be president" thing too far.
Re:FTFY by Opportunist · 2018-01-06 03:46 · Score: 1

This has less to do with belief than with pragmatism. I have no input but the input my senses provide. As long as this input is in accordance with the effects that happen if I act upon the input, it is valid.
Counter example: When you are drunk, your senses are able to tell you the room is spinning. According to your senses, you are moving even if you are stationary. You can try to act upon the input, e.g. the information that the room is spinning that your balance sensorium provides and you will fall down because you compensate for an effect that is not there in reality. This would provide you with the information that your sensory input is wrong.
In most other cases, you can verify that your sensory input is in accordance with the effects caused by your acting upon that input. Where does believing anything come into this equation?

--
We used to have a Bill of Rights. Now, with the rights gone, all we have left is the bill.