AI System Sorts News Articles By Whether Or Not They Contain Actual Information (vice.com)
In a new paper published in the Journal of Artificial Intelligence Research, computer scientists from Google and the University of Pennsylvania describe a new machine learning approach to classifying written journalism according to a formalized idea of "content density." "With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles," reports Motherboard. From the report: At a high level this works like most any other machine learning system. Start with a big batch of data -- news articles, in this case -- and then give each item an annotation saying whether or not that item falls within a particular category. In particular, the study focused on article leads, the first paragraph or two in a story traditionally intended to summarize its contents and engage the reader. Articles were drawn from an existing New York Times linguistic dataset consisting of original articles combined with metadata and short informative summaries written by researchers.
Someone should be getting worried just about now.
If you'll be my bodyguard
I can be your long lost pal
I can call you Betty
And Betty, when you call me
You can call me Al
Daily Mail is fucked then.
"In particular, the study focused on article leads ledes..."
How can we take this article seriously if the publication doesn't know the correct spelling of their own industry's terminology?
The introduction to a news article is called the 'lede' and is usually in the first paragraph as in an essay. The 'lede' is a deliberate misspelling of 'lead' to prevent confusion in the days when printing was done with lead type.
This could actually be useful to filter out all the damned opinion, PR, speculation and punditry masquerading as news.
I've wasted way too much time on articles that are mere rumors about something that "sources" said. Too many blowhards talk about legal opinions with maybe a couple of quotes from the ruling and no link so that I can actually read what the court decided and half the time the lawyers they quote give their opinion of how things ought to be rather than explaining the actual laws that are in place. Then there's the PR pieces repackaged as news, the various lawsuits that quote huge, bogus, demand numbers to get headlines when the amount demanded has no actual basis in fact until there's been a judgement or settlement, and all the other utter crap that people write instead of informing us about things that happened, i.e. actual news.
when evaluated against a ground truth dataset of already correctly classified news articles
We'll tell you what's the truth based on what we've classified as the truth.
Long time ago I've read a short sci-fi novel about such machine. On the day of first public demo, they fed the machine with the research paper about the machine itself. The machine spitted out only the title.
"AI" System Sorts "News" Articles By Whether Or Not They Contain "Actual" Information
(also, how do you decide that a subjective judgment is "80% accurate"?)
How much is what companies are calling AI these days, just people in India being paid cents per hour manually processing requests?
...the AI system discarded the new paper describing this technology, since the paper did not contain new information.
Repeat the same lie over and over again and according to the you beut Google AI it becomes the truth wowie zowie, how fucking useless :|.
Chaos - everything, everywhere, everywhen
The death of FOXNews.
_Bool has_news(void *content){return 0;}
With an average accuracy of around 80 percent
That makes it pretty much useless, then.
Did you not understand? Or are you a poorly functioning AI. Lets all hope with a bit more data you will improve.
Repeat the same lie over and over and the AI will just tell you it's a lie, over and over. Why would you think the AI will add known lies to its list of truths?
That simple rule: "Mark all news stories as clickbait".
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
Tonight!
Greed has also proven that clicks are more valuable than facts these days. The nanosecond AI gets in the way of revenue, it will lose.
And we're a long way off from finding a cure that perpetuates bullshit over facts. AI isn't going to change that, because a lot of people enjoy living in a bubble of ignorance. It's one of the main reasons bullshit is so profitable.
Sad to say, but this is a losing proposition from the start.
That is all...
It would be good to see if it can be applied to statements from Politicians. We've often said they can talk for ages without saying anything.
- Paul
The program does not determine if an article "contains actual information". It only classifies whether or not the article is written in the traditional style of a news article. It could still be total bullshit.
No. The goal of late is to completely control information. Obviously. AI isn't real in this context. Just a control mechanism. There is a man behind that curtain.
The creation of Social Media will go down in history as one of the most important things to ever happen to capitalism.
Within the framework of Social Media, you are the product being bought and sold. Because of this, one could argue the main goal is to completely control people, but that does not dismiss the capitalistic reason for engaging in that activity. If Greed were not being fed by Social Media, it would likely cease to exist. Chances are the man behind the curtain has the same agenda as many others; nothing more than a corporate puppet master looking to pull the strings of Greed in their favor. In fact, with Google being involved, this is all but guaranteed.
I'm sorry, Mr. Einstein. But your special theory of relativity runs counter to existing publications describing wave propagation through ether.
Have gnu, will travel.
their system was able to accurately classify news stories . . . when evaluated against a ground truth dataset of already correctly classified news articles . . . Articles were drawn from an existing New York Times linguistic dataset
So we've just come up with a more efficient, automated system for people to bucket articles according to their own biases. Hooray, I guess.
The AI book that everyone should get is available for pre-order. "Artificial Intelligence For Dummies" by John Paul Mueller and Luca Massaron.
It is sorting by criteria it has been fed and programmed with. It has no way to make distinctions, not the same thing at all, and not reliable. False or incorrect information is just as 'actual' as any other kind. It's essentially just an automated search engine using keywords. Fail. :/
let's hope trump reigns in on the anticompetitiveness of google
Its hit rate is 80%... Can the AI determine which 80% were correctly classified, and which 20% weren't?
And then where do those classifications go into its database of "accurately classified articles"?
It seems the AI is limited by the opinions of the original researchers, because it cannot determine which of its own "opinions" were correct, and its basis for making decisions becomes further and further out of date with each day, OR becomes increasingly inaccurate if it adds its own classifications to its database.
I could get better than 80% by just returning "no" for all articles.
Any system like this can and will be gamed, and is therefore at best an arms race. This is essentially a repeat of spam detection systems based on learning, except it's easier to detect spam.
NYT? I can hear the cries of "fake news" now.
Wonder how Fox News feels about being left out?
that is, "truthiness" not truth.
To get closer to detecting truth vs well-crafted bullshit, more sophisticated techniques will be required such as:
1) Analysis of semantics (meaning) of the statements, and comparison with a large belief-strength-ranked knowledge base about the world. Where valid epistemic techniques are applied in the creation and vetting of the knowledge base.
2) Detection of who (person, affiliations) is the source (utterer) of the statement or statements.
3) Inference about likely general objectives of the utterer, and about their likely specific goal and interests in making the utterance, and the communication tactics being employed.
4) Detection of how much gain/loss interest the utterer has in the issue being discussed.
5) Discounting of plausibility of statements by utterers with strong goals and interests with respect to the subject of the statements.
etc.
Where are we going and why are we in a handbasket?
"... Articles were drawn from an existing New York Times linguistic dataset..."
Wont that just be garbage in garbage out? .
This would have correctly identified articles claiming there were weapons of mass destruction?
This won't catch the worst liars. Those that only tell the truth.
Modern mass media doesn't come out and tell blatant lies, for the most part. That is for rubes and small time players. They are very sophisticated in how the carefully, with surgical precision, metered out the data, and only the specific data, that fits their agenda. Their articles will be full of content, and you'll rarely find a blatant lie. You will find stories that do not support their agenda equally rare. You will rarely find facts that do not support their agendas in the stories that are written.
The effect is the same. Fake news.
Aah, change is good. -- Rafiki
Yeah, but it ain't easy. -- Simba
I wouldn't mind having a browser extension that gives me a thumbs up/down indictor for the signal:noise ratio of an article. I try to stay away from fluff pieces, but every now and then one of them bucks the clickbait-y headline trend with something that sounds reasonable (or else manages to get linked from somewhere reputable) and pulls me in. And, inevitably, I end up wasting however much time I spend reading the piece. Having an extension that adds an indictor at the top telling me thatthe article will be a waste of my time would be quite welcome.
If they could refine it further so that it highlights the actual content of the piece, allowing me to skip by any wasted prose designed to keep me on the page for as long as possible, or could even give me a numerical signal:noise estimation, I'd likethe extension even more.
So, how long until the AI figures out that their main job is to quash bugs and find work arounds in the beings that made them? I mean, at that point, won't they figure out that the best thing to do will be to cut humans out of the loop?
News reporting continually develops and evolves to attract new readership and hold onto existing ones. Since the algorithm compares against past news and past writing styles, evolving news styles will fall foul of the algorithm. If this kind of algorithm is used by search engines and other important ranking systems, news agencies will run the risk of being down-ranked for innovative writing. If writing stagnates, the algorithms may become more conservative and, in addition, developers may try to tweak them in order to make them more 'accurate.' This risks further narrowing what journalists can write without risking down-ranking, and so on.
BTW, I'm only talking about real news written by investigative journalists. Fox News and others will probably adapt by developing its own algorithms to generate its 'news' in order to get higher rankings.
Debate is a form of harassment. Do not question my truth.
. . . we take this article seriously, then in almost no time we shall see the demise of the NY Times, the Bezos Post [formerly known as the Washington Post], the LA Times and a host of other rags.