Slashdot Mirror


AI System Sorts News Articles By Whether Or Not They Contain Actual Information (vice.com)

In a new paper published in the Journal of Artificial Intelligence Research, computer scientists from Google and the University of Pennsylvania describe a new machine learning approach to classifying written journalism according to a formalized idea of "content density." "With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles," reports Motherboard. From the report: At a high level this works like most any other machine learning system. Start with a big batch of data -- news articles, in this case -- and then give each item an annotation saying whether or not that item falls within a particular category. In particular, the study focused on article leads, the first paragraph or two in a story traditionally intended to summarize its contents and engage the reader. Articles were drawn from an existing New York Times linguistic dataset consisting of original articles combined with metadata and short informative summaries written by researchers.

2 of 80 comments (clear)

  1. Ledes dammit by Anonymous Coward · · Score: 4, Informative

    "In particular, the study focused on article leads ledes..."

    How can we take this article seriously if the publication doesn't know the correct spelling of their own industry's terminology?

    The introduction to a news article is called the 'lede' and is usually in the first paragraph as in an essay. The 'lede' is a deliberate misspelling of 'lead' to prevent confusion in the days when printing was done with lead type.

  2. Re:Daily Mail is fucked by AmiMoJo · · Score: 4, Informative

    The Daily Mail is 97% opinion, but does usually include the facts at the very end of the article. The trick they use is to split the article over two pages, or make it long enough to people don't get to the end.

    A classic example was a story about the EU banning companies from claiming that bottled water cured dehydration. They had endless quotes from outraged morons ranting about the terrible EU and it's idiocy. Then right at the end someone sane explaining that dehydration is a medical condition with a variety of causes, many of which cannot be cured by drinking water, and the blanked rule on making unsubstantiated or misleading medical claims in advertising stands.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC