Slashdot Mirror


Computers Summarize the News

oily_ants writes "I get sick and tired of reading the same story on different web sites. That's why I like slashdot so much. Good (??) summaries of all of the stuff out there on the net. Now there is a project at Columbia University by the nlp group that attempts to generate computer summaries of all of those news articles on different web sites. The project is called Newsblaster and the summaries are excellent. You can read about the project on regular news sites like Online Journalism Review or USA Today."

11 of 175 comments (clear)

  1. Well, there you go by Otter · · Score: 3, Interesting
    Now there is a project at Columbia University by the nlp group that attempts to generate computer summaries of all of those news articles on different web sites.

    Well, there's the answer to the Ask Slashdot from a couple of days ago.

  2. We've been doing that for ages. by almaw · · Score: 2, Interesting

    Our company has been running a similar service for a very long time. It's free, and you canget it here. It's called NewsScape.

  3. I think Newshub is better by brunnock · · Score: 3, Interesting

    http://newshub.com/

  4. Copyright Infringement - Fair Use Doctrine -NOT! by Anonymous Coward · · Score: 3, Interesting

    This is usually Copyright Infringement - Fair Use Doctrine is not applicable.

    Every one of these paraphrasers lift large chunks of syntax.

    I would maintain that this is still a plagiarsist or copyright violation unless it is done really well.

    And it never will be done really well unless NeuralNetwork chips are common and mankind has advances in Artificial Intelligence research. Five years away at best.

    I dare the commerical services to hit Enyclopedia Britannica. Or I dare them to routinely slurp New York Times and boast that they digest the New York Times..

    A massive Civil Suit is awaiting some of these early adapters planning on creating a business out of this.

    And they deserve it.

    It is just "Word Twiddling", however useful.

    If the twiddling is done live, once, per user client, then maybe its OK, but none of these business models are setup THAT way.

  5. copyright/legality? by ddeboer · · Score: 5, Interesting

    What are the copyright or other legal issues to republishing news stories collected from web sites? The Newsblaster site clearly states where the information comes from - like every good college student is taught to cite information sources. On the other hand, on the bottom of many of the stories is the notice: "Copyright 2002 Associated Press. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed." Is collecting and condensing news stories "republishing" - does this violate copyright stuff?

  6. Re:Also try... by FinnishFlash · · Score: 2, Interesting

    Yesterday ?

    Okay the timezone is different (I'm in europe). But two weeks...

    There you can also easily see how differently different newsagencies report the same story, for example few days ago there was this story about depleted uranium and it being dangerous:

    news Agency #1: new study finds connection between depleted uranium and hightened risk to get cancer.

    news Agency #2: Soldiers exposed to depleted uranium bullets in risk to get cancer.

    news Agency #3: Children in Yugoslavia might get cancer because of NATO's depleted uranium bullets

    Talk about reporting the facts objectively...

    It would have been intresting to see the summary generated from these three... Might have been a bit schizophrenic...

    --
    please proff read !
  7. Re:Also try... by elfkicker · · Score: 4, Interesting

    I've been using that for a couple of months now. (It has been available buried at http://www.google.com/news/newsheadlines.html) I find it very useful. I wish they'd explain exactly how it worked though. Is it all machine parsed? Are the articles listed in order of relevance, time posted, etc?

    I've see a couple occasions where it's had an article on a completely different subject under the header, but it's not the norm. It's always up to date. My only gripe is that it doesn't have an "Older Stories" link. I've gone back to try and find something I've seen before only to find that it had been pushed off.

    They also keep a list of links to news sourse and current relevant resources at http://www.google.com/news/.

  8. How does it work? by AndyChrist · · Score: 3, Interesting

    I don't see any concrete information on what it does to summarize stories...is it using something like Cyc? Does it just have some heuristics for picking out the important parts of paragraphs?

    Also, who else thought "neuro-linguistic programming" for at least a moment when they saw "nlp"?

  9. A Few Kinks, A few Comments by Irvu · · Score: 3, Interesting

    As the summary here shows there are still a few kinks in the system.

    While I have to agree with some people that this isn't in-depth reporting I do think that it is pretty interesting AI. When it comes down to it the problem is not that a computer might be summarizing our news. The problem is twofold.

    Firstly people are not always inclined to look beyond summaries. When faced with typical time constraints people prefer to look at summaries because they do not have time to search across a dozen sources and articles. This is why USA today became big in the first place. Nothing there is more than 1 column long. (Incidentally did anybody else find it hilarious that this system "summarizes" USA today who themselves summarize other news sources?)

    Secondly much of the news is the same. News is big business and most major news media tell the stories that sell. Because they are all targeting the same markets they tell the same stories and in the same ways. Therefore there is little difference between CNN, the NY Times, etc in terms of tone and "facts". Especially since much of "their news" comes from the same wire services such as Reuters. Fox News is different but that is because they have abandoned the mantle of impartiality and become all conservative all the time.

    In essence this system is perfect for the internet news style. Breif summaries of facts followed by more "in-depth" leads that we may peruse as we wish. The real question is, when will this begin drawing on sites like Indymedia, The Register and /.

  10. Tried the new google news search service yet? by skunkeh · · Score: 2, Interesting
    It's still in beta but it's already pretty impressive:

    http://news.google.com/

    It indexes a huge array of news sites several times a day for fresh stories - enter a search term and it will bring up all the headlines it can find for that subject. Best of all, it uses an algorithm to identify alternative coverage of any one story and lists these links in a block beneath the main search results. That way you get links to several different accounts of the same story (although in practise they end up being pretty similar due to using the same news agencys) without having to hunt around for them yourselves.

    They're still working on the algorithm and are requesting as much feedback as possible - read more here.

  11. Now it just needs SOAP by jfsather · · Score: 3, Interesting

    I think this would be pretty cool if they could add some sort of a SOAP/XML-RPC type interface where you could query on sections, stories, whatever. It would be nice to allow content syncing like this.

    I was writing about this in response to a post in a user's journal the other day that even better would be to make a story content P2P system where you could allow story distribution. You might place a limit and only allow the summary to drive people to your site, but it could still help with bandwidth issues. This would basically be like an enhanced RDF/RSS type system but over a P2P type network you wouldn't even really have to host your own feeds for people. Add in some sort of DB persistance and you could just say "get new headlines and summaries from site x"--the system would bring in all the new content. Anyway, that is just a dream I have and probably will never happen the way some people feel about their content.