Slashdot Mirror


Computers Summarize the News

oily_ants writes "I get sick and tired of reading the same story on different web sites. That's why I like slashdot so much. Good (??) summaries of all of the stuff out there on the net. Now there is a project at Columbia University by the nlp group that attempts to generate computer summaries of all of those news articles on different web sites. The project is called Newsblaster and the summaries are excellent. You can read about the project on regular news sites like Online Journalism Review or USA Today."

26 of 175 comments (clear)

  1. Also try... by SlashChick · · Score: 5, Informative

    news.google.com. Just released yesterday. I haven't yet played around with it enough to say whether it's cool or not, but it does look promising.

    1. Re:Also try... by elfkicker · · Score: 4, Interesting

      I've been using that for a couple of months now. (It has been available buried at http://www.google.com/news/newsheadlines.html) I find it very useful. I wish they'd explain exactly how it worked though. Is it all machine parsed? Are the articles listed in order of relevance, time posted, etc?

      I've see a couple occasions where it's had an article on a completely different subject under the header, but it's not the norm. It's always up to date. My only gripe is that it doesn't have an "Older Stories" link. I've gone back to try and find something I've seen before only to find that it had been pushed off.

      They also keep a list of links to news sourse and current relevant resources at http://www.google.com/news/.

  2. Say what? by turbine216 · · Score: 5, Funny

    I get sick and tired of reading the same story on different web sites. That's why I like slashdot so much.

    I'm sure most will agree with me when I say that this makes ABSOLUTELY NO SENSE.

    1. Re:Say what? by elmegil · · Score: 4, Funny

      Hey, maybe they like reading the same story over and over on the same web site.

      --
      7 November 2006: The day Americans realized corruption and incompetence weren't addressing 11 September 2001
  3. Well, there you go by Otter · · Score: 3, Interesting
    Now there is a project at Columbia University by the nlp group that attempts to generate computer summaries of all of those news articles on different web sites.

    Well, there's the answer to the Ask Slashdot from a couple of days ago.

  4. Now to ask... by Sorthum · · Score: 3, Insightful

    ...whether this will include the obscure stories that are actually interesting, or whether it'll be just a rehash of the major stories that we can find in ten or twelve other places.

    1. Re:Now to ask... by oily_ants · · Score: 3, Informative

      It's just a rehash of all of those other stories. But the nice part about it is it is in reader's digest condensed version. I only have to read one small paragraph to get the major points of the event instead of sifting through a long article that doesn't include much actual information. It is meant as a summary so the information is NOT the obsure stuff (which is interesting) but quick and dirty summaries of important events.

  5. Newsbot by TheGreenLantern · · Score: 4, Funny

    Sounds like a good idea, but I'm worried about the "Newsbots" objectivity. If I wanted to read a bunch of stories about the latest NVidia GeForce 4 release, 10 reasons more RAM is better, and why you should upgrade your hard drive, I'd just watch TechTV.

    --

    It hurts when I pee.
  6. I think Newshub is better by brunnock · · Score: 3, Interesting

    http://newshub.com/

  7. Impressive by Reality+Master+101 · · Score: 5, Informative

    To tell you the truth, at first I thought the summaries were TOO good; I was suspicious that it wasn't really automated.

    But after looking at a few more stories, it looks like it just pulls sentences out of the stories that seem to have a different point to make, and strings them together.

    Sometimes you see some redundancy and some non-sequiturs, but I have to admit the illusion is pretty good.

    --
    Sometimes it's best to just let stupid people be stupid.
  8. breadth vs depth by Ubergrendle · · Score: 5, Insightful

    This is a somewhat dangerous trend, IMHO. CNN Headline news gives us blurbs...soundbites...with no substance. "Israelis shot Palestinians" or vice versa on a daily basis. Little reporting of substance of negotiations; why there was a conflict in that location at that time for what reason. The great thing about the internet is that there is great reporting in depth. I like to check out the Drudge report, BBC, disinfo.com, etc on a regular basis to get a good blend of various points of view so that I can make my OWN opinion. I don't want to be served watered down sentence fragments by a corporate AOL/TimeWarner beheometh. Slashdot is one of a few exceptions to this rule, since they typically link to articles of substance and allow for dialogue and debate by (usually) intelligent users. The moderation system isn't perfect, but it helps dodge the trolls. My guess is that automated summaries will lose the flavour of good journalism/writing, and by taking an "average" will end up with a C+ "factual comprehension" review as opposed to multiple A+ "theory" and "syntehsis" editorials.

    --
    John Maynard Keynes: "When the facts change, I change my mind. What do you do?"
  9. Still Some Work To Be Done... by po8 · · Score: 4, Funny

    Check out this odd story about incarcerated Browns. The summarizer could apparently still use some manual supervision.

  10. Seems nice enough... by Davorama · · Score: 4, Insightful

    So where's the slashbox for it?

    --

    Davo -- Free speech, free software, AND free beer.

  11. Copyright Infringement - Fair Use Doctrine -NOT! by Anonymous Coward · · Score: 3, Interesting

    This is usually Copyright Infringement - Fair Use Doctrine is not applicable.

    Every one of these paraphrasers lift large chunks of syntax.

    I would maintain that this is still a plagiarsist or copyright violation unless it is done really well.

    And it never will be done really well unless NeuralNetwork chips are common and mankind has advances in Artificial Intelligence research. Five years away at best.

    I dare the commerical services to hit Enyclopedia Britannica. Or I dare them to routinely slurp New York Times and boast that they digest the New York Times..

    A massive Civil Suit is awaiting some of these early adapters planning on creating a business out of this.

    And they deserve it.

    It is just "Word Twiddling", however useful.

    If the twiddling is done live, once, per user client, then maybe its OK, but none of these business models are setup THAT way.

  12. copyright/legality? by ddeboer · · Score: 5, Interesting

    What are the copyright or other legal issues to republishing news stories collected from web sites? The Newsblaster site clearly states where the information comes from - like every good college student is taught to cite information sources. On the other hand, on the bottom of many of the stories is the notice: "Copyright 2002 Associated Press. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed." Is collecting and condensing news stories "republishing" - does this violate copyright stuff?

  13. Direct NewsBlaster link by Alien54 · · Score: 3, Informative
    The direct link is here:

    www.cs.columbia.edu/nlp/newsblaster/

    although I found some of the summaries slightly shallow, they are not bad.

    The problem is that it becomes an average of opinion, when you sometimes need that longer insightful article. This easily could become the news of sheep everywhere.

    This could be bad when facts come in to contradict initial impressions.

    oops

    --
    "It is a greater offense to steal men's labor, than their clothes"
  14. How does it work? by AndyChrist · · Score: 3, Interesting

    I don't see any concrete information on what it does to summarize stories...is it using something like Cyc? Does it just have some heuristics for picking out the important parts of paragraphs?

    Also, who else thought "neuro-linguistic programming" for at least a moment when they saw "nlp"?

  15. Read the papers by mizhi · · Score: 3, Informative

    Reserach Papers

    I'm not sure if they've done anything really novel. I skimmed through one of the more recent papers, on sentence ordering; but that seem to only operate on the same event There's research like this going one at alot of major universities like CMU and MIT. That said, it does look impressive.

    --
    Humorless sig goes here.
    1. Re:Read the papers by DavidKirkEvans · · Score: 5, Informative

      We have a summarization strategy that selects from three summarizers: one that works over documents describing a "single event" which is novel, one that works over documents describing a person (so-called biography events) using sentence extraction, and one that is a general sentence extractor based on the biographical summarizer which does use more than just TFIDF weighting for the extraction. (It has a notion of semantic classes, and some other stuff.)

      The "single event" summarizer is novel though. It uses a clustering component to cluster the sentences, then for each cluster it takes the intersection of the sentences (yes, we need to parse the text to do this, and we do) and RE-GENERATES (does not extract) a sentence that synthesizes the information from the cluster.

      There's a lot of other stuff going on as well, we're using a text categorization system that we developed here, a text clustering system, our own system for categorizing the images that come with the articles (you'll be able to browse by image categories soon as well) and some other stuff.

  16. A Few Kinks, A few Comments by Irvu · · Score: 3, Interesting

    As the summary here shows there are still a few kinks in the system.

    While I have to agree with some people that this isn't in-depth reporting I do think that it is pretty interesting AI. When it comes down to it the problem is not that a computer might be summarizing our news. The problem is twofold.

    Firstly people are not always inclined to look beyond summaries. When faced with typical time constraints people prefer to look at summaries because they do not have time to search across a dozen sources and articles. This is why USA today became big in the first place. Nothing there is more than 1 column long. (Incidentally did anybody else find it hilarious that this system "summarizes" USA today who themselves summarize other news sources?)

    Secondly much of the news is the same. News is big business and most major news media tell the stories that sell. Because they are all targeting the same markets they tell the same stories and in the same ways. Therefore there is little difference between CNN, the NY Times, etc in terms of tone and "facts". Especially since much of "their news" comes from the same wire services such as Reuters. Fox News is different but that is because they have abandoned the mantle of impartiality and become all conservative all the time.

    In essence this system is perfect for the internet news style. Breif summaries of facts followed by more "in-depth" leads that we may peruse as we wish. The real question is, when will this begin drawing on sites like Indymedia, The Register and /.

  17. I'm afraid to Slashdot a great site, but... by babbage · · Score: 4, Funny
    www.headlinehaikus.com

    Basically, it looks at the headlines on Yahoo/Reuters, and finds sentences that scan as 5/7/5, and uses Perl cleverness to present them as a little news haikus (or senryu, if you wanna be picky). It's great stuff:

    Today:
    Commonwealth Group Blasts Zimbabwe Poll

    but defended by
    separate observer groups
    from South Africa

    Also today:
    Amnesty Charges U.S. Violated Rights of Detainees

    possible suspects
    connected to the attacks
    including their right

    My last birthday (Feb 4):
    Saudi Proposed Saddam Overthrow to US - Prince:

    we agree upon
    the various issues that
    we agree upon

    Christmas, 2001:
    Deep emotion, little joy in Bethlehem Christmas:

    Palestinians
    without the special permits
    very bad this year

    Sept 10 2001:
    Belarus Opposition Demands New Vote, Plans Rally:

    We do not agree
    with the official result
    RE RUN UNLIKELY

    June 1 2001:
    Bridgestone says some Ford Explorers defective:

    I am just here
    to say what needs to be said
    I am not here

    I'm hooked :)

    They have archives going back to the beginning of 2001, with only a few holes (e.g. the days after September 11), and they talk about how they are doing everything. Bonus points: you can have the haiku headlines mailed to you automagically every day. I just hope they have the bandwidth (etc) to withstand Slashdot....

  18. How wrong can you get? by TheAwfulTruth · · Score: 3, Insightful

    Wrong on every count.

    Besides the fact that /. gets is news from other places and is always hours or days late with it. The worst thing you can do is get all your news from one source.

    Every news site has some kind of slant to it. CNN, NPR, /. (And my favorite of your list "USA Today") Sometimes you get more information than contained in a story merely by seeing how different people report the story! Reading one paragraph summaries of the days news will tell you nothing at all. Maybe worse, mislead you due to there not being enough information.

    I read news from about 10 sources a day and if I see multiple articles that I'm not interested in they're easy to skip. If I am intersted in them I read them on all sites. You get much much more information that way.

    Though you do need to pick your sites. If you look at CNN, MSNBC and Salon and all three are merely parroting Reuters then you know your not doing yourself any good.

    --
    Contrary to popular belief, coding is not all free blow-jobs and beer. Those things cost MONEY!
  19. Re:What Will Google Do Next? by Jason+Levine · · Score: 4, Informative

    And don't forget http://catalogs.google.com/ for online searching of mail-order catalogs. (They scan 'em, OCR 'em, and make 'em searchable.)

    --
    My sci-fi novel, Ghost Thief, is now available from Amazon.com.
  20. Now it just needs SOAP by jfsather · · Score: 3, Interesting

    I think this would be pretty cool if they could add some sort of a SOAP/XML-RPC type interface where you could query on sections, stories, whatever. It would be nice to allow content syncing like this.

    I was writing about this in response to a post in a user's journal the other day that even better would be to make a story content P2P system where you could allow story distribution. You might place a limit and only allow the summary to drive people to your site, but it could still help with bandwidth issues. This would basically be like an enhanced RDF/RSS type system but over a P2P type network you wouldn't even really have to host your own feeds for people. Add in some sort of DB persistance and you could just say "get new headlines and summaries from site x"--the system would bring in all the new content. Anyway, that is just a dream I have and probably will never happen the way some people feel about their content.

  21. Re:Copyright Infringement - Fair Use Doctrine -NOT by LinuxParanoid · · Score: 3, Insightful

    I think you are confusing plagarism, and a violation of copyright. I am primarily concerned with the legal issue of Copyright violation raised by the previous poster, not an amorphous ethical one.

    As Bitlaw points out, under the Copyright Act, four factors are to be considered in order to determine whether a specific action is to be considered a "fair use." These factors are as follows:

    1) the purpose and character of the use, including whether such use is of commercial nature or is for nonprofit educational purposes;
    2) the nature of the copyrighted work;
    3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
    4) the effect of the use upon the potential market for or value of the copyrighted work.

  22. Re:Copyright Infringement - Fair Use Doctrine -NOT by LinuxParanoid · · Score: 3, Insightful

    Dang, clicked the submit button by mistake.

    Attempting to apply the four factors there, while some could be argued either way, I can see that on balance, you both might be right. I could probably make a stronger case that it doesn't qualify as fair use, than that it does, based on those four factors. I think I was focusing over-much on the "amount taken" criteria and overlooking the others.

    --LP