Computers Summarize the News

← Back to Stories (view on slashdot.org)

Posted by ryuzaki0 on Friday March 15, 2002 @04:43AM from the replaced-by-a-very-small-shell-script dept.

oily_ants writes "I get sick and tired of reading the same story on different web sites. That's why I like slashdot so much. Good (??) summaries of all of the stuff out there on the net. Now there is a project at Columbia University by the nlp group that attempts to generate computer summaries of all of those news articles on different web sites. The project is called Newsblaster and the summaries are excellent. You can read about the project on regular news sites like Online Journalism Review or USA Today."

13 of 175 comments (clear)

Min score:

Reason:

Sort:

Also try... by SlashChick · 2002-03-15 04:45 · Score: 5, Informative

news.google.com. Just released yesterday. I haven't yet played around with it enough to say whether it's cool or not, but it does look promising.

--
Simpli - Your source for San Jose dedicated servers and colocation!
1. Re:Also try... by elfkicker · 2002-03-15 05:04 · Score: 4, Interesting
  
  I've been using that for a couple of months now. (It has been available buried at http://www.google.com/news/newsheadlines.html) I find it very useful. I wish they'd explain exactly how it worked though. Is it all machine parsed? Are the articles listed in order of relevance, time posted, etc?
  
  I've see a couple occasions where it's had an article on a completely different subject under the header, but it's not the norm. It's always up to date. My only gripe is that it doesn't have an "Older Stories" link. I've gone back to try and find something I've seen before only to find that it had been pushed off.
  
  They also keep a list of links to news sourse and current relevant resources at http://www.google.com/news/.
Say what? by turbine216 · 2002-03-15 04:46 · Score: 5, Funny

I get sick and tired of reading the same story on different web sites. That's why I like slashdot so much.

I'm sure most will agree with me when I say that this makes ABSOLUTELY NO SENSE.
1. Re:Say what? by elmegil · 2002-03-15 04:47 · Score: 4, Funny
  
  Hey, maybe they like reading the same story over and over on the same web site.
  
  --
  7 November 2006: The day Americans realized corruption and incompetence weren't addressing 11 September 2001
Newsbot by TheGreenLantern · 2002-03-15 04:51 · Score: 4, Funny

Sounds like a good idea, but I'm worried about the "Newsbots" objectivity. If I wanted to read a bunch of stories about the latest NVidia GeForce 4 release, 10 reasons more RAM is better, and why you should upgrade your hard drive, I'd just watch TechTV.

--

It hurts when I pee.
Impressive by Reality+Master+101 · 2002-03-15 04:53 · Score: 5, Informative

To tell you the truth, at first I thought the summaries were TOO good; I was suspicious that it wasn't really automated.

But after looking at a few more stories, it looks like it just pulls sentences out of the stories that seem to have a different point to make, and strings them together.

Sometimes you see some redundancy and some non-sequiturs, but I have to admit the illusion is pretty good.

--
Sometimes it's best to just let stupid people be stupid.
breadth vs depth by Ubergrendle · 2002-03-15 04:54 · Score: 5, Insightful

This is a somewhat dangerous trend, IMHO. CNN Headline news gives us blurbs...soundbites...with no substance. "Israelis shot Palestinians" or vice versa on a daily basis. Little reporting of substance of negotiations; why there was a conflict in that location at that time for what reason. The great thing about the internet is that there is great reporting in depth. I like to check out the Drudge report, BBC, disinfo.com, etc on a regular basis to get a good blend of various points of view so that I can make my OWN opinion. I don't want to be served watered down sentence fragments by a corporate AOL/TimeWarner beheometh. Slashdot is one of a few exceptions to this rule, since they typically link to articles of substance and allow for dialogue and debate by (usually) intelligent users. The moderation system isn't perfect, but it helps dodge the trolls. My guess is that automated summaries will lose the flavour of good journalism/writing, and by taking an "average" will end up with a C+ "factual comprehension" review as opposed to multiple A+ "theory" and "syntehsis" editorials.

--
John Maynard Keynes: "When the facts change, I change my mind. What do you do?"
Still Some Work To Be Done... by po8 · 2002-03-15 04:55 · Score: 4, Funny

Check out this odd story about incarcerated Browns. The summarizer could apparently still use some manual supervision.
Seems nice enough... by Davorama · 2002-03-15 04:55 · Score: 4, Insightful

So where's the slashbox for it?

--
Davo -- Free speech, free software, AND free beer.
copyright/legality? by ddeboer · 2002-03-15 04:58 · Score: 5, Interesting

What are the copyright or other legal issues to republishing news stories collected from web sites? The Newsblaster site clearly states where the information comes from - like every good college student is taught to cite information sources. On the other hand, on the bottom of many of the stories is the notice: "Copyright 2002 Associated Press. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed." Is collecting and condensing news stories "republishing" - does this violate copyright stuff?
Re:Read the papers by DavidKirkEvans · 2002-03-15 05:46 · Score: 5, Informative

We have a summarization strategy that selects from three summarizers: one that works over documents describing a "single event" which is novel, one that works over documents describing a person (so-called biography events) using sentence extraction, and one that is a general sentence extractor based on the biographical summarizer which does use more than just TFIDF weighting for the extraction. (It has a notion of semantic classes, and some other stuff.)

The "single event" summarizer is novel though. It uses a clustering component to cluster the sentences, then for each cluster it takes the intersection of the sentences (yes, we need to parse the text to do this, and we do) and RE-GENERATES (does not extract) a sentence that synthesizes the information from the cluster.

There's a lot of other stuff going on as well, we're using a text categorization system that we developed here, a text clustering system, our own system for categorizing the images that come with the articles (you'll be able to browse by image categories soon as well) and some other stuff.
I'm afraid to Slashdot a great site, but... by babbage · 2002-03-15 05:49 · Score: 4, Funny

www.headlinehaikus.com
Basically, it looks at the headlines on Yahoo/Reuters, and finds sentences that scan as 5/7/5, and uses Perl cleverness to present them as a little news haikus (or senryu, if you wanna be picky). It's great stuff:

Today:
Commonwealth Group Blasts Zimbabwe Poll

but defended by separate observer groups from South Africa

Also today:
Amnesty Charges U.S. Violated Rights of Detainees

possible suspects connected to the attacks including their right

My last birthday (Feb 4):
Saudi Proposed Saddam Overthrow to US - Prince:

we agree upon the various issues that we agree upon

Christmas, 2001:
Deep emotion, little joy in Bethlehem Christmas:

Palestinians without the special permits very bad this year

Sept 10 2001:
Belarus Opposition Demands New Vote, Plans Rally:

We do not agree with the official result RE RUN UNLIKELY

June 1 2001:
Bridgestone says some Ford Explorers defective:

I am just here to say what needs to be said I am not here

I'm hooked :)

They have archives going back to the beginning of 2001, with only a few holes (e.g. the days after September 11), and they talk about how they are doing everything. Bonus points: you can have the haiku headlines mailed to you automagically every day. I just hope they have the bandwidth (etc) to withstand Slashdot....

--
DO NOT LEAVE IT IS NOT REAL
Re:What Will Google Do Next? by Jason+Levine · 2002-03-15 06:15 · Score: 4, Informative

And don't forget http://catalogs.google.com/ for online searching of mail-order catalogs. (They scan 'em, OCR 'em, and make 'em searchable.)

--
My sci-fi novel, Ghost Thief, is now available from Amazon.com.