Computers Summarize the News
oily_ants writes "I get sick and tired of reading the same story on different web sites. That's why I like slashdot so much. Good (??) summaries of all of the stuff out there on the net. Now there is a project at Columbia University by the nlp group that attempts to generate computer summaries of all of those news articles on different web sites. The project is called Newsblaster and the summaries are excellent. You can read about the project on regular news sites like Online Journalism Review or USA Today."
What are the copyright or other legal issues to republishing news stories collected from web sites? The Newsblaster site clearly states where the information comes from - like every good college student is taught to cite information sources. On the other hand, on the bottom of many of the stories is the notice: "Copyright 2002 Associated Press. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed." Is collecting and condensing news stories "republishing" - does this violate copyright stuff?
I've been using that for a couple of months now. (It has been available buried at http://www.google.com/news/newsheadlines.html) I find it very useful. I wish they'd explain exactly how it worked though. Is it all machine parsed? Are the articles listed in order of relevance, time posted, etc?
I've see a couple occasions where it's had an article on a completely different subject under the header, but it's not the norm. It's always up to date. My only gripe is that it doesn't have an "Older Stories" link. I've gone back to try and find something I've seen before only to find that it had been pushed off.
They also keep a list of links to news sourse and current relevant resources at http://www.google.com/news/.