Computers Paraphrase English
AhaIndia submits a link to a story discussing computerized paraphrasing of English news articles. This technology, destined to eventually replace most reporters with very small shell scripts, is thankfully still in its infancy.
I've provided search engine functionality to a few sites using Verity's K2 product, which provides a similar piece of functionality. If you (programmatically) ask it to return a summary of each hit, what you get is what it considers to be representative of the document as a whole, not merely the first few lines, or a paragraph, or whatever. It actually works pretty well, but then it should, as (a couple of years ago) it cost almost as much as my house...
It's official. Most of you are morons.
Unfortunately, there isn't yet a way to use computers to detect dupes.
Or Is there?!?
Karma: Chevy Kavalierma.
Lojban is among the more interesting newer languages. It can be parsed just like c! Esperanto is somewhat interesting. English will be regarded in the future as a curious artifact--it was swept along with the technology revolution simply because ASCII didn't include accents and extra marks on letters. Eventually we'll get away from vocalization all together and have purely numerical, written laguages.
Right now, trying to work with English in computers deals way more with the strangeness of the language than the more interesting issues of cognition that lie underneath.
-Libertarian secular transhumanist
This article posted before already tells us all this, the paper that originated it was mentioned in the comments, and this one is another of a series of papers by this researcher.
OK, nothing else to see here, move on to the next redundant post (Is that paraphrasing 'dupe'?)
I believe this was covered in a related Slashdot before regarding to this site: http://www1.cs.columbia.edu/nlp/newsblaster/
Here is a quote from their site:
Columbia Newsblaster is a system to automatically track the day's news. There are no human editors involved -- everything you see on the main page is generated automatically, drawing on the sources listed on the left side of the screen.
Every night, the system crawls a series of Web sites, downloads articles, groups them together into "clusters" about the same topic, and summarizes each cluster. The end result is a Web page that gives you a sense of what the major stories of the day are, so you don't have to visit the pages of dozens of publications.
Newsblaster is an academic project from the Natural Language Processing group at Columbia University's Department of Computer Science. It is designed to demonstrate the Group's technologies for multidocument summarization, clustering, and text categorization, among others. It is funded under DARPA TIDES and KDD and has been operational online since September 2001.
Current and future enhancements include international perspectives, multilingual capability, and tracking events across days.