Computers Paraphrase English
AhaIndia submits a link to a story discussing computerized paraphrasing of English news articles. This technology, destined to eventually replace most reporters with very small shell scripts, is thankfully still in its infancy.
won't you need someone to write the stuff to be paraphrased in the first place?? explain to me how that replaces reporters with small shell scripts.
For you to say that this technology will someday replace reporters makes me think that you're clueless about what reporters do. Do you realize that the biggest parts of a reporter's job are gathering facts and making judgments about 1) which stories are worth reporting, 2) which are the relevant facts about a story and 3) who's lying and who's telling the truth about a story? The actual writing that you see is many times almost incidental to most of what a reporter does. You might not like the judgments that a reporter makes (and I could agree with that in many cases), but software can't go out into the world and talk to people and use judgment and intuition to find information to write about.
As an ex-reporter and editor, I find it laughable that anyone might think this technology will replace reporters. It's sort of like suggesting that machines that can read source code and interpret it can somehow figure out what new software people want and then write it. Both possibilities are equally insane.
The poster incorrectly assumes that this could be used to replace reporters. The problem is that computers have a difficult time generating new text. The methods that computers use to evaulate text (as any user of grammar-check would realize) aren't that great.
In fact, most language models cannot generate even a large portion of English text. Those that do have a good range rarely have good accuracy, because there are many things that we "just don't say that way." This is why when you're talking to a non-native speaker, you often cannot explain why something they said was wrong. This is because there is no real grammar rule against speaking in a given way.
So if we rule out syntax-based models, that just leaves statistical-based models. I worked in a NLP lab during the summer of 2002, and my prof there said that syntax and statistics are like the two sides of the force. Statistics are quick and easy but are seductive. They corrupt you and leave you unable to really think about the language itself. You only think in terms of bigrams and HMMs.
So even though these systems are doing well, they are mostly statistical. Thus, it's hard to get incremental improvement. You have to have larger corpora, and larger corpora usually have more errors, thus defeating any advantage you might get by capturing more aspects of a language.
In my opinion, only with well-developed language models that can effectively generate NL can we get anywhere. Which is what Barzilay is working on, but it's still a long, long, long way off.
----------
I am an expert in electricity. My father held the chair of applied electricity at the state prision.
Obviously this is a developing field. The best models seem to use phrases from the original text, anyway the Mac OSX example above shows that it is useful to users willing to take it with a massive grain of salt, even if we are not into full computational sentience yet.
When it works even a little better it will replace all those awful grade school teachers who assign paraphrasing as a homework assignment. The reporters who might have been replaced by it will have already lost their jobs, except for the ones in AhaIndia of course who will paraphrase for the rest of us, usually at a marginally better level than the machine.
The research is interesting - and I'd like to understand Barzilay's notation is that APL or calculus of statement? - in the paper (pdf) I found on google. Also see the papers on her site.
Of course structured text is easier, and news stories are known to have most of the meat in the beginning, but this is great stuff.
One interesting older system is ThoughtTreasure which was built to understand a story and answer questions about it. The author also did work on news analysis ("NewsForms") too. There are tools out there, I've been making a survey myself too. If anyone has information about practical NLP tools for real world tasks please post.