Computers Paraphrase English
AhaIndia submits a link to a story discussing computerized paraphrasing of English news articles. This technology, destined to eventually replace most reporters with very small shell scripts, is thankfully still in its infancy.
This shirt?
Game Overdrive - Gaming News
So one day instead of complaining against michael and co., everyone will be moaning about someone else's code - seems more appropriate for a nerd site somehow ;)
Google news already uses a similar technique to decide what to put in the summary beneath the headline, it does not paraphrase but it does actually extract a summary.
Also if you have Microsoft Word lying about there is a feature called Auto-summary which is suprisingly good, amost as effective as going through a document yourself looking for the main points.
There is no god
#!/bin/sh curl $1 > paraphrase > slant -patriotic -stupid > fox_news_story.txt
Omnes stulti sunt.
Yes, but until it can post duplicate articles with slightly different phrases, it will never replace CowboyNeal!
won't you need someone to write the stuff to be paraphrased in the first place?? explain to me how that replaces reporters with small shell scripts.
All someone has to do now is marry this technology with a term-paper database, and "Hello Original Work!"
The question will then become, how many different unique "paraphrases" can the system ultimately generate?
------ The best brain training is now totally free : )
Unfortunately, there isn't yet a way to use computers to detect dupes.
Or Is there?!?
Karma: Chevy Kavalierma.
So, will there be difference between paraphrasing and copying now in an educational setting? Seems like this could make a report pretty easy...
1) Brainstorm some key points/ideas
2) Have this program data mine for relavent articles online
3) Feed sections of each article into the program and have a finished paper
Granted, the tech isn't quite that powerful yet and probably wouldn't do a whole paper, but it sure looks like it could supply several paragraphs of material per page...
Lojban is among the more interesting newer languages. It can be parsed just like c! Esperanto is somewhat interesting. English will be regarded in the future as a curious artifact--it was swept along with the technology revolution simply because ASCII didn't include accents and extra marks on letters. Eventually we'll get away from vocalization all together and have purely numerical, written laguages.
Right now, trying to work with English in computers deals way more with the strangeness of the language than the more interesting issues of cognition that lie underneath.
-Libertarian secular transhumanist
~~~
Isn't this the way those trashy love novels are written?
Someone set up us the bomb!
Eat at Joe's.
How is this going to replace reporters? Reporters don't just paraphrase other reports. They actually are supposed to search for stories (hopefully factual!) on their own.
Back in the late 1980's I had a word processor for my Amiga that had a function whereby it would do a global search and replace of every Xth word (User settable) with a synonym from the built in Theasarus... Very handy for those term papers I so hated in high school...
I'm assuming this (Of course I didn't RTFA) is far more advanced than what we had back then, but the idea for this has been around for quite a while at least...
Never ask a geek why, just nod your head and slowly back away. -Rob Malda
AhaIndia submits story discussing paraphrasing of articles. This technology, destined to replace reporters shell, is still in its infancy. Huh, perhaps we'll still need humans after all . . .
For you to say that this technology will someday replace reporters makes me think that you're clueless about what reporters do. Do you realize that the biggest parts of a reporter's job are gathering facts and making judgments about 1) which stories are worth reporting, 2) which are the relevant facts about a story and 3) who's lying and who's telling the truth about a story? The actual writing that you see is many times almost incidental to most of what a reporter does. You might not like the judgments that a reporter makes (and I could agree with that in many cases), but software can't go out into the world and talk to people and use judgment and intuition to find information to write about.
As an ex-reporter and editor, I find it laughable that anyone might think this technology will replace reporters. It's sort of like suggesting that machines that can read source code and interpret it can somehow figure out what new software people want and then write it. Both possibilities are equally insane.
conduct interviews and generate original copy. These people are called reporters.
The people who take this copy off the wire and paraphrase it for publication in the local paper are called copy writers.
This software will reduce the number of copy writers needed, not reporters.
This is certainly an issue to the copy writers and their families, but overall it's really just a blue collar worker being replaced by a robot issue.
The idea of a 'style dial' I find a bit more disturbing.
KFG
The poster incorrectly assumes that this could be used to replace reporters. The problem is that computers have a difficult time generating new text. The methods that computers use to evaulate text (as any user of grammar-check would realize) aren't that great.
In fact, most language models cannot generate even a large portion of English text. Those that do have a good range rarely have good accuracy, because there are many things that we "just don't say that way." This is why when you're talking to a non-native speaker, you often cannot explain why something they said was wrong. This is because there is no real grammar rule against speaking in a given way.
So if we rule out syntax-based models, that just leaves statistical-based models. I worked in a NLP lab during the summer of 2002, and my prof there said that syntax and statistics are like the two sides of the force. Statistics are quick and easy but are seductive. They corrupt you and leave you unable to really think about the language itself. You only think in terms of bigrams and HMMs.
So even though these systems are doing well, they are mostly statistical. Thus, it's hard to get incremental improvement. You have to have larger corpora, and larger corpora usually have more errors, thus defeating any advantage you might get by capturing more aspects of a language.
In my opinion, only with well-developed language models that can effectively generate NL can we get anywhere. Which is what Barzilay is working on, but it's still a long, long, long way off.
----------
I am an expert in electricity. My father held the chair of applied electricity at the state prision.
Re journalistic integrity - There's the possibility that a single entity could issue the release to the wire services, they could relase it in some kind of 'compiled' form (where it's just the syntax/semantic relations.) (How this could be different from how releases are issued now is a good question, but I guess there'd have to be reporters on hand to inquire about details... so maybe journalism might be saved after all... but not if templates for information were used, and the templates themselves needed to fill in the missing gaps...)
You could imagine how each news outlet could receive the relase, and use their own reconstructive code to flesh out the [NP][VP]{NP] ("who did what to who"* scenario) and then write their own story from that.
Editing scripts could decide what in the story would be details that would shine damaging light on that paper's politics, and then stuff those details in the 37th paragraph that no one reads, write a potentially-misleading headline that would allow for a reading that would tell its readers the exact slant they want to give the story, and DONE - they've printed the ostensible truth, but since few people are going to read the article, they've done their job and done it well.
"Wait a minute, isn't that what happens now anyway?" Maybe, but now papers can save that much more on spin-sters' salaries. And then there'd be yet more English majors who can't find a job. Go capitalism, yay. *shudder*
*it's who. not whom. No one has said whom in english for a century or so, and then only because they 'think' it's correct. Anytime I hear someone saying it for real, I shudder to think that they're so neurotic about their grammar that they use something they've been told is right but have never really heard themselves. None of my linguistics profs ever used "whom", EVER. I think they privately hate the word.
P.S. This entire post have been wrote by a really good scripts.
Intelligent Design: because MATH is HARD.
This article posted before already tells us all this, the paper that originated it was mentioned in the comments, and this one is another of a series of papers by this researcher.
OK, nothing else to see here, move on to the next redundant post (Is that paraphrasing 'dupe'?)
Of course the time will come when machines summarize articles, and I believe I have seen where this has already been tried with mixed success. It would be kind of neat to see /. use both a summary engine and a paraphrase engine on
submitted articles. Then we could have 3 article descriptions: the posters description; a machine summary of the same article; and a machine paraphrase of the original posters summary.
Letter To Iran
Would be nice to be able to summarize + paraphrase large articles and documents. Not all of us have the necessary time to read 20+ page documents.
:)
It won't replace original works, but it could help reduce a lot of extraneous data on the web
- Dan
to wit, there are attributes of register, tone, and modality that can be applied not just to individual sentences, but to entire pieces of text that may be able to indicate a piece's slant, political tone, reading level, and (ahem) ability to incite readers to flame.
Some of the decision making processes you're talking about that go on during editing and truth judgments admittedly will probably not be computerized. But some of them can.
The point of the responses here are not to relegate journalism or wordsmithy (as it were) to the level of manual labor, as manual labor has been replaced by machines. But the truth is that machines are more complex now and they're ready to take on more complex tasks. Some things about language are very much NOT a mystery. Code isn't either.
Intelligent Design: because MATH is HARD.
Obviously this is a developing field. The best models seem to use phrases from the original text, anyway the Mac OSX example above shows that it is useful to users willing to take it with a massive grain of salt, even if we are not into full computational sentience yet.
When it works even a little better it will replace all those awful grade school teachers who assign paraphrasing as a homework assignment. The reporters who might have been replaced by it will have already lost their jobs, except for the ones in AhaIndia of course who will paraphrase for the rest of us, usually at a marginally better level than the machine.
The research is interesting - and I'd like to understand Barzilay's notation is that APL or calculus of statement? - in the paper (pdf) I found on google. Also see the papers on her site.
Of course structured text is easier, and news stories are known to have most of the meat in the beginning, but this is great stuff.
One interesting older system is ThoughtTreasure which was built to understand a story and answer questions about it. The author also did work on news analysis ("NewsForms") too. There are tools out there, I've been making a survey myself too. If anyone has information about practical NLP tools for real world tasks please post.
The main problem is that languages, especially English, are so idiomatic that mechanical translators will be a too much of a disadvantage - take the Babelfish translator for instance.
Furthermore, the English language is so flexible that just about any word can arbitrarily substitute for anything else - for instance, take 'bad' meaning 'good'.
It would be impossible to program a machine to be able to understand the full spectrum of idiomatic phrases but the future may lie in employing neural net technologies so that computers can do some limited learning.
cogito ergo sig...
I believe this was covered in a related Slashdot before regarding to this site: http://www1.cs.columbia.edu/nlp/newsblaster/
Here is a quote from their site:
Columbia Newsblaster is a system to automatically track the day's news. There are no human editors involved -- everything you see on the main page is generated automatically, drawing on the sources listed on the left side of the screen.
Every night, the system crawls a series of Web sites, downloads articles, groups them together into "clusters" about the same topic, and summarizes each cluster. The end result is a Web page that gives you a sense of what the major stories of the day are, so you don't have to visit the pages of dozens of publications.
Newsblaster is an academic project from the Natural Language Processing group at Columbia University's Department of Computer Science. It is designed to demonstrate the Group's technologies for multidocument summarization, clustering, and text categorization, among others. It is funded under DARPA TIDES and KDD and has been operational online since September 2001.
Current and future enhancements include international perspectives, multilingual capability, and tracking events across days.
From the article:
The researchers, Regina Barzilay, an assistant professor in the department of electrical engineering and computer science at the Massachusetts Institute of Technology, and Lillian Lee, an associate professor of computer science at Cornell University, said that while the program would not yield paraphrases as zany as those in the Monty Python sketch, it is fairly adept at rewording the flat cadences of news service prose.
Two women came up with this! Why doesn't it surprise me in the least that women are officially researching ways to automate the process of saying the exact same thing in an infinite number of different ways?
LK
"Hi. This is my friend, Jack Shit, and you don't know him." - Lord Kano
...when we can replace upper level management with small shell scripts.
#naabhaprzrag, #sverubfr-000, #agi-fcbafberq, negvpyr[pynff*=' negvpyr-ary-'] { qvfcynl: abar !vzcbegnag; }