Slashdot Mirror


Computers Paraphrase English

AhaIndia submits a link to a story discussing computerized paraphrasing of English news articles. This technology, destined to eventually replace most reporters with very small shell scripts, is thankfully still in its infancy.

31 of 212 comments (clear)

  1. hmm... soudns familiar... by Dorothy+86 · · Score: 4, Funny
    This technology, destined to eventually replace most reporters with very small shell scripts

    This shirt?

  2. Automated slashdot? by TwistedSquare · · Score: 3, Funny
    This technology... thankfully still in its infancy.

    So one day instead of complaining against michael and co., everyone will be moaning about someone else's code - seems more appropriate for a nerd site somehow ;)

    1. Re:Automated slashdot? by Steve+Franklin · · Score: 3, Insightful

      "It might even know the proper use of to/too and your/you're."

      Yeah, but can it manage to use "There are" instead of "There is" with a plural subject?

      Actually, the long known solution to most of these *oh so difficult* translation problems is to translate everything into a neutral interlanguage like Interlingua and then translate that into other languages, sending the interlingua version along for the ride, thus preventing degradation in further translations. Then all that local linguists have to concentrate on is ONE set of problems: translating their local language into and out of Interlingua, and Interlingua, being tightly defined, is much easier to machine translate into and out of other languages. So...all this lunacy of trying to machine translate Chinese into English, German, Hungarian, Estonian...--you get the picture--is an incredible waste of time and resources and isn't the best way to solve the problem.

      --
      Hic iacet Arthurus, rex quondam rexque futurus.
  3. fox_news.sh by sinclair44 · · Score: 5, Funny

    #!/bin/sh curl $1 > paraphrase > slant -patriotic -stupid > fox_news_story.txt

    --
    Omnes stulti sunt.
    1. Re:fox_news.sh by drakaan · · Score: 4, Funny
      perl makestory.pl -slant "liberal dem party-line" -severity "raving" -subject "Cheney Halliburton motives"

      Fair is fair ;)

      --
      "Murphy was an optimist" - O'Toole's commentary on Murphy's Law
    2. Re:fox_news.sh by drakaan · · Score: 3, Insightful

      Well, some of us (me, for instance) listen to fox news and NPR...my own personal take on fair and balanced...and see that the party line is alive and well on both major sides of the political fence. That's part of the reason I'll never be a democrat or a republican (or a libertarian, or any other label you want to stick on a like-minded group of people). The news has information in it. Look for it, compare notes, and make up your own mind what's news.

      --
      "Murphy was an optimist" - O'Toole's commentary on Murphy's Law
    3. Re:fox_news.sh by niom · · Score: 4, Funny
      Fair is fair ;)

      Except when immediately followed by "and balanced".

      --
      -- Repeat with me: "There is no right to profits".
  4. Very small shell scripts by bunnyman · · Score: 5, Funny

    Yes, but until it can post duplicate articles with slightly different phrases, it will never replace CowboyNeal!

    1. Re:Very small shell scripts by EvilTwinSkippy · · Score: 3, Funny

      Yes, but a system will not take the place of CowboyNeal until it posts duplicate articles with slighty different phrases!

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    2. Re:Very small shell scripts by matth · · Score: 3, Funny

      After taking the place of CowboyNeal will a system like posting duplicate articles, phrase slightly different? Yes!

    3. Re:Very small shell scripts by EvilTwinSkippy · · Score: 3, Funny

      CowboyNeal's system is posting slighty different phrases. Yes takeing me places!

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
  5. But still.... by AgBullet · · Score: 5, Insightful

    won't you need someone to write the stuff to be paraphrased in the first place?? explain to me how that replaces reporters with small shell scripts.

    1. Re:But still.... by EvilTwinSkippy · · Score: 3, Insightful
      There are reporters? Crap, every other article in my local fishwrap is Rueters, the other half is AP. There are one or two articles for local color, generally homicides or documenting yet more ways our local government is a) corrupt, b) inept, and/or c) playing partisen politics with/against the state goverment.

      By the time it's printed in the "News" its usually pretty old.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
  6. Re:Interesting use of Technology by Tim+C · · Score: 4, Informative

    I've provided search engine functionality to a few sites using Verity's K2 product, which provides a similar piece of functionality. If you (programmatically) ask it to return a summary of each hit, what you get is what it considers to be representative of the document as a whole, not merely the first few lines, or a paragraph, or whatever. It actually works pretty well, but then it should, as (a couple of years ago) it cost almost as much as my house...

  7. The Ultimate Tool For Plagiarism by popo · · Score: 4, Interesting


    All someone has to do now is marry this technology with a term-paper database, and "Hello Original Work!"

    The question will then become, how many different unique "paraphrases" can the system ultimately generate?

    --
    ------ The best brain training is now totally free : )
    1. Re:The Ultimate Tool For Plagiarism by SurgeonGeneral · · Score: 3, Interesting

      Yes, we've all heard the arguments against cheating.

      Especially the, 'you're only cheating yourself' one.

      Its irrelevant because this will not affect the way we cheat so much as the way we learn and the way we write. Think about it beyond your personal experience in high school.

      1. On the micro scale, an autosummerize feature like this will allow someone to take another's essay and put their facts into their own words. But I dont see how this makes any difference to the cheater other than saving him an hour. To see this tech as a problem on this level is to ignore the future.

      2. On the medium scale, it will allow someone to take multiple papers, extrapolate all the facts and their sources and then string them together again with their own interpretation. This will allow the learner to come up with a new argument and possibly a fresh insight based on the available information. In this case, it saves the learner a few hours of reading, though he has to do the same amount of thinking and logical reasoning. Is it a shame that the person doesnt have to waste time reading irrelevant information? Still, looking at it on this level is not thinking very deep.

      I take history in university and the essays we have to write are done by data mining books. Lots of books. We have to read large amounts of material in as short a time as possible. We have to find out what is important and what is relevant. Am I really learning how to analyze facts? I dont think so. I am learning how to write university papers and theorize based on incomplete information. I am learning how to make a lot of wasted time look like a lot of work.

      3. The macro scale. What if every book ever written was replicated in full electronically and available for parsing. What if I could extrapolate every fact from every source even remotely relevant to a topic. I'm right back to where I was before : hours and hours of reading. Yet, my argument will be more solid and my information more complete then it ever could be using the outdated method of data mining: looking in the indexes of books. In this case, what am I learning? I am learning how to think. I am learning how to spot holes, inconsistancies, fallacies, and etc. In this case the technology has eliminated cheating altogether because there is no single source to copy from. And if I want to understand how all these facts are related to each other I either have to think about it or read an other authors interpretation of it. (thus I could still cheat in the classical sense)

      4. But lets look at it on one more level, the very tiniest level and the most futuristic. A well constructed paragraph or sentence cant be parsed down, and wouldnt make sense if it was. The facts contained in a paragraph only become important in relation to one another. So in the end, it could just change the way we write. Enough with this puffed up crap, enough with padding your papers - either state whats important or nothing at all. A well constructed essay in the future will be one that cant be "autosummerized" without losing all its intelligability.

      --
      -- "Man is born free, and everywhere he is in chains." Jean Jacques Rousseau
  8. Dupe by greenhide · · Score: 5, Informative

    Unfortunately, there isn't yet a way to use computers to detect dupes.

    Or Is there?!?

    --
    Karma: Chevy Kavalierma.
  9. School Reports by gregfortune · · Score: 3, Insightful

    So, will there be difference between paraphrasing and copying now in an educational setting? Seems like this could make a report pretty easy...

    1) Brainstorm some key points/ideas
    2) Have this program data mine for relavent articles online
    3) Feed sections of each article into the program and have a finished paper

    Granted, the tech isn't quite that powerful yet and probably wouldn't do a whole paper, but it sure looks like it could supply several paragraphs of material per page...

  10. Rethink English ! by Thinkit3 · · Score: 4, Informative

    Lojban is among the more interesting newer languages. It can be parsed just like c! Esperanto is somewhat interesting. English will be regarded in the future as a curious artifact--it was swept along with the technology revolution simply because ASCII didn't include accents and extra marks on letters. Eventually we'll get away from vocalization all together and have purely numerical, written laguages.

    Right now, trying to work with English in computers deals way more with the strangeness of the language than the more interesting issues of cognition that lie underneath.

    --
    -Libertarian secular transhumanist
  11. Fake literature by MAPA3M · · Score: 4, Funny

    Isn't this the way those trashy love novels are written?

  12. Or games... by A55M0NKEY · · Score: 3, Funny

    Someone set up us the bomb!

    --

    Eat at Joe's.

  13. Something similiar existed on the Amiga by Serk · · Score: 3, Interesting

    Back in the late 1980's I had a word processor for my Amiga that had a function whereby it would do a global search and replace of every Xth word (User settable) with a synonym from the built in Theasarus... Very handy for those term papers I so hated in high school...

    I'm assuming this (Of course I didn't RTFA) is far more advanced than what we had back then, but the idea for this has been around for quite a while at least...

    --
    Never ask a geek why, just nod your head and slowly back away. -Rob Malda
  14. Do you know what reporters DO? by DavidinAla · · Score: 4, Insightful

    For you to say that this technology will someday replace reporters makes me think that you're clueless about what reporters do. Do you realize that the biggest parts of a reporter's job are gathering facts and making judgments about 1) which stories are worth reporting, 2) which are the relevant facts about a story and 3) who's lying and who's telling the truth about a story? The actual writing that you see is many times almost incidental to most of what a reporter does. You might not like the judgments that a reporter makes (and I could agree with that in many cases), but software can't go out into the world and talk to people and use judgment and intuition to find information to write about.

    As an ex-reporter and editor, I find it laughable that anyone might think this technology will replace reporters. It's sort of like suggesting that machines that can read source code and interpret it can somehow figure out what new software people want and then write it. Both possibilities are equally insane.

  15. Someone must research a story . . . by kfg · · Score: 5, Interesting

    conduct interviews and generate original copy. These people are called reporters.

    The people who take this copy off the wire and paraphrase it for publication in the local paper are called copy writers.

    This software will reduce the number of copy writers needed, not reporters.

    This is certainly an issue to the copy writers and their families, but overall it's really just a blue collar worker being replaced by a robot issue.

    The idea of a 'style dial' I find a bit more disturbing.

    KFG

  16. Generation isn't that easy by Ezubaric · · Score: 4, Insightful

    The poster incorrectly assumes that this could be used to replace reporters. The problem is that computers have a difficult time generating new text. The methods that computers use to evaulate text (as any user of grammar-check would realize) aren't that great.

    In fact, most language models cannot generate even a large portion of English text. Those that do have a good range rarely have good accuracy, because there are many things that we "just don't say that way." This is why when you're talking to a non-native speaker, you often cannot explain why something they said was wrong. This is because there is no real grammar rule against speaking in a given way.

    So if we rule out syntax-based models, that just leaves statistical-based models. I worked in a NLP lab during the summer of 2002, and my prof there said that syntax and statistics are like the two sides of the force. Statistics are quick and easy but are seductive. They corrupt you and leave you unable to really think about the language itself. You only think in terms of bigrams and HMMs.

    So even though these systems are doing well, they are mostly statistical. Thus, it's hard to get incremental improvement. You have to have larger corpora, and larger corpora usually have more errors, thus defeating any advantage you might get by capturing more aspects of a language.

    In my opinion, only with well-developed language models that can effectively generate NL can we get anywhere. Which is what Barzilay is working on, but it's still a long, long, long way off.

    --

    ----------
    I am an expert in electricity. My father held the chair of applied electricity at the state prision.
  17. The article, summarized by MacOS X by sakusha · · Score: 4, Interesting
    MacOS X has a summarization feature implemented in the Services menu. I decided to summarize the CNet article just to see what I got, and because I like the idea of summarizing an article about summarizing.
    In the famous sketch from the TV show "Monty Python's Flying Circus," the actor John Cleese had many ways of saying a parrot was dead, among them, "This parrot is no more," "He's expired and gone to meet his maker," and "His metabolic processes are now history."
    ...The program gathers text from online news services on specific subjects, learns the characteristic patterns of sentences in these groupings and then uses those patterns to create new sentences that give equivalent information in different words.
    The researchers, Regina Barzilay, an assistant professor in the department of electrical engineering and computer science at the Massachusetts Institute of Technology, and Lillian Lee, an associate professor of computer science at Cornell University, said that while the program would not yield paraphrases as zany as those in the Monty Python sketch, it is fairly adept at rewording the flat cadences of news service prose.
  18. Hardly news... by JayJay.br · · Score: 4, Informative

    This article posted before already tells us all this, the paper that originated it was mentioned in the comments, and this one is another of a series of papers by this researcher.

    OK, nothing else to see here, move on to the next redundant post (Is that paraphrasing 'dupe'?)

  19. Bring on the Machines by DumbSwede · · Score: 4, Interesting
    I don't think many people read the article. While Michael suggest this could replace reporters, it is not about summarizing a whole article, but merely paraphrasing individual sentences and elements. This would be useful for checking for plagiarism where one author has merely line by line paraphrased another. Another useful area is in language translation, where the paraphrasing may make the translation more understandable. I don't think todays translation programs allow you to say the the same thing two or three times, but repeat it back differently (paraphrase) if not understood by your listener the first time.

    Of course the time will come when machines summarize articles, and I believe I have seen where this has already been tried with mixed success. It would be kind of neat to see /. use both a summary engine and a paraphrase engine on submitted articles. Then we could have 3 article descriptions: the posters description; a machine summary of the same article; and a machine paraphrase of the original posters summary.

  20. Typical /. story.. maybe they need the engine? by mattr · · Score: 4, Insightful
    Slashdot needs to implement another new editorial policy: if you have nothing intelligent or really funny/biting to say, don't! An interesting topic with a another half-assed presentation.

    Obviously this is a developing field. The best models seem to use phrases from the original text, anyway the Mac OSX example above shows that it is useful to users willing to take it with a massive grain of salt, even if we are not into full computational sentience yet.

    When it works even a little better it will replace all those awful grade school teachers who assign paraphrasing as a homework assignment. The reporters who might have been replaced by it will have already lost their jobs, except for the ones in AhaIndia of course who will paraphrase for the rest of us, usually at a marginally better level than the machine.

    The research is interesting - and I'd like to understand Barzilay's notation is that APL or calculus of statement? - in the paper (pdf) I found on google. Also see the papers on her site.

    Of course structured text is easier, and news stories are known to have most of the meat in the beginning, but this is great stuff.

    One interesting older system is ThoughtTreasure which was built to understand a story and answer questions about it. The author also did work on news analysis ("NewsForms") too. There are tools out there, I've been making a survey myself too. If anyone has information about practical NLP tools for real world tasks please post.

  21. It's unlikely to catch on... by ChunKing · · Score: 3, Insightful

    The main problem is that languages, especially English, are so idiomatic that mechanical translators will be a too much of a disadvantage - take the Babelfish translator for instance.

    Furthermore, the English language is so flexible that just about any word can arbitrarily substitute for anything else - for instance, take 'bad' meaning 'good'.

    It would be impossible to program a machine to be able to understand the full spectrum of idiomatic phrases but the future may lie in employing neural net technologies so that computers can do some limited learning.

    --
    cogito ergo sig...
  22. Columbia News Blaster by Richard+Allen · · Score: 3, Informative

    I believe this was covered in a related Slashdot before regarding to this site: http://www1.cs.columbia.edu/nlp/newsblaster/

    Here is a quote from their site:
    Columbia Newsblaster is a system to automatically track the day's news. There are no human editors involved -- everything you see on the main page is generated automatically, drawing on the sources listed on the left side of the screen.

    Every night, the system crawls a series of Web sites, downloads articles, groups them together into "clusters" about the same topic, and summarizes each cluster. The end result is a Web page that gives you a sense of what the major stories of the day are, so you don't have to visit the pages of dozens of publications.

    Newsblaster is an academic project from the Natural Language Processing group at Columbia University's Department of Computer Science. It is designed to demonstrate the Group's technologies for multidocument summarization, clustering, and text categorization, among others. It is funded under DARPA TIDES and KDD and has been operational online since September 2001.

    Current and future enhancements include international perspectives, multilingual capability, and tracking events across days.