Slashdot Mirror


Computers Paraphrase English

AhaIndia submits a link to a story discussing computerized paraphrasing of English news articles. This technology, destined to eventually replace most reporters with very small shell scripts, is thankfully still in its infancy.

14 of 212 comments (clear)

  1. The Ultimate Tool For Plagiarism by popo · · Score: 4, Interesting


    All someone has to do now is marry this technology with a term-paper database, and "Hello Original Work!"

    The question will then become, how many different unique "paraphrases" can the system ultimately generate?

    --
    ------ The best brain training is now totally free : )
    1. Re:The Ultimate Tool For Plagiarism by EvilTwinSkippy · · Score: 2, Interesting
      Actually you can use topic maps to decompose a body of work into individual statements and then use a set or randomly generated "flavors" to re-constitute the facts into an original work. The rules about what goes where are pretty cut and dry.

      More stuff to help people avoid shitwork, only for humanity to discover our purpose in life IS to do shitwork.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
    2. Re:The Ultimate Tool For Plagiarism by KrispyKringle · · Score: 2, Interesting
      This isn't necessarily the big problem it appears. I've heard of many college professors and high school teachers using automated plagiarism detectors in the news, and that strikes me as stupid, as well. I mean, if a student has to write a paper on _The Bell Jar_, I'm sure he can find one online. But in most classes, you expect some level of familiarity with the students, on part of the teacher. If a kid who sleeps in every class and who's comments tend to be off topic or stupid turns in a paper worthy of The Atlantic Monthly, the teacher ought to realize something is up. Sure, it may not be absolute proof of wrongdoing, but it warrants a talk with the student about his erratic performance.

      College courses might be a bit tougher; there are certainly plenty in which the course is simply too large for the professor to know all the students, but in most courses, the subject matter is novel enough that finding a paper online that's relevant should be pretty difficult.

      I went to a high school with quite a lot of cheating (probably at least half the students engaged in it occasionally or more), and it really did get me. The co-valedictorian was this fat bitch who cheated on a regular basis (and even had been caught at it). And even in college I've seen some things that were borderline or worse. But there are better answers to this than ``let's do everything we can to stop cheaters.''

      First, cheating is symptomatic of misplaced priorities and pressures. The students who cheated the most were the ones who didn't really understand why they should go to school.

      Second, as trite as it sounds, you really are only cheating yourself. The kids who cheated the most in my high school didn't get very far (save perhaps for the co-valedictorian). I never cheated, and (not to toot my own horn, of course) I was the other co-valedictorian, I went to the prestigious school, I had the career opportunities, etc. The thing that always struck me as funny was that most of the kids who cheated didn't do very well anyway.

      And finally, even if some people cheat and get good grades, does it really matter? Your grades aren't relative to others, they are your own. Sure, colleges look at what percentile you are in, but I don't think cheating ever helped anyone that much to begin with. And grades themselves, cheating or no, are pretty meaningless; grade inflation and average GPAs vary enough from school to school as to be useless as objective indicators anyway. You hope colleges can see a bit more into the personality of their applicants than simply the GPA (and if they can't, it's the admissions system, not cheating, that's at fault).

      I guess I'm a bit offtopic now. Ah, well.

    3. Re:The Ultimate Tool For Plagiarism by SurgeonGeneral · · Score: 3, Interesting

      Yes, we've all heard the arguments against cheating.

      Especially the, 'you're only cheating yourself' one.

      Its irrelevant because this will not affect the way we cheat so much as the way we learn and the way we write. Think about it beyond your personal experience in high school.

      1. On the micro scale, an autosummerize feature like this will allow someone to take another's essay and put their facts into their own words. But I dont see how this makes any difference to the cheater other than saving him an hour. To see this tech as a problem on this level is to ignore the future.

      2. On the medium scale, it will allow someone to take multiple papers, extrapolate all the facts and their sources and then string them together again with their own interpretation. This will allow the learner to come up with a new argument and possibly a fresh insight based on the available information. In this case, it saves the learner a few hours of reading, though he has to do the same amount of thinking and logical reasoning. Is it a shame that the person doesnt have to waste time reading irrelevant information? Still, looking at it on this level is not thinking very deep.

      I take history in university and the essays we have to write are done by data mining books. Lots of books. We have to read large amounts of material in as short a time as possible. We have to find out what is important and what is relevant. Am I really learning how to analyze facts? I dont think so. I am learning how to write university papers and theorize based on incomplete information. I am learning how to make a lot of wasted time look like a lot of work.

      3. The macro scale. What if every book ever written was replicated in full electronically and available for parsing. What if I could extrapolate every fact from every source even remotely relevant to a topic. I'm right back to where I was before : hours and hours of reading. Yet, my argument will be more solid and my information more complete then it ever could be using the outdated method of data mining: looking in the indexes of books. In this case, what am I learning? I am learning how to think. I am learning how to spot holes, inconsistancies, fallacies, and etc. In this case the technology has eliminated cheating altogether because there is no single source to copy from. And if I want to understand how all these facts are related to each other I either have to think about it or read an other authors interpretation of it. (thus I could still cheat in the classical sense)

      4. But lets look at it on one more level, the very tiniest level and the most futuristic. A well constructed paragraph or sentence cant be parsed down, and wouldnt make sense if it was. The facts contained in a paragraph only become important in relation to one another. So in the end, it could just change the way we write. Enough with this puffed up crap, enough with padding your papers - either state whats important or nothing at all. A well constructed essay in the future will be one that cant be "autosummerized" without losing all its intelligability.

      --
      -- "Man is born free, and everywhere he is in chains." Jean Jacques Rousseau
  2. Something similiar existed on the Amiga by Serk · · Score: 3, Interesting

    Back in the late 1980's I had a word processor for my Amiga that had a function whereby it would do a global search and replace of every Xth word (User settable) with a synonym from the built in Theasarus... Very handy for those term papers I so hated in high school...

    I'm assuming this (Of course I didn't RTFA) is far more advanced than what we had back then, but the idea for this has been around for quite a while at least...

    --
    Never ask a geek why, just nod your head and slowly back away. -Rob Malda
  3. Someone must research a story . . . by kfg · · Score: 5, Interesting

    conduct interviews and generate original copy. These people are called reporters.

    The people who take this copy off the wire and paraphrase it for publication in the local paper are called copy writers.

    This software will reduce the number of copy writers needed, not reporters.

    This is certainly an issue to the copy writers and their families, but overall it's really just a blue collar worker being replaced by a robot issue.

    The idea of a 'style dial' I find a bit more disturbing.

    KFG

  4. The article, summarized by MacOS X by sakusha · · Score: 4, Interesting
    MacOS X has a summarization feature implemented in the Services menu. I decided to summarize the CNet article just to see what I got, and because I like the idea of summarizing an article about summarizing.
    In the famous sketch from the TV show "Monty Python's Flying Circus," the actor John Cleese had many ways of saying a parrot was dead, among them, "This parrot is no more," "He's expired and gone to meet his maker," and "His metabolic processes are now history."
    ...The program gathers text from online news services on specific subjects, learns the characteristic patterns of sentences in these groupings and then uses those patterns to create new sentences that give equivalent information in different words.
    The researchers, Regina Barzilay, an assistant professor in the department of electrical engineering and computer science at the Massachusetts Institute of Technology, and Lillian Lee, an associate professor of computer science at Cornell University, said that while the program would not yield paraphrases as zany as those in the Monty Python sketch, it is fairly adept at rewording the flat cadences of news service prose.
  5. Re:School Reports by roninmagus · · Score: 2, Interesting

    I do very much hope so; as a computer science major who hhaaatteess general studies classes, I hope very much that the English/History classes which so graciously waste my programming time with useless writings go down the drain. Of course, my website is entirely such useless writings, so I stand trumped.

    However, I did meet my girlfriend and hopefully future wife in Sophomore English at MTSU. Go figure.

  6. yr comment's a journalism integrity question... by geekpuppySEA · · Score: 2, Interesting
    ...not nec a problem to be solved by the code. Which BTW probably are a leetle more complex than small shell scripts, and see a good textbook like Jurafsky and Martin (pub 2000) for why.

    Re journalistic integrity - There's the possibility that a single entity could issue the release to the wire services, they could relase it in some kind of 'compiled' form (where it's just the syntax/semantic relations.) (How this could be different from how releases are issued now is a good question, but I guess there'd have to be reporters on hand to inquire about details... so maybe journalism might be saved after all... but not if templates for information were used, and the templates themselves needed to fill in the missing gaps...)

    You could imagine how each news outlet could receive the relase, and use their own reconstructive code to flesh out the [NP][VP]{NP] ("who did what to who"* scenario) and then write their own story from that.

    Editing scripts could decide what in the story would be details that would shine damaging light on that paper's politics, and then stuff those details in the 37th paragraph that no one reads, write a potentially-misleading headline that would allow for a reading that would tell its readers the exact slant they want to give the story, and DONE - they've printed the ostensible truth, but since few people are going to read the article, they've done their job and done it well.

    "Wait a minute, isn't that what happens now anyway?" Maybe, but now papers can save that much more on spin-sters' salaries. And then there'd be yet more English majors who can't find a job. Go capitalism, yay. *shudder*

    *it's who. not whom. No one has said whom in english for a century or so, and then only because they 'think' it's correct. Anytime I hear someone saying it for real, I shudder to think that they're so neurotic about their grammar that they use something they've been told is right but have never really heard themselves. None of my linguistics profs ever used "whom", EVER. I think they privately hate the word.

    P.S. This entire post have been wrote by a really good scripts.

    --
    Intelligent Design: because MATH is HARD.
  7. Re:fox_news.sh by EvilTwinSkippy · · Score: 2, Interesting
    I'm a big fan of completely blocking out the major new outlets and simply investigating matters on my own. I take a mental highlighter to the actual facts as stated in an article, and disregard the interpretation.

    I have discovered there are very few people actually collecting news. In many cases I boil a dozen or so stories down to a single quote from the same source, or even funnier, one reporter's misinterpretation of another reporter's work. My favorite is when an american reporter writes that "the bomb was detonated from approx 330 feet away." They ripped of someone else's estimate of "about 100 meters."

    You are correct though, anyone who takes what they see or hear at face value is a fool. Regardless of the source.

    --
    "Learning is not compulsory... neither is survival."
    --Dr.W.Edwards Deming
  8. Bring on the Machines by DumbSwede · · Score: 4, Interesting
    I don't think many people read the article. While Michael suggest this could replace reporters, it is not about summarizing a whole article, but merely paraphrasing individual sentences and elements. This would be useful for checking for plagiarism where one author has merely line by line paraphrased another. Another useful area is in language translation, where the paraphrasing may make the translation more understandable. I don't think todays translation programs allow you to say the the same thing two or three times, but repeat it back differently (paraphrase) if not understood by your listener the first time.

    Of course the time will come when machines summarize articles, and I believe I have seen where this has already been tried with mixed success. It would be kind of neat to see /. use both a summary engine and a paraphrase engine on submitted articles. Then we could have 3 article descriptions: the posters description; a machine summary of the same article; and a machine paraphrase of the original posters summary.

  9. Re:Rethink English ! by Just+Some+Guy · · Score: 2, Interesting
    Right now, trying to work with English in computers deals way more with the strangeness of the language than the more interesting issues of cognition that lie underneath.

    That's true. Computer languages that don't stick close to "regular" human expression are very popular and growing quickly. Languages that resemble written English are dwindling rapidly.

    After all, code is meant to be written, not read, and programmers should strive to write such that their work can't be understood by anyone not an expert in the language they're using.

    Put another way: as long as I have to fix other people's code, or I want my boss to be able to read my code without me spending an afternoon explaining it to him, I really hope it doesn't look like a string of line noise. English-like constructs may be distracting for some, but they're pretty handy for the rest of us.

    --
    Dewey, what part of this looks like authorities should be involved?
  10. Re:Interesting use of Technology by znu · · Score: 2, Interesting

    Mac OS X users can select text and choose 'Summarize' from the Services menu in any Cocoa or Services-enabled Carbon application. Summarization is also available to any application programatically through the Find By Content API.

    --
    This space unintentionally left unblank.
  11. Re:Do you know what reporters DO? by shaitand · · Score: 2, Interesting

    "which stories are worth reporting"

    With this technology, ALL of the stories could be reported.

    "which are the relevant facts about a story"

    odd, I myself get very pissed about reporters who don't give ALL the facts. If you mean summarizing, that is EXACTLY what this is supposed to do.

    "who's lying and who's telling the truth about a story"

    That's for the reader to decide. A reporter who makes judgements concerning what they are reporting and expresses their view of the subject is a bad one. At least in terms of news, a review of course is another matter since that is it's entire purpose.

    "You might not like the judgments that a reporter makes"

    A reporter shouldn't be making judgements, this is constant, most reporters do and slant the news toward what THEY believe is the truth, letting their own opinion of the matter interfere with the information they provide me to use to form MY opinion. A reporter should be a fact gather and a writer, nothing more. Gather the facts, put as much information about the subject as possible down in as concise a manner as possible SO THAT I THE READER can decide what it means, who is telling the truth and whether or not it's interesting.