Slashdot Mirror


Computers Paraphrase English

AhaIndia submits a link to a story discussing computerized paraphrasing of English news articles. This technology, destined to eventually replace most reporters with very small shell scripts, is thankfully still in its infancy.

16 of 212 comments (clear)

  1. But still.... by AgBullet · · Score: 5, Insightful

    won't you need someone to write the stuff to be paraphrased in the first place?? explain to me how that replaces reporters with small shell scripts.

    1. Re:But still.... by EvilTwinSkippy · · Score: 3, Insightful
      There are reporters? Crap, every other article in my local fishwrap is Rueters, the other half is AP. There are one or two articles for local color, generally homicides or documenting yet more ways our local government is a) corrupt, b) inept, and/or c) playing partisen politics with/against the state goverment.

      By the time it's printed in the "News" its usually pretty old.

      --
      "Learning is not compulsory... neither is survival."
      --Dr.W.Edwards Deming
  2. School Reports by gregfortune · · Score: 3, Insightful

    So, will there be difference between paraphrasing and copying now in an educational setting? Seems like this could make a report pretty easy...

    1) Brainstorm some key points/ideas
    2) Have this program data mine for relavent articles online
    3) Feed sections of each article into the program and have a finished paper

    Granted, the tech isn't quite that powerful yet and probably wouldn't do a whole paper, but it sure looks like it could supply several paragraphs of material per page...

  3. Replace reporters?? by dyj · · Score: 2, Insightful

    How is this going to replace reporters? Reporters don't just paraphrase other reports. They actually are supposed to search for stories (hopefully factual!) on their own.

  4. Do you know what reporters DO? by DavidinAla · · Score: 4, Insightful

    For you to say that this technology will someday replace reporters makes me think that you're clueless about what reporters do. Do you realize that the biggest parts of a reporter's job are gathering facts and making judgments about 1) which stories are worth reporting, 2) which are the relevant facts about a story and 3) who's lying and who's telling the truth about a story? The actual writing that you see is many times almost incidental to most of what a reporter does. You might not like the judgments that a reporter makes (and I could agree with that in many cases), but software can't go out into the world and talk to people and use judgment and intuition to find information to write about.

    As an ex-reporter and editor, I find it laughable that anyone might think this technology will replace reporters. It's sort of like suggesting that machines that can read source code and interpret it can somehow figure out what new software people want and then write it. Both possibilities are equally insane.

    1. Re:Do you know what reporters DO? by DavidinAla · · Score: 2, Insightful

      I'm sorry, but you're SO ignorant about the way the process works that I can't begin to correct all of your misunderstandings. If you really and truly believe that it's even possible to give readers ALL of the available information every single day, you're completely unaware of how much information is out there.

      Do you want to report what is on the menu at every restaurant in town every day? What about an attendance list of who made it to school at every school in town? What about the results of every medical test at every medical provider in town? What? You say those things aren't news under normal circumstances? Well, you've just made a judgment about what should be reported. At the most elementary level -- and as simplified absurd examples -- that's the first step of what a reporter learns to do.

      There is far, far, far too much information to do what you propose. The reporter is a "gatekeeper of information," whether he likes it or not. Someone has to decide what's news and what is going to be included and what is going to be cut. SOMEONE MUST MAKES THOSE DECISIONS. Reporters and editors do it every day. You might not always agree with their decisions -- and I'm a huge critic of many decisions made in news organizations today -- but to say that nobody should be making those decisions betrays a lack of understanding of the volume of information available.

  5. Generation isn't that easy by Ezubaric · · Score: 4, Insightful

    The poster incorrectly assumes that this could be used to replace reporters. The problem is that computers have a difficult time generating new text. The methods that computers use to evaulate text (as any user of grammar-check would realize) aren't that great.

    In fact, most language models cannot generate even a large portion of English text. Those that do have a good range rarely have good accuracy, because there are many things that we "just don't say that way." This is why when you're talking to a non-native speaker, you often cannot explain why something they said was wrong. This is because there is no real grammar rule against speaking in a given way.

    So if we rule out syntax-based models, that just leaves statistical-based models. I worked in a NLP lab during the summer of 2002, and my prof there said that syntax and statistics are like the two sides of the force. Statistics are quick and easy but are seductive. They corrupt you and leave you unable to really think about the language itself. You only think in terms of bigrams and HMMs.

    So even though these systems are doing well, they are mostly statistical. Thus, it's hard to get incremental improvement. You have to have larger corpora, and larger corpora usually have more errors, thus defeating any advantage you might get by capturing more aspects of a language.

    In my opinion, only with well-developed language models that can effectively generate NL can we get anywhere. Which is what Barzilay is working on, but it's still a long, long, long way off.

    --

    ----------
    I am an expert in electricity. My father held the chair of applied electricity at the state prision.
  6. Re:fox_news.sh by drakaan · · Score: 3, Insightful

    Well, some of us (me, for instance) listen to fox news and NPR...my own personal take on fair and balanced...and see that the party line is alive and well on both major sides of the political fence. That's part of the reason I'll never be a democrat or a republican (or a libertarian, or any other label you want to stick on a like-minded group of people). The news has information in it. Look for it, compare notes, and make up your own mind what's news.

    --
    "Murphy was an optimist" - O'Toole's commentary on Murphy's Law
  7. Re:Rethink English ! by TwistedSquare · · Score: 2, Insightful
    English will be regarded in the future as a curious artifact

    One man's informative is another man's troll... Esperanto was interesting and look where it got. Nowhere. People will speak in what's easiest. English is becoming a de facto standard that will continue to be the most spoken language in the world. People won't use odd designed languages because it will be harder than current languages, which got where they are today though iterative refinement to be the best suited language for us to communicate in.

  8. Paraphrase by JediDan · · Score: 2, Insightful

    Would be nice to be able to summarize + paraphrase large articles and documents. Not all of us have the necessary time to read 20+ page documents.

    It won't replace original works, but it could help reduce a lot of extraneous data on the web :)

    --
    - Dan
  9. Maybe you're not sure what linguists do... by geekpuppySEA · · Score: 2, Insightful
    Hey, don't troll this stuff out quite yet - sure it's future ware right now, but think ahead, and ... more to the point, read some about it. There's more to language and computational linguistics than you might think. Just because your (former) line of work stands to be partially replaced doesn't mean that the technology is insane.

    to wit, there are attributes of register, tone, and modality that can be applied not just to individual sentences, but to entire pieces of text that may be able to indicate a piece's slant, political tone, reading level, and (ahem) ability to incite readers to flame.

    Some of the decision making processes you're talking about that go on during editing and truth judgments admittedly will probably not be computerized. But some of them can.

    The point of the responses here are not to relegate journalism or wordsmithy (as it were) to the level of manual labor, as manual labor has been replaced by machines. But the truth is that machines are more complex now and they're ready to take on more complex tasks. Some things about language are very much NOT a mystery. Code isn't either.

    --
    Intelligent Design: because MATH is HARD.
    1. Re:Maybe you're not sure what linguists do... by DavidinAla · · Score: 2, Insightful

      Maybe you're not clear about the difference between a reporter and an editor.

      It's theoretically possible that an editor could be replaced in some instances by software, but not the reporter. The reporter doesn't have anything to start with -- no sentences for software to analyze. A reporter normally starts with some vague thing like a source in the city clerk's office telling him that some bogus expenditures are being put into the sanitation department budget for next year, but nobody really knows what's going on. It's about rumor and bits and pieces of evidence picked up almost from the wind. The reporter has to follow up on lots and lots of little wisps of nothing and figure out which ones are worth checking out and maybe writing about.

      Software cannot do that. Until there is really perfect AI software -- which I think is so unlikely as to preclude reasonable speculation for the purpose of this conversation -- reporters won't be replaced by software.

  10. Typical /. story.. maybe they need the engine? by mattr · · Score: 4, Insightful
    Slashdot needs to implement another new editorial policy: if you have nothing intelligent or really funny/biting to say, don't! An interesting topic with a another half-assed presentation.

    Obviously this is a developing field. The best models seem to use phrases from the original text, anyway the Mac OSX example above shows that it is useful to users willing to take it with a massive grain of salt, even if we are not into full computational sentience yet.

    When it works even a little better it will replace all those awful grade school teachers who assign paraphrasing as a homework assignment. The reporters who might have been replaced by it will have already lost their jobs, except for the ones in AhaIndia of course who will paraphrase for the rest of us, usually at a marginally better level than the machine.

    The research is interesting - and I'd like to understand Barzilay's notation is that APL or calculus of statement? - in the paper (pdf) I found on google. Also see the papers on her site.

    Of course structured text is easier, and news stories are known to have most of the meat in the beginning, but this is great stuff.

    One interesting older system is ThoughtTreasure which was built to understand a story and answer questions about it. The author also did work on news analysis ("NewsForms") too. There are tools out there, I've been making a survey myself too. If anyone has information about practical NLP tools for real world tasks please post.

  11. It's unlikely to catch on... by ChunKing · · Score: 3, Insightful

    The main problem is that languages, especially English, are so idiomatic that mechanical translators will be a too much of a disadvantage - take the Babelfish translator for instance.

    Furthermore, the English language is so flexible that just about any word can arbitrarily substitute for anything else - for instance, take 'bad' meaning 'good'.

    It would be impossible to program a machine to be able to understand the full spectrum of idiomatic phrases but the future may lie in employing neural net technologies so that computers can do some limited learning.

    --
    cogito ergo sig...
  12. Re:Automated slashdot? by Steve+Franklin · · Score: 3, Insightful

    "It might even know the proper use of to/too and your/you're."

    Yeah, but can it manage to use "There are" instead of "There is" with a plural subject?

    Actually, the long known solution to most of these *oh so difficult* translation problems is to translate everything into a neutral interlanguage like Interlingua and then translate that into other languages, sending the interlingua version along for the ride, thus preventing degradation in further translations. Then all that local linguists have to concentrate on is ONE set of problems: translating their local language into and out of Interlingua, and Interlingua, being tightly defined, is much easier to machine translate into and out of other languages. So...all this lunacy of trying to machine translate Chinese into English, German, Hungarian, Estonian...--you get the picture--is an incredible waste of time and resources and isn't the best way to solve the problem.

    --
    Hic iacet Arthurus, rex quondam rexque futurus.
  13. Re:The Ultimate Tool For Plagiarism by iabervon · · Score: 2, Insightful

    Since the point of term papers is not, in fact, to learn to write term papers, it is likely that, as the production of term papers becomes possible while missing the point, the assignment should be changed to retain the point.

    The ability to do research (of known information, at least) has already been changed by technology. Google, PubMed, and other sites make real literature research possible for high school students with just a web browser, and the kind of slogging through printed books that I learned in high school is now entirely obsolete, like long division. Doing a term paper on the Oneida Community in high school, I was limited to the books in my high school library and town library (and my parent's tableware), and I had to look in card catalogs and chase references to do it. Today, I can just type "Oneida Community" into Google, and I get primary sources, the site's own information, photos, and various essays on the subject. The old skills simply don't produce as good information as is trivially available today.

    Technology replaces the gruntwork in research, and allows a given assignment to take less time finding the information and more time thinking about it. If the point is to teach students to learn new things, shouldn't it be encouraged to eliminate with technology all of the parts which are not part of learning new things, but rather part of demonstrating that you have learned them?

    If you are required to come to a novel conclusion after looking at everything written on your topic, and then argue that position, it doesn't matter how little of the text you personally writing originally; if the result is a logical argument, you must have understood the topic and selected suitable raw material for it.

    I think the real problem with term papers is that you are encouraged to come to the same conclusions that everybody else does, but to put it in your own words. In writing such a term paper, good research will turn up something that is exactly what you want to say. New things aren't part of the assignment at all. The task is essentially to rephrase something that's already well-written, and this task will soon be automatable.