Slashdot Mirror


Paraphrasing Sentences With Software

prostoalex writes "Cornell University researchers are making progress in paraphrasing and "understanding" complete sentences in a software application. Analyzing sentences on the semantic level allows the software application to treat two sentences, expressing similar thoughts and ideas, but written in a different manner, as a single semantic unit. Significant achievements in this area could revolutionize the information searching field."

203 comments

  1. This translation just got out by Anonymous Coward · · Score: 2, Funny

    Imagine a beowulf cluster of this

    1. Re:This translation just got out by orthogonal · · Score: 0, Redundant

      Imagine a beowulf cluster of this

      Imagine John Ashcroft, Admiral Poindexter, and the National Security Agency using a Beowulf cluster of these to scan everybody's email.

      I wonder if there's a Bayesian filter that picks out athiests, free-thinkers, commies, anti-war activists, and Democrats.

      Pass that list through a geo-locator, and the thought police can be at your door by midnight. (According to Solzhenitsyn, they always knock on your door at midnight.)

    2. Re:This translation just got out by dk.r*nger · · Score: 1

      I wonder if there's a Bayesian filter that picks out athiests, free-thinkers, commies, anti-war activists, and Democrats.

      Hmm.. Why not just criminate anyone sending emails with a subject different from "FW: fw: fw: FW: READ THIS!!! FW: fw: Something cute"

    3. Re:This translation just got out by mattjb0010 · · Score: 1

      No, a translation of a beowulf "cluster" goes something like:
      while (true) { print "So. The Spear-Danes in days gone by and the kings who ruled them had courage and greatness. We have heard of those princes' heroic campaigns..."; }
      # props to Seamus Heaney

    4. Re:This translation just got out by Anonymous Coward · · Score: 0

      All these years and that stale joke is still funny.

    5. Re:This translation just got out by Anonymous Coward · · Score: 0
      Imagine a beowulf cluster of this


      Which has *exactly* the same meaning as "Fuck-off, AC!"

    6. Re:This translation just got out by Anonymous Coward · · Score: 0
      They then employed computational biology techniques to identify sentence templates, or lattices.

      Or rather: string matching techniques (alignment), which are now also employed in computational biology.

  2. The problem is... by Anonymous Coward · · Score: 4, Insightful

    That's there's absolutely nothing formulaic about idioms, which comprise 80% or so of english conversation. A human learns it by years of experience, a computer has to be given programming for every idiom there is.

    1. Re:The problem is... by Anonymous Coward · · Score: 1, Interesting

      80% sounds a bit high. Did you make it up, or is there a source for it?

      I doubt that any system designed to deal with idioms would be programmed with every idiom. More likely, they would take a huge corpus of text and do tons of statistical manipulations to it, such that idioms would be roughly equivalent to non-idiomatic phrases expressing the same concept.

    2. Re:The problem is... by mirko · · Score: 1

      well, if you use a distributed web application with learning capabilities to fill it, I think this could easilly be sorted out.

      --
      Trolling using another account since 2005.
    3. Re:The problem is... by ravydavygravy · · Score: 5, Informative

      a computer has to be given programming for every idiom there is.

      Rubbish - Ever heard of Machine Learning?

      There has been much work on resolving coreferance and named-entity recognition problems has been onging for several years, with the aim being to lead onto full NLP. This research seems interesting in that it takes work from another field (genetic sequence matching) and applies it to an NLP problem. What links them all is that in almost every case, the research involves machine learning at some point... it makes no sense to hand-code millions of case-specific rules, when a machine can learn them faster and better...

      Read their paper and you'll see that indeed it's an unsupervised learning approach - even nicer in that it doesn't require you to label training examples for the algorithm...

      ~D

    4. Re:The problem is... by ravydavygravy · · Score: 1

      There has been much work on resolving coreferance and named-entity recognition problems has been onging for several years,

      And if only I spent as much time on my english usage research.... :-)

      Obviously, I meant:

      There has been much work on resolving coreferance and named-entity recognition problems in recent years,

      ~D

    5. Re:The problem is... by mirko · · Score: 1

      This web site gives a nice example of what I meant in my above post...

      --
      Trolling using another account since 2005.
    6. Re:The problem is... by Anonymous Coward · · Score: 0

      This is true. One of my friends who came over to the UK to learn English had several lessons on common idioms for this reason.

    7. Re:The problem is... by Anonymous Coward · · Score: 0

      I met him. That guy was a total idiom.

    8. Re:The problem is... by MegaHamsterX · · Score: 0, Troll

      People must smoke crack before filling the questions out, it need common sense, but there's nothing common about it.

    9. Re:The problem is... by Anonymous Coward · · Score: 0

      Dude, she's a girl. And a pretty nice one too.

    10. Re:The problem is... by Zardoz44 · · Score: 1
      You are right. It is difficult. Understanding syntax and semantics without a proper context is impossible for humans, nevermind computers. I looked into these problems for a thesis several years ago and focused most of my attention on this idea:

      Link Grammar. (Google Cache since their page isn't responding. )

      I don't remember all the details, but it is basically a program which parses sentences and links all the parts, given a dictionary of rules. For instance, it show which noun links to which verb, hence Link Grammar.

      After getting involved with this I realized how insanely complicated it is to program a computer to understand text that it reads. Machine learning is about the only way this can happen since it has to learn volumes of context surrounding language that we take for granted.

    11. Re:The problem is... by lonb · · Score: 1

      Statistic: 80% of statistics are made up

      --
      "Ain't I a stinka..." - Bugs
    12. Re:The problem is... by amplt1337 · · Score: 1

      On the contrary. Most idioms are built out of conceptual metaphoric models -- such that there are a finite (albeit rather large) number of learned associations that govern language use, as well as most abstract thought.

      If you're interested in this topic, see Lakoff & Johnson, Philosophy in the Flesh (I don't get a kickback) which discusses the embodied mind and the metaphoric nature of human reasoning (as well as language). There may also be some discussion of this in Metaphors We Live By, by the same authors, though I've only read excerpts of that work & it is several years older.

      --
      Freedom isn't free; its price is the well-being of others.
    13. Re:The problem is... by ArgumentBoy · · Score: 1

      >a computer has to be given programming for every idiom there is.

      >>Rubbish - Ever heard of Machine Learning?

      No, I agree that idioms are not the problem. Those can be learned, in just the same way that we all learn what 'sup means, or when we all figured out that "bad" meant good in some contexts.

      One problem is what goes under the title of indirect speech acts. These are utterances whose meaning does not track to the words and their syntactic connections. For instance, "can you pass the salt?" is a request, not a question, but computer-type algorithms choke on this stuff. Another problem is apparent but meaningful violation of Grice's maxims - for example, A says 'can you give me a ride?' and B replies 'my car is in the shop.' A formal algorithm will choke on that, too, but we all understand it.

      The authors are working with journalistic text, which is relatively simple. But as we learned with OCR, 95% accuracy is still a disaster. I doubt that the project will ultimately suceed, though I wish them well.

    14. Re:The problem is... by Anonymous Coward · · Score: 0

      It's 76% of all statistics, everyone knows that... Even a third grader can tell you that 93% of all statistics are made up!

    15. Re:The problem is... by ravydavygravy · · Score: 1

      but computer-type algorithms choke on this stuff.

      Note also however that this is one of the areas in NLP that is recieved a huge amounts of attention over the past two decades - people were producing papers on the recognition of indirect speech acts back in the 80's...

      Explanation-Based Learning of Indirect Speech Act Interpretation Rules

      Now, Speech act theory and its applications is not directly my field of expertese, so maybe someone who does research in that area could let us know what the state of that art achieves these days...

      ~D

    16. Re:The problem is... by MegaHamsterX · · Score: 1

      Mods, go to the site referenced before modding me a troll, pick and object and play the game, did you bother to make an account?

      My post concours with the above posters thought on the subject.

    17. Re:The problem is... by ArgumentBoy · · Score: 1

      I'm not aware of the work you cite, but the url has expired. Can you send me another cite, or even a copy of the paper offline? TIA d-hample@wiu.edu

  3. First use of this technology by mcrbids · · Score: 4, Funny

    I think that the first and best use of this technology would be to help the editors of Slashdot find duplicate articles!

    Think about the possiblities...

    Of course, the biggest problem with that is that there wouldn't be nearly as many cool articles to read!

    --
    I have no problem with your religion until you decide it's reason to deprive others of the truth.
    1. Re:First use of this technology by Dreadlord · · Score: 1

      I know your comment is meant to be a joke, but after thinking about it, I guess using a similar system can give false positives, let's say a story about an event was posted, and then a update regaurding theevent is posted a while later, both will definitely contain many similar sentences.

      --
      The IT section color scheme sucks.
    2. Re:First use of this technology by jkrise · · Score: 1

      I think this technology should be used in the SCO case first. Find out how differently constrtucted programs achive the same result!

      --
      If you keep throwing chairs, one day you'll break windows....
    3. Re:First use of this technology by Arleo · · Score: 2, Insightful

      Or how about removing redundant comments?

    4. Re:First use of this technology by mangu · · Score: 1
      ...a story about an event was posted, and then a update regaurding theevent is posted a while later...


      Yes, but that's exactly what's meant as a "dupe" in slashdot. The story may not be the same, but karma whores still get lots of points by re-posting comments from the earlier story.

    5. Re:First use of this technology by Dreadlord · · Score: 1

      Nope, a dupe is posting exactly the same story because 2 users submitted it with different words, at different times.
      example of dupes:
      id Says 60fps Is Enough For Doom III
      DOOM III to be capped at 60 fps
      an example of what I mean is:
      a story regarding Doom III at QuakeCon is posted, later, a story about a specific feature in Doom III is discussed.
      The second example may not be the best one, but it gives an idea of what I actually mean.

      --
      The IT section color scheme sucks.
    6. Re:First use of this technology by Polo · · Score: 1

      Funny, I was thinking it could read my mail and when I say "this is spam", it would know from then on it would help filter out these mortgage/viagra/etc offers

    7. Re:First use of this technology by Anonymous Coward · · Score: 0

      I think that the first and best use of this technology would be to help the editors of Slashdot find duplicate articles!

      Think about the possiblities...

      Of course, the biggest problem with that is that there wouldn't be nearly as many cool articles to read!

      Slashdot dupes are like barfing. Food and stories just aren't as good the second time around.

  4. This reminds me of the Infocom classics by chewtoy-11 · · Score: 5, Interesting

    I always loved the text adventure games by Infocom. They were way ahead of their time, and I have been truly amazed on several occasions by the software's ability to 'understand' what I was asking it to do. Of course I'm sure this is leaps and bounds beyond what was available back then, but it's truly amazing how far ahead of their time they actually were.

    There is a mailbox here.

    --
    C. Griffin
    "Can I keep his head for a souvenir?" --Max from Sam 'N Max Freelance Police
    1. Re:This reminds me of the Infocom classics by pubjames · · Score: 3, Insightful

      I always loved the text adventure games by Infocom. They were way ahead of their time, and I have been truly amazed on several occasions by the software's ability to 'understand' what I was asking it to do. Of course I'm sure this is leaps and bounds beyond what was available back then, but it's truly amazing how far ahead of their time they actually were.

      Yes. I can't be the only one that is disappointed that text adventure development essentially died. The great limiting factors always used to be memory (with no disc drives, the whole game had to be stored in a very limited amount of memory) and processing speed. Now that we have both of these in abundance it should be possible to write a real "interactive novel", but I guess that will never happen. Shame, it's a great format for cell phones and pdas.

    2. Re:This reminds me of the Infocom classics by TwistedGreen · · Score: 2, Insightful

      Um, infocom's text interface wasn't too complex. I mean, it mostly simple commands in the form "verb + noun."

      > open mailbox

    3. Re:This reminds me of the Infocom classics by Anonymous Coward · · Score: 2, Informative

      That'd be scott adams games.

      Infocom's parser was much better. "Put the big bunch of keys in the blue box under the table." can be parsed by it, for example.

      As the OP said, this isn't near the level of what's mentioned in the article, but it's certainly better than you imply.

    4. Re:This reminds me of the Infocom classics by franklinrh · · Score: 1

      If only Mapquest worked as well as the Infocom games.

      I remember trying to get my friend's character lost by walking randomly when they left the keyboard.

      Meanwhile Mapquest will try to have me circle the block while driving down a straight road. Maybe it's my just desserts!

      --

      --
      Can anyone spare 120 chars? I'm saving mine to buy a link at Fark.
    5. Re:This reminds me of the Infocom classics by blancolioni · · Score: 5, Informative

      Interactive fiction hasn't died, and you can certainly play it on your PDA. Furthermore, it's generally acknowledged that the quality of modern works has surpassed that of Infocom. Baf's guide is probably a good place to dip your toes in, but there's resources all over the place and the annual competition has just finished.

      An interactive novel, at least the kind you're probably thinking about with deeply implemented characters and so forth, is probably AI-complete. It's not about the disk space and processor speed, it's about the inherent trickiness.

    6. Re:This reminds me of the Infocom classics by Anonymous Coward · · Score: 0

      T0 p4r4p|-|r453: I 41w4y5 10v3d +|-|3 +3x+ 4dv3|\|+ur3 g4m35 by I|\|f0(0m. T|-|3y w3r3 w4y 4|-|34d 0f +|-|3ir +im3, 4|\|d I |-|4v3 b33|\| +ru1y 4m4z3d 0|\| 53v3r41 0((45i0|\|5 by +|-|3 50f+w4r3'5 4bi1i+y +0 'u|\|d3r5+4|\|d' w|-|4+ I w45 45ki|\|g i+ +0 d0. Of (0ur53 I'm 5ur3 +|-|i5 i5 134p5 4|\|d b0u|\|d5 b3y0|\|d w|-|4+ w45 4v4i14b13 b4(k +|-|3|\|, bu+ i+'5 +ru1y 4m4zi|\|g |-|0w f4r 4|-|34d 0f +|-|3ir +im3 +|-|3y 4(+u411y w3r3.

    7. Re:This reminds me of the Infocom classics by pubjames · · Score: 1

      Interactive fiction hasn't died

      Yes, I know about the stuff you are talking about.

      it's generally acknowledged that the quality of modern works has surpassed that of Infocom.

      That's the problem... The modern games have only just surpassed games that were created for machines of 12 years ago.

      It's not about the disk space and processor speed, it's about the inherent trickiness.

      Not today, but it was an extremely limiting factor when you are trying to get a whole game into 32Kb of memory.

      Yes, it is a tricky problem. But the problem isn't being addressed - that's my point. Nearly all of the 'modern' interactive fiction I've seen uses engines that haven't changed much in a decade.

    8. Re:This reminds me of the Infocom classics by Sargent1 · · Score: 3, Informative

      There are changes to the various interactive fiction languages to address various problems and shortcomings in the field. The trouble is, most of the easy stuff has been done. What's left now is trying to figure out what hard stuff can be done, or is even worth doing.

      For example, right now most of the languages accept sentences of the form [VERB] [DIRECT OBJECT] [PREPOSITION] [INDIRECT OBJECT]. Occasionally someone suggests, "Why not add adverbs?" The general concensus is that doing so suddenly requires the author(s) to consider a gigantic range of actions (what's the difference in result between "squeeze toothpaste tube slowly" and "squeeze toothpaste tube violently"?), and that, though such parsing can be done, it doesn't add to the world model.

      Nevertheless, even in traditional interactive fiction there is language development going on to increase what can be done. The example I am most familiar with is TADS 3 (http://tads.org/t3dl.htm), which is adding a lot of deeper simulation aspects, such as varying light sources, a better concept of distance, easy ways of getting around the standard atomicity of the world being broken up into discrete rooms, and support for deeper interaction with non-player characters. The big leap here is in giving a ready-made and easy-to-use framework for such advances.

    9. Re:This reminds me of the Infocom classics by Retired+Replicant · · Score: 1

      Yeah, the toothpaste example you used would probably be easier to model realistically in a fully-rendered 3d simulation with an accurate physics engine. A text adventure doesn't model everything. It just provides the most important/significant descriptors and then lets the various reader's imaginations fill in everything else. This is why having the game respond appropriately to a player's actions in a text adventure is so much more complicated than it is in a game with a 3d graphics/physics engine.

    10. Re:This reminds me of the Infocom classics by jafuser · · Score: 1

      I'm still amazed at how they were able to parse some things. I used to throw all kinds of stuff at it to try to make it look dumb, but more often than I expected it handled things quite well.

      Does anyone have any insight into the algorithm they used?

      --
      Please consider making an automatic monthly recurring donation to the EFF
    11. Re:This reminds me of the Infocom classics by JoeBuck · · Score: 1

      It could do a little bit better than this. It understood direct and indirect objects ("give him the orb"), as well as some particles (the difference between "put on" and "put down"), and could figure out some omitted words from context. But they could do this because the situation was so limited.

    12. Re:This reminds me of the Infocom classics by Tablizer · · Score: 1

      Um, infocom's text interface wasn't too complex. I mean, it mostly simple commands in the form "verb + noun."

      That is all that is needed in a porn command interface:

      [verb-bleep] me!

      [verb-bleep] faster!

      [verb-bleep] my [noun-bleep]!

      [verb-bleep] her [noun-bleep]!

      I want to [verb-bleep] you!

      (Optionally append, but ignore "baby")

  5. comments? by mutagenman · · Score: 1, Interesting

    Will this get rid of the 10 people who get +5 informative from stealing the link out of the comment a few spots up.

    1. Re:comments? by mirko · · Score: 2, Funny

      if these people get an "informative" when they paraphrase the article, they should be metamodded to "insightful"...
      but the day the mods will be replaced by parsers, I think I'll get one to post instead of me.

      --
      Trolling using another account since 2005.
  6. google? by Anonymous Coward · · Score: 4, Interesting

    so would this allow something like google to pick up a phrase and relate it to the results instead of just picking up keywords?

    1. Re:google? by millette · · Score: 2, Interesting

      Actually, google already does this a little. If I can find an example, I'll reply again. Excite, the old search engine, used to pick out synonyms (well, that's how I heard it explained once) by comparing pages and related content.

    2. Re:google? by millette · · Score: 5, Informative
      Just discovered this:
      Now when searching Google, you can use a ~ (tilde) to find pages using synonyms of the word you're searching for. For instance, search for:


      css ~help

      and you'll get sites with tutorials, guides, support, etc.
    3. Re:google? by zerblat · · Score: 1

      I seems like Google has started to use stemming (or something similar). If you search for "linux print", it also finds pages containing "linux printing". IOW it considers word with certain suffixes (probably -ing, -s, -ed etc, depending on wordclass) to be equivalent with their stem, i.e. the word with the suffix stripped off. This isn't such an important thing in English, since there aren't so many different suffixes, but it can be very important for more inflective languages.

      --
      Please alter my pants as fashion dictates.
    4. Re:google? by millette · · Score: 1

      Read about it first in slashdot *hehe* Thanks - didn't know that!

    5. Re:google? by Frogg · · Score: 2, Interesting

      ..also worth noting that Google have recently introduce a very powerful implementation of word stemming. (Yup, this is separate to the synonyms, but is still related)

      It's enabled by default - if you want exact match words (like it was a month ago) you need to search for: +keyword

    6. Re:google? by Arslan+ibn+Da'ud · · Score: 1

      For instance, search for:
      css ~help
      and you'll get sites with tutorials, guides, support, etc ...but you won't get DeCSS!

      --

      Practice Kind Randomness and Beautiful Acts of Nonsense.

  7. how it can be useful by Dreadlord · · Score: 4, Interesting

    one of the ways I can think of to use this technology is to improve search engine capabilities, instead of looking for exactly the same words, search engines then can look for similar sentences, giving more accurate results.
    However, after reading the article, I wonder whether the research can be applied to Latin languages, as they did the research on semantic languages.

    --
    The IT section color scheme sucks.
  8. Hrm by Auckerman · · Score: 3, Interesting

    I was too lazy to lazy to read the article so I used the Summarize feature in OS X to parse the sentences down since it seems a bit wordy.

    Okay, maybe I exaggerate a bit here, I did read the article and while the summarize isn't that far off from what these guys are doing...

    --

    Burn Hollywood Burn
    1. Re:Hrm by plumby · · Score: 1

      Word has a similar feature.

      So for the 1% summarisation of the article "The sentence-based paraphrasing system could improve machine translation, according to Barzilay".

    2. Re:Hrm by Anonymous Coward · · Score: 0

      I assume this is a little more advanced than that. Summarize has been around for a while, and is implemented in MS Word and Abiword for sure (perhaps OOo as well). Hopefully this is the next step.

  9. Google News? by cryptor3 · · Score: 4, Interesting

    I'm curious as to whether Google News, since it draws from various news sources and groups articles by topic (similar to paraphrasing, perhaps), uses any of the same techniques.

    1. Re:Google News? by mghiggins · · Score: 1

      That's an interesting question: in the field of language comprehension, is the cutting edge of research in academia or in industry?

      Anyone know?

      --
      All opinions expressed herein are not my own; I haven't had free will since last year when aliens ate my brain.
    2. Re:Google News? by Chalybeous · · Score: 1

      No idea, sorry; however, I was thinking this technology might eventually enable news avatars like Ananova (or Ray Kurzweil's "Ramona" avatar) to assimilate and rephrase information on the fly, without need for humans to write the stories.
      As an English Lit student, this would be a total boon. "Computer, Jane Eyre - ten minute precis, then compile all listed journal articles on issues of gender and class for an essay."
      (Not that I'd have the computer write it for me, but given the amount of data I have to wade through, it would be a useful summary tool!).

      --

      "It is dark. You are likely to be eaten by a grue." -- Zork

    3. Re:Google News? by otisg · · Score: 1

      This is simple.
      The cutting edge is in the academia. With time, that cutting edge moves to the industry realm. In parallel, the academia moves and stays ahead.

      Consider Google. What is one of the things they take pride in, and one of the things that makes Google so good? Its people. Who are its people? A large percentage of Doctors (PhDs). That's academia that moving to industry.

      --
      Simpy
    4. Re:Google News? by Kappelmeister · · Score: 4, Informative

      I'm curious as to whether Google News, since it draws from various news sources and groups articles by topic (similar to paraphrasing, perhaps), uses any of the same techniques.

      No, but Regina Barzilay, who is the researcher featured in the article, worked (with me) on the Newsblaster project at Columbia University, where she indeed applied these techniques to multidocument summarization. Newsblaster gathers and clusters news like Google News, but produces more sophisticated summaries.

    5. Re:Google News? by BubbleNOP · · Score: 1

      A counter-question: does it matter? Suppose the cutting edge of research is in academia. Obviously industry is what actually gets products out to the customers. So, even though academia may be further ahead, it takes times to take their ideas and unpolished code and make them work in practical applications, and researchers are often not interested in doing that. So if you want to take something that you can use, you may just forget about academia unless you want to spend lots of time making a product out of stuff from research papers and then suffering the legal consequences...

  10. Deploy on babelfish.altavista.com by Anonymous Coward · · Score: 0

    I hope they make use of this new technology on machine translation sites like Babelfish, because the dreck that Babelfish shoots out is utter shit!

  11. Translation software? by znaps · · Score: 2, Informative

    I'm sure this would improve translation software too, since a paraphrased sentence should be easier to translate into something sensible.

  12. Fascinating read by zhenlin · · Score: 1

    But... I wonder, will it produce 'In Soviet Russia' pseudo-paraphrasing.

    I wonder what its' application could be, other than to detect duplicates... Perhaps, a tool to suggest ways of rewriting sentences? Or maybe part of a more advanced grammar check?

    1. Re:Fascinating read by Jugalator · · Score: 5, Insightful

      I wonder what its' application could be, other than to detect duplicates... Perhaps, a tool to suggest ways of rewriting sentences? Or maybe part of a more advanced grammar check?

      My first thought was translation tools. GOOD translation tools that understand the grammar in the source language, and uses the grammar in the destination language to form the resulting sentence.

      There has been some work on something to solve this problem, where a phrase in language A was translated to some special "universal" code, and then finally to language B. The developers would then need to make the translator translate all languages to the universal code, and vice versa. The universal code could be whatever necessary to make the software as easily as possible be able to preserve the "meaning" of the sentence.

      However, if this is done, the problem could change from this:

      Source: I love hot dogs.
      Destination: Ich liebe heiBe Hunde. (i.e. a literal translation, from Altavista Babelfish) ... to this:

      Source: I love hot dogs.
      Destination: Ich liebe Nahrung. ("I love food")

      In case the universal language wasn't advanced enough and the english -> universal translator conversion was "lossy". So we might exchange our current problem with mangled grammar with lots information.

      Here's a web site about it, and I'm sure there are many more.

      --
      Beware: In C++, your friends can see your privates!
    2. Re:Fascinating read by Jugalator · · Score: 1

      Destination: Ich liebe heiBe Hunde

      Cool, /. doesn't understand Unicode, and not even Latin characters (!) like the german sharp s. Is it still living in the world of 7 bit characters or what? :-O

      --
      Beware: In C++, your friends can see your privates!
    3. Re:Fascinating read by Anonymous Coward · · Score: 1, Interesting

      They didn't like people using some of the odder Unicode characters to do page widening tricks, and stuff. It's a shame, because some of these extra characters were quite pretty.

    4. Re:Fascinating read by Anonymous Coward · · Score: 0

      How hard could it be to build a list of accepted unicode symbols. Maybe just the ones that have html entity equivalents, like è? To me the fact that they didn't do this signifies they don't give a rat's ass about international slashdotters.

    5. Re:Fascinating read by Trejkaz · · Score: 3, Interesting

      I guess you could try using Esperanto or Lojban as your intermediary language. Lojgan in particular is computer parseable *and* human understandable, so it would probably be the easiest to write translations for.

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
    6. Re:Fascinating read by Anonymous Coward · · Score: 0

      Yeah, they are right lazy bastards. It wouldn't be hard at all. Of course, being Open Source, they'd probably want someone else to do it (only to reject that patch from Slashdot's slashcode, I imagine.)

    7. Re:Fascinating read by PingPongBoy · · Score: 1

      I guess you could try using Esperanto or Lojban as your intermediary language. Lojgan in particular is computer parseable *and* human understandable, so it would probably be the easiest to write translations for.

      The name Logban is kind of anti-intuitive isn't it? You can't even keep the spelling consistent.

      --
      Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
    8. Re:Fascinating read by Trejkaz · · Score: 1

      And that's a third spelling. ;-)

      --
      Karma: It's all a bunch of tree-huggin' hippy crap!
    9. Re:Fascinating read by davew2040 · · Score: 1

      Human language is necessarily compact, because spoken communication can be time-consuming if it becomes long-winded. We tend to rely heavily on context.

      Context is always a useful thing, but I see no reason why a computer "universal language" couldn't relax context requirements, and use a bit more information to represent the various grammatical constructs that make up a sentence. Rather than "hot dog" or "food", a more complex object could be used, including perhaps visual and historical information (whatever information the computer deterministically decides is both pertinent and possibly ambiguous).

      Anyway, point being, a computer is not limited to (short) strings, so I'm sure that could be a benefit.

    10. Re:Fascinating read by amplt1337 · · Score: 1
      There has been some work on something to solve this problem, where a phrase in language A was translated to some special "universal" code, and then finally to language B. The developers would then need to make the translator translate all languages to the universal code, and vice versa. The universal code could be whatever necessary to make the software as easily as possible be able to preserve the "meaning" of the sentence.
      Well, at least one linguistic school believes that this is a realistic description of what humans do. See The Language Instinct by Stephen Pinker. (Man, I'm all over the Amazon links today.) He refers to it as "mentalese" -- the idea that our words are converted into a subverbal conceptual level.

      The problem is that, if poorly implemented, this would instead bear an uncomfortable resemblance to the discreditable analytical-philosophy theory of how language works. People haven't been able to create a truly expressive value-free and concept-only language with which to communicate (try translating even a not-too-flowery author like Hemingway into logic propositions), and it would be people who would need to create the "universal idea set".

      Unless, of course, there were a way for the machines to do it statistically...

      --
      Freedom isn't free; its price is the well-being of others.
    11. Re:Fascinating read by Anonymous Coward · · Score: 0

      The only trouble is that Esperanto is overly simple. It only has three tenses, compared to the dozen or so in English. That means that information is lost in the translation. For instance:

      She went.
      She had gone.

      Those are both past tense sentences, but one is simple past and the other in perfect. I don't believe that Esperanto is couple of distinguishing.

  13. YAY! by Anonymous Coward · · Score: 0

    So now
    (all your base...)==(I'm a tard)
    ?

  14. Fascinating by Raindance · · Score: 2, Insightful

    Things like this are what makes academic research Really Cool and allows useful things to come about, Go Cornell.

    I'd note that this is a novel approach, and, for better or for worse, it goes about doing things much differently than our minds do.

    Actually, though, it's closer to how humans understand writing (stringing together atomic words/phrases in an implicit context) than previous statistical methods. ... and I'd relate my 2nd and 3rd paragraph if it wasn't 3am here. Goodnight, slashdot. :)

    RD

  15. Paraphrased version by Anonymous Coward · · Score: 1, Interesting

    Maybe prostoalex could learn something from the Cornell researchers! How about this for an article summary, eh?

    Cornell University researchers could revolutionize the information searching field by analyzing sentences on the semantic level to allow a software application to treat two sentences, expressing similar thoughts and ideas but written in a different manner, as a single semantic unit.

  16. So... by CyberSlugGump · · Score: 1


    Who will be first to post the paraphrased article so I don't have to RTFA?

    1. Re:So... by Anonymous Coward · · Score: 0

      You can just read the comments from the first time slashot posted this story.

    2. Re:So... by Anonymous Coward · · Score: 0

      A paraphrase has already been posted. What's worse, if you do RTFA - someone has gone and paraphrased the entire article.

  17. Does this mean... by Powercntrl · · Score: 1

    The days of "All your base are belong to us" Engrish may soon be over? A brand new AirSoft gun I just purchased has the phrase "No point at the creature" molded into the plastic. Don't get me started on the owners manuals for consumer electronics. Japan needs this software, bad. If it comes at a cost of no more "All your base" jokes, well, that's a cost I think society will have to bear.

    --

    ---
    DRM is like antifreeze, to the MPAA/RIAA it's sweet, to the consumers it's poison.
    1. Re:Does this mean... by Anonymous Coward · · Score: 0

      Wouldn't it work in the opposite direction better?

      "Wash your hands. It's the law." What does that mean?

      **runs through paraphraser**

      "Wet your applause. That thing is the rule." Ah so... That makes much mole sense.

    2. Re:Does this mean... by Adam9 · · Score: 1

      Sounds like the Chinglish Files

    3. Re:Does this mean... by dancingmad · · Score: 1
      Japan needs this software, bad.

      How much Japanese is it that you speak? Get off your high horse before you talk about others.

      --
      "There is no time, sir, at which ties do not matter," Jeeves, (Jeeves and the Impending Doom)
  18. It's been done by CanadaDave · · Score: 2, Interesting
    Microsoft Word had AutoSummarize in Word 97, or was it 2000? Anyhow it seems to be absent in Word XP. It was the trashiest thing I'd ever seen. Actually I used to use it all the time to write my abstract. It provided a nice way for me remember everything I talked about in my report, and I think it made an effort to use keywords words which came up a lot in the report. But sometimes it did things which made no sense at all. Too bad Microsoft wasn't Open Source, their AutoSummarize feature might actually be half decent by the year 2003, but instead the abandonned it to work on other projects I guess.

    I looked again and whaddayaknow? I asked the paperclip about auto summarize and it is still there in the toold menu afterall! Looks like I don't have that feature installed though.

  19. Who didn't think of Reginald Barclay? by philipdl71 · · Score: 1

    Two ideas led to the system, said Regina Barzilay...

    Speaking of natural language recognition, I parsed this sentence from the article as reading, "Two ideas led to the system, said Reginald Barclay ..." :)

  20. Someone help me out here by prockcore · · Score: 5, Funny

    I'm too lazy to read the article.. could someone write some software to paraphrase it for me?

    1. Re:Someone help me out here by Anonymous Coward · · Score: 0

      I'm too lazy to read the article.. could someone write some software to paraphrase it for me?

      Yes: "You lazy bastard. Go fuck self."

  21. My take on this by product+byproduct · · Score: 1

    If strcmp says that two strings are different, but you say that they mean the same thing, then the problem is with your language, not with strcmp.

    1. Re:My take on this by ravydavygravy · · Score: 1

      You're absolutly right - however last time I checked, we all speak some form of natural language, right?

      Would you prefer if we all spoke some sort of langauge governed strictly by some computer-linguistic grammar? I'll get started on the Yacc code right away... :-)

      ~D

    2. Re:My take on this by ideonode · · Score: 2, Interesting

      Yes, but strcmp can say two strings are identical, yet they can convey different information. Big-endian vs. little-endian, anyone?

      Binary identity does not imply semantic equivalence. It all depends on how the data is interpreted.

    3. Re:My take on this by mopslik · · Score: 1

      If strcmp says that two strings are different, but you say that they mean the same thing...

      ...then you might have an excellent example of the "richness" of a language, and not necessarily a "problem" with it. The following sentences would all be different to strcmp, but are semantically the same[*] for all intents and purposes:

      "It's enormous".
      "It's immense".
      "It's massive".
      "It's huge".

      Part of the reason why languages haven't dropped multiple words with the same meaning is that people enjoy using a variety of words to express the same idea. Without this variety, a language seems stale and boring. Consider the classic anecdote about esperanto's lack of adoption due to it being too regular.

      [*] Quibble as you will, but any distinctions between these words are so subtle that they are not used interchangeably in regular English conversation.

    4. Re:My take on this by jrockway · · Score: 1

      > "It's enormous".
      > "It's immense".
      > "It's massive".
      > "It's huge".

      Damn! I have GOT to remember to close the shades before I undress :) But, uh, thanks :)

      --
      My other car is first.
  22. Japanese manuals by Space+cowboy · · Score: 2, Funny

    Finally, auto-translate, then auto-parse can rid us of these "manuals" :-)

    Simon

    --
    Physicists get Hadrons!
  23. Goodbye, Cliff Notes... by IvyMike · · Score: 3, Funny

    Hello, automatic paraphrasing of literature.

    P.S. Just joking, kids. Stay in school!

  24. What about... by millette · · Score: 2, Funny

    Let's see the srtwfaoe cut its tteeh anigist tihs lttilte puzzle! (blatant reference to an older article)

  25. Ob - RvB by Seraphim_72 · · Score: 0, Flamebait


    riiiiipppppp

    What is that? What are you doing?

    I'm paraphrasing. This intro is too long.

    Paraph....well don't paraphrase...don't. Look, I will read whatever is in the script and you just type whatever I say. So just type what ever I say.

    Just type whatever I say.

    No, dont type everthing I say. Just type .. what's .. in the ... damn..

    No! Not everthing... just guh, guh, er, duh, duh...

    That's not funny.

    You're such a cock bite.

    Alright now that - ok, that's gotta - that, take that off, because that is firs ... number one that's offensive, and secondly, I am not a cock bite, seriously. I, I am not a cock bite, that is rude. Just, put the fucking logo ok? Just put up the logo. Assholes.

    cock bite

    --
    Slashdot, where armchair scientists get shouted down and armchair theologians get modded up.
    1. Re:Ob - RvB by ravydavygravy · · Score: 0, Offtopic

      funniest machinima ever....

  26. Another Killer App by varjag · · Score: 4, Funny

    They should use this technology to transcribe legalese into plain English and back. Like, you feed it with "Due to unanticipated circumstances as listed under the terms of the clause 17(a), we may be unable to comply with your request within this and successive fiscal year(s)", and it spits out "bugger off".

    Of course, millions of lawyers worldwide would lose their jobs, but I, being bitten by them, just take it as an added benefit.

    --
    Lisp is the Tengwar of programming languages.
  27. It has to be said ... by B3ryllium · · Score: 1

    Paraphrase THIS!

    (from the I'll-Paraphrase-YOU! department)

  28. I get it. by yo303 · · Score: 1
    Significant achievements in this area could revolutionize the information searching field.

    Significant achievements [GOOD] in this area could revolutionize [IS] the information searching field. [THIS].

    yo.

  29. Finally ... by makapuf · · Score: 4, Funny

    a "-1, redundant" generator.

  30. Forget Research! by eWarz · · Score: 1

    What about true speech recognition? As i understand it this could go a long way towards making speech recognition work effectively. Me: "Computer i want to write an email." Computer: "One moment please."

    1. Re:Forget Research! by Anonymous Coward · · Score: 0

      Me: "Computer i want to write an email." Computer: "One moment please."

      How about:

      Me: "Computer, I want to write an email."
      Computer: "Well whoopee fucking shit!"

  31. Paraphrase of the article. by fven · · Score: 4, Informative

    Without thinking too much about it, we paraphrase all the time. Trying to give a sentence to a computer to reword, is a complicated task.

    At Cornell, University, researchers decided to avail themselves of two different sources of the same news and use computational biology methods to make it possible for computers to automatically paraphrase input sentences. Their first step was to compare the two different sources of the same news.

    Eventually, it is hoped that this research will have benefits in computer processing of natural-language queries, translation engines, and in assisting people with certain types of reading disabilities.

    The project began when two ideas came together, said one of the Cornell researchers, Regina Barzilay. Regina Barzilay is an assistant professor of computer science at the Massachusetts Institute of Technology.

    The vast amount of duplicated content online is a valuable resource for computer systems learning to paraphrase. A number of reporters report the same news but using different wording. The redundant sources of news are able to assist in learning the different ways one piece of information can be paraphrased, as the same basic facts are reported in each. So with these multiple sources, you can sort out the noise and get the facts and then work out different ways of stating those facts.

    Even with similar styles of writing, paraphrasing of sentences is more than just working out ans substituting synonyms. The researchers' provide a couple of common business phrases to illustrate this:

    After the latest Fed rate cut, stocks rose across the board.
    Winners strongly outpaced losers after Greenspan cut interest rates again.

    The next step, was to use computational biology techniques to determine how much in common two sentences had and how closely they were related. The technique used was similar to when biologista are looking to see how close two sets of genes are that may have started from the same seed but then evolved. They are different but have a degree of similarity.

    They important thing was to compare news sources that were written differently but covered the same event. This generated a whole set of word patterns that were kind of the same. This was exactly the core data needed to inform a computer paraphrasing technique.

    The Reuters and AFP news sources were used to test the system. News was selected from English articles produced between September 2000 and August 2002.

    The system developed by the researchers performs two groupings; firstly comparing articles from the same source:

    Word-based clustering methods were used to identify sets of text that had a high degree of overlapping words. This method identified articles that reported distinct acts of violence occuring in Israel and the Palestinian territories.

    Computational biology techniques were then used on these sets of articles to generate lattices or sentence templates for the computer to use. Each lattice contains a number of sets of words that occur in parallel and empty slots where arguments, such as locations, number of fatalities, times and dates can be inserted.

    The challenge was to sort out which lattices were indeed due to different events and which were due to writing variability.

    The researchers were thus able to identify common templates used by journalists to describe similar events. Ie. journalists who take the same article and change or take out a word, add a detail, reverse the sentence and so on are hereby busted.

    One of the templates, or lattices, read: Palestinian suicide bomber blew himself up in NAME on DATE killing NUMBER (other) people and injuring/maiming NUMBER. In addition to the injuring/maiming variable, there are several variables within the name argument: settlement of, coastal resort of, center of, southern city, or garden cafe.

    43 AFP and 32 Reuters templates were thus discovered by the system. The researchers then cross-compared these lattices.

    They compared the

  32. Pleasure-ism by DrewCapu · · Score: 2, Funny

    The next generation of students sure will have it much easier than us. How is a teacher supposed to catch plagiarism with software like that?

    Oh wait...

    Mrs. G: Johnny, come here for a second.
    Johnny: Yes Mrs. G?
    Mrs. G: What did you mean by "Shrub claimed that Basket Hamper and the Hatchets of Sin will be blown out" in your current events report?
    Johnny: Oh, whoops! What I meant to say there was, "Bush says Bin Laden and the Axes of Evil will be defeated." Sorry about that. Darn that defective spell-check and grammar-check!

    1. Re:Pleasure-ism by Anonymous Coward · · Score: 0

      It's "Axis Of Evil". Axis like an axle, you know.

      Not that it was funny either way.

  33. Correctly paraphrasing is a difficult problem by Serious+Simon · · Score: 1

    after reading the article, I wonder whether the research can be applied to Latin languages, as they did the research on semantic languages

    ...is a good example :)

  34. Not a coincidence.. ? by Channard · · Score: 2, Funny
    What's the betting Infogrames code has in fact been reused for this application? Twenty years down the line...

    Auto Greeter Machine: I welcome you to our country, and greet you with open arms. Please enjoy your stay - we have a fine range of tourist facilities, restaurants, bars and so forth. And on a personal note, may I say that you are likely to be eaten by a grue.

    1. Re:Not a coincidence.. ? by Prior+Restraint · · Score: 1

      What's the betting Infogrames code has in fact been reused for this application? Twenty years down the line...

      I think you mean Infocom.

      Infocom --> Activision
      Infogrames --> Atari

  35. Maybe they could add this function by b00le · · Score: 1

    According to The Guardian "In his pre-trial interview, the cannibal said that after eating Brandes he felt much better and more stable. Brandes spoke good English, he said, and since eating him his English had improved."

  36. Hmmm by Anonymous Coward · · Score: 0

    Perhaps it can make sense of Bill Gates talks on security. I know I can't.

  37. Just wait till they try and teach it 1337 by Anonymous Coward · · Score: 0

    7#1$ m@(#1n3 $uXX0rZ

  38. Finally! by Lord+Bitman · · Score: 1

    So now we can run a simple program and it will tell us what the media is really saying, without all their bullshit and padding. That is, it will go through an entire article, and pull out the stupid statistics like death counts. Thus, three pages of bullshiot and padding are reduced to:

    10 People were killed, and 30 injured. Arabs Suck, America is great.

    It took a complex computer program and years of research to figure out that all the news stories could be summed up in 3 lines.

    (don't mark this as troll right-off, read the article first.)

    --
    -- 'The' Lord and Master Bitman On High, Master Of All
    1. Re:Finally! by WuphonsReach · · Score: 1

      Similar to one of the short stories at the start of the Foundation series where the foundationers are visited by a high-ranking official from the old empire.

      He says a lot while he's there, but after they run it through some sort of language processor they find out that he said exactly *zip*.

      Aren't weasel-words fun?

      --
      Wolde you bothe eate your cake, and have your cake?
  39. Obligatory Paraphrases by KoolDude · · Score: 2, Funny


    How do you paraphrase Slashdot ?
    Ans : Dupes for nerds, stuff that matters again and again.

    How do you paraphrase Microsoft Innovation ?
    Ans :

    --
    getSexySig(); /* returns sexy signature */
    1. Re:Obligatory Paraphrases by Lord_Dweomer · · Score: 1
      "How do you paraphrase Microsoft Innovation ? Ans : "

      Apple?

      --
      Buy Steampunk Clothing Online!
  40. I have it installed by Inda · · Score: 1

    Summary
    It's been done by CanadaDave (544515) on Thu December 04, 9:20

    Microsoft Word had AutoSummarize in Word 97, or was it 2000? Anyhow it seems to be absent in Word XP.

    -----

    Fantastic bit of programming there, Bill.

    Not really the same thing Mr. Dave. :)

    --
    This post contains benzene, nitrosamines, formaldehyde and hydrogen cyanide.
  41. Newspeak, anyone? by Dopefish_1 · · Score: 1

    Don't bother with the Yacc code, Orwell already did the work for you. As I recall, eliminating synonyms was one of the primary goals of newspeak.

    --

    #include <sig.h>
  42. coreference by Anonymous Coward · · Score: 0

    nuff said

  43. But could it..... by MegaHamsterX · · Score: 2, Funny

    But could it understand bablefish translations.

  44. The real Question is... by CrystalChronicles · · Score: 1

    when will this thing be ready to 'summarize' whole articles. I'm in senior next year. heh heh

  45. Better idea by richie2000 · · Score: 2, Insightful
    Significant achievements in this area could revolutionize the information searching field.

    Not to mention the increased ability to quickly spot "re-written" bought term papers.

    --
    Money for nothing, pix for free
    1. Re:Better idea by davew2040 · · Score: 1

      And if it's *too* good, then it'll realize that 99% of *legitimate* term papers are unwitting rehashes anyway...

  46. Interesting by Illserve · · Score: 4, Funny

    There's this algorithm called Latent Semantic Analysis which has been under development for quite some time (freely available!). It's quite good at comparing the semantic content of 2 bits of speech based on its database of many thousands of book (in fact you can specify the education level by choosing different databases).

    The output of LSA has been shown to be roughly equivalent to human scorers for examining summary essays produced in tests.

    Point is, that by combining this here paraphrasing algorithm with LSA, we can have computers summarizing text and other computers giving them grades on it. This takes students and teachers out of the equation entirely. Saves us big bucks and get public education back on its feet!

    1. Re:Interesting by Illserve · · Score: 1

      silly Rabbit, messed up the link

      It's coloradO.edu

    2. Re:Interesting by Jack+Tanner · · Score: 1

      There's this algorithm called Latent Semantic Analysis which has been under development for quite some time (freely available!).

      LSA is freely available to those who have lotsa $$$. Even if you don't get the original Susan Dumais implementation (from Telcordia -- http://lsi.research.telcordia.com/), the algorithm itself is extremely patented.

    3. Re:Interesting by Anonymous Coward · · Score: 0

      So we can use all the money for public education to buy computers?

  47. SCO Analysis by richie2000 · · Score: 4, Funny
    I tried running this on all statements and press releases coming out of SCO and Darl McBride for the last six months and after a thorough semantic analysis, this is the resulting summary:

    "Pass me the crackpipe, man!"

    Proudly karma-whoring since the turn of the millenium

    --
    Money for nothing, pix for free
    1. Re:SCO Analysis by SurgeonGeneral · · Score: 1

      In all my days I've never actually seen anyone try to fake attach a sentance to their post and pretend its part of their .sig

      guess theres a first time for everything... but why'd you do it?

      --
      -- "Man is born free, and everywhere he is in chains." Jean Jacques Rousseau
    2. Re:SCO Analysis by richie2000 · · Score: 1
      Dunno really. Came to think of it (since I felt like I was just karma-whoring with that comment, even if I don't need it) and thought it'd look cool like that.

      Like all other disasters, it seemed like a good idea at the time. :-)

      Patenting fake .sigs since the turn of the millenium

      --
      Money for nothing, pix for free
    3. Re:SCO Analysis by SurgeonGeneral · · Score: 1

      hehe! You are a funny guy =)

      --
      -- "Man is born free, and everywhere he is in chains." Jean Jacques Rousseau
  48. extracting & searching on memes by crovira · · Score: 1

    and deep contextual dependency.

    Neat trick if they can pull it off. Then Google results would really improve.

    --
    MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
  49. YaY! by MonkeyINAbaG · · Score: 1

    No more Google Ebay ads Selling us shit they really dont sell!
    They can finally target advertising directly ino our brain!

  50. Yes. by Gordonjcp · · Score: 4, Funny

    An American friend of mine was terribly confused by the expression "Crash us a fag, mate".

    1. Re:Yes. by Anonymous Coward · · Score: 0

      There's a story of an Southern American traveling in Russia, and at one point exclaims, "Well I'm just tickled pink."

      The translator thinks for a while, and comes up with, "Scratch me until I bleed."

    2. Re:Yes. by Anonymous Coward · · Score: 0

      I was more confused when a British acquantance went out to have "a smoke and a fag" ... :)

    3. Re:Yes. by JoeBuck · · Score: 1

      Then there are the English women puzzled by the expressions they get from Americans when they say "Knock me up the next time you're in town".

    4. Re:Yes. by Gordonjcp · · Score: 1

      Well, "knocked up" is a common British colloquialism for "pregnant", usually referring to an unwanted pregnancy. Fuck knows what it means to Americans.

  51. I can see it now by castlec · · Score: 1

    Tie this little puppy into a speach recoginition system then jack your DVD jukebox into your computer. Now you have suddenly obtained the ability to search all your pr0n the exact type of scene you like, or your woman likes, to see. Identify, "I like X"; "Give it to me from X"; "You like that X, don't you?"; Seems like a perfect application to me.

    --
    When I tell an object to delete this, am I killing it or telling it to kill me?
  52. Versification ! by nonos · · Score: 1


    A possible application :

    We could feed a technical report to the computer and the output will be pure poetry, imagine Slashdot in verses !

    1. Re:Versification ! by Short+Circuit · · Score: 1

      Balmer's a Microsoft Man
      He plugs them whenever he can

      But with Linux about
      He started to shout
      And his mouth spouted much "slight-of-hand"

  53. LOLITA? by spongman · · Score: 2, Interesting

    can anyone else shed any light into how far the LOLITA project (under Roberto Garigliano) got at Durham Unversity? Yeah, it's a research project, but last I heard (10 years ago) it was able to parse complete texts (for example, newspaper articles) and answer simple questions based on it. I believe ther was also work underway to make it understand/'speak' chinese/russian. There was also supposed to be some kind of 'script' support which would give it contextual information about certian situations (the common example was what contextual knowlegde do you need to know when you go into a restaurant and how can that knowledge help you understand what is said there).

    1. Re:LOLITA? by Anonymous Coward · · Score: 1, Funny

      As you've said, 10 years have passed, so they had to rename the project to BARELYLEGAL

    2. Re:LOLITA? by QwkHyenA · · Score: 1
      Good job there bud. You just reminded 100k geeks they need to check out the Alt.Binaries tonight. And my download rate was just starting to creep up...

      --
      LFS. Have you built your system today?
  54. Re:This is a Pirst Frost! by Anonymous Coward · · Score: 0

    yuo are on TEH SPOKE!!!!!!11

    paraphrased to

    In SOVIET RUSSIA, #other people's messages posted before your own avoid simply duplicating what has already been said by YOU

  55. Spamfilter by Goodbyte · · Score: 3, Interesting

    Shouldn't this make it possible to improve spam filters?

  56. Link for the Reuters Corpus stories. by openmtl · · Score: 1, Informative

    A lot of Reuters stories are available for research purposes as a set corpus. See http://about.reuters.com/researchandstandards/corp us/ for details on this. Perfect and designed for just this sort of work. Also BT a few years back was working on a summariser called Prosum. Don't know what happened to that in the .don churn.

    --

  57. One good use by JayJay.br · · Score: 1

    Actually, this might be a Good Thing for e-learning projects. One great challenge for e-learning is to give precise evaluation automatically. With this, teacher could write his own essay and machines evaluate students' essays taking teacher's as reference.

    Neat stuff. And the paper is really well written, IMHO. The "story" doesn't say enough.

    1. Re:One good use by timjdot · · Score: 1

      Also the problem is inaccuracy. Inaccuracy is fine for most professions but there are some in which it causes serious problems. As we move to using computers that categorize inaccurately then we will lose creativity. This is fine, we are losing human cultural independence even more rapidly than species; so, clearly we are headed to a monopoly of knowledge in which non-conformant thinking is not existable (as computers will omit or mis-classify it out of existence). Already the Internet has become a set of "top pages" as seen from the results of search engines. Knowledge breadth has been greatly lost as 99% of the pages will not show up in the first page of hits. Typically, I suspect, the 99.99% that have something different to say about the subject being searched. Not to mention that many English speakers do not mean what they say semantically. A great example was quoted in one of the books of NLP I read and the irony was the sentence was ill-formed grammatically but being used as an example of how hard semantic analysis is as if the sentence were correct English.

      --
      Expect Freedom.
  58. Good luck... they tired to do this 50 years ago by Anonymous Coward · · Score: 0

    See wittgenstein for a related concept..and why it probably won't work. People have tried defining language logicaly for a long time.. the semantics of it never work. Ultimately you can't use language to "completely" describe itself.

  59. One small step... by drskrud · · Score: 1

    ...towards a Natural Language Compiler?

  60. Already been done by jez9999 · · Score: 1

    Babelfish already does this.

  61. Re:First use of this technology = plagiarism by Tarlbot · · Score: 1

    The first use I thought of was using this software to paraphrase an assignment so you could more easily pass it in as your own work, and it would be more difficult to prove that it was the same as a previous work since it had different words. Possibly the software has some sort of paraphrase signature that would make this possible to detect?

  62. Douglas Adams by squaretorus · · Score: 2, Funny

    Another area in which the world is poorer for the lack of a Douglas Adams wandering (or more likely flying first class) around it.

    I would have LOVED to see him tackle a 'text message adventure' along the lines of the old infocom classics. He has written a number of pieces (some of which are collected in salmon of doubt) about how much he enjoyed this marrage of writing and computing. The flexibility and restrictions of the medium would have led to something pretty neat I'm guessing. Of course - then he'd have pissed another 10 years down the drain discussing making it into a movie with Disney!

    Damn I want to swap to another paralel universe sometimes. One where Adams did EVERYTHING we think he'd have been good at, and where Britney Spears lives next door and cooks me pastries for breakfast on sundays!

    1. Re:Douglas Adams by GigsVT · · Score: 1

      where Britney Spears lives next door and cooks me pastries for breakfast on sundays!

      She doesn't do that for you?

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    2. Re:Douglas Adams by Anonymous Coward · · Score: 0

      Um... did you ever hear of the game Starship Titanic?

    3. Re:Douglas Adams by thatguywhoiam · · Score: 1
      I would have LOVED to see him tackle a 'text message adventure' along the lines of the old infocom classics.

      He did - a game called Starship Titanic was written by Adams, in conjunction with a game developer (Simon & Schuster? can't remember...)

      It combined a text adventure interface with some nice 3D graphics that would move around above the text box, in a Mystian sort of way. The game itself was very funny, had some beautiful designs and ideas, and was almost totally impossible. In other words it was par for the course for ol' Douglas.

      --
      If Jesus wants me it knows where to find me.
    4. Re:Douglas Adams by Frogg · · Score: 1

      I believe there was a Hitch Hikers Guide done by Infocom, but I don't know if Adams was involved in it's production.

      I remember enjoying the game quite a lot--but I was young at the time

    5. Re:Douglas Adams by iNetRunner · · Score: 1

      You mean something like this: http://www.douglasadams.com/creations/infocomjava. html

      --
      Store with salt
    6. Re:Douglas Adams by IMSoP · · Score: 1

      For completeness, Douglas Adams wrote to my knowledge three complete computer games:
      1) an Infocom interactive version of "The Hitch-Hiker's Guide to the Galaxy" (regarded, I believe, as one of the most challenging, but rewarding, works of interactive fiction);
      2) another Infocom game, entitled "Bureaucracy", also highly regarded in the Interactive Fiction community;
      I believe both of these are available from the interactive fiction archive along with software to play them on just about anything.

      and 3) Starship Titanic, which was a kind of exploration of what you could do with CD-ROMs: a mixture of interactive fiction, lush 3D graphics, and a lot of sound files of all the things talking to you...

    7. Re:Douglas Adams by Anonymous Coward · · Score: 0
      Damn I want to swap to another paralel universe sometimes. One where Adams did EVERYTHING we think he'd have been good at, and where Britney Spears lives next door and cooks me pastries for breakfast on sundays!
      With my luck, I'd get a universe where Douglas Adams died prematurely, we'd have Bush and Ashcroft at the helm and Britney Spears at the top of the charts.

      Oh, wait.
    8. Re:Douglas Adams by STrinity · · Score: 1

      1) an Infocom interactive version of "The Hitch-Hiker's Guide to the Galaxy" (regarded, I believe, as one of the most challenging, but rewarding, works of interactive fiction)

      The main reason it was so challenging was that it wasn't free roaming like the Zork series -- instead of being able to go in any direction you like and backtrack as needed, the game presented you with a series of rooms each containing a discrete puzzle that had to be solved before you could proceed. On top of that, some of the puzzles required seemingly inconsequential items from previous levels, and if you forgot them, you'd die -- for example, if you didn't take the mail from your mat at the beginning of the game, you couldn't get a bablefish on the Vogon ship. A minor mistake on one level would kill you ten levels later.

      --
      Les Miserables Volume 1 now up with my reading of
    9. Re:Douglas Adams by squaretorus · · Score: 1

      Ive played starship titanic - excellent piece of kit! I meant 'text message adventure' as in text message to your mobile phone!

  63. I'm saved! by Short+Circuit · · Score: 1

    Now I can get my 2000 word English research paper up to the required 3000 words, and have the required unreadability!

  64. old technology by Grimwiz · · Score: 1

    I used to work at a company called Uniplex. They bought technology that could precis English text. One of their examples was cutting down "Alice in Wonderland" to 10% of its original length. It weighted words according to some magic algorithm that tried to retain the most important phrases.

    Whilst the resulting document was a bit odd, you could certainly use it to remind yourself about the story.

    --
    -- Don't believe everything you read, hear or think
  65. Dictionary by PingPongBoy · · Score: 1

    If I was paraphrasing a passage I don't understand, I would need a dictionary and grammar rules. If the grammar was normal or normalized, I would still need the dictionary.

    So, what would a dictionary for a computer look like? How can basic concepts be defined for computer understanding?

    Would it look perhaps like a Prolog program?

    --
    Know your pads. One time pad: good for cryptography. Two timing pad: where to take your mistress.
  66. Advances in Automatic Text Summarization by fingal · · Score: 4, Informative

    If anyone is interested in the history of this field then I would highly recommend the book with the above title, edited by Inderjeet Mani and Mark T. Maybury. amazon. Lots of very interesting articles, including discourse trees and a brief bit of stuff about summarising non-textual assets such as diagrams, video streams etc etc

    --

    The only Good System is a Sound System

  67. um by Anonymous Coward · · Score: 0
    Cornell University researchers are making progress in paraphrasing and "understanding" complete sentences

    maybe they should work on this first before building the app.

  68. Schoolkids by Azghoul · · Score: 2, Interesting

    My guess is any slick technology set up with this will let plagiarism run rampant.

    Google translator already let my sister-in-law "cheat" on a German paper, but the translation was "too good" so she got caught. Paraphrasing that's excellent (obviously would take a while, but what the hell, we can play Apple II games on a Palm not 20 years later....) could be real messy.

  69. Wow... by davew2040 · · Score: 1

    With technology like this, we could probably compress the Internet into about 200 or so unique sites!

    We might even arrive conclusively at the twenty or so keywords that compromise 99% of Slashdot posts. Oh heck, I'll even give it a partial headstart: "Linux, Linus, MPAA, RIAA, SCO, RTFA, Gates, Lucas, outrage, Rings, Rockets, RMS"

  70. What it'll actually be used for: by domovoi · · Score: 1
    "Significant achievements in this area could revolutionize the information searching field..."

    Significant achievements in this area will revolutionize the lazy plagiarist field.

    1. Re:What it'll actually be used for: by Retired+Replicant · · Score: 1

      That is the first thing I thought of. Another way for college students to cheat...just what the world needs. On the other hand, if the machines can actually start understanding stuff, we'll all be obsolete and irrelevant anyway.

  71. Language-use is dynamic, language is not by eyenot · · Score: 0

    One will be required to think and to phrase oneself alike Ray Romano or Paddington Bear in order for software to fully 'understand', and for one to understand the software's response. Which sucks. Why bother trying? Are we really up to seeing if 'language rule updates' can keep up with changes in actual language? Or, will we find that language stagnates just because somebody makes a dictionary vocally conversational?

    --
    "Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
    1. Re:Language-use is dynamic, language is not by eyenot · · Score: 0

      by 'actual language' i meant language-use. contextually this is obvious since i was contrasting it to the idea of 'language rules' which i portrayed as obviously stagnant, illustrated by analogy of 'updates'. wow, it would suck having to explain contexts like this. cornell must suck.

      --
      "Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
  72. meme generation by eyenot · · Score: 0

    it not ain't if i say 'boo-ya' up to the shizzo!! phooonkeee-pbbbt! mess with it.

    --
    "Stratigraphically the origin of agriculture and thermonuclear destruction will appear essentially simultaneous" -- Lee
  73. Call Infocom! by Hoi+Polloi · · Score: 2, Interesting

    Just think of the ramifications this will have for Zork. Now I'll be able to say "Will you just open the damn egg?"

    --
    It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
  74. For the lazy, or interested, a summary via OS X! by 2nd+Post! · · Score: 4, Informative
    Set on the lowest setting, a summary of the article is:

    The method could eventually allow computers to more easily process natural language, produce paraphrases that could be used in machine translation, and help people who have trouble reading certain types of sentences.

    At a roughly 10% size:

    The researchers used gene comparison techniques to identify word patterns from different news sources that described the same event.

    The method could eventually allow computers to more easily process natural language, produce paraphrases that could be used in machine translation, and help people who have trouble reading certain types of sentences.

    ...When two reporters describe the same news event, for instance, they may use different details, but they tend to report about the same basic facts, said Barzilay.

    ...you have genes which started from the same kind of seed, and then they change during evolution [but] there is some similarity," said Barzilay.

    ...Given a sentence to paraphrase, the system finds the closest match among one set of lattices, then uses the matching lattice from the second source to fill in the argument values of the original sentence to create paraphrases.

    At a quarter size:

    The researchers used gene comparison techniques to identify word patterns from different news sources that described the same event.

    The method could eventually allow computers to more easily process natural language, produce paraphrases that could be used in machine translation, and help people who have trouble reading certain types of sentences.

    ...When two reporters describe the same news event, for instance, they may use different details, but they tend to report about the same basic facts, said Barzilay.

    ...Second, to sort out sentence similarities, the researchers borrowed techniques from computational biology that determine how closely related organisms are by finding similarities among genes.... you have genes which started from the same kind of seed, and then they change during evolution [but] there is some similarity," said Barzilay.

    ...Lattices are made up of words or parallel sets of words that occur across several examples, and arguments, or slots, where names, dates or number of people hurt or killed occur.

    ...One pattern, or lattice, read: Palestinian suicide bomber blew himself up in NAME on DATE killing NUMBER (other) people and injuring/wounding NUMBER.

    ...Given a sentence to paraphrase, the system finds the closest match among one set of lattices, then uses the matching lattice from the second source to fill in the argument values of the original sentence to create paraphrases.

    ...The researchers' ultimate goal is to use the system to allow computers to be able to paraphrase like humans, and to understand paraphrases, "but that's very far [off]", said Barzilay.

    ...Barzilay's previous work, which used a different technique to paraphrase at the level of words and phrases rather than sentences, is part of the Columbia News Blaster project, which summarizes news stories.

    ...The researchers' system has the potential to accomplish the same thing by taking one human translation and creating 10 paraphrases of it automatically, she said.

    ...The system could be used to produce paraphrases based on a specific model, for example, for phasic readers, who find it difficult to read certain types of phrases, she said.

    ...For example, the system learned incorrectly that "Palestinian suicide bomber" and "suicide bomber" were the same, and that "killing 20 people" is the same as "killing 20 Israelis", said Barzilay.

  75. Online Machinese syntax parser by Jugalator · · Score: 1

    Here's a site demo'ing the Machinese syntax parser. It can build parse trees for sentences you type in where the components in the sentence are separated and related to each other.

    http://www.connexor.com/demos/syntax_en.html

    --
    Beware: In C++, your friends can see your privates!
  76. How I do this in my product by MarkWatson · · Score: 3, Interesting
    I use a fairly effective algorithm to do this in my product:

    I first classify the text into a category, then weight every word in the text based on how much it contributed to this classification - I then output as a "summary" of the one or two sentences in the original text that most contribute to the classification of the entire text.

    Not really sumarization, but useful.

    -Mark

  77. Another site by flicken · · Score: 1
    Here's a web site about it, and I'm sure there are many more.

    Here is another website about a similar idea, Universal Networking Language (UNL).

    --
    20 mil and I will! Learn Esperanto with 20M others.
  78. Plagiarism by gassendi · · Score: 1

    If this works, it would make catching plagiarists almost impossible.

    1) Google the paper topic
    2) Cut'n'paste
    3) Run it through the Cornell application
    4) Turn it in
    5) Collect the grades

    Doubtless Cornell University researchers are already modifying this to create software for catching plagiarists. A bit like IronPort buying SpamCop?

  79. Re:For the lazy, or interested, a summary via OS X by marciot · · Score: 1

    This post piqued my interest. I don't own a Mac so now I'm curious about how this thing works. I wasn't even aware that there was such a thing as a summarizing algorithm. How does it work? I did a search for "Summary" "OS X" on google and I got no interesting leads. Can anyone give me some pointers to places where I could either play with a summarizing program (maybe a web based one) or learn more about how it works?

  80. Re:For the lazy, or interested, a summary via OS X by bill_mcgonigle · · Score: 1

    Get some text on your screen with a Cocoa app. Say, this post with Safari.

    Select the text.

    Choose from the Application (e.g. Safari) menu in the menubar Services...Summarize.

    The Summary tool pops up. Horray! The sad part is they demoed it at MacWorld Boston '97, and released it in Jaguar, IIRC.

    --
    My God, it's Full of Source!
    OUTSIDE_IP=$(dig +short my.ip @outsideip.net)
  81. Translations by z_gringo · · Score: 1

    Depending on how that develops, it will have a great impact on translation software.

    Imagine, using a computer to translate from one language to another, and end up with a gramatically correct result. That would be amazing..

    --
    -- -- Warning. Do not stare directly at the sun.
    1. Re:Translations by rupert2000 · · Score: 1

      The trick is to get Japanese phrases from translating to: "All your base are belong to us"

  82. Automated Essay Scoring is amazing by Anonymous Coward · · Score: 0

    I think it's a revolution in psychometrics and psychological assessment that's already here and waiting to expand exponentially in use.

    I do research on psychological assessment for a living, and it's amazing to me all the applications NLP has for psychological assessment.

    Already, as you say, existing algorithms and software can produce scoring systems that have the same validity as the average rating of a group of human raters. In most cases, the predictive validity of a AES system exceeds that of a single human rater. That is, the scores assigned by NLP scoring algorithms are more valid, in the sense of better predicting other criteria, than the scores assigned by a single human rater.

    I don't do educational assessment, I work on clinical assessment, and so far I've seen nothing done with NLP. It has the potential to really cause a revolution in clinical psychological assessment.

    For one, it has the potential to lead to a renaissance in use of "projective" tests, which have fallen out of use for good reason. It also has the potential for standardized scoring of clinical interview responses, which is amazing to me.

    Absolutely amazing stuff.

  83. Re:For the lazy, or interested, a summary via OS X by 2nd+Post! · · Score: 1

    Describes it a little, since it's written with Apple's Summarize Service.

    I think Apple uses the service internally in their file indexing and search feature, too!

  84. What About the Summarize Service in OSX? by Smurfboy · · Score: 1


    It does a bit of what this article describes, works wonders,
    has been around for quite some time, and it's amazingly
    accurate; check it out:
    highlight any text in OSX, hit the application menu, then Services and Summarize. Simple!

    k.h.

    --
    k.h.
  85. Automated Plagiarism by SpaceShaver · · Score: 1

    Think what a boon this will be for students and reporters who don't want do their own work. You find the article containing the target subject, plug in the style you want it paraphrased into and let it crank.

    Stealing from one person is called plagiarism. Stealing from many is calld research.

  86. Did they know... by Anonymous Coward · · Score: 0

    that newswires have cross-licensing arrangements. So AFP might well be able to take a Reuters feed (and vice versa) and minimally rewrite it. I'm not sure that they do, but that sort of thing is pretty common - for instance Reuters tend to specialise in finance while AFP are more a media service, so AFP might source some of their finance news from Reuters.

    That'd rather throw out the assumptions behind this research, wouldn't it?

  87. The Real Challenge... by rupert2000 · · Score: 1

    Company representatives said quote, "The real challenge was finding a software developer that hadn't slept through English class and knew how to diagram sentences."

  88. A test case for them... by TheTranceFan · · Score: 1
    Here's a test case transcription from a (purported) real human, Donald Rumsfeld:

    "Reports that say something hasn't happened are interesting to me, because as we know, there are known unknowns; there things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know."

    If the software can summarize that for me, I'm all ears :-)

  89. Re:Second use of this technology by Thing+1 · · Score: 1

    Or how about removing redundant comments?

    --
    I feel fantastic, and I'm still alive.