Paraphrasing Sentences With Software
prostoalex writes "Cornell University researchers are making progress in paraphrasing and "understanding" complete sentences in a software application. Analyzing sentences on the semantic level allows the software application to treat two sentences, expressing similar thoughts and ideas, but written in a different manner, as a single semantic unit. Significant achievements in this area could revolutionize the information searching field."
I always loved the text adventure games by Infocom. They were way ahead of their time, and I have been truly amazed on several occasions by the software's ability to 'understand' what I was asking it to do. Of course I'm sure this is leaps and bounds beyond what was available back then, but it's truly amazing how far ahead of their time they actually were.
There is a mailbox here.
C. Griffin
"Can I keep his head for a souvenir?" --Max from Sam 'N Max Freelance Police
Will this get rid of the 10 people who get +5 informative from stealing the link out of the comment a few spots up.
so would this allow something like google to pick up a phrase and relate it to the results instead of just picking up keywords?
one of the ways I can think of to use this technology is to improve search engine capabilities, instead of looking for exactly the same words, search engines then can look for similar sentences, giving more accurate results.
However, after reading the article, I wonder whether the research can be applied to Latin languages, as they did the research on semantic languages.
The IT section color scheme sucks.
I was too lazy to lazy to read the article so I used the Summarize feature in OS X to parse the sentences down since it seems a bit wordy.
Okay, maybe I exaggerate a bit here, I did read the article and while the summarize isn't that far off from what these guys are doing...
Burn Hollywood Burn
I'm curious as to whether Google News, since it draws from various news sources and groups articles by topic (similar to paraphrasing, perhaps), uses any of the same techniques.
80% sounds a bit high. Did you make it up, or is there a source for it?
I doubt that any system designed to deal with idioms would be programmed with every idiom. More likely, they would take a huge corpus of text and do tons of statistical manipulations to it, such that idioms would be roughly equivalent to non-idiomatic phrases expressing the same concept.
Maybe prostoalex could learn something from the Cornell researchers! How about this for an article summary, eh?
Cornell University researchers could revolutionize the information searching field by analyzing sentences on the semantic level to allow a software application to treat two sentences, expressing similar thoughts and ideas but written in a different manner, as a single semantic unit.
I looked again and whaddayaknow? I asked the paperclip about auto summarize and it is still there in the toold menu afterall! Looks like I don't have that feature installed though.
They didn't like people using some of the odder Unicode characters to do page widening tricks, and stuff. It's a shame, because some of these extra characters were quite pretty.
I guess you could try using Esperanto or Lojban as your intermediary language. Lojgan in particular is computer parseable *and* human understandable, so it would probably be the easiest to write translations for.
Karma: It's all a bunch of tree-huggin' hippy crap!
can anyone else shed any light into how far the LOLITA project (under Roberto Garigliano) got at Durham Unversity? Yeah, it's a research project, but last I heard (10 years ago) it was able to parse complete texts (for example, newspaper articles) and answer simple questions based on it. I believe ther was also work underway to make it understand/'speak' chinese/russian. There was also supposed to be some kind of 'script' support which would give it contextual information about certian situations (the common example was what contextual knowlegde do you need to know when you go into a restaurant and how can that knowledge help you understand what is said there).
Shouldn't this make it possible to improve spam filters?
Yes, but strcmp can say two strings are identical, yet they can convey different information. Big-endian vs. little-endian, anyone?
Binary identity does not imply semantic equivalence. It all depends on how the data is interpreted.
My guess is any slick technology set up with this will let plagiarism run rampant.
Google translator already let my sister-in-law "cheat" on a German paper, but the translation was "too good" so she got caught. Paraphrasing that's excellent (obviously would take a while, but what the hell, we can play Apple II games on a Palm not 20 years later....) could be real messy.
Just think of the ramifications this will have for Zork. Now I'll be able to say "Will you just open the damn egg?"
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
I first classify the text into a category, then weight every word in the text based on how much it contributed to this classification - I then output as a "summary" of the one or two sentences in the original text that most contribute to the classification of the entire text.
Not really sumarization, but useful.
-Mark