Paraphrasing Sentences With Software

← Back to Stories (view on slashdot.org)

Paraphrasing Sentences With Software

Posted by ryuzaki0 on Wednesday December 3, 2003 @09:01PM from the university-paraphrase-progress dept.

prostoalex writes "Cornell University researchers are making progress in paraphrasing and "understanding" complete sentences in a software application. Analyzing sentences on the semantic level allows the software application to treat two sentences, expressing similar thoughts and ideas, but written in a different manner, as a single semantic unit. Significant achievements in this area could revolutionize the information searching field."

19 of 203 comments (clear)

Min score:

Reason:

Sort:

This reminds me of the Infocom classics by chewtoy-11 · 2003-12-03 21:06 · Score: 5, Interesting

I always loved the text adventure games by Infocom. They were way ahead of their time, and I have been truly amazed on several occasions by the software's ability to 'understand' what I was asking it to do. Of course I'm sure this is leaps and bounds beyond what was available back then, but it's truly amazing how far ahead of their time they actually were.

There is a mailbox here.

--
C. Griffin
"Can I keep his head for a souvenir?" --Max from Sam 'N Max Freelance Police
comments? by mutagenman · 2003-12-03 21:06 · Score: 1, Interesting

Will this get rid of the 10 people who get +5 informative from stealing the link out of the comment a few spots up.
google? by Anonymous Coward · 2003-12-03 21:07 · Score: 4, Interesting

so would this allow something like google to pick up a phrase and relate it to the results instead of just picking up keywords?
1. Re:google? by millette · 2003-12-03 22:25 · Score: 2, Interesting
  
  Actually, google already does this a little. If I can find an example, I'll reply again. Excite, the old search engine, used to pick out synonyms (well, that's how I heard it explained once) by comparing pages and related content.
2. Re:google? by Frogg · 2003-12-04 02:42 · Score: 2, Interesting
  
  ..also worth noting that Google have recently introduce a very powerful implementation of word stemming. (Yup, this is separate to the synonyms, but is still related)
  
  It's enabled by default - if you want exact match words (like it was a month ago) you need to search for: +keyword
how it can be useful by Dreadlord · 2003-12-03 21:07 · Score: 4, Interesting

one of the ways I can think of to use this technology is to improve search engine capabilities, instead of looking for exactly the same words, search engines then can look for similar sentences, giving more accurate results.
However, after reading the article, I wonder whether the research can be applied to Latin languages, as they did the research on semantic languages.

--
The IT section color scheme sucks.
Hrm by Auckerman · 2003-12-03 21:09 · Score: 3, Interesting

I was too lazy to lazy to read the article so I used the Summarize feature in OS X to parse the sentences down since it seems a bit wordy.

Okay, maybe I exaggerate a bit here, I did read the article and while the summarize isn't that far off from what these guys are doing...

--

Burn Hollywood Burn
Google News? by cryptor3 · 2003-12-03 21:09 · Score: 4, Interesting

I'm curious as to whether Google News, since it draws from various news sources and groups articles by topic (similar to paraphrasing, perhaps), uses any of the same techniques.
Re:The problem is... by Anonymous Coward · 2003-12-03 21:12 · Score: 1, Interesting

80% sounds a bit high. Did you make it up, or is there a source for it?

I doubt that any system designed to deal with idioms would be programmed with every idiom. More likely, they would take a huge corpus of text and do tons of statistical manipulations to it, such that idioms would be roughly equivalent to non-idiomatic phrases expressing the same concept.
Paraphrased version by Anonymous Coward · 2003-12-03 21:14 · Score: 1, Interesting

Maybe prostoalex could learn something from the Cornell researchers! How about this for an article summary, eh?

Cornell University researchers could revolutionize the information searching field by analyzing sentences on the semantic level to allow a software application to treat two sentences, expressing similar thoughts and ideas but written in a different manner, as a single semantic unit.
It's been done by CanadaDave · 2003-12-03 21:20 · Score: 2, Interesting

Microsoft Word had AutoSummarize in Word 97, or was it 2000? Anyhow it seems to be absent in Word XP. It was the trashiest thing I'd ever seen. Actually I used to use it all the time to write my abstract. It provided a nice way for me remember everything I talked about in my report, and I think it made an effort to use keywords words which came up a lot in the report. But sometimes it did things which made no sense at all. Too bad Microsoft wasn't Open Source, their AutoSummarize feature might actually be half decent by the year 2003, but instead the abandonned it to work on other projects I guess.
I looked again and whaddayaknow? I asked the paperclip about auto summarize and it is still there in the toold menu afterall! Looks like I don't have that feature installed though.
Re:Fascinating read by Anonymous Coward · 2003-12-03 22:05 · Score: 1, Interesting

They didn't like people using some of the odder Unicode characters to do page widening tricks, and stuff. It's a shame, because some of these extra characters were quite pretty.
Re:Fascinating read by Trejkaz · 2003-12-03 23:07 · Score: 3, Interesting

I guess you could try using Esperanto or Lojban as your intermediary language. Lojgan in particular is computer parseable *and* human understandable, so it would probably be the easiest to write translations for.

--
Karma: It's all a bunch of tree-huggin' hippy crap!
LOLITA? by spongman · 2003-12-03 23:47 · Score: 2, Interesting

can anyone else shed any light into how far the LOLITA project (under Roberto Garigliano) got at Durham Unversity? Yeah, it's a research project, but last I heard (10 years ago) it was able to parse complete texts (for example, newspaper articles) and answer simple questions based on it. I believe ther was also work underway to make it understand/'speak' chinese/russian. There was also supposed to be some kind of 'script' support which would give it contextual information about certian situations (the common example was what contextual knowlegde do you need to know when you go into a restaurant and how can that knowledge help you understand what is said there).
Spamfilter by Goodbyte · 2003-12-04 00:02 · Score: 3, Interesting

Shouldn't this make it possible to improve spam filters?
Re:My take on this by ideonode · 2003-12-04 01:00 · Score: 2, Interesting

Yes, but strcmp can say two strings are identical, yet they can convey different information. Big-endian vs. little-endian, anyone?

Binary identity does not imply semantic equivalence. It all depends on how the data is interpreted.
Schoolkids by Azghoul · 2003-12-04 01:51 · Score: 2, Interesting

My guess is any slick technology set up with this will let plagiarism run rampant.

Google translator already let my sister-in-law "cheat" on a German paper, but the translation was "too good" so she got caught. Paraphrasing that's excellent (obviously would take a while, but what the hell, we can play Apple II games on a Palm not 20 years later....) could be real messy.
Call Infocom! by Hoi+Polloi · 2003-12-04 02:43 · Score: 2, Interesting

Just think of the ramifications this will have for Zork. Now I'll be able to say "Will you just open the damn egg?"

--
It is by the juice of the coffee bean that thoughts acquire speed, the teeth acquire stains. The stains become a warning
How I do this in my product by MarkWatson · 2003-12-04 03:24 · Score: 3, Interesting

I use a fairly effective algorithm to do this in my product:
I first classify the text into a category, then weight every word in the text based on how much it contributed to this classification - I then output as a "summary" of the one or two sentences in the original text that most contribute to the classification of the entire text.
Not really sumarization, but useful.
-Mark