Slashdot Mirror


Open Source Automated Text Summarization?

TrebleJunkie writes "I've spent some time recenting looking for open source projects dealing with Automated Text Summarization -- automatically generating detailed summaries from longer documents -- to no avail. I can find a lot of research papers and several commercial projects, but no open source code or projects? Does anyone out there know of any?"

3 of 38 comments (clear)

  1. The way it is supposed to work! by AllMightyPaul · · Score: 0, Redundant

    It's too bad the Internet didn't keep with the way it was supposed to work back in the beginning with the and such. If that were the case, you could look for headers and body blocks to determine content.

    If you know how the text will be formatted you can have the script look for a proliferation of words that are not things such as "a" or "and" or "of" and perhaps search for common threads, but other than that, I got nothing!

  2. A simple kinda-solution by wickidpisa · · Score: 1, Redundant

    I know this isn't exactly what you are looking for, but I remember SAT prep books that teach you to read the first line of every paragraph to get a quick summary. Granted it works better for the SATs than it does IRL, but it often works pretty well and it's better than nothing. You could whip up a simple perl script to extract the first line of each paragraph in no time.

  3. No Offence inteneded but, Why?? by Why+Should+I · · Score: 0, Redundant
    Can't seem to think about any reason you would possibly want to automate this.
    Surely the whole point to summaries is that they are a shortened version of human-generated english (or whatever human language) that embodies the general context of the document.
    This just seems like one of those things that:
    1. Is best not automated
    2. Probably cheaper done by hiring a clerk to read and summarise, than use a computer
    Seriously though, I can't think of any reason why you would really need to automate this. Is there one?