Slashdot Mirror


Building a Searchable Literature Archive With Keywords?

Sooner Boomer writes "I'm trying to help drag a professor I work with into the 20th century. Although he is involved in cutting-edge research (nanotechnology), his method of literature search is to begin with digging through the hundreds of 3-ring binders that contain articles (usually from PDFs) that he has printed out. Even though the binders are labeled, the articles can only go under one 'heading' and there's no way to do a keyword search on subject, methods, materials, etc. Yeah, google is pretty good for finding stuff, as are other on-line literature services, but they only work for articles that are already on-line. His literature also includes articles copied from books, professional correspondence, and other sources. Is there a FOSS database or archive method (preferably with a web interface) where he could archive the PDFs and scanned documents and be able to search by keywords? It would also be nice to categorize them under multiple subject headings if possible. I know this has been covered ad nauseum with things like photos and the like, but I'm not looking at storage as such: instead I'm trying to find what's stored."

2 of 211 comments (clear)

  1. Personal Document Management by steveha · · Score: 3, Interesting

    I am hoping that someone will make a nice personal document management package as free software.

    If you use Windows, you can buy this:

    http://www.nuance.com/paperport/

    The basic features would be:

    • Scan in a document (group multiple pages into a single PDF)
    • Easily scan a page and insert it into a pre-existing PDF (if you missed a page yesterday, today go back and put it in)
    • OCR the documents and provide an index to allow searching
    • Provide a really convenient photocopier feature (scan+print)
    • Fast and easy. Scan in color, but detect black-and-white and auto-convert to greyscale. Do not pop up any dialogs; when the user clicks on the "Scan!" button, start scanning.
    • Also allow dropping in saved HTML pages, OpenOffice.org documents, etc. Manage the user's saved documents, no matter what kind of documents they are.

    In a perfect world, the GNOME guys and the KDE guys would both start competing over who can make the slickest product and we all would win.

    steveha

    --
    lf(1): it's like ls(1) but sorts filenames by extension, tersely
  2. Comment removed by account_deleted · · Score: 4, Interesting

    Comment removed based on user account deletion