Slashdot Mirror


Building a Searchable Literature Archive With Keywords?

Sooner Boomer writes "I'm trying to help drag a professor I work with into the 20th century. Although he is involved in cutting-edge research (nanotechnology), his method of literature search is to begin with digging through the hundreds of 3-ring binders that contain articles (usually from PDFs) that he has printed out. Even though the binders are labeled, the articles can only go under one 'heading' and there's no way to do a keyword search on subject, methods, materials, etc. Yeah, google is pretty good for finding stuff, as are other on-line literature services, but they only work for articles that are already on-line. His literature also includes articles copied from books, professional correspondence, and other sources. Is there a FOSS database or archive method (preferably with a web interface) where he could archive the PDFs and scanned documents and be able to search by keywords? It would also be nice to categorize them under multiple subject headings if possible. I know this has been covered ad nauseum with things like photos and the like, but I'm not looking at storage as such: instead I'm trying to find what's stored."

1 of 211 comments (clear)

  1. Re:Document Management Software and OCR by electrons_are_brave · · Score: 5, Insightful

    As an ex-librarian, I can give you a professional's answer. You need a professional. But - if that's not possible, then what you are aiming for is a dream, and a huge data entry task to boot. And you will be creating a system that he will never be able to maintain. Aim lower. Ask him - does he want to keep the paper copies or move them all onto computer. Not both. If he wants to keep the paper - it's simple. Weed weed weed. 60% of what anyone holds is rubbish, and if's available online (and I mean in a proper source not a dissapearing link) he'll find it when he needs it. (I'm thinking he can't be using much of it given the difficulty of finding it). So that will leave you with about 20 three-rings out of the hundreds. Number each document, put them in a filing cabinets by MAIN SUBJECT. If you want to spend your life typing then, by all means, use incite, the word referencing system or some simple library freeware to create a db with author, title, journal etc and main subject (or maybe two). If he wants them all digital - same deal. Scan the ones that aren't there. Forget any sort of magic software that will catalogue for you, you crazy dreamer. The best you can do is use incite or some other referencing software to search for and make a record of the ones that have the record available on line. And then type the rest in. Personally, he sounds like a hoarder, so he will probably resist both suggestions. If this is the case then sort the folders into main subject and type a list (bib reference) and stick it to the front of each. At least that will cut down on his search time - but again, it's a lot of typing.