Building a Searchable Literature Archive With Keywords?
Sooner Boomer writes "I'm trying to help drag a professor I work with into the 20th century. Although he is involved in cutting-edge research (nanotechnology), his method of literature search is to begin with digging through the hundreds of 3-ring binders that contain articles (usually from PDFs) that he has printed out. Even though the binders are labeled, the articles can only go under one 'heading' and there's no way to do a keyword search on subject, methods, materials, etc. Yeah, google is pretty good for finding stuff, as are other on-line literature services, but they only work for articles that are already on-line. His literature also includes articles copied from books, professional correspondence, and other sources. Is there a FOSS database or archive method (preferably with a web interface) where he could archive the PDFs and scanned documents and be able to search by keywords? It would also be nice to categorize them under multiple subject headings if possible. I know this has been covered ad nauseum with things like photos and the like, but I'm not looking at storage as such: instead I'm trying to find what's stored."
From the sound of it, you want to verify that your product supports document tagging (not unlike Slashdot's tagging system I guess) so that he can attach his categories to documents as he puts them in (or more likely as you do the manual labor, right?).
... where he could archive the PDFs and scanned documents and be able to search by keywords?
So, my big concern is the part where you said he scans things from books and articles and so some of the PDFs might just be massive images, right? I don't think you're going to find systems with OCR built in so you might have quite the chore on your hands. If you don't have it electronically or if it's just an image electronically, you may have to implement some sort of process for getting a doc into this system so it can be searched, right? Look into GOCR or Tesseract if this is the case.
Also, judging by your nickname ("Sooner Boomer"), you're at the University of Oklahoma. Why in the world would you name yourself after a group of people who not only disobeyed the Indian Appropriation Act but also moved out onto Native American territory before it was officially declared property of the United States? And then you also chose "Boomer" which refers to "white settlers who believed the Unassigned Lands were public property and open to anyone for settlement, not just Indian tribes. Their reasoning came from a clause in the Homestead Act of 1862, which said that any settler could claim 160 acres of public land. Some boomers entered and were removed more than once by the United States Army." If you are a descendant of either a Sooner or a Boomer, I respectfully do not agree with their actions.
My work here is dung.