Slashdot Mirror


Preserving Old Research Notes and Documents?

twistedcubic asks: "I have several thousand 8.5 x 11 inch dead tree pages of notes and research that takes up too much storage space. I would like to have all these notes scanned into PDF files (for example) so I can recycle the pages and reclaim storage space. Does anyone know of a store that provides this service, or an inexpensive machine that will do the job in a reasonable amount of time?"

101 comments

  1. In a few months time... by Anonymous Coward · · Score: 5, Funny
    In a few months time... coming as a duplicated story post...

    "I have several thousand PDF files taking up too much disk storage space. I would like to have all these files printed on to 8.5 x 11 inch dead tree pages of notes so I can delete the files, empty the recycle bin and reclaim storage space. Does anyone know of a store that provides this service, or an inexpensive machine that will do the job in a reasonable amount of time?"

    For future reference, I suggest a printer.

    --BladeMelbourne

    1. Re:In a few months time... by sribe · · Score: 2, Informative

      Check out the Fuji ScanSnap. Their lowest-end document scanner; but still faster than all the slow consumer-level junk; and comes with a version of Acrobat that will OCR the images and put the text in a "hidden" layer for searching.

    2. Re:In a few months time... by fbjon · · Score: 1
      I can recommend this new IBM printer, it'll only take a few minutes with it.

      Just for future reference.

      --
      True confidence comes not from realising you are as good as your peers, but that your peers are as bad as you are.
  2. Easy by ptaff · · Score: 3, Insightful

    10-year old nephew and a scanner.

    1. Re:Easy by holy+zarquon's+singi · · Score: 1

      Or you can outsource to somewhere the cost of living is extremely cheap, like Cambodia. It's a labour intensive job. I use admin staff when I want this stuff done, but it's not my money I'm paying them with...

      --
      "...we should just trust our president in every decision that he makes and we should just support that." B.Spears 2003
    2. Re:Easy by Anonymous Coward · · Score: 1, Interesting

      I live in Cambodia...it costs about $4.50 to get a quite thick book photocopied and bound.

      I'd imagine it'd cost a similar amount to get them to do the scanning, maybe a discount for bulk (maybe not, because scanning is more labour intensive than photocopying)...you'd probably have to teach them to make PDFs though...not hard.

      If you provided a computer/scanner, it'd probably be cheaper too...then you could just pay somebody $100/month to scan books all day (you could probably pay less...but $100 isn't much in the scheme of things...so might as well make them happy)

      Then you'd have to add the cost of shipping...the cost of bribing the customs officials to get your books from the port...It'd probably only be a couple of cubic metres...so you could get door to door freight.

      But yeah...much cheaper than the US or anywhere developed, but still not that cheap.

    3. Re:Easy by Anonymous Coward · · Score: 0

      I have been planning a move to Cambodia!

      (Specifically to Sihanookville, from Munich Germany, where I am currently an English teacher. [I'm 22].)

      Please email me (rot13'd address: eiventu ng lnubb qbg pbz) if I could ask you a few questions.

      Thanks!!

  3. Lots of Work by Zecritic · · Score: 1, Insightful

    Even if you could scan all of them, are you going to just leave named

    untitled, untitled-1, untitled-2... untitled-3000

    or are you going to rename all of them and organize them in some way? You probably won't find a solution that won't take a lot of time and work.

    --
    "Scientists have proof without certainty; Creationists have certainty without proof" -Ashley Montagu
    1. Re:Lots of Work by Saeed+al-Sahaf · · Score: 1

      Absolutely true, and honestly, "several thousand" pages does not take up that much room. Keep them until you don't need them, than recycle them. Just let it go, it's probably not that important.

      --
      "Who are in control, they are not in control of anything - they don't even control themselves!" - Glen Beck
    2. Re:Lots of Work by twistedcubic · · Score: 1

      Actually, they take up a two 6 foot book cases, and I no longer have room for them, nor do I want to spend money on physical storage.

    3. Re:Lots of Work by akgunkel · · Score: 1

      Um, that's quite a bit more than "several thousand" pages. My 6' cases hold at least 10,000 pages per shelf and have 5 shelves. You could have more than 100,000 pages! You'd better check the duty cycle on whatever hardware you use.

    4. Re:Lots of Work by justforaday · · Score: 2

      You don't want to spend money on physical storage, yet you're asking about a service that will do the job of scanning for you? Here's a hint: for the cost of hiring someone to do this job for you, you can rent a small room at a self-store place for 15-20 years.

      --
      I'll turn into a supernova and burn up everything. Well I'll turn into a black little hole and you'll turn into string.
    5. Re:Lots of Work by dqbiggerfam · · Score: 1

      Will the self storage locker be heated/insulated? Lots of heat/cold and humidity can be severly damaging to paper. While unheated lockers are somewhat cheap, heated ones cost a little more. The digital archive will be easier to copy if needed, anyway.

    6. Re:Lots of Work by twistedcubic · · Score: 1

      The documents I have are actually useful, in that it would be a hindrance to have to drive to a storage space to refer to them. Money is not a problem.

    7. Re:Lots of Work by mikael · · Score: 1

      You can always place them in named directories - just by using the thumbnail images you can sort the files out without having to rename them all.And you can always organise these into categories (with Unix at least) using symbolic links.

      The advantages of transferring all of these pages onto disk storage, is that you can store around 1000 pages onto a single CD-ROM, 4000 pages on a DVD, and 60K pages or more on a USB drive, and be able to take them with you wherever you go, without having to lug around a box of papers with you.

      I did this with articles with old second-hand magazines that I had to throw out due to water damage.

      And as other comments have pointed out, even if paper is stored somewhere high enough away from direct contact with water, humidity and cold will eventually give paper and books a musty smell.

      --
      Vintage computer adverts: http://www.vintageadbrowser.com/computers-and-software-ads
    8. Re:Lots of Work by E8086 · · Score: 1

      I did something similar with old family slides, about 50yrs worth. But my "several thousand" was only about 2200 and I could scan three at a time with the $35 scanner I found on ebay. The originals are still in excellent condition, this was done for easy of distribution, no need to be sending 5 boxes of slides and an old projector around the country, would be bad if any of it was lost or damaged. It's a lot easier to write copies to DVsD and mail them for about $2.50 each. Sounds like you need cheap labor, scannng services are very expensive, I'd go with the younger relative you consider skilled enough not to screw up and who you can pay in the form of pizza and/or other food stuffs. Keep in mind this only works to a point, eventually they grow up and need compensation in cash.

      If you're teaching or researching at a school you may be able to find some lab/teaching assistants willing to help out with a few pages once in a while. If you're teaching and want to risk trusting some students in need of extra credit, that's up to you.

      Scanning documents is very labor intensive and time consuming. You can either pay a lot and get have it done semi-quickly or be willing to spend lots of time doing it yourself.

      --
      F7 doesn't work, ignore spelling and grammar
    9. Re:Lots of Work by crazyphilman · · Score: 1

      I've done something like this in the past. Here's an approach you could use:

      1. Look at how you've organized your paper documents. If they're in boxes, or file cabinet drawers, they must be named or coded, right? And within a coded box or drawer, you must have folders that are themselves named or coded (or notebooks?).

      2. Name your top-level directories after your boxes or file cabinet drawers (or whole file cabinets if we're talking about a LOT of papers). Then, move down through the hierarchy to more specific groups of documents. Drawer, folder set, folder. Do all this with your directory structure.

      3. As you scan each article page in, name each one by taking its title, replacing spaces with dashes, and appending a page number, like this: "Some-Article-I-Wrote_1.png". Collect groups of input scans into PDF documents using your favorite PDF editor. Then, name the aggregate article using article name, with date and time appended (to help with duplicate titles): "Some-Article-I-Wrote_9-14-2005_1207PM.pdf"

      4. You can search your archive using your favorite operating system's built in search, searching on title, directory names, etc. It works.

      It worked for me, anyway. ;)

      --
      Farewell! It's been a fine buncha years!
  4. Dee, dee, dee... by atomic-penguin · · Score: 2, Insightful

    Filing cabinet.

    --
    /^([Ss]ame [Bb]at (time, |channel.)){2}$/
    1. Re:Dee, dee, dee... by Creepy+Crawler · · Score: 0, Troll

      Go Mencia!

      He wins a Dee, Dee, Dee award.

      --
    2. Re:Dee, dee, dee... by dubl-u · · Score: 1

      Let's also consider the humble shredder.

      I just cleaned out several boxes of stuff that I've been lugging around for more than a decade. Once I actually sat down and went through it, I realized that I'd never look at 95% of it again. I kept the stuff with sentimental value, and the rest is confetti. What a relief!

  5. Not the ideal solution, but a start.. by NanoGator · · Score: 0

    Okay, I don't know of a 'mass scanner' sort of device where you can dump a bunch of paper into it and it'll automatically handle it. But I can tell you that my aunt has a scanner that had a feeder that would accomodate one sheet at a time. She had it set up so she'd just feed the paper through, push a button, and it'd scan it for her and save it somewhere.

    Unfortunately, I'm having a terrible time remembering the brand of it. I also don't know if they're even made these days. It's not a great solution to your problem, but I imagine it'd be a bit easier than using a flatbed scanner.

    Apologies, this isn't that helpful of post. I'm just hoping I can spark a memory or two in somebody who knows the answer and can post it.

    --
    "Derp de derp."
    1. Re:Not the ideal solution, but a start.. by NanoGator · · Score: 3, Informative

      Sorry to reply to my own post, but I felt bad about the unhelpfulness of my previous comment. I headed over to Visioneer's site (www.visioneer.com) and found a few scanners that handle like 25 pages at a time. The more you spend, the faster it scans. Sorry, I cannot personally recommend a scanner in particular. Never had one like this.

      Good luck!

      --
      "Derp de derp."
  6. searchable db? by Bluntzilla · · Score: 2, Interesting

    i would try and convert the pages to some sort of text format to allow searching...

  7. http://www. by brandanglendenning · · Score: 0

    google.com

  8. Buy a scanner with an ADF by zhiwenchong · · Score: 3, Insightful

    ADF (Automatic Document Feeder) scanners are fairly pricey (good ones are in the US$400 - US$1000 range, but you can get a cheapie Brother MFC-3240C All-In-One (C$140) that has a 20-page document feeder and then get a slave (e.g. some grad student) to feed in your pages for you.

    My Brother MFC-2340C scanner comes with the PaperPort application, which generates PDFs and supports double-sided scanning even though the scanner doesn't support it. (You just flip over the whole stack once you've scanned one side, and start scanning the other side. Paperport knows how to automatically reconcile the pages.)

    If you have Acrobat Professional, you can do a Paper Capture(TM) which is basically doing an OCR on the PDF and then storing the recognized words as "keywords" so that the PDF is searchable via Spotlight or other indexing mechanisms.

    A document scanner is indeed a very useful piece of equipment -- I use it to scan notes and scrap paper containing rough ideas, often with lots of mathematics. Sometimes writing stuff on paper is just easier than typing in LaTeX...

    The eminent computer scientist Edsger Dijkstra also liked to write stuff using pen and paper. His digitized works, called EWDs (after his initials, Edsger Wybe Dijkstra) are available here:
    http://www.cs.utexas.edu/users/EWD/

    1. Re:Buy a scanner with an ADF by Anonymous Coward · · Score: 0
      WOW, thanks! Selected one at random and found a new hero! From EWD707:

      A sample from your catalogue of services indicates that probably more than half of your courses aim at teaching how to live with --or even: how to convert to!-- IBM products. They may represent "training", but I cannot call them "education". The problem how to get useful work done with IBM products seems, indeed, severe --that, at least, is the only conclusion that I can draw from the vast number of courses your profitable company devotes to its numerous aspects--, but the problem seems better evaded than solved. (What about courses how to convert away from IBM products?) Industrial mistakes are not sacrosanct, just because they have been made on a large scale. From an educational point of view, your organization is on --or beyond-- the verge of fraudulence, and your invitation to cooperate with your endeavour can, therefore, be interpreted as an insult.

      Presumable by way of temptation, you list the names of the "celebrities" of this education circus, whose ranks I could join by cooperating with your company. (I could even have my photograph reproduced in your next folder! How jolly!) I know most of them. When their work is sent to me, I leave it lying around in my University office, for the amazement of my visitors: usually it leaves them flabbergasted to see that such superficial stuff is not only printed, but apparently even sold. I would like you to understand that there exists a scale of values, according to which your invitation to join that crowd is also no less than an insult.

    2. Re:Buy a scanner with an ADF by jd · · Score: 2, Insightful
      That would be the best method, but I would seriously question the wisdom of PDF files. Although they represent documents fairly well, the format is too proprietary and too variable to be safe. You want the baseline documents to be in a format you can read at ANY time in the future, not just three weeks down the road.


      With the merge of Adobe and Macromedia, the constant toying with DRM schemes, the allowing of unsafe code in current Adobe formats, etc, make format choice as vital as scanner choice.


      A good example of this was the use of Laserdisks for the 1980's survey of Britain to commemorate the Domesday Book. The Domesday Project is now unusable on anything but a very small number of machines, because they weren't adequately careful.


      Oh, and disks are also an important decision. Do NOT go with Blu-Ray or HD-DVDs, because these formats are fighting a battle to the death, One will win and whoever uses the other will end up with media no future computer will be able to read.


      It is interesting to note that Papyrus documents with iron oxide inks have proven the most durable of all written media. More modern papers are designed (quite intentionally) to fail in a fraction of the time, as are modern inks. Durability is expensive, and cheap sells.


      The same is true of electronic and optical media. The "silver" alumin(i)um CDs are much less durable than the "gold" disks, but both will fail in the space of decades even if kept well. If kept poorly, the surface will not just scratch, it'll peel off within a few months. (I know from experience.)


      In comparison, the old magnetic "core" memories were pretty much guaranteed to hold data for a century or two.


      Assuming you don't want to keep re-copying the notes, you want to pick formats and media that meet the sort of timescale they'll potentially remain important - plus 10%. Where a note may be of historical usefulness (and nobody can really predict those in advance), you want to pick a format and a medium that is as durable as you can reasonably afford to invest in.


      Even where the notes are relatively trivial, YOU may want to read them later, and virtually no format in existance today has lasted for very long in comparison to a human lifespan. Indeed, computers themselves have not existed for long, in comparison to a human lifespan.


      I pity those scientists who may still have important logs on 8" floppies or drum hard drives. They're not going to find it easy to retrieve the data now, even if the data is still there to retrieve. And whilst the CIA probably has forensics to read ancient magnetic storage systems with decayed data, I doubt they'd loan the machines to careless researchers, even if the researchers had the sorts of money you'd need to hire such equiptment and the data was valuable enough for them to spend the money.


      In other words, don't digitize (or file) for the sake of doing so. Think about when you would want the information and pick a technology that you can be confident will exist THEN (and preferably now as well).

      --
      It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
    3. Re:Buy a scanner with an ADF by fm6 · · Score: 3, Insightful
      That would be the best method, but I would seriously question the wisdom of PDF files. Although they represent documents fairly well, the format is too proprietary and too variable to be safe. You want the baseline documents to be in a format you can read at ANY time in the future, not just three weeks down the road.
      I'm not a big fan of PDF, at least as it's commonly used. (It's essential for prepress applications, but it's most commonly used for online document sharing, an application for which it sucks.) So I hate to disagree with a fellow PDF-hater. But your arguments against using it are nonsense.

      Technically, yes, PDF is a proprietary format. a well-documented, widely licensed format. Really, it's just Postscript with a few organizational elements. Both Postscript and PDF have many third-part implementations, including one that's available under the GPL.

      With the merge of Adobe and Macromedia, the constant toying with DRM schemes, the allowing of unsafe code in current Adobe formats, etc, make format choice as vital as scanner choice.
      I don't see what the merger with Macromedia has to do with anything. DRM would be an issue if Adobe was the only source for PDF software -- but it's not.
      A good example of this was the use of Laserdisks for the 1980's survey of Britain to commemorate the Domesday Book. The Domesday Project is now unusable on anything but a very small number of machines, because they weren't adequately careful.
      Hindsight is all very well -- but what format would you have chosen? Floppy disks would have been too expensive, CDs didn't exist yet. If it had been up to me, I would have chosen 9-track mag tape -- and I would have been wrong. (I still have a 9-track tape containing a backup of my student files, and no way to read it!) In any case, that mistake had to do with a choice of hardware. It's a lot easier to recreate old software than old hardware.

      I'll skip past all your other hardware examples (papyrus???) and skip to...

      In other words, don't digitize (or file) for the sake of doing so....
      What, you think this is some kind of whim? If these documents are at all important, he has to bring them online. As long as they exist only in dead tree form, they are awkward to access, expensive to store, and run the risk of being lost in day-to-day use, to say nothing of the odd natural disaster.
    4. Re:Buy a scanner with an ADF by pipingguy · · Score: 2, Interesting


      If you have Acrobat Professional, you can do a Paper Capture(TM) which is basically doing an OCR on the PDF and then storing the recognized words as "keywords" so that the PDF is searchable via Spotlight or other indexing mechanisms.

      Maybe I'm mistaken, but doesn't Google index PDFs? If that's the case, you can just upload it to a website and wait for it to be crawled for later searching.

      That doesn't really help with the scanning problem though. Parent's solution of slave useage might be best.

    5. Re:Buy a scanner with an ADF by Anonymous Coward · · Score: 0

      Relatively cheap inkjet printer-scanner-fax combo devices (like the HP OfficeJet series) often have automatic document feeders with a capacity of a few dozen pages. With the right software you can scan straight to multipage-PDF.

    6. Re:Buy a scanner with an ADF by Matthew+Bafford · · Score: 1
      Maybe I'm mistaken, but doesn't Google index PDFs? If that's the case, you can just upload it to a website and wait for it to be crawled for later searching.
      The OCR process still has to be performed. I seriously doubt Google's PDF indexer is actually OCRing the PDFs. The majority of the PDFs out there are generated from digital text - and as such have easily indexed content.

      Scanned documents would simply be images until OCR was applied (or manual transcribing).
    7. Re:Buy a scanner with an ADF by sribe · · Score: 4, Insightful

      That would be the best method, but I would seriously question the wisdom of PDF files. Although they represent documents fairly well, the format is too proprietary and too variable to be safe. You want the baseline documents to be in a format you can read at ANY time in the future, not just three weeks down the road.

      Bull. PDF is completely open and is not going away. To get the specs you merely have to download them for free from Adobe's web site. There are multiple open-source implementations of PDF readers. Although Adobe is adding features all the time, the basic format that would be used for storing scanned images has been stable and forward-compatible for years and years. There are multiple court systems which have designated PDF as the format for filing, storing, and archiving court records. There is work on an official national standard for long-term archiving of records in PDF format. (PDF-A, specifies things like: the PDF must embed the fonts used, and so on, to ensure that it will be portable across OS's and decades.)

      ...the constant toying with DRM schemes...

      A flaming example of a red herring. Your scanner software is not going to create a PDF with any DRM unless you tell it to. And some future version of your PDF reader is not going to suddenly refuse to read non-DRM'd files.

      The "silver" alumin(i)um CDs are much less durable than the "gold" disks, but both will fail in the space of decades even if kept well.

      Most "gold" CDs are merely "silver" CDs with a gold-colored label on the top. It's not even clear that the gold vs aluminum reflective layer is a real issue. But the dye type does matter, hugely.

    8. Re:Buy a scanner with an ADF by Nutria · · Score: 1

      I would have chosen 9-track mag tape -- and I would have been wrong. (I still have a 9-track tape containing a backup of my student files, and no way to read it!) In any case, that mistake had to do with a choice of hardware. It's a lot easier to recreate old software than old hardware.

      It's no problem to buy a "desktop" 9-track tape drive that understands EBCDIC. I'm sure you could then write a program to convert the data to something modern.

      --
      "I don't know, therefore Aliens" Wafflebox1
    9. Re:Buy a scanner with an ADF by fm6 · · Score: 1
      Why do you assume my data is in EBCDIC? IBM did own 90% of the market in that era, but that still leaves a lot of ASCII-based machines from that era, especially on university campuses, lab people preferred to have their on minis (usually a PDP) rather than share a mainframe, and no serious CS program relied on IBM systems.

      Buying a desktop 9-track would cost me something like $300 -- more than I care to spend to read one tape with data of purely nostalgic value. If I ever care enough, I'll send the tape to a legacy/recovery service.

    10. Re:Buy a scanner with an ADF by Nutria · · Score: 1

      Why do you assume my data is in EBCDIC? IBM did own 90% of the market in that era,

      I didn't assume. I was being prudent, since, as you say, Big Blue did have a huge chunk of the market.

      --
      "I don't know, therefore Aliens" Wafflebox1
    11. Re:Buy a scanner with an ADF by E8086 · · Score: 1

      The "silver" alumin(i)um CDs are much less durable than the "gold" disks, but both will fail in the space of decades even if kept well.

      I've given up on trying to find a storage medium that will last "forever" I'm going with multiple copies on multiple newer mediums. I have a few docs that first lived on 5.25" floppies. I just copied them to the newer option when they became popular, 3.5" floppy then larger hdd then CD then DVD and flash mem card. Storage formats don't go obsolete that fast, the CD has been around since 1980 and it's still used. I still have working 5.25" and 3.5" floppy drives taking up space in empty drive bays, they havn't been connected or used in years but they still work. I'm sure there are enough people who have held onto old working hardware that you should have no problem finding something that will read the DVD-&+R formats in 25yrs. Don't bother with finding the "will last forever" format, just keep a couple copies laying around.

      --
      F7 doesn't work, ignore spelling and grammar
  9. Legal Services Firm by Anonymous Coward · · Score: 1, Informative

    There's tons of companies that specialize in electronic document scanning & OCR, usually for the legal industry. Probably cost .05 to .10 a page, but you might be able to cut a deal as an individual rather than a law firm.

    1. Re:Legal Services Firm by Anonymous Coward · · Score: 1, Insightful

      While I'm all for bargaining... I'm pretty sure that it would work out the other way. The Law Firm will pay less due to their volume. Although they usually need it yesterday and may pay a premium for that service.

      His one bargaining point is that he likely can wait much longer for his papers to be scanned. So he could negotiate on having his papers on a very low priority queue.

  10. OCR probably not the way to go by Nutria · · Score: 4, Insightful
    OCR is no match for eyeballs. You'd spend so much time editing it for slight errors, it wouldn't be worth your time.

    Are the notes graphics-heavy (i.e., scientific/engineering)?

    If not, give it to a typing service. Once you show them how much "stuff" you have, I'm sure they'll give you a discount. They might even agree to use OpenOffice2 (because it handles huge documents well, the files are small, and it has an excellent PDF exporter).

    You'd still have to scan in the pictures/drawing/graphs, and place them appropriately, which will take time.

    Also, there are firms that specialize it digitizing paper documents (mostly forms and regularized documents for businesses). Depending on the amount of hand-writing & graphics, it might not be appropriate, though.

    All in all, no matter how you do it, the project will
    • take a long time
    • cost a lot of money
    --
    "I don't know, therefore Aliens" Wafflebox1
    1. Re:OCR probably not the way to go by robson · · Score: 1
      All in all, no matter how you do it, the project will
      * take a long time
      * cost a lot of money

      Hey, but at least having picked those, it's guaranteed to be good ;D
    2. Re:OCR probably not the way to go by Anonymous Coward · · Score: 0

      You obviously have never dealt with NMCI. It cost a metric assload of cash and it was very late. It is also very crappy.

  11. Scan to PDF with OCR behind the image by fatboy-fitz · · Score: 2, Informative

    There are companies that will do this for you. For example, IMC in WV (http://www.imcwv.com/). They can scan it all to PDF using the image as what you see in the PDF backed up with the OCR'd text. That way the document is somewhat searchable, but you always see the exact scan of the doc when you look at the PDF.

    --
    I'm better, because I'm bigger
    1. Re:Scan to PDF with OCR behind the image by apothoray · · Score: 1

      If you want to do it yourself and you're desperate for OCR, you can easily do this in the commercial version of Adobe Acrobat (I've used it on 4.0, 5.0, 6.0, but I haven't tried 7.0). One of the "Paper Capture" options is to place the OCR'd text right behind the image.

      In general, I scan at high resolution (600 dpi) black and white using an Epson Perfection 3170 scanner with ADF. I don't bother with the OCR. You can either scan straigth into Adobe Acrobat or into Adobe Photoshop if you need to touch up any of your images (coffee stains, anyone?). I find that I rarely go back to my old research notes, but if I need them, I can still print the relevant parts for working around the lab. I've been doing this for about 6 years (initially with a really bad scanner) and I've saved a lot of file cabinet space. I've also done this with all of my old photocopied journal articles.

  12. The unorthodox method by UnapprovedThought · · Score: 2, Funny
    1. Climb to the top of a tall building
    2. Find the side that is closest to the parking lot
    3. Shake all of the pages out
    4. Have an assistant below shoo away potential meddlers
    5. Pull out your 12Mpixel camera
    6. Take several pictures as the papers flip end-over-end
    7. ??? (do some really amazing 3-D stuff with GIMP)
    8. Convert pictures to PDF
    1. Re:The unorthodox method by Daxster · · Score: 1

      Slight correction to steps 7 through 8:
      7. ??? 8. Profit!

      --
      Death by snoo-snoo!
    2. Re:The unorthodox method by tigersha · · Score: 1

      And since you used an Open Source tool it must be good, yes?

      --
      The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
  13. imDex by cstew · · Score: 3, Informative

    Disclaimer: I used to work for this company as a coop student.

    I would contact PRG Schultz as they have done this for large clients in the past. Hey have a program called imDex which is pretty slick. Basically, it's a searchable, cross-indexable database, so you'll have OCR'd text, along with TIFF's or PDF's of the documents. If you would like more information, let me know.

  14. What are you going to store them on? by the+eric+conspiracy · · Score: 2, Informative

    The problem is then you have to come up with a safe long term way to store digital data.

    Clue:

    There isn't one.

    The best thing to do is NOT convert the paper to digitized format. Find some space instead, and store the paper. Your data will be much safer.

    1. Re:What are you going to store them on? by Hydroksyde · · Score: 2, Insightful

      You can easily make backups of data on a computer. You could put multiple copies in many places, all around the country or even all around the world. But paper has this annoying habit of losing data easily when it is burned or made wet, and there goes your only copy. If the world trade centre were full of paper, the disaster would have had a much greater impact economically.

    2. Re:What are you going to store them on? by aminorex · · Score: 3, Informative

      Not unless the notebooks in question were made of acid-free archival paper. I've seen cheap paper falling apart in 5 years, irrecoverable in 10. Phase-change media, like CD-RW, will easily outlast my children.

      --
      -I like my women like I like my tea: green-
    3. Re:What are you going to store them on? by cfavader · · Score: 3, Interesting

      The matter of the fact is, documents on papers are not nearly as available as electronic copies. Hell, you could let thousands of people read all those documents at once for just a tiny amount of money in bandwidth costs (unless you have a university host it for free, which I'm sure they will). For most of us, this accessability is easily worth keeping a backup of the data, even if it also requires us to store it on new mediums as time goes on (i.e. switch from floppies to cdrs to dvdrs to whatever every 5-10 years).

    4. Re:What are you going to store them on? by jhoger · · Score: 2, Insightful

      How do you store paper in a long term way without copying it? Clue: there isn't one.

      You have to copy EVERYTHING to new media eventually. You need to have a plan, and you need to execute it. Simple as that. Paper will disintegrate, and yes, hardware will become obsolete. You just need to progress to the stone in the river before the current one is submerged.

      But which is easier/cheaper to propagate to new media and make backup copies? Digital data in open, documented, implement formats, or paper? Which is cheaper and easier to store?

      There's also the argument that computers become obsolete. Well, yeah... but I think you would have a hard time finding many computers in the last 25 years that don't have a software emulator around. All you need to do is archive an, ideally, open source emulation of the machine that implements the software, and fire it up to transfer the stuff to the next machine when it becomes necessary.

      The only real impediment to survival of data is that it become uninteresting therefore not actively maintained.

    5. Re:What are you going to store them on? by lucm · · Score: 1
      Magnetic medias can be wiped by EMR so it is a no-go. The only remaining mainstream digital storage technology is optical (CD or DVDs), but even those are not reliable. Cheap DVDs (like princo) can be unreadable within 2 years, often before.

      Apart from stone engraving, paper is probably the most reliable long-term archive solution, as long as it is stored properly. Of course it is difficult to seach and index but this is a different issue.

      --
      lucm, indeed.
    6. Re:What are you going to store them on? by SeeTheLight · · Score: 1
      Phase-change media, like CD-RW, will easily outlast my children.
      That is only true if you get a good brand (or good batch or whatever) of CDRW's. I've had brand new Memorex CDRW's that LOST all their data 1 week after having data burned on them, without ever leaving my cool desk and without ever being in direct sunlight.
    7. Re:What are you going to store them on? by Effika · · Score: 1

      What CDRW's would you recommend?

  15. Go low tech? by andreMA · · Score: 2, Informative
    If you just want to have it to refer to very infrequently and (possibly) print a page, look into having it filmed as microfiche. Viewers are fairly cheap and in a pinch a strong lens (loupe, possibly) will do.

    Many libraries will have reader-printers that for a small fee (eg, $0.20/page?) you can print a copy.

    Most of the expense with fiche is the production of the silver halide original; diazo copies are relatively cheap. If it's really important to you, have a copy made and lock the original film in a safe deposit box (or at least offsite)

  16. Children? by flyneye · · Score: 0, Funny

    Isn't that what children are for?
    Surely one of your kids has screwed up and needs the responsibility of justifying his food and shelter.Why not just make Jr. scan for a few hours whenever he screws up and stays out late,steals the car,etc.Hell,kids are taking up valuable processor cycles for hours as it is.Time to show them that computers are for more than games,www.,and pr0n.
    The patience taught by scanning even the first hundred pages or so will be priceless.The look on their face when they find out they have to convert and title them should be savored as a rare delicacy.The defeat at learning that they will or lose recreational computer time will be better than paladium.

    --
    *Repent!Quit Your Job!Slack Off!The World Ends Tomorrow and You May Die!
  17. Some tips by scheme · · Score: 1

    I've helped setup something like this. The best small scale solution would be to get a good flatbed scanner with an automatic document feeder (ADF). You can get decent HP scanners for about $400-700.

    Once you have the scanner, you can setup a few scanning profiles that automatically set resolution, color depth, black&white threshold, etc. Then scan the notes into adobe as images. If you scan them in as monochrome images at 100-150dpi you can get fairly small files that are very readable on screen and as printouts.

    Finally get a RA or student labor to feed the documents in to adobe and save them in separate files. The adf lets you do 25-100 sheets at a time so the help and start a scan and surf the web or something until the batch is done.

    N.B.: Having a flatbed scanner lets you handle odd sized sheets of paper or delicate stuff. Although you can scan and ocr the documents, ocr is probably going to screw things up a bit and you probably don't want to try to read through the documents to catch and correct the ocr errors. Also if you have any math, diagrams, or handwriting in the notes, the ocr program will probably produce unusable junk.

    --
    "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
  18. ask the institution by Bastian · · Score: 1

    Not sure anyone like Kinko's does this. If they do, the price for several thousand pages will almost certainly be greater than the cost of buying an auto-feed scanner.

    I assume if you've collected that much research, you work for a university or some sort of research institution. My undergrad college of 1,100 students had like three of these, including one that was part of some ginormous Xerox do-everything-and-then-collate-and-bind-it-(if-you 're-printing, -of-course) machine that was sitting out so that anyone with a campus ID could use it. Maybe it's time to talk to the powers that be about buying some equipment?

  19. ADF by sakusha · · Score: 1

    I'm doing the same thing, scanning piles of my old college papers using an automatic document feeder. I bought an HP 8250 because it does duplexing, so I can just tell it to scan both sides of the sheet. I have lots of mixed double-sided/single-sided documents (like notebooks), it's a lot easier to just scan everything doublesided and go through it in Acrobat and just delete the blank pages. Plus, with the duplexing feeder, I can cut the bindings off old books, drop the whole stack in the feeder, and scan the whole thing. But I haven't quite decided to destroy my old textbooks like that yet.
    The HP 8250 software was just updated for MacOS X 10.4, which makes me really happy since I bought it just before 10.4 shipped, and they updated it promptly. It works well for bulk scanning on a Mac, and it was pretty hard to find a good Mac ADF duplex scanner. It also does 35mm slides, but it would probably be better to get some better software for that job, something like Silverfast SE.
    Anyway, lots of my documents are handwritten (and many in Japanese too), so OCR isn't workable. I don't really need machine-readable documents to do text search, but I could always use Acrobat for OCR on some documents. I think there's a way to keep the graphic image intact while the searchable OCR text is on an invisible layer, but I haven't quite figured out that Acrobat function yet.

  20. HP Digital Sender by metamatic · · Score: 1

    The ideal solution would probably be an HP Digital Sender. It's about the size of a laser printer, except it's kind of a laser printer in reverse. You load a document into it, it scans the whole thing at 30-40 pages per minute, turns it into PDF, and then sends the PDF across ethernet via SMTP to wherever you want. Works with any OS, obviously. It's sold as an alternative to fax.

    The problem is that they're about $2500 each (MSRP $3200), because they're a niche item. Shame really, because if they'd dropped in price the way laser printers have, they could have made fax a thing of the past.

    As it is, I spend ages screwing around with a flatbed scanner, like every other poor sod trying to solve his personal filing problems.

    --
    GCHQ Quantum Insert installed. If only our tongues were made of glass, how much more careful we would be when we speak
  21. simple by urdine · · Score: 2, Funny

    Print them in a book and wait for Google Print to scan them all for you.

  22. wiki by wikinerd · · Score: 1

    Easy: Release all your research papers under the GNU Free Documentation Licence and post them on Wikipedia. If they delete them as original research, post them to my wiki where we keep everything.

    1. Re:wiki by jimmypw · · Score: 0

      How would he post them to wikipedia if their currently on paper? Surely he would have to scan them to send them.

    2. Re:wiki by Anonymous Coward · · Score: 0

      A bit of an attention whore, are we? You take every oppritunity to spam your website on Slashdot. How disgusting.

  23. Digital Copier with ADF by amliebsch · · Score: 1

    The ideal way to do this would be to find a digital copier with an automatic document feeder, like a Ricoh Aficio or a Canon ImageRunner, that you have access to. They generally have a function to scan instead of copy, and they scan at very high speeds (even duplex scanning!). The data is often retrievable over a simple LAN - I've even seen some that support TWAIN over LAN.

    --
    If you don't know where you are going, you will wind up somewhere else.
  24. hylafax by np_bernstein · · Score: 4, Interesting

    go buy a modem, and grab an old fax machine, then fax the documents to yourself. You should be able to fax a decent number of pages at a time and can walk away and leave it running. these will be saved as multi-page tiffs which while not pdfs and searchable at least solve part of your problem.

    --
    RandomAndInteresting.comdefending the world from stupidity since 1979
    1. Re:hylafax by nine-times · · Score: 1
      I hope you're joking but I have a hard time believing that this is the best solution. I'm not sure why this was modded interesting instead of funny. Fax quality is generally low, and data transmission over a phone line will increase the time spent needlessly.

      It's really no improvement over the obvious: get a scanner with an autofeeder. However, I'm assuming that he's looking for something more industrial/quick, since that answer is so obvious. I'm guessing he must have already checked out Kinkos (which I think is another obvious possibility) and they said that they couldn't do it.

  25. A store by XCorvis · · Score: 2

    Try a legal copyist.

  26. Some tips-ABBYY Finereader. by Anonymous Coward · · Score: 0

    "N.B.: Having a flatbed scanner lets you handle odd sized sheets of paper or delicate stuff. Although you can scan and ocr the documents, ocr is probably going to screw things up a bit and you probably don't want to try to read through the documents to catch and correct the ocr errors. Also if you have any math, diagrams, or handwriting in the notes, the ocr program will probably produce unusable junk."

    ABBYY Finereader does an excellent job. On more difficult pages I had to do some tweaking, but part of that was my inexperience with the product.

    And yes it can handle diagrams, pictures, etc.

    1. Re:Some tips-ABBYY Finereader. by scheme · · Score: 1
      ABBYY Finereader does an excellent job. On more difficult pages I had to do some tweaking, but part of that was my inexperience with the product.

      My experience was that OCR was not up to the task 2 years ago. However this was with notes and papers where up to 50% of the page was mathematics. Once you start seeing some of the more esoteric or specialized mathematical symbols, I think the OCR just breaks down.

      However, even with 95% accuracy on math symbols that would leave a lot of pages to be reviewed for small errors that might significantly alter the meaning. For example, the visual difference difference between a lowercase chi and x or a lowercase nu and v is fairly small but can make a huge difference if they get transposed.

      --
      "When you sit with a nice girl for two hours, it seems like two minutes. When you sit on a hot stove for two minutes, it
  27. Seriously...This is a good idea by Noksagt · · Score: 1

    Thousands of pages? That is VERY little. One piece of paper is ~0.1mm thick. 10,000 pages would take up only 1 m, which is only one or two drawers in a filing cabinet.

    1. Re:Seriously...This is a good idea by twistedcubic · · Score: 1

      O.k., I guess I underestimated the page count. Nonetheless, the notes amd binders which hold them take up a lot of space.

  28. Get a Document Scanner, not a Flatbed + ADF by Noksagt · · Score: 1

    The problems with flatbeds are that they are often slow & the ADF jams quite a bit. A nice scanner dedicated to documents is what you want.

    I am extremely happy with my Canon DR-2080C. Note: It is the only piece of hardware I've bought, knowing that it won't work with Linux. I ran windows SPECIFICALLY to use this document scanner. It looks like it has been discontinued & the DR-2050C is the model to get now. Looks like it does larger documents, which is nice. These do duplex scans in one pass, so you can get about 40 sides (so 20 2-sided pages) per minute. These will probably set you back ~$650 new.

    If you have more money to spend, there are even better document scanners available.

    1. Re:Get a Document Scanner, not a Flatbed + ADF by Anonymous Coward · · Score: 0

      Wow...SPECIFICALLY, you say? That was a stressful decision. Glad you were the man at the helm that day. Wow. SPECIFICALLY.

  29. Microfilm! by theonetruekeebler · · Score: 1
    If you want to guarantee that the work can be read in fifty years, get the pages converted to microfilm or microfiche. Then they can be read a century from now by anybody with a light source and a magnifer. Consult your local university library on how have this done. It's quite possible you'll end up scanning it all to TIFF files and sending a DVD to a service bureau, thereby giving yourself a digital copy as well.

    Unless you need the capability to grep the documents, there's little point in digitizing old notes. Digitization carries a number of risks, anyway, not the least of which is that in a few decades (and by "a few" I mean one or two) you may find the information unreadable by any still-functioning hardware. Then again, you could just upload it to "the Internet" and let various system administrators guarantee its perpetuity.

    A frank question you have to ask, though, is how important it is to preserve this information. A strong test is how often anyone has needed to refer to these old notes in the intervening years. It's difficult to say this about the output of one's labor, but it may well be that it truly serves no further purpose and what you really need to do is bypass the scanner, go directly to the recycling center and bid it all farewell.

    --
    This is not my sandwich.
    1. Re:Microfilm! by Anonymous Coward · · Score: 0

      How are you planning to store microfilm for a century, and what are guarantees that it'll actually stay preserved for that long?

      What's to say a "katrina" will not wipe your entire library one day?

      It's fairly trivial to store redundant copies of your digital files, even in multiple locations worldwide. The costs are minimal too.

    2. Re:Microfilm! by theonetruekeebler · · Score: 2, Insightful
      How are you planning to store microfilm for a century

      In a drawer or filing cabinet.

      and what are guarantees that it'll actually stay preserved for that long?

      Wet-film microfilm has an estimated survivability of 500 years in ideal conditions and a minimum of 100 years in any reasonable conditions. To my knowledge this exceeds the lifetime of any digital medium.

      It's fairly trivial to store redundant copies of your digital files, even in multiple locations worldwide. The costs are minimal too.

      It's fairly trivial to store redundant copies of your microfilm, even in multiple locations worldwide. The costs are minimal too.

      --
      This is not my sandwich.
    3. Re:Microfilm! by Anonymous Coward · · Score: 0

      Don't try to argue that the cost of copying 100 microfilms is comparable to cost of copying X megabytes of data. Those are several orders of magnitude in difference.

      What happens to your filing cabinet during a flood, fire or a simple burglary?

    4. Re:Microfilm! by theonetruekeebler · · Score: 2, Insightful
      I have not "tried to argue" that copying 100 microfilms costs the same as copying 100 sets of bits. That's inane. What I have argued is that if this data is important enough to preserve for a century, it should be archived to a non-digital medium. And after the initial transfer, the cost of duplicating a master film is...

      Ah, fuck it. I'm tired of doing your research for you. You log in as an AC, then expect a legitimate user to Google "lifetime of microfilm" and "cost of microfilm transfer" because you're too sorry to educate yourself. I no longer see any benefit in changing the relationship between my knowledge and your ignorance.

      The only reason use Slashdot as an Anonymous Coward is if you would be fired, arrested, or sued for your post.

      --
      This is not my sandwich.
    5. Re:Microfilm! by Anonymous Coward · · Score: 0

      A comment is either right or wrong. Attaching the pseudonym "theonetruekeebler" to it confers no extra legitimacy at all.

  30. PDF is NOT proprietary by Noksagt · · Score: 1
    I would seriously question the wisdom of PDF files. Although they represent documents fairly well, the format is too proprietary
    Yes, PDF is controlled by Adobe. No, most wouldn't consider it proprietary. It is completely documented & has implementations for both authoring tools and viewers not written by Adobe. It is considered by most to be an open format.
    and too variable to be safe.
    Each version has had incremental changes. Readers which support version 1.5 of the PDF spec are backwards-compatible to earlier versions. I even use xpdf to open 1.6 PDFs (created in Acrobat 7), which aren't compatible. I get warnings, but I can use the documents.

    Even if there is a MAJOR change in the spec, there are open source viewers & you won't be stuck out in the cold. This is why a lot of places DO use PDF for archiving. If they don't think they'll be stuck out in the cold, why do you?
    With the merge of Adobe and Macromedia,
    This is neither here, nor there.
    the constant toying with DRM schemes
    Authors get to decide whether a document is protected or not. Patches for the free viewers are available to remove DRM if you have accidentally added it.
    the allowing of unsafe code in current Adobe formats,
    Can you elaborate?
    A good example of this was the use of Laserdisks for the 1980's survey of Britain to commemorate the Domesday Book. The Domesday Project is now unusable on anything but a very small number of machines, because they weren't adequately careful.
    I dare-say that PDF has much broader adoption than laserdisks ever had. Certainly, ANY software format (deprecated or not) are more accessible than dead removable media formats. However, you can always find a reader & dump the files onto a new medium.

    As far as medium goes, I agree that magnetic medium makes the most sense. Put it on the hard drive of at least one networked machine & back it up to tape. Hard drives also die, but there is no excuse not to make the data accessible & the files can always be recovered from backup.
  31. This is the best advice by Linuxathome · · Score: 1

    I also have a Brother MFC and it's the best investment we've made. Actually, Dell sells an MFC at a much cheaper price, but we had to give it up because the scanbed doesn't support legal size paper (but the ADF does). Dell's MFC also comes with PaperPort. You can probably purchase from Dell with some back to school bargains (check the discount deal sites like techbargains, xpbargains, fatwallet, etc.). Even the cheapest laser MFC is a network printer (although scanning requires USB connection)--which means you can connect both via ethernet and USB, the ethernet is for printing for all the other systems on your LAN and the USB is for your Windows box. I also second the suggestion about Adobe Acrobat, it's the best piece of software out there, even though it's a Windows only piece of software.

  32. Maybe you should try djvulibre by Anonymous Coward · · Score: 0

    To scan and store hand-written notes it might be better to use DJVU format http://djvulibre.djvuzone.org/. You can find free readers for almost every platform (including Zaurus!) and filesize is very small despite the good quality.

    You can also convert to/from PDF and PS using a free (non-gpl but open source license) gs driver from AT&T.

    1. Re:Maybe you should try djvulibre by twistedcubic · · Score: 2, Informative

      Dude! I already found a $100 scanner that does the job and works in Linux (HP officejet 4215). It scans really fast. My only problem up til now was that PDF redering was too slow. But then I compared the results to DJVU... Wow! The DJVU files render incredibly fast! Thanks!

  33. It's possible... by NemoX · · Score: 2

    This is what we do at work. We spend about $5000 on the set up, but remember that this is an enterprise where we scan about 125,000 pages to .pdf a month. It is probably possible for about $500 or so, for what you are looking at (oh, and some programming)

    First, you'll need a low-volume scanner. (Check the duty cycle to make sure it can handle you bookshelf of papers.) Then, you'll need something to convert the images to pdf. If you have any programming experience, write a quick app that uses http://www.imagemagick.org/ Image Magick to convert from tiff to pdf. Put each binding in its own folder, and pretend the "untitled1.pdf" says "page1.pdf" ;)

    If you want to get fancier have the front end app rename the untiled1.tiff to whatever you'd like. Also, you can embed extra information into the pdf by using metadata and Adobe XMP SDK (free download from Adobe). Make the meta data like:
    TITLE="My Book"
    AUTHOR="Bart Simpson"
    etc.

  34. What are your retrieval needs? by davidwr · · Score: 1

    Are these notes preserved "just in case" they are needed?

    Do you actually REFER to the notes every now and then?

    Do you need text or just scanned-images?

    Do the advantages of having them outweigh the advantages of destruction? Remember, if you destroy it then it can't come back and haunt you in a lawsuit. But then again, it can't help you either. Caution - before you destroy anything make sure you have an official data-retention policy, and stick with that policy. Otherwise, destroying data CAN be seen as a sign that you have something to hide.

    Once you've answered these questions, you can decide among your options for each document

    - destruction
    - file in archives, probably off-site, if necessary secure against fire or other disaster
    - microfilm or microfiche
    - scan images
    - scan and "95% accuracy" OCR
    - scan and 99.99% accuracy OCR with human verification.

    If you scan or micro-photo-copy, you have to decide what to do with the originals - keep on site, archive off site, or destroy.

    If you scan, you have to have a plan to copy the data to new media and new file formats as old ones become obsolete. If you have any 8-inch floppies or obsolete-format computer files lying around, you know the problem I'm talking about.

    You should also set an "expiration date" on all documents. If a document has to be preserved until, say, 2010, it's OK to convert to digital and destroy the original, since 5 years from now it's almost certain you'll still be able to read it. If you'll need it in 2100 however, I'd recommend keeping the paper copy or at least a microfilm copy.

    One more thing - if you need to preserve color information, color microfilm may age a lot faster than black and white, causing color shifts. This is probably okay for line drawings, charts, and such but not for photographs.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  35. Mod Overrated Parent Down by Anonymous Coward · · Score: 0

    This is an anti-Adobe troll. As has been pointed out, PDF is -not- a proprietary format.

    1. Re:Mod Overrated Parent Down by tigersha · · Score: 1

      Actually, PDF is not much different from Postscript (it add all the paging stuff and its imaging part is a subset of Postscript) but you never hear the fanatic whackos whine about that.

      --
      The dangers of excessive individualism are nothing compared to the oppressiveness of excessive collectivism
  36. Document Management System by JoeCommodore · · Score: 1
    What you are looking for is a Document Management System, such things have been around for microcomputers for at least a decade. They include a hi-performace scanner and quality OCR software tied to a text/image database. The database holds the scanned data in a serchable and usable form, and the originals are stored in a indexed image archive (to look at images or to verify accuracy). Once finalized the data is burned to CD to archive if needed.

    Legal businesses and accounting departments use this stuff regularly.

    Have you googled for it, there might be a sourceforge FOSS project along those lines.

    --
    "Enjoy what you're doing! If it becomes drudgery, you're doing it wrong!" - Jim Butterfield
  37. Retaining Legality by digitalsushi · · Score: 1

    Those handwritten notebooks are legal documents proving you did that work when you dated it. That's why engineers keep notebooks. If they are digital, do they keep their status as legal documents? This is an important question to me, so I hope someone knows :D

    --
    slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
    1. Re:Retaining Legality by Star+Stealing+Girl · · Score: 1

      They do in the airline maintenance field. Airlines are switching their mechanics from paper docs to computerized paperless systems. The FAA has approved these new digital documents as legal documents proving work has been done.

      --
      All my money went to Nigeria and all I got was this lousy sig. . .
  38. Einstein Papers Project by Tihstae · · Score: 1

    Ask the experts at the Einstein Papers Project. They have been doing this for quite some time.

  39. XEROX by Anonymous Coward · · Score: 0

    Xerox makes several high speed grey scale docuemnt scanners.
    If you search tm out you can find services that utilize these machines. Typically they OCR the pages side by side with a raw image scan. SO even if the ocr is only so-so you can search the documents and read the original hand written pages. These services are most utilized by law firms that are given crates of documents for legal cases.
    It's not cheap but it's very fast and convient.

  40. DjVu, not PDF by TeXMaster · · Score: 2, Informative
    There is a file format which is specifically created for this kind of stuff, and it's called DjVu. There is a free (as in open source) reference library, and proprietary tools by LizardTech.

    (Of course, you will still need to spend lots of time scanning, naming and classifying those pages. The ADF and 10yo nephew suggested in another post might be useful for that.)

    DjVu offers very compact representation without the need to OCR the document (I've converted a 13 megs scanned PDF into a 600K DjVu which was much faster and easier to read), and optionally a "hidden text layer" if you want to OCR it to make it searchable.

    --
    "I'm never quite so stupid as when I'm being smart" (Linus van Pelt)
  41. Junk them by egriebel · · Score: 1
    Seriously. Throw them out. Especially if more than 5 years old. What are the chances that you are going to need to refer to them? Compare the time and effort to what you could be doing instead.

    If you really need to keep them, throw them in boxes and put them in document storage somewhere. Then, on the off chance you might need them for patent disputes, etc., you can hire someone for $8/hr to go thru them.

    --
    ACHTUNG! Das computermachine ist nicht fuer gefingerpoken und mittengrabben. Ist nicht fuer gewerken bei das dumpkopfen.
  42. My Suggestion by Goo.cc · · Score: 1

    Does this really need to be done quickly? If not, you could do it yourself with just one to three pages a day, which should be very managable. This would save you the money of paying someone and it would give you the chance to quality check each page as you go.

    Unless you plan on using OCR, these documents could also be saved in tiff, png, or jpeg formats. Personally, I would consider a format that allows for the embedding of keywords into the file, so that searching will be easier later on.

    Good luck.

  43. Possible Solution: by Whatchamacallit · · Score: 1

    1. Get a scanner with a document feeder.
    2. Get software to scan to PDF format.
    3. Get Google Desktop Search which will index the contents of PDF or get an Apple Mac with Mac OS X 10.4 (Tiger) and Spotlight will index your PDF's. If you have a Mac, you may be able to scan to PDF without needing Adobe Acrobat.

    Don't know about scanner services, but check around and you might find someone who can scan the documents to PDF and give you a DVD-R or CD-R's with the files. Kinko's? Print Shop?

    We have highend Kodak scanners that are unbelievably quick (24ppm) and scan both sides of a page at the same time. Of course, they cost $15,000.00 USD. So that's not practical for most budgets.

  44. Big photocopiers by adl99 · · Score: 1

    In the last two weeks, I have done broadly the same thing. I work at a hospital with a large canon photocopier (ir5020i). This has an auto-document feeder like any self-respecting copier would.

    It is also a network device to scan / print. I took in my computer (mac mini) plugged into the ethernet port and (adding 20-30 minutes of fiddling) was away.

    So.. make friends with your local big company (a hospital would be good - you can make a small donation).
    Bear in mind though that it took me pretty much all (working) day and I only had the equivalent of about 6 reams of paper (3000 sheets). Thank goodness it was a public holiday!

    To save time, go through meticulously beforehand with a staple remover. To separate the sheets, place them vertically and blow down onto them to get air between them. HTH.

  45. I can do it. by AvitarX · · Score: 1

    I work at a place that does this kind of stuff. I can say though that storage space is cheaper than you would think and in even the medium term a better idea.

    If you really need to be able to access it though something like that should cost between 10 and 20 cents a page (in that quantity) depenind on the standards for the accuracy and the feedability. (if it is 100 page documents in 3 ring biders with no staples and clean edges and no post its expect it to cost a lot less than 2 page documents covered with stickies and stapled.

    I am not plugging my place of employement, it should be easy to find somewhere local.

    --
    Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
  46. Hardware is not the limitation by cander0000 · · Score: 1

    I explored offering this service as part of my consulting practice a few years back. Turned out hardware and software weren't the limitations, even if I could use a simple ADF scanner with a reasonably- priced temp staff. The limitation was the time to organize it into usable categories and make it the file name match the type of content, etc. I'm intrigued by some of the new software such as PaperPort http://www.scansoft.com/paperport/standard/ that apparently makes the documents searchable, minimizing the need to title and folder each scanned image, but I'm sure its relying on some kind of OCR and not sure how kind that would be to hand-written docs. As long as it allows a simple interface to tag each scanned image. You're gold.