Slashdot Mirror


Large-Scale Paper-To-Digital Conversion?

An anonymous reader writes "I've just been asked to digitize several dozen sets of lecture outlines at the university where I work. Basically, professors want to hand me a big (often 100+ page) stack of their handwritten lecture notes (with messy text, equations, and diagrams; sometimes double-sided) and expect me to post a PDF-or-something-similar to their course's web page. However, every desktop scanner I've ever used takes 1-2 minutes of user-attention per page and the resulting files end up Huge, impossible-to-read, or both. All I have at my disposal is my PowerBook, Acrobat, a couple hundred dollars of department funds for a new scanner (this maybe?), and, if I ask nicely, overnight use of the secretary's Win2k box. Any ideas? Sheet-fed scanner recommendations? Better file formats than PDF (or better PDF settings)? Do any of you students have usability advice?"

459 comments

  1. Get stuffed by October_30th · · Score: 4, Insightful

    Uh. How about telling your prof. to get stuffed and get a real secretary.

    --
    The owls are not what they seem
    1. Re:Get stuffed by Amiga+Lover · · Score: 5, Interesting

      I think you're right on the money. May be well worth taking the job to an outside agency. There are many print shops using Xerox Docutechs, which scan in many hundreds of sheets at once to print copies of documents. The scanning takes barely a second a page, and it wouldn't surprise me if the document format being stored inside the docutech is something that can be used for this purpose.

      I've had a similar job, where our school's lecturers wanted their notes in the same style so one of my jobs as admin assistant was retyping chapters from textbooks & inserting the original illustrations. That didn't start out too bad until lecturers started basing course notes on entire quarters of books, expecting them to be retyped completely in their own style. Give an inch they'll try to take a mile - use the few hundred $$ to get it professionally scanned.

    2. Re:Get stuffed by Anonymous Coward · · Score: 0

      WOW! Thats *so* helpful! Just refuse to do the job your employer is paying you to do... DAMN... why didn't I think of that?

    3. Re:Get stuffed by SoSueMe · · Score: 2, Interesting

      ...retyping chapters from textbooks & inserting the original illustrations. That didn't start out too bad until lecturers started basing course notes on entire quarters of books...

      Isn't that copyright infringement?
      Unless, of course, they wrote the textbooks.

    4. Re:Get stuffed by October_30th · · Score: 3, Insightful
      WOW! Thats *so* helpful! Just refuse to do the job your employer is paying you to do... DAMN... why didn't I think of that?

      How do you know he's getting paid to do it? Some professors have a nasty habit of getting all their nasty, menial and boring stuff done by their students who are already working on their degree projects 12 hours a day, six days a week.

      Ok, so for some reason I assumed that the poster is a student so my initial reaction was probably off. I would never assign such a menial, dead-end task to my postgrad students, nor would I have accepted such a task without objections when I was still a student.

      --
      The owls are not what they seem
    5. Re:Get stuffed by Walt+Dismal · · Score: 3, Insightful

      No, seriously, this request shows utter lack of concern by someone who may be a professor, but is also a bad manager and possibly an idiot. Your response perhaps should be to scope out the project and toss estimate and the funding issue back into his lap. But do not let yourself be used as slave labor.

    6. Re:Get stuffed by djplurvert · · Score: 5, Insightful

      In addition to the points already made it is not unreasonable to simply tell the prof that his/her expectations are unreasonable. Perhaps "get stuffed" is a bit over the top but I've found that employers (even professors) will listen to reasonable explanations.

      I used to have a boss that would say things like "this should only take you about five minutes". I finally told him, "nothing takes just five minutes, if I have to stop what I'm doing there is a startup/teardown cost for every task." I convinced him that there was a granularity of 1/2 hour for every random task he wanted done. The discussion was fruitful for both of us, he was more reasonable about his expectations and put a bit more thought into what he wanted to distract me from my primary task to do.

      Now, the original idea is a reasonable proposition, however, it isn't really the sort of thing that should be done for just one prof. Perhaps several departments can combine their resources to setup something that will allow this type of thing to done in a reasonable time frame.

      plurvert

    7. Re:Get stuffed by Amiga+Lover · · Score: 1

      Isn't that copyright infringement?

      probably

    8. Re:Get stuffed by Glonoinha · · Score: 2, Interesting

      Even half an hour is being generous, on your side. As a consultant the smallest unit of time I was even allowed to to quote was four hours, meaning that the client was looking at at least a $500 bill every time I got involved (or if I was involved, each time they changed directions or wanted something done differently because they changed their mind.) Needless to say, I was allowed to stay focused on the actual project and rarely got hit with mickey mouse crap like changing the colors or fonts or rearranging the buttons on the screen because the secretary likes the word 'Yes' instead of 'Ok'.

      Granted if it was something reasonable and I could do it without shifting gears (mentally) I would usually slipstream it into the work I was doing and not write it up. If it was redoing work that I had already done, or worse if I recommended doing it one way and they mandated I did it some other way and after I was knee deep in it decide to go yet another direction or even in the direction I originally suggested ... there is significant rework that needs to be done and the associated ramp down / ramp up time is often a big chunk.

      --
      Glonoinha the MebiByte Slayer
    9. Re:Get stuffed by Adian · · Score: 5, Insightful

      On the contrary, it's your job as a professional and as an employee to keep your employers in tune with what is possible, and what is most efficient for the manhours/money involved. As employees you are also responsible to your employers to keep them informed of ways to actually save money also if there is a place this can be done. If this particular job would require hundreds of manhours to do, versus paying a place that actually specializes in these services to do it. Which I'd guess the university either has this equipment on campus, or has contracts with some company already for something similar.
      Besides the fact, it sounds like they are not aware of the time involved in scanning off 10's nonetheless hundreds of pages. It doesn't sound like they are too anxious to make it easy for him to get the job done either (not buying him new equipment, using the secretaries Win2k box after hours??).
      I've volunteered my efforts before on a simple scanning job that required hundreds of regular photos to be scanned in at relatively good quality (why else do it otherwise), and ended up taking forever. Upon informing the client of the amount of time required, they adjusted the way the job was being handled.
      I think being straight with your employers, and clients is the best approach to any situation where too much is being expected. The times I've had these instances come up, and recommended different approaches that resulted in money being saved, or manhours on a task being reduced, I saw benefit in my paycheck through raises or promotions.

      --
      Adian
    10. Re:Get stuffed by sumdumass · · Score: 1

      wouldn't it be fair use? especially if they bought the books for learning material.. or am i missing somethign? is he doing this to escape everyone buying the books?

    11. Re:Get stuffed by gkuz · · Score: 2, Informative
      Xerox Docutechs, which scan in many hundreds of sheets at once to print copies of documents. The scanning takes barely a second a page, and it wouldn't surprise me if the document format being stored inside the docutech is something that can be used for this purpose.

      Truly, ignorance is bliss. This is clearly written by someone who has seen a DocuTech only from a distance.

      We have three of them where I work, and I have worked very very closely with them on a number of projects. Sure, they scan quickly, but you can't get the data out of them. They are copiers, essentially.

    12. Re:Get stuffed by curator_thew · · Score: 2, Informative

      Fair use for educational purposes has narrowed. In patent law, Madey v Duke 2002 found that a university wasn't allowed a non-commercial exception because "experimentation" was "furthering the business aims of the university". Yes, this was a hugely contentious case.

    13. Re:Get stuffed by Anonymous Coward · · Score: 1, Funny

      How can I get overnight use of the secretary's box?

    14. Re:Get stuffed by sumdumass · · Score: 1

      As kind of an after thought, i was originaly thinking they would buy the books then copy them to reorganize the information to more closley match the content being instructed.

      After looking into that case and some more thoght, i see your point and and agree. Also it could be seen as trying not to pay for the content in the way it was original intended. Thankyou

    15. Re:Get stuffed by Man+Eating+Duck · · Score: 4, Informative


      I've been working with various versions of the Docutech system for about six years, and they're in use in most of the professional copy/print shops around, at least in Scandinavia. They scan full page and double sided, 600 dpi at about 1 page/sec. Newer versions also can handle full colour.

      Native document format is tiff images with a proprietary control file (structuring, positioning etc), but you can easily convert it to pdf.

      I'd guess that a professional shop will charge you about 30 cents a page if you accept the raw document files without 'touching up'. This is more than adequate if you're just going to reproduce it on paper, or even distribute the PDFs. It'll weigh in at about 100k a page for the tiff format, and a lot less for the PDFs. This is black and white, which in most cases will suffice.

      Professional equipment (as in contracting a print shop) is definitely the way to go. I know that at the University of Oslo, Norway, they have established an in-house shop that will do this type of work internally for just about cost. Maybe that's an idea to put forth to the management? Surely your university will find other uses for it than just your assignment.

      Hope this helps :)

      --
      Are you a grammar Nazi? I'm trying to improve my English; please correct my errors! :)
    16. Re:Get stuffed by Anonymous Coward · · Score: 1, Informative

      Uhhh we regularly use docutechs to convert from printed matter -> tiff -> pdf.

      I assumed all models allowed this, but I guess not.

    17. Re:Get stuffed by E_elven · · Score: 1

      Well you obviously have time to spare. Taking Sunday off to 'avoid going insane'.. damn pussies.

      --
      Marxist evolution is just N generations away!
    18. Re:Get stuffed by Anonymous Coward · · Score: 1, Informative

      LaTeX

      If the professors are doing lecture notes with many equations they might consider putting the notes in LaTeX format...

      Its really the best way to do lecture notes that can be edited... far more useful than a pdf file of a scan of hand written notes.

      Of course that makes things easy on you by making things hard for them...

    19. Re:Get stuffed by eliza_effect · · Score: 3, Funny

      professors want to hand me a big (often 100+ page) stack of their handwritten lecture notes (with messy text, equations, and diagrams; sometimes double-sided) and expect me to post a PDF-or-something-similar to their course's web page.

      Spend the money on buying them copies of Mavis Beacon Teaches Typing.

    20. Re:Get stuffed by Anonymous Coward · · Score: 0

      The poster has a laptop, PDF software, and a couple of hundred bucks. The goal is to convert hand written documents (100+ several times over) and post them on a webpage. Why would this even be considered as anything but a low-grade time waster. YOU DON'T HAVE THE RESOURCES. What are you going to buy with that "couple of hundred dollars"? A scanner?...please.

      I would like to build a hydro-electric dam, I have a shovel and some pocket change, can someone here on slashdot suggest something?

    21. Re:Get stuffed by Anonymous Coward · · Score: 0

      Being straight, as Adian suggests, is a very good idea. But to help with this, I bet your school library or a nearby school library has a scanner system for scanning documents for blind or visually impaired students and staff. Find that office and talk to *them* about whether you can use their equipment, commission their part-time employees to record these documents, and how much time and money it usually takes so that you have a reasonable estimate of the job to bring back to your employers. They should at least be able to give you good estimates of the amount of time needed to process such documents, written down so that you can budget the time properly with your professor's knowledge.

      I assume that your professor is not evil, and will be reasonable about helping you pull the strings to get the needed resources. Perhaps, for example, he can fund the secretary for a computer upgrade she has been demanding so you can use her system after hours? Or finally get the system he's been wanting?

    22. Re:Get stuffed by idlerich · · Score: 1

      Speaking as a university professor, those people asking you to digitize hundreds of pages need to get a clue. Point them towards a LaTeX manual, and get on with your life. You have better things to do.

    23. Re:Get stuffed by MarcQuadra · · Score: 1

      When I started my job thfe first project I was handed was a leftover from the last set of folks there:

      Digitize 2500 CDs. Now. And without spending any money.

      I eventually sat down with the CIO and explained that this isn't something you can do in a day, week, or month. We eventually had student workers plug away at it in their free time and it took eight months.

      All this because someone said, "sure, I just digitized a CD a few minutes ago, digitizing the whole roomful shouldn't take more than a few days."

      --
      "Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
    24. Re:Get stuffed by Paisley+Phrog · · Score: 2, Informative

      Exactly. I work at a print/copy shop, and the last color machines we had (Xerox DocuColor 3535) served as document scanners as well as copiers/printers. Place your document in the feeder, select "scan" from the menu, pick your resolution, and press the big green button. When it completes, you log into the printer's internal web services page (reachable from it's own IP address) and select how you want to download your files...JPEG, TIF, or PDF. I loved the PDF feature....too bad the 3535 was absolutely horrendous at everything else for high end color.

      This feature isn't available on every Xerox color (we don't have it on our new 2045, unfortunately), so you'll have to check around a little. Check with print shops and see if their Xeroxes have a "scan to file" capability.

    25. Re:Get stuffed by bronney · · Score: 1

      That's exactly what I told my bossy :P.. Ima graphic designer and when you do something that has to do with creating something from nothing (i.e. cooking, lying..), distractions really multiply your startup/teardown time.

      I gave a nice analogy for people who don't understand the creative business though, and most of them understands the difficulties involved:

      "Think of how you're going to write 6 poems on different subjects in parallel".


      -bron

    26. Re:Get stuffed by Anonymous Coward · · Score: 0

      "Yes, this was a hugely contentious case."

      Thank you including that caveat. I believe it bears repeating.

      This was a hugely contentious case.

      In fact it is clear as a bell that fair use is specifically strengthened for educational use. The fact that schools often maintain "copy shops" where they still pay outrageous fees on materials copied for class use is an example of mismanagement and greed at the administrative level and not a reflection of fair use at all.

    27. Re:Get stuffed by Short+Circuit · · Score: 1

      I'm sure that the publishers would rather charge them for it. Take instructions on how to reorder the book, then ship to Xerox for "On-demand" publishing.

      Oh, and the limited-edition nature of the books would make them more valuable, both in the collector sense and in the I'm-a-student-and-I-need-this-textbook sense.

    28. Re:Get stuffed by JAgostoni · · Score: 1

      Everytime someone on /. brings up document formats there are always people that say "LaTeX is the way to go, everything else sucks." While I agree that LaTeX is a great way to go there are many things, in my opinion, wrong with suggesting it especially in this situation:

      1. The professors are obviously not thinking this out as it is (maybe not having the time?) and you expect them to learn LaTeX AND rewrite their lecture notes? Maybe they only do new suff but (a) that doesn't solve the posters problem and (b) they'll need a TA for sure ... old dog new trick sort of thing

      2. It doesn't really apply here because the document already exist and if you (the poster) want it in LaTeX, guess who they're going to ask to do it? And a couple of hunderd dollars is not going to buy you enough intern-time to re-do all the lecture notes in LaTeX.

      3. People want their GUI's ... expecially profs. As an aside, are there any good/easy-to-use layout apps. that output good, clean LaTeX? Not sure what is the point by then, but are there any?

    29. Re:Get stuffed by oregonnerd · · Score: 1

      Kill all professors in question--secretively, of course, so that you won't be blamed. Join in the search (first) for the professors. Secondly, when they're found in the cold room for the cooks, exclaim how badly mutilated the carcasses are. Thirdly, discover a note, "This is what happens to professors that want to scan impossible things and have them posted on their asinine sites". Fourthly, pick some student you really hate, blame it on him/her, and call the FBI. Fifthly (this is just for insurance) check into the local padded-room center and scribble JCLs on the walls.

      --
      oregonnerd...a nerd in Oregon, of course
    30. Re:Get stuffed by LaCosaNostradamus · · Score: 1

      Unfortunately, "digitizing the whole roomful shouldn't take more than a few days" is a prevailing and default line of thought, and is part of a larger mental illness that has led people to tell me things on the order of "any teenager can do what you do with computers". It is always a priority with me to see that the pain of computer work climbs back up the chain of command so that they don't start reverting to this way of thinking. Eternal Vigilence!

      --
      [You have a stable society when some nut guns down a schoolyard and the law doesn't change.]
    31. Re:Get stuffed by MarcQuadra · · Score: 1

      LOL. I remember quite clearly sitting in the office with the new CIO and explaining that digitizing a CD took about ten minutes, and how 2500 of them would take 25,000 minutes if I wasn't doing anything else. Then I explained how there are only 420 work/minutes each day for me, and how even the slightest deviation would stop the digitizing, and that the minimum digitizing time was sixty days, fudged to 180 because I have other work (an entire job) to do.

      The look on the face of these people when they realized how out-of-whack their non-estimations were was priceless.

      anywho, they were happy to hear that I found a way to get it done at no 'real' cost by having student workers do the grunt-work. Now it's almost done and they want a LOT of features added to the web-based front-end. I have to explain again how if they don't want to spend massive loot to farm it out to a contrator I have to have the time to LEARN AN ENTIRE COMPUTER LANGUAGE which could take months.

      --
      "Sometimes, I think Trent just needs a cup of hot chocolate and a blankie." -Tori Amos on Nine Inch Nails
    32. Re:Get stuffed by Anonymous Coward · · Score: 0

      "Think of how you're going to write 6 poems on different subjects in parallel".

      No offense, but if that request cause you any difficulties then you are not creative enough, and I suspect you only took this job out of lazyness. Do us all a favor and ask your boss to hand you a mop.

    33. Re:Get stuffed by Anonymous Coward · · Score: 0

      > I was handed was a leftover from the last set of folks there:
      > Digitize 2500 CDs. Now. And without spending any money.

      Sorry, but what is 'digitizing' a CD? Do you mean scan stuff into images and put them on a CD, or burn copies of CDs, or record the (already digital) music on CDs into the computer, or what?

      I guess I've just never heard the term used in that context before.

    34. Re:Get stuffed by idlerich · · Score: 1

      Re 1. and 2.: LaTeX is not hard to learn. It's very common in the scientific community, because it's very easy to structure a document nicely, and it is superb when it comes to handling mathematical expressions. Re 3.: Yes, there's LyX, whose authors describe as a "what-you-see-is-what-you-mean" editor. It's very user friendly. You get all the advantages of LaTeX without having to learn it. I've always felt that university professors should be able to handle technology better than the average consumer. Regardless of specialization, an academic should be computer-literate.

    35. Re:Get stuffed by Anonymous Coward · · Score: 0

      where you work

      Spoken like a President & CEO, if I'm not mistaken ;-)

    36. Re:Get stuffed by JAgostoni · · Score: 1

      Perhaps LaTeX is not difficult to learn but in order for this to be a viable solution, the person has to be willing to learn it. Why would they learn such a wonderful, powerful language when they can use a half-assed tool in Windows? As silly as that sounds, when I worked for a University, that was pretty much the response. Granted, I didn't exactly work for the sharpest Profs...

  2. Kinkos? by axonal · · Score: 5, Informative

    Some Kinkos have those big goliath Xerox scanners which act just like copiers. Load a stack up papers, and it will scan the pages and load them up. Not sure about PDF export/etc though.

    1. Re:Kinkos? by anglete · · Score: 1

      I agree, it will be cheaper for kinkos to do it with their machine than you to do it and waste a couple days figuring it out.

    2. Re:Kinkos? by zenquest · · Score: 5, Informative

      We have a Xerox WorkCentre Pro 65 at my school. It can scan at around 50-60 pages per minute, and will do double-sided. It will do PDF output, too. (and email it or FTP it to you, if so configured)

      Our teachers use them for exactly the purpose described. If you don't have one of these type machines around anywhere, then definitely give Kinkos or some similar establishment a try.

    3. Re:Kinkos? by Anonymous Coward · · Score: 1, Informative

      The current price for Scan to PDF at my Kinko's branch is a $25 setup and $.25 per page. Since there are hundreds of pages, you can probably get them to waive the setup, since it's really just there to gouge folks who want a couple pages scanned.

    4. Re:Kinkos? by DJStealth · · Score: 1

      Take it to kinkos and keep the difference in $$.

    5. Re:Kinkos? by Anonymous Coward · · Score: 0

      That sounds like it fits within his budget of a couple hundred dollars!

    6. Re:Kinkos? by zenquest · · Score: 4, Informative

      Going to Kinkos? Yeah, it's a bit pricey, but not totally out of bounds.

      If he's at a large university, some other department might have one of these. Xerox doesn't charge for scans when you lease the machine. They only charge for how many prints or copies are made, so it would be essentially free for another department to allow him use their machine. It doesn't even require any additional setup, since you can enter any email address into the machine and have it send the document there directly from the copier. (assuming the SMTP server has been set on the machine)

    7. Re:Kinkos? by glowurm · · Score: 2, Informative

      Confirmed. Your local Kinko's should have the resources to scan the pages to PDF in an automated fashion. Call around and talk to a couple of locations, if necessary, to get the proper terms and pricing, but it should run about US$0.25 per page. The software used at the Kinko's I'm on familiar terms with (intimately familiar - too much work done there) uses Canon hardware and software and can burn the resulting files on to CD for you (at a cost of US$9.95/cd, of course!)

      They can usually do up to tabloid size sheets (11"x17") through the feeder, and expect a turn-around time of 24 hours or so. They can also do OCR scans on the resulting files. The OCR conversion runs around US$9.95 for the first page, and US$2.50-ish for additional pages, and they won't do any correcting of the errors such that result from software retardedness unless you pay extra. It hurts, but if you gotta have it...

      Good luck!

    8. Re:Kinkos? by Luzumsuz+Lazim · · Score: 2, Informative

      Our department has Konica 7165 copier, which has the scan-to-email capability. It can e-mail the scanned document as multi-page-pdf or tiff files, thus you don't need to convert it to pdf page by page.

      And, use a low resolution setting (say 100dpi) for handwritten documents. It will do just fine. Pdf (depending on the driver though) compresses the image. If you use a machine something like the Konica, try to set the threshold/brightness to a level such that the empty portion of the pages will appear as plain white; this will increase the compression ratio significantly.

      So, my recommendation is that try to find a Kinkos which has this type of machine. If you can't find, just tell the professors that it is simply not a reasonable task that can be done in finite time.

    9. Re:Kinkos? by Anonymous Coward · · Score: 0

      I used to work here www.fvtech.com maybe you can give them a call.

    10. Re:Kinkos? by digitalrust · · Score: 2, Informative

      At the FedEx Kinko's where I work (dig the new name), we use the Canon ImageRunner 105 and scan directly into Acrobat. It's very convenient and pretty fast. We have control over dpi of the scan, pure B/W vs. greyscale, and minimal halftone settings. There's no company-defined pricing for this; we charge $0.25 per page, with a $10 minimum. It creates huge files though, unless you reduce the dpi of the scan.

      Another option is to look in the phone book under "litigation copying" or "legal copying". Lawyers often scan thousands of legal documents and have them indexed by keyword. Data entry people get paid to skim each document to record the keywords before the documents are actually scanned. Price quotes are based on the quality of the originals (staples, torn sheets, etc.)

    11. Re:Kinkos? by ryg0r · · Score: 0
      Going to Kinkos? Yeah, it's a bit pricey, but not totally out of bounds.

      Yeah, I can't afford the outrageous prices of the Romania or Marbella either.

      C'mon, who else thought that Kinko's was an escort service? Fess up!

      --
      Karma whoring .sigs don't work
    12. Re:Kinkos? by Anonymous Coward · · Score: 0

      Two words: "Unfunny!"

  3. Knee to the grindstone... by Faust7 · · Score: 1, Funny

    Flex your fingers, crack your knuckles, and get some eyedrops... because you're going to be doing a lot of typing.

    1. Re:Knee to the grindstone... by Exocet · · Score: 4, Insightful

      "Ummm yeahhhh... if you could just do that..."

      Faust7 is right about this one. Frankly, OCR is ok, but not great - on nice text on book-or-better paper. Handwritten notes? With equations? No. Not unless your profs have some damn fine handwriting and we all know that that is absolutely not the case.

      My advice is the same as Faust7's with these additions: spend some of that money on a really nice keyboard, wrist-rest and/or maybe a nice monitor. You are going to be needing all three. If there are any left over funds, get some really nice tea. I suggest Twinnings English Breakfast or Prince of Wales, if you're going to go bagged.

      --
      Exocet Industries - Taking over the world, one computer at a
    2. Re:Knee to the grindstone... by comm3c · · Score: 1

      or instead of buying a scanner, buy some people to type it with you and have fun.

    3. Re:Knee to the grindstone... by base_chakra · · Score: 1

      My advice is the same as Faust7's with these additions: spend some of that money on a really nice keyboard, wrist-rest and/or maybe a nice monitor. You are going to be needing all three. If there are any left over funds, get some really nice tea. I suggest Twinnings English Breakfast or Prince of Wales, if you're going to go bagged.

      And then get some really nice bath toys and a really nice beanbag chair, and if you can, one of those little stands that holds bananas up so they stay fresh longer (a really nice one) if you're going to do the whole "banana" thing.

    4. Re:Knee to the grindstone... by LittleBigLui · · Score: 1
      I suggest Twinnings English Breakfast or Prince of Wales, if you're going to go bagged.
      At least english breakfast is also available loose-leaf. But he should spend his time typing, not fiddling with loose-leaf tea, so bagged is the way to go for our brave keyboard knight.
      --
      Free as in mason.
    5. Re:Knee to the grindstone... by Anonymous Coward · · Score: 0

      Hmm... solution we came up with (but costs $$$, but maybe you could leverage this into a campus-wide system):

      1) InputAccel server license.
      2) dual-processor Windows server, with fast processors, and a SCSI card, oh, and 1GB+ of RAM.
      3) Fujitsu makes some nice SCSI scanners w/ automatic doc feeders.

      InputAccel can manage the scan-OCR-PDF process.

      Like I said, this will cost $$$. You can easily spend $3-5000 on the scanner alone.

    6. Re:Knee to the grindstone... by xScruffx · · Score: 1

      Or he could ship the notes to me. I'll be happy to type 'em up (figures/formulae included).

      * Pops about two dozen Advil.

  4. well... by Anonymous Coward · · Score: 5, Funny

    if I ask nicely, overnight use of the secretary's Win2k box

    Plus, if you're lucky, you could also get other after-hours favors from the secretary as well ;-)

    1. Re:well... by Anonymous Coward · · Score: 1, Funny
      Plus, if you're lucky, you could also get other after-hours favors from the secretary as well

      No no no... he asked only for the secretary's Win2k box. A mistake, if you ask me.

    2. Re:well... by PsiPsiStar · · Score: 2, Funny

      Maybe. But I doubt she has a scanner too.

      --

      ___
      It's the end of my comment as I know it and I feel fine.
    3. Re:well... by Anonymous Coward · · Score: 2, Funny

      I thought you were going to get a Funny Mod from me until you failed to reference the word box.

    4. Re:well... by Anonymous Coward · · Score: 0

      > Plus, if you're lucky, you could also get other after-hours favors from the secretary as well ;-)

      Secretary of my last department was a guy in his mid 30's, pushing 300 lbs. Smart funny guy tho. Nice to know the 50's are still with us, eh?

    5. Re:well... by Anonymous Coward · · Score: 0

      if I ask nicely, overnight use of the secretary's Win2k box Plus, if you're lucky, you could also get other after-hours favors from the secretary as well ;-)

      It's a Win2K box. Too many viruses, especially if he has to use the back door to access it and doesn't use good protection.

  5. High Speed Scanner by Anonymous Coward · · Score: 2, Informative

    You need a high speed scanner. Fujistu makes a nice one that works pretty well.

    1. Re:High Speed Scanner by Judg3 · · Score: 1

      Indeed, the AC is right - Fujitsu (and a few others) is the way to go. Back when I used to work for a stock brokerage, all of the overwhlming amount of paper that customers had to fill out would be scanned in with a few high-speed Fujitsu's into Hyland's OnBase document management system.

      Sadly, this approach is way out of league for the small budget the poster has.
      I'd have to wonder if a consumer scanner, even a nice one like that HP, can keep up with the constant use required of it.
      Much like Laser printers, the Fujitsu scanners have complete rebuild kits that you can use to bring them back to like-new state, which I don't think the consumer based HP scanners would have. But then again, if you get a good year or so out of a $300 scanner before needing a new one, that's a lot better then buying a high speed scanner (They easily run $3000 used)

      --
      Looking for hardware (Currently need: Large Etch-a-Sketch) Have one? See my journal!
    2. Re:High Speed Scanner by Nogami_Saeko · · Score: 2, Informative

      Do NOT get that HP scanner. I have the same model, and while the hardware is just fine, HP's scanning software is garbage.

      I run paperport to store all of my bills, documents, etc. The HP scanner software simply will-not use the resolutions and options I want paperport to use (200 DPI, B&W).

      When using the sheetfeeder, the damn thing always scans in 24bit at 200DPI no matter what I try and set as a default - then I have to manually convert every page.

      Go with a different model.

      N.

      --
      "Nothing strengthens authority so much as silence." - Charles de Gaulle
    3. Re:High Speed Scanner by Anonymous Coward · · Score: 0

      HP software for consumer grade hardware is garbage in general. I have installed many HP printers and while the hardware is great, the software is nothing but trouble. Fortunately, you usually don't have to use HP software if the driver is built into the operating system.

    4. Re:High Speed Scanner by arete · · Score: 1

      I've found the HP _drivers_ to be very good, the HP user software to be shitty.

      However, if you were running linux, "convert" would do a good job of automating that process for ya. I think Photoshop has a similar level of possible automation, but I haven't tried it.

      --
      Looking for freelance Actionscript (Flash/Flex) or ColdFusion work and/or freelance developers. Email me, put Slashdot
  6. Simple. by jebell · · Score: 5, Funny

    Outsource the job to India.

    --
    This is my sig. There are many like it but this one is mine.
    1. Re:Simple. by GothChip · · Score: 4, Insightful

      I know the parent post was funny but he's thinking along the right ideas.

      Take the few hundred you have to spend on equipment and spend it hiring a few temps.

      A good typist should be able to type up hand written notes faster than scanning them all in and manually fixing all the mistakes.

    2. Re:Simple. by pendragn · · Score: 2, Insightful

      Outsource the job to India.

      Not as bad an idea as it sounds. My advice is to not waste the department's money, and your time, buying, installing, and using a sheet feed scanner. Somebody in your local area assuredly has one already that they either rent out to people in your situation, or that they use to do the work you need done.

      Use the funds that the department gave you to have your local copy shop do the work. They will almost certainly do it faster than you could, and the end product will most certainly be better than what you could provide. This is the kind of thing that the people who work at copy shops do for a living.

      Also PDF is a great format for this, highly portable, and so far fairly version proof. You don't have to worry about the PDF being obsolete before the professor decides to change the structure of his class.

    3. Re:Simple. by Anonymous Coward · · Score: 1, Interesting

      The Internet Archive did or plans something like this for their scanned book project. The cost of sending scanners to (India | China | Malaysia) and paying a few cents per page in labor is less than doing the same job here.

    4. Re:Simple. by modge · · Score: 1

      That would be what i was thinking. get the students to do it for pintance (im a student, so i know how easy it is to make mugs of students). still convert the text to pdf in the end of course

      --
      I am a sig
    5. Re:Simple. by Anonymous Coward · · Score: 0
      A good typist should be able to type up hand written notes faster than scanning them all in and manually fixing all the mistakes.


      Erm, if anyone knows a few skilled typists who are also proficient in maths, physic or any other faculty research subject, just contact the topic poster :)
    6. Re:Simple. by Anonymous Coward · · Score: 0

      Actually, not a joke - a good solution.
      The price will beat a Xerox WorkCentre any day, iff you have a one-off job to do on it.

      If however, there will be a continual stream of workload then get the Xerox.

    7. Re:Simple. by Anonymous Coward · · Score: 0

      I'm your guy! I type 88 corrected words per minute, and I'm a whiz with Word's Equation Editor.

      Seriously I used to type up course notes from Word to PDF for an EE professor, including circuit diagrams in Visio.

      I thought I was doing a good job until the students started laughing at the Greek letters I was getting wrong!

    8. Re:Simple. by dasmegabyte · · Score: 1

      The company I used to work for processed newspaper ads into html, thus repurposing static content without the graphic ad designers having to learn anything new or do any additional work, which they weren't willing to do. Most of our clients saw the internet as a necessary evil that was probably robbing them of subscribers, but that they had to have some small piece of, so it wouldn't have been worth more than a few measly dollars.

      We converted these by sending the scanned tif and gif files to a US company which outsourced the creation of the new graphics to India and/or Pakistan (it used employees in both to hedge bets against political instability). Cost to us was something like a dollar or so for straight cut and paste graphics conversions, and $5 for more "artistic" work, adding full colour clip art and the like. We estimated it took them about two hours for this...which means the outsourcing company took in about $2.50 per hour. No word on what the "artists" received.

      Incidentally, at one point they replaced this expensive, manual process for most ads with a simple resizing program. It wasn't searchable, but nobody cared too much...and it saved so much money.

      --
      Hey freaks: now you're ju
    9. Re:Simple. by arantius · · Score: 1

      As pointed out, there's lots of complicated maths and so forth. Ignoring that ... We figure some guesses ... Several dozen papers of 100 ish avg pages each makes let's say 4000 pages. A few hundred dollars may be $500. So, we have $0.0125 per page.

      In other words the typists have to handle 41.2 pages per hour to manage earning minimum wage. Sure.

      --
      Health is simply dying at the slowest rate possible.
    10. Re:Simple. by syukton · · Score: 1

      ever tried to type a diagram? I doubt most typists are up to par wrt ASCII art. Or for that matter, ever tried to type up a mathematical formula?

      --
      Reinvent the wheel only at either a lower cost, greater effectiveness, or your own personal enrichment and satisfaction.
  7. HP Copiers by kevinank · · Score: 2, Informative

    The large multi-function HP Printer/Copiers will scan and e-mail a PDF of an entire stack of papers just as you would use a normal copier. I'm sure that the other manufacturers have similar features, but it is the HP equipment that we use at work.

    --
    LibBT: BitTorrent for C - small - fast - clean (Now Versio
    1. Re:HP Copiers by XaXXon · · Score: 2, Insightful

      Will you please tell both of us where we can get one for a few hundred dollars, as specified in the question?

      I think the real answer is that this guy is S.O.L. .. he's just going to have to spend some good quality time getting to know a consumer-level scanner, and let the professor know to do his notes in software initially.

    2. Re:HP Copiers by plankers · · Score: 3, Informative

      The Konica ones where I work do a similar thing -- they can email you a TIFF or a PDF of a huge stack of paper. Ours are only black & white, and will only do a fixed resolution, but a newer color copier would fix all those shortcomings. Many universities and colleges have print centers that have this type of equipment if your department doesn't.

      Worse case, you can get an HP scanner and the automatic document feeder for it. If this is going to happen a lot it should be pretty easy to justify the $500 or so for the scanner, ADF, and a copy of Acrobat.

    3. Re:HP Copiers by plankers · · Score: 1

      You might not be able to get a copier that does this for a couple hundred bucks, but if a place on campus has a copier you can use, either for free or cheap (since scanning doesn't use toner or paper, after all), you win.

    4. Re:HP Copiers by kevinank · · Score: 2, Informative

      The big copiers run a couple of thousand dollars, but the multi-function fax/scanner/printers from HP are in the approximate price range and are all able to scan stacks of paper rather than individual sheets. The easiest way to get one of the large printers for less that a few hundred dollars is to start calling alumni who work for HP and ask them if they'll make an equipment donation.

      --
      LibBT: BitTorrent for C - small - fast - clean (Now Versio
    5. Re:HP Copiers by Provocateur · · Score: 1

      it should be pretty easy to justify the $500 or so for the scanner, ADF, and a copy of Acrobat.

      you left out the secretary...

      --
      WARNING: Smartphones have side effects--most of them undocumented.
    6. Re:HP Copiers by Zak3056 · · Score: 1

      Will you please tell both of us where we can get one for a few hundred dollars, as specified in the question?

      How about this?

      Scanner/Copier/Laser Printer with a 50 page document feeder for $400. And you can get an inkjet model for $150.

      The really important thing for him to have here, I think, is the document feeder, based on his complaint of having to spend 1-2 minutes per page. Anything else (say, converting tiff to pdf and compressing the whole thing) can be accomplished via software.

      --
      What part of "shall not be infringed" is so hard to understand?
    7. Re:HP Copiers by mangu · · Score: 1
      you can get an HP scanner and the automatic document feeder for it


      Some guys in my office did that. Bad idea. The HP automatic feeder is a POS and the software that comes with the scanner is worse. No more than one page per minute, lots of jamming, lots of BSOD's. No Linux driver. Entirely GUI, no way to automate anything, you have to click your way through every load, which is about ten pages max. If the pages are old, ie dog-eared, or paper that's somewhat thinner or thicker than usual, then it's back to manual feed.


      No, in my experience I would never recommend an HP scanner to anyone.

  8. I'd go for the by Anonymous Coward · · Score: 0, Funny

    overnight use of the secretary's box ...

  9. HP Digital Sender by Guanix · · Score: 4, Informative

    The HP Digital Sender series are really great for this stuff. You feed it a stack of paper and it scans it, 15 pages per minute, and can store the PDF on a file server or you can send an email with the PDF attached directly from the network sender! It's a bit expensive, but try to look around for one, maybe the local copyshop? Guan

    1. Re:HP Digital Sender by HBI · · Score: 1

      There used to be a smaller model for about $1500, the 8100C. We have one of these and it's quite useful.

      Not as fast as they claim though. Take the speed with a grain of salt, assume half.

      --
      HBI's Law: Frequency of calling others Nazis is directly correlated with the likelihood of the accuser being Communist.
    2. Re:HP Digital Sender by W2k · · Score: 3, Informative

      Great product. Unfortunately, its price is listed at about 10x the "few hundred dollars" the original submitter specified in his posting.

      I've found the Canon Canoscan flatbeds do a good job of automatically scanning straight to PDF, only minimal user intervention (hit "enter") is required. There's a special mode for scanning text which enhances contrast, so messy notes and diagrams should be fine, too. The resulting PDF:s are also remarkably small in size for what is essentially a huge bitmap. I've a Canon Canoscan 8000F myself, it's very fast and can do higher DPI's than most people need, and although it might be a bit out of his price range, I'm sure the cheaper models can do the same job nearly as well.

      --
      Quality, performance, value; you get only two, and you don't always get to pick.
    3. Re:HP Digital Sender by Florian+Weimer · · Score: 1

      Most digital copiers can do similar things nowadays. Typically, you rent such machines, and it's not too expensive in this case, especially if the device is also used as a true copier.

    4. Re:HP Digital Sender by Necr0maN · · Score: 1

      He could always propose the fact that the scanner would actually do everything described in his job, and that way they could fire him, fork out a little more cash for the big ass scanner, and save the money they'd have to spend on him.

    5. Re:HP Digital Sender by Anonymous Coward · · Score: 0

      Get an HP Laser Jet 3100 multifunction printer/scanner from EBAY and you are all set. Caution, this printer only works with Windows. There is no Linux support (read it's consierd a boat anchor).

      The printer can be found on EBAY for around a hundred dollars.

    6. Re:HP Digital Sender by Monkelectric · · Score: 1
      Unfortunately, its price is listed at about 10x the "few hundred dollars" the original submitter specified in his posting.

      Well, the submitters boss is an idiot. Just because you have 200$ to spend on a project doesn't mean the project can be done for 200$. Undoubtedly whats going on is someone has made it his responsibility to do this project, without any real concern or the ammount of effort. This is pretty standard in just about any business/university :(. I worked for a university and every day the question was "how can we stretch our resources further?". My servers operated continually under a load AVERAGE of 4-5.

      --

      Religion is a gateway psychosis. -- Dave Foley

    7. Re:HP Digital Sender by Zak3056 · · Score: 3, Informative

      The HP Digital Sender series are really great for this stuff. You feed it a stack of paper and it scans it, 15 pages per minute, and can store the PDF on a file server or you can send an email with the PDF attached directly from the network sender!

      I have one of these on my office network, and and I agree that they're pretty good machines--though I have some complaints about them.

      First off, I don't believe their functionality justifies the $3100 price tag. While the feature set it good, for that kind of money, this thing should be able to OCR, and not have to rely on 3rd party software for that functionality.

      Secondly, their "scan to file server" feature requires a server side daemon to run--you can't simply drop the document to an SMB or NFS server. Further, the daemon only runs on WinNT/2k/XP systems, and you need to do a little bit of hacking to get it to run as a service, instead of opening it manually (or via startup folder) on login.

      Third, it can be DOG SLOW. In particular, when scanning multiple large jobs (particularly at higher resolution) the thing will bog down. It also can only handle a fairly small number of jobs in queue at any one time. One of our secretaries can fill its queue in short order, and have to wait about ten minutes before she can scan the next document packet. When she's trying to scan a hundred packets, this essentially becomes her main focus for a work day.

      All in all, our Toshiba copiers seem to do the same job better--of course, they have their own problems (i.e. over $20k each, with a poor user interface, and they don't do color, and don't OCR either.)

      --
      What part of "shall not be infringed" is so hard to understand?
    8. Re:HP Digital Sender by Anonymous Coward · · Score: 0

      > Great product. Unfortunately, its price is listed at
      > about 10x the "few hundred dollars" the original
      > submitter specified in his posting

      Well, that depends on how you read his question. One can read it as "I have a couple of hundred dollars to spend, and thought the thing to do with them is to buy a scanner". If that is true, "don't buy a scanner, buy copier time at Kinkos" is a good answer.

    9. Re:HP Digital Sender by Anonymous Coward · · Score: 0

      Don't ever buy a HP scanner if you want to use it with your mac. Their mac drivers and support range from appalling to non-existent, and to add insult to injury, they claim (on the box, in their specs) that it will work. It doesn't. Save yourseld the money, and what's more important the frustration and go for another brand. (Canon, Epson, whatever. Check out the comments on www.versiontracker.com to find out the quality of the drivers and support. (check out HP scanjet there and you will be dished some really strong language....)

    10. Re:HP Digital Sender by glen604 · · Score: 1

      First off, I don't believe their functionality justifies the $3100 price tag. While the feature set it good, for that kind of money, this thing should be able to OCR, and not have to rely on 3rd party software for that functionality.

      We used to have one of these where I worked as well. The big problem I had with it was that when it broke (mechanical failure), the model we had was obsoleted, and HP's advice was "buy a new one"! Kind of irritating when it costs over 3 grand to begin with (and it was only about 2 years old), I couldn't find anyone to service it, and the warranty is only one year.

      We decided that it wasn't worth spending 3 grand to get another one, with a guaranteed life of only one year, so I put together a solution using an old spare cheap pII we had lying around and a scanner with an ADF- much easier to fix/replace if necessary

    11. Re:HP Digital Sender by Adrick42 · · Score: 1

      HP had this to say about your link. "The page you requested can not be found."

  10. Format by bobthemuse · · Score: 2, Interesting

    While PDFs are pretty well supported, you'll still be storing it as raster data, so there won't be any size decrease over using an image format, such as PNG.

    Are there any web-based packages for searching documents, based on OCR-extracted keywords? Obviously with messy hand-written notes, formulas, etc, OCR won't work reliably. For a similar project, I'd like to OCR the files and use the text data solely for keyword searching. Obviously not perfect, but better than just images.

    PNG is your friend....

    1. Re:Format by Chuckaluphagus · · Score: 5, Informative

      I have to scan and store very high-res black-and-white images for work, and I've found that the best format to save in is TIF with a CCITT Fax 4 compression. It will only work for black-and-white files, but for a full page of text and graphics scanned at 2-color, 600 dpi, you can get a file about 100 kbyte. The image quality is superb, and it's far, far more efficient than PDF.

      The program I use to convert to TIF is IrfanView (http://www.irfanview.com/), a generally excellent image viewer. I'ts free, too, so no worries there. It offers a ton of options for compression settings for different formats, so you can try other file formats as needed.

    2. Re:Format by Waffle+Iron · · Score: 1
      PNG is your friend....

      PNG works well for scanning random documents. To my suprise, however, I've found that JPEG can work even better.

      I played around for a while to find a format to replace the TIFFs that scanner manufacturers seem to think you want to use for documents. For my purposes, TIFF was horrible because its two-tone image detection can totally lose details that aren't high-contrast, like all the handwriting on a carbon-copy form. The required dpi resolution also makes the images way too large for viewing on a monitor without serious downscaling.

      It turned out that scanning grayscale JPEGs at 150 dpi always gave me a readable copy no matter how dim the original was, and it usually came out with significantly smaller files than formats like PNG or GIF. (Probably because JPEG naturally filters out high-frequency sampling noise.) Even with this relatively low sample resolution, very fine print is still readable because the grayscale gives it some "antialiasing". The JPEG images do have some visible compression artifacts, but I don't care about that for the purposes of simply archiving documents in a readable format while taking up as little disk space as possible.

    3. Re:Format by Anonymous Coward · · Score: 0

      We faced a similar problem, and found that scanning at a very high resolution, running a despeckle filter, downsampling to say 100dpi, crushing the palette to 8 colors, then using pnmtops with the rle option made small and very readable files, even when the text was dim in some places (ie, a typed page with pencil-written notes on it)

    4. Re:Format by alannon · · Score: 2, Informative

      Storing what you describe as a PDF should be almost the same size as the TIFF you describe, except for the small overhead of the PDF wrapper. PDFs support CCITT Fax 3 & 4, as well as ZIP & run-length compression on monochrome images.

      I run a micro-publishing business which often involves scanning a lot of B&W images at high resolution. I'll agree that storing files as TIFFs makes them much easier to edit, though. Our final publishing happens as PDFs, though, and it does not bloat the size of the images.

    5. Re:Format by Naffer · · Score: 1

      Maybe its just me, but I thought that JPEG wasn't very good at high contrast areas (black ink on white paper) and was better suited to things like photographs.

    6. Re:Format by klossner · · Score: 1

      The PDF format can handle CCITT compression in the same amount of space as a TIFF file. Finding a PDF writer application to use this format may be a trick, but the format itself is not at fault.

    7. Re:Format by Waffle+Iron · · Score: 1

      That's what I initially thought too, and it's probably true for most situations where the JPEG compression artifacts on high-contrast transitions aren't acceptable. However, if you're willing to put up with the artifacts, it turned out for me that grayscale JPEG can give good compression, retention of dim details and text readability for archiving documents with widely different contrasts and brightnesses.

    8. Re:Format by CoolGuySteve · · Score: 1

      As a student, I have to say that PDFs are the most annoying format ever. Especially when it's mostly a plain text typed document that could have been stored in RTF or a scan of someone's handwriting that could have been stored as an image format.

      PDF requires you to load Adobe's bloated and crappy software everytime you want to go and check something on the course website. It also forces you to spend time downloading and installling Adobe's crappy and bloated software.

      The only documents for which PDFs are acceptable are ones with elaborate formatting such as brochures or papers done in latex.

    9. Re:Format by Anonymous Coward · · Score: 0

      I agree that IrfanView is a really good image viewer and converter. It is really shame that it is not available on Linux. I've been sort of hoping that some bright Linux hacker might some day notice IrfanView and make a clone of it for Linux. :)

    10. Re:Format by Compuser · · Score: 1

      I thought Irfanview was initially an offshoot of GQView.

    11. Re:Format by Anonymous Coward · · Score: 0
      We ... found that scanning at a very high resolution, running a despeckle filter, downsampling to say 100dpi, crushing the palette to 8 colors, then using pnmtops with the rle option made small and very readable files

      Yeah, but do you really want to spend 5 minutes per page when you have 100+ pages to do?

  11. Ounce of prevention... by drsmack1 · · Score: 0

    This whole problem could be eliminated if these papers were put into PDF as soon as they are created. That said; I would explore solutions from the legal profession - they have a lot of things that do this.

  12. If you're being 'asked' by Space+cowboy · · Score: 4, Insightful

    Just say 'No'. (If you're being told, it's a different matter, of course).

    It sounds to me like a damned hard job to automate (which is the only way it's not going to be a constant drain on your time), and you're being given next-to-no resources to even come up with a creative solution. Sometimes the best answer is in fact 'No' - it forces people to re-evaluate what they're asking. It comes with the danger of being sacked if it's you that's being unreasonable, of course....

    Simon.

    --
    Physicists get Hadrons!
    1. Re:If you're being 'asked' by malia8888 · · Score: 4, Insightful
      I really agree with Space cowboy. My former husband was a college professor. He was very brilliant in his field, but anything out side of his narrow realm daunted him. He wanted to put pennies in our fusebox when the lights went out. He stared at a breaker box in the condo like it was the control panel of an alien spacecraft.

      Explain the enormity of this scratched note-to-finished Pdf to this educator. Use crayons, mirrors, yarn and tape if necessary to get your point across. Just be diplomatic :P

      --
      Harpo Tunnel Syndrome--my wrist feels funny.
    2. Re:If you're being 'asked' by n1ywb · · Score: 1

      I have to agree, this project sounds like a waste of time and resources. The only way to do a NICE job of this would be to type up all the text and create neat digital versions of all the diagrams, and that's more than a one-person job. Sounds to me like something the profs should be doing their damn lazy selves.

      --
      -73, de n1ywb
      www.n1ywb.com
    3. Re:If you're being 'asked' by Anonymous Coward · · Score: 1, Insightful

      saying no is a good option, then follow it up by telling the teachers that if they want copies of their in class notes, THEY are going to have to change their habits as well. so, a better answer than no would be, NO- but if you can use a laptop to take notes( and join the 21st century with the rest of us) I can easily make copies of those for you.

    4. Re:If you're being 'asked' by rmarll · · Score: 1

      This does not seem like that big of a deal to me. Scan the papers. Once you know what you're doing it shouldn't take more than a few seconds to scan a page. I've done this on cheap ($100) consumer hardware, it's not bad.

      No matter the format, a 100+ page PDF file of pictures of text is going to be bigger than a text document. Get over it.

      What kind of University of Hick doesn't have a scanner anyway.

    5. Re:If you're being 'asked' by Anonymous Coward · · Score: 1, Interesting

      Most of my profs would just scan in the handwritten notes and put it on the net. They were absolutely astonished when one prof showed off his multi variable calculus notes That looked readable and I think more of them will be doing this. They dont seem to mind the fact its in handwriting in a huge jpg, in fact they love it. I dont think students mind either since they have to read the handwriting during lectures anyway so having to read the notes wont be too bad.

    6. Re:If you're being 'asked' by miffo.swe · · Score: 2, Insightful

      I agree totally. Some people tend to look at an admin as someone who does magic. They dont understand that some things either costs money or takes time. Perhaps it would be better to give the people writing theese things a laptop in the first place. It sounds like a great waste of time to duplicate the work when it should have been given to the admin in digital format in the first place.

      --
      HTTP/1.1 400
    7. Re:If you're being 'asked' by Anonymous Coward · · Score: 0

      You seem like someone who will lose their job to an Indian soon. Nice attitude.

    8. Re:If you're being 'asked' by curator_thew · · Score: 1

      > I agree totally. Some people tend to look at an admin as someone who does magic. They dont understand that some things either costs money or takes time.

      Garbage. This is why you take their requirements, produce an estimate of what it will take for the work to be done, get their buy-in (or not, they may cancel the project) and do it.

      Like any professional, you need to negotiate with them so they understand the problems and what it takes from your side of the fence.

      Then what the poster (or his faculty) should do is get the professors on a track of putting the work in typed up format, and so on.

    9. Re:If you're being 'asked' by Anonymous Coward · · Score: 0

      I agree wholeheartedly. I think you should offer at least these three proposals: 1) scan notes directly to PDF 2) re-type notes, and convert to PDF 3) do step 1, then evaluate step 2. You can price each of these options, and help them to understand that this is not a simple, one-person job. They will either point out alternatives, or be forced to accept one of your proposals, or cancel the project entirely. In any case, you aren't stuck doing an impossible job!

    10. Re:If you're being 'asked' by Anonymous Coward · · Score: 0
      Explain the enormity of this scratched note-to-finished Pdf to this educator. Use crayons, mirrors, yarn and tape if necessary to get your point across. Just be diplomatic

      Don't forget "and find another job first." The prof will easily see this as a failing of YOU and not of technology, and you'll be out on your ass.

    11. Re:If you're being 'asked' by Dun+Malg · · Score: 1
      What kind of University of Hick doesn't have a scanner anyway.

      Universities aren't monlithic entities with a giant pool of common resources. For example, if you work in the lit department you can't just wander over to the chemistry department and borrow a spectrographic analyzer. Universities are a collection of small fiefdoms where resorces are guarded jealously. Every department stencils, stamps, or engraves its name all over its equipment not so much to prevent theft by outsiders (who can remove such markings at their leisure once they get it off campus), but to prevent "permanent borrowing" by other departments. There are constant battles over office space, "turf wars" over grants, and even attempts to mount "hostile takovers" of portions of other departments. It's just not one big happy university family.

      So basically, only a "University of Hick" would be small enough to "have a scanner". There are likely dozens of scanners at the university in question, but the department he works for is too small to have needed one yet.

      --
      If a job's not worth doing, it's not worth doing right.
    12. Re:If you're being 'asked' by dasmegabyte · · Score: 1

      Unlike everybody else here, I disagree.

      For one thing, it seems this guy is a student worker of some kind...either a computer assistant, a work study kid, or some other sort of paying-for-my-education-with-work kind of employee. Which means if he says no, there's a good chance that he'll lose his workstudy job, and thus be unable to pay his tuition. At the worst case, there's a chance you'll piss off your professor...and first hand I can tell you that having professors who like you is more important than good grades, honors projects or stipends. Being Professor X's golden boy could mean glowing recommendations, exciting research projects, and in some cases scholarship awards (my wife got a few hundred dollars from one of these, recommended by a professor who loved her to death).

      For another, it's not like a professor asking a student worker to digitize his notes is anything new. A lot of my friends have been asked by their professors to type up notes in the worst henscratch imaginable, to make transparencies, to proofread papers and so on. I even had a professor ask me to write a note for each student in his three classes giving permission to photocopy one of his "famous essays" on logic design, which he then signed.

      If the problem is just volume and feasibility of getting 100 handwritten pages scanned in, you should say so. It shouldn't be, though. From the "minute per page thing," I assume you're using the scanner incorrectly...turning off color and reducing the resolution for a massive speed boost. 150 dpi grayscale should be plenty for pencil, it'll keep file sizes down and keep your machine from chugging away when compressing/OCRing things. Speaking of which, do these documents really need to be digitized to be searchable, or merely digitized so the prof can put them on his website for download? It's probably only the latter...but if it is the former, I suggest brushing up on your typing skills. You've just become a stenographer.

      --
      Hey freaks: now you're ju
  13. The most important thing by Timesprout · · Score: 5, Funny

    Is to first make an exact copy (by hand) of all the existing documents. Its vital to have a full backup in case anything goes wrong with the scanning process you can always restore the manilla folders to their original filled state.

    --
    Do not try to read the dupe, thats impossible. Instead, only try to realize the truth
    What truth?
    There is no dupe
  14. HP Digital Sender by wik · · Score: 1

    See if the department can afford an HP Digital Sender. While they're quite pricy, they'll feed, scan, and email you a PDF.

    http://h10010.www1.hp.com/wwpc/us/en/sm/WF05a/15 17 9-64175-64404-12126-64404-25324.html

    --
    / \
    \ / ASCII ribbon campaign for peace
    x
    / \
  15. ADF Scanners by Loiosh-de-Taltos · · Score: 5, Informative

    What I suggest and use is the HP 4C scanner. It's a SCSI-II only scanner that can be found on Ebay for under $10 usually. They also have an automatic document feeder option that can be found on Ebay. This scanner was originally designed for both Windows and Apple compatibility as well. It cannot handle 2-sided sheets.

    The scanner has four different pieces of software you can choose to use, I'd suggest Precision Scan Pro as that makes multi-document scanning easier.

    1. Re:ADF Scanners by LightForce3 · · Score: 2, Informative

      I agree, an ADF scanner is definitely the way to go on your budget. However, I'd recommend purchasing a new one instead of buying used, especially since you'll be doing high-volume work. I'd also be wary of HP scanners, as I've had bad experiences with their PrecisionScan Pro software, and have been told that in general HP software is sub-par.

    2. Re:ADF Scanners by silverhalide · · Score: 2, Informative

      I have used this setup to scan in tens of thousands of pages. All you need is Adobe Acrobat 4 or 5 (full version) and the Deskscan driver. Drop a stack in, click scan, and walk off and go do something. Come back in 5 minutes, put the pile back in to scan the back sides, click continue, you're done. Acrobat automatically interweaves the fronts and backs for you so the no-duplex thing is a non-issue. Ideal speed/quality settings: 300 dpi black and white threshold scanning. Tweak the threshold for the first page, should be good for the rest of them... Resulting files are 30-40k a page and look great when reproduced on a laser printer.

    3. Re:ADF Scanners by detritus. · · Score: 2, Informative

      I definitely second this recommendation. IMO, one of the best scanners ever made. I have a newer usb HP scanner that doesn't even come close to the speed of this thing. They just don't make bulky, well built quality scanners like the 4C anymore.

      And for the record, you aren't limited to only 4 software applications for scanning (at least in Windows, any application will work if it uses TWAIN). Perhaps you were referring to the document feeder having limited software compatibility?

      (Off topic, but amusing nonetheless if you didn't know, there's an easter egg that's quite humorous..)

  16. Latax is your friend. by Anonymous Coward · · Score: 0

    Just hand back half of the stack, then do the half you kept up in latex.

  17. Outsource to India by Animats · · Score: 0, Troll

    This is a job made for outsourcing to India.

    1. Re:Outsource to India by dougmc · · Score: 1
      This is a job made for outsourcing to India.
      The parent post may be redundant (as it's already been suggested), but it's not a troll. It's a reasonable solution to this problem.

      This is going to be a job that uses lots of time, but may not require much training to perform. Perfect for outsourcing, be it to the secretary, Kinkos or India.

    2. Re:Outsource to India by Syberghost · · Score: 1

      You should have known you'd be marked Troll on that. This is Slashdot; it's only appropriate for oil production jobs to be outsourced.

  18. Change your major? by Anonymous Coward · · Score: 0

    Change your major?

    Hey, it's a thought.

  19. Long time scanning per page by dicepackage · · Score: 1

    I had to do something similar with about a thousand or so pages except they were all seperate files. I would concentrate on doing everything one step at a time. What I mean by that is scan all the pages into your computer and then begain making them into PDF files or whatever format you prefer. On my scanner it took about a minute per page so my main problem was just not having anything to do durring the time while it was scanning. Don't worry about this use this time to do something else such as reading a book or have another computer next to you to surf the web or play games on.

  20. vi, LaTeX, 10 coffeepots and reduced sleep by Anonymous Coward · · Score: 0

    I am digitalizing my lecture notes with LaTeX. Takes some time, but results in perfect output quality and small file sizes. Needless to say I am not using any wimpy wysiwyg-stuff to produce the graphics, thats what the picture-environment was made for.

  21. HP Scanjet 5550c is not what you want by GraZZ · · Score: 4, Informative

    Definately keep clear of the Scanjet 5550c; there's a reason why it's the cheapest feed scanner out there. It will frequently jam if you a) load more than 5 sheets into the feeder or b) use any sort of paper that has been handled by human beings.

    Our Engineering Society was trying to put up an exam archive with one of them and quickly gave up and started scanning with the flatbed.

    Also the scanner has no sane support (one of the few HP scanners that doesn't)

    1. Re:HP Scanjet 5550c is not what you want by unholysheep · · Score: 1
      I'll second that suggestion. I used a Scanjet 4xxx series w/ an ADF at work, and it was super. I assumed the 5550C would only be better. WRONG!

      The jams with the ADF make it worthless, agreed. But the thing that makes even the flatbed nearly useless as well is the drivers. They are absolutley the worst pieces of software I have ever seen. The v2 drivers hose up the system (pc+scanner) so badly that they both have to be reset, and the v1.2 drivers force you to 'preview' everything once before you can set your scan options---this includes ADF use! What was HP thinking?!?

      I wish I had never purchased this hunk of junk. ;(

    2. Re:HP Scanjet 5550c is not what you want by Anonymous Coward · · Score: 0

      "It will frequently jam if you a) load more than 5 sheets into the feeder or b) use any sort of paper that has been handled by human beings."

      If find that this is true for any HP feed scanner or printer that doesn't cost >800 dollars.

    3. Re:HP Scanjet 5550c is not what you want by Anonymous Coward · · Score: 0

      If you set it to scan as a PDF file, it will create a file for each and every page it will scan. (Really NOT the way to go.) The only way to make it do a multi-pages pdf file is to import everything in Paperport and then export to pdf.

      Also, instead of placing the full sheet on the scanning glass then scan, that scanner rolls the sheet in front of the scanning head (like old-style fax machines). The result is that you never get straight lines from the autofeeder.

      Conclusion: Get something else to do the job.

  22. DjVu by alienw · · Score: 3, Informative

    Acrobat sucks ass for bitmap images. It doesn't display them very well, they don't print out well, and the files are huge. DjVu is a new image format that compresses extremely well (a few kilobytes a page -- actually comparable to ASCII text). It's somewhat proprietary, but it's probably the best solution here. There are free web-based services that can compress your images. You can try some of them and see for yourself.

    1. Re:DjVu by Ed+Avis · · Score: 4, Informative

      For scanned documents, tic98 compresses even better than DjVu. It's free software and you can even read the author's PhD thesis about it.

      --
      -- Ed Avis ed@membled.com
    2. Re:DjVu by mystik · · Score: 2, Informative

      I haven't tried tic98 (mentioned lower in this thread) but I can vouch for DjVu. I routinely scan notices, bills and whatnot mailed to me, then destroy them (rather than maintain a large paper file)

      300DPI Black & White scans take about 19kb. They are quite readable, and with 300DPI information, make pretty good printouts.

      --
      Why aren't you encrypting your e-mail?
    3. Re:DjVu by LightForce3 · · Score: 1

      An alternative file format may be a possibility, but if these files are going to be distributed to students, you may want to stick with PDF.

      Also, **good results are possible** with PDF, you just have to know what settings to use and tweak. I scan textbooks for the Academic Support office at a university. The settings I use are 300dpi 1-bit black and white with adaptive compression at highest quality, and I can fit about 30 pages in a few MB.

      ~~LF

    4. Re:DjVu by jskiff · · Score: 2, Informative

      Disclaimer: I work for the company the sells the commercial version of DjVu, LizardTech

      DjVu is licensed from AT&T labs, and has both a commercial component and an open source component called DjVuLibre. The technology works by analyzing documents, particularly scanned color documents, for hard edges. Hard edges typically indicate text, while smooth, continuous tones indicate background images. DjVu then "segments" the two types of imagery on the page into different layers and compresses them using different formats for optimal compression and quality.

      Okay, enough marketing. While it does have some warts, it's a pretty cool technology to work with. That, of course, and I'm happy to have any job these days.

      --
      It's "no one," not "noone." Who the hell is noone anyway?
    5. Re:DjVu by verbovet · · Score: 1

      Was true 5 years ago, not now. cjb2 from the GPLed djvulibre produces smaller lossless djvu.

    6. Re:DjVu by renzema · · Score: 1

      Acrobat actually works quite well for text-based bitmaps. Simply run a Paper Capture (under the document menu). It effectively does a text recognition on the document then only stores the text for the sections that it can recognize. Makes the documents nice and small.

    7. Re:DjVu by alienw · · Score: 1

      That doesn't work with handwritten notes.

  23. Do what I do by Anonymous Coward · · Score: 0

    Wank off for a bit

  24. How about a tackling it differently by Anonymous Coward · · Score: 0

    At my uni the course-capture guys started with the scanning approach. OFcourse it didn't work out since it is impossible.

    Eventually they rigged up a system of sticking a cheap video camera in the class, and giving the prof a chalkboard capable of printing whatever he wrote on it. That would just get converted to a PDF (no conversion or OCR), and the taped course convered to an MPEG1.

    If you can't handle the load, look at alternative solutions like the above. YMMV.

  25. Fax machine by markprus · · Score: 5, Interesting

    Just fax the documents to a computer.

    1. Re:Fax machine by Anonymous Coward · · Score: 2, Informative

      Don't rely on faxing.

      Fax on a computer is actually TIFFg3 format, created by Sam Leffler whose name is on all the old BSD UNIX copyrights. It's a wonderful tool, and a wonderful format, and easily transformed to other needed formats by the various tools in the TIFF library still published by Sam, but faxes provide only 196 dots per inch at "fine" resolution, and that's usually not good enough for documents and hand-written notes.

      A good flatbed scanner is really your friend, for price and performance.

    2. Re:Fax machine by foltzwerk · · Score: 1

      Very creative way to get your pages into a Multi-page TIFF file! However, without a large and reliable document feeder this task would be a very boring and long babysitting job.

  26. Mongo Fax by onyx+pi · · Score: 1

    Mongofax it to yourself. Will come to an inbox near you as an email with pdf attachment. No need for a scanner. Works as fast as your fax can chew through your docs.

  27. color simplification? by Anonymous Coward · · Score: 0

    the scanner might be acting like it's scanning a multi-colored photograph when reading in a hand-drawn lecture frame. try to see if the scanner can simpify colors or if your PDF maker could do it. Put another way, instead inputting 64000 possible resultant colors, use 16 or some other low count number, as typical lecture slides (and pens) only use a small number of colors (typically black, red, blue, green, and orange)

  28. Recruit the community by SoSueMe · · Score: 5, Interesting

    Do it the open source way.

    Get several (dozen) other students to use their own equipment and time in echange for a copy/copies of the completed work.

    I would hazard a guess that there are more than a few people who would like to have a copy of the complete series of the lecture outlines.

    1. Re:Recruit the community by pjt33 · · Score: 1

      I rather doubt the lecturers intend to sell those notes to the students. I realise that my experience at a British university may not correspond at all to the experience at an American college, but most of our lecturers just handed notes out on the door at the first lecture. If that's not what the lecturers in question here are planning on doing, one wonders why they don't go to a publisher and get their notes published as a book.

    2. Re:Recruit the community by pecko666 · · Score: 1

      Well, the same was in university where I studied. But for many students printed material is much more readable then handwritten. I personaly took my teacher slides, translated it (it was written in english, which is not my native language), and then TeX-ed it (because of lots of equations and such stuff). At the beginning of the course, every student had photocopy of original slides. At the end, no one used them, they preffered the TeX-ed version. So this 'volunteering' actualy may works, because final document will be much more readable/manageable/searchable .. and probably after agreement of the teacher can be published online.

  29. Easy by JensR · · Score: 5, Interesting

    Get some students of the professor's course to type them into LaTeX. Give them some points they'd otherwise get for homework.
    a) Publication quality DVI/PS/PDF files
    b) The student can deepen their knowledge of the topic
    Everyone happy. Used to work like this at the university I went to. And you may be even lucky that some student typed these notes in for himself.

  30. DjVu format is pretty good for scanned docs. by artemb · · Score: 3, Interesting

    I found that DjVu format produces substantially smaller file than PDF for the same scanned image.

    There is an open-source project http://djvu.sourceforge.net/ that provides code for reading DjVu docs, but I have no idea where to get DjVu encoder.

    1. Re:DjVu format is pretty good for scanned docs. by Anonymous Coward · · Score: 2, Informative

      Sigh... In THE VERY SAME PACKAGE! Is reading documentation contained in tarballs you download really *that* hard?!

  31. froogle says... by ZiggyM · · Score: 1

    I put "continuous feed scanner" in froogle, sorted by price, and found one for arround $400. You can do it 25 pages at a time with this. (Microtek X12USL 2400x1200dpi 42bit).

    1. Re:froogle says... by Anonymous Coward · · Score: 0
      thanks for the link! I appreciate it!

      /sarcasm

  32. PDF is good by Datasage · · Score: 1

    Where i used to work, we digtized 4-5 million documents per month. But these were mostly printed copies.

    We had a set of high-speed sheet fed scanners, it would be then checked, and linked to a database. The documents in most cases where shipped to a vault.

    --
    In America we are imprisoned by our fear of them.
  33. Outsource it by bshroyer · · Score: 0, Troll

    This looks like a job for cheap manual labor. Try India. Or an unpaid intern.

    Don't you dare moderate this as a troll. You know as well as I do that this is probably the only viable solution.

    Bret

    --
    The cure for cancer is coming: Reovirus
    1. Re:Outsource it by cloudmaster · · Score: 4, Insightful

      Maybe he *is* the cheap manual labor / unpaid intern...

  34. If you had a budget.... by nurb432 · · Score: 1

    Get one of those Canon scanner/copier/printer thingies..

    They can scan direct to PDF at an amazing rate of feed using the standard sheet feed.

    Since it has dual purposes, you might con them into one, shared among a couple of departments...

    --
    ---- Booth was a patriot ----
  35. Xerox DocumentCentre by Anonymous Coward · · Score: 0

    We have a 332 ST at work and recently added the scanning software to it; it can export PDF's (image PDF's no OCR stuff) straight to a FTP site. pretty nice. Of course this seems like you'd already have to have a documentCentre to begin with.

  36. God be with you by Anonymous Coward · · Score: 0

    I'm sorry to hear of your trouble. I offer prayers for you and your professors.

  37. where to look by bcrowell · · Score: 2, Insightful
    Have a look at the archives of this mailing list, which is mainly populated by Project Guternberg folks.

    But the broader question is whether this is really a good idea. The result is going to be huge files, which will be messy, hard to read, and will lack an index or table of contents. Seems like a case of profs with too much ego and not enough willingness to put their own work into more useful form.

  38. fax by Anonymous Coward · · Score: 1, Informative
    Many people seem to forget that the cheapest, most common, and most reliable sheet-feed scanner is the old-fasioned fax.

    Use the department funds to sign up an account at interpage.net, which will allow you to fax stuff off to yourself and recieve it as an email attachement. Then use the fax machine in the office to run everything through.

    That takes care of the scanning part; cataloging, organizing, and etc will take a lot more time.

    You may be able to presuade some professors to fax you the stuff themselves, saving you a bit of time.

  39. Scanner recommendation by seanmcelroy · · Score: 1

    We use a Canon DR-3050 at work to do about 5,000 pages/week. It scans at 20 PPM, and you can put in a batch of about 75-100 pages and say 'go' and not worry about it. It's a $4,000 scanner, but it works really well for continuous processing.

    As for formats, if it has handwritten stuff on it, you probably won't be able to OCR it and just store that. PDF image files are a pain, but so are lots of individual TIF's. Your students probably won't have a smart image viewere that can thumb through multiple pages of a multi-image TIF file, but if the prof's can mandate they download a free one somewhere, that'd probably be the way to go... even less proprietary than Adobe's PDF.

    --
    Be very, very careful what you put into that head, because you will never, ever get it out. -Thomas Cardinal Wolsey
  40. Not Uncommon by kannibal_klown · · Score: 1

    It's not as bad as it seems.

    At work, we have several multifunction printers / copiers / faxes / scanners. These things are huge, and take take reams of paper at a time for input, and don't take too long. Besides, it's completely automated (you might just have to import the resulting images into pdf which can be done easily). I've used it in the past to scan in my notes and worksheets the professor's handed out. It makes storage a lot easier.

    Someone already suggested Kinko's. Yes, they might have it. Also, I've seen some smaller copying places in Newark have similar devices. So it's common enough that you can find it easily.

    If a friend or contact doesn't have access to such a device, then I'd suggest paying a copy shop to do it for you. I doubt it would be that expensive, and you can bill the school for it.

    The problem isn't that hard to solve (unless you want to try to do it in your apartment). But it's a good thing to bring up on slashdot, as many people might learn about this in case they need to do it in the future.

  41. Xerox DocuShare by Paladin814 · · Score: 1
    I company I work for has looked at a solution at a corporate solution for this very problem. After much research, we have decided to use Xerox's Docushare solution with flowport.

    Basically you walk over to a Xerox copier with a sheet feeder attached and using a cover sheet created in flowport, scan in your documents into Docushare. They are stored as fairly high quality PDFs. The Docushare software also does an OCR on the files and then makes them text searchable.

    Although not perfect, it is by far the best solution I have seen. It sounds like you do not have the funds to implement this at your school (the price of the Xerox copier and dedicated docushare server) but if you only have a limited number of these documents, then you would not need to have the infrastructure and perhaps Xerox would do this for you. Xerox has many offices in major cities.

  42. Xerox's Flowport by daigu · · Score: 1

    One option would be to use Xerox's Flowport. You would have to check what is available to you locally - but I can tell you that making a PDF with a Xerox copier and Flowport of 100 pages is a few minutes of work.

    Also, try look for others are doing in the university setting.
  43. HPs are cheap on Ebay by fille · · Score: 1

    I just bought a HP ScanJet 6250C with ADF on ebay for 100 euros. I have not tried it yet but it scans all pages in the feeder (25?) after a press on the button. Some multifunctionals (fax, printer and scanner in one thing) have a feeder too and are much cheaper than a scanner with an ADF.

  44. Try making GIFs by PapayaSF · · Score: 2, Informative

    GIFs compress very well, especially with source material that's in limited colors. Try making a page into an 8-color or even 4-color GIF at about 150 dpi. The handwriting should be about as readable as the original.

    Also, if you're scanning material with copy on both sides, you might get some visible bleed-through. Try scanning such pages with a sheet of black paper between the page and the lid of the scanner, then adjust contrast to ensure white whites and black blacks.

    --
    Q: What does the "B." in Benoit B. Mandelbrot stand for? A: Benoit B. Mandelbrot
    1. Re:Try making GIFs by Anonymous Coward · · Score: 0

      Does making a suggestion on Slashdot to make GIFs count as a troll?

  45. Volunteers? by Anonymous Coward · · Score: 1, Interesting

    After you get all of it scanned it and put through OCR, there will still be a ton of mistakes you'll need to correct.

    now, at this point, you'll likely start wishing that you live in Canada (if you already don't).

    The key is in volunteers, to bastardize "1984". Get a number of fairly intelligent high school kids that haven't done thier 40 hours of community service (a graduation requirement).

    Now, make them look at the originals, the scanned, and correct all the discrepancies

    bonus: if the kids are the nerdy types, tell them that they're learning university material for free.

    they could start paying you!

  46. Handwritten!! by sciop101 · · Score: 1

    The only handwritten stuff I saw professors use were in math/statistics classes and math-heavy engineering classes. Survey class professors lecture and test the same stuff every year. Go with October 30's advice.

    --
    The only thing new in this world is the history that you don't know.[Harry Truman]
  47. Digicam by Anonymous Coward · · Score: 0

    Use a digital camera and save as jpg. It's a lot faster than scanning, and the quality is just as good.

    1. Re:Digicam by Anonymous Coward · · Score: 0

      I would hate to be student who had to download the notes that way.

  48. i like my fujitsu scanner... by bbdd · · Score: 1

    i have a fujitsu scanpartner fi-4120c desktop scanner. only offers a page feeder, though, no scan bed, so you will need everything to be loose pages.

    very fast, and will do both sides in one pass, if you are working with double-sided pages. at 200x200 resolution (you might need higher, ymmv) and scanning double sided pages, i get something like 3 seconds per page (counting one double-sided page as two pages). for software i am just using the included scanner driver and twain software and adobe acrobat.

    cdw has it here, i'm sure it can be had for cheaper. i got mine for $800 i think. a little more expensive, but the speed is well worth it in time savings.

    1. Re:i like my fujitsu scanner... by FirstOne · · Score: 1
      "I have a fujitsu scanpartner fi-4120c desktop scanner. only offers a page feeder, though, no scan bed, so you will need everything to be loose pages."

      Along that same line..

      If you have access to a PC, Fujitsu's ScanSnap is somewhat cheaper and will automatically create a single large PDF file in a single pass. It will scan both sides at 150, 200 or 300 dpi resolution in a single pass and the input tray has a capacity for 50 pages. But, you can scan larger documents merely by adding more pages to the front of the input tray while it scanning documents from the back of the input tray. Note: Don't forget to remove pages from output tray periodically.

      I've scanned 500+ page documents in about an hour using that technique.

      Price is $389 after $100 rebate will an get you a turnkey solution for under $300.

      It comes with a full version of Adobe Acrobat 5.0/6.0 and scanned documents will automatically appear as one or more PDF files. User selects number of pages per PDF. You can also fire up the paper capture plug in/function and OCR the scanned images in the PDF.

  49. No good answers AFAIK by John+Miles · · Score: 4, Informative
    I've run into a similar problem, and have no good solutions in the general case. I'm on a mailing list for users and collectors of Tektronix test equipment (oscilloscopes, logic and spectrum analyzers, and so forth). Last year, Tektronix's legal department issued a copyright release that permits the reproduction and distribution of documentation for test equipment that they (Tek) no longer support. This was of great interest to the people on the TekScopes list, because it gave a green light to scanning and trading/selling copies of manuals. I've scanned in a few manuals for some equipment I own, and it's a huge pain in the butt any way you look at it.

    Electronic test-equipment manuals are pretty much worst-case candidates for scanning. In Tek's case, the schematic volumes often consist of hundreds of double-sided, nonstandard-sized foldout sheets (11x23" for example) with lots of fine detail that must be reproduced clearly. You can either scan the pages in segments and leave it to the reader to reassemble them, or you can take the manuals to Kinko's and have the foldout pages shrunk to 11x17" or 8.5x11" for scanning. Either way, it's a real hassle, and highlights a clear need for a "prosumer" duplex sheet-feed scanner solution.

    A few years ago you could buy scanners like this one that could handle arbitrary sheet sizes, but I haven't seen them in stores lately. These may be easier to use than flatbed scanners, assuming the precision they offer is sufficient for your application. I don't know how well they'd work on densely-printed schematics.

    Other than bitching about the state of the scanner marketplace, I don't have much to suggest. There are a few hints that will improve the quality and usability of your final document:
    • There are other formats, like DjVu, that have certain advantages over .PDF, but think carefully before using them. Will you be able to read your files 10, 20 years from now? In .PDF's case, the answer is an unequivocal 'yes' because of widespread government, military, and commercial standardization around it. I hate to see people spend hours scanning manuals in DjVu or another nonstandard format, because I'm 95% sure I won't be able to read them years down the road on a completely different platform.
    • To make the document searchable, use an OCR package like FineReader if possible... but expect to spend even more time babysitting the process.
    • Experiment with your scanner resolution settings to minimize the resulting .PDF file size. There's a big difference in size between 200 dpi and 300 dpi, and between a B&W and color scan.
    • For some mysterious, forehead-slapping reason, flatbed scanners often use glossy-white backing material in the lid. This encourages bleedthrough of text on the reverse side of double-sided material, making your scanned documents look sloppy and compress poorly. Placing a sheet of black paper, plastic, or cardboard material between your document and the scanner lid will make a big difference.
    --
    Dahlmann tightly grips the knife, which he may have no idea how to use, and steps out into the plain.
    1. Re:No good answers AFAIK by deranged+unix+nut · · Score: 3, Informative

      Since purchasing a Canon G2 (4 megapixel) digital camera, I have discovered that it works pretty well for producing readable quality duplications of 8.5"x11" sheets of paper and whiteboard notes.

      This camera can be controlled programatically. Automation would be needed to make it practical for a large scale, but it is much quicker than most flat-bed scanners and the quality would be okay for hand-written notes. It would be easy to take multiple overlapping pictures and leave it to software to re-assemble the images.

      (Yes, it is a goofy solution, but I works well for me as I normally have my camera handy.)

    2. Re:No good answers AFAIK by jensend · · Score: 2, Informative

      Bunk. DjVu has an open-source implementation and well-documented specs. It will thus be readable no matter what happens to LizardTech. Similarly, the main reason PDF can be counted on to be readable in the distant future is not its installed user base (that changes quickly enough to be fairly well negated as an advantage over the 10-20 year timespan you suggest), but rather that it is an open format.

      DjVu is probably the best format for the poster's needs. I had a university class where nothing was ever handed out to students in hard copy and documents were instead posted on the web; .doc was used for the kind of documents PDF is good for, while PDF was used for scanned-in (but not OCR'd) articles and so forth. This was a nightmare; the PDFs were absolutely huge, and just scrolling through them would bring a >1ghz computer to its knees. It would even have been better to use uncompressed TIFFs.

    3. Re:No good answers AFAIK by John+Miles · · Score: 1

      DjVu has an open-source implementation [sf.net] and well-documented specs.

      Peachy. So did that really cool Hangman game I wrote in 9th-grade computer class. But that doesn't mean I could put my hands on it now.

      Believe me, 20 years from now, you are not going to appreciate saving a measly gigabyte or two by using DjVu instead of .PDF. Your daughter's nose ring will have more mass storage than that. You may, however, be willing to sell your soul to Bill Gates's kids for a DjVu reader that works on Windows 2020 Maxi-DRM Edition.

      DjVu is probably the best format for the poster's needs.

      You could very well be right; I'm not saying it's a bad format, or that it won't do the job. DjVu is certainly a lot nicer to browse onscreen than .PDF. The obsolescence issue may not be a concern for the original poster, given that the course notes will probably be obsolete a couple of semesters from now.

      It all comes down to picking the right tool for the job. Still, I felt it was appropriate to offer a counterpoint to all the DjVu cheerleading that's going on. The same debate has come up on the Tek and Agilent mailing lists before, and for those applications, it should be .PDF or nothing, because people will need to support that equipment for (human) generations to come.

      --
      Dahlmann tightly grips the knife, which he may have no idea how to use, and steps out into the plain.
    4. Re:No good answers AFAIK by Anonymous Coward · · Score: 0

      Hey, that's great! I've been thinking of getting a digital camera instead of a scanner but the only thing I've been wary about is the scanning time... Can you tell how fast your set-up is?

    5. Re:No good answers AFAIK by deranged+unix+nut · · Score: 1

      It takes about 1 second for the camera to "boot up" after I turn it on and take the lens cap off, then it takes about 1 second to focus and take a picture.

      Additional pictures are between 1/4 second and 1 second depending on focus time.

      It works well enough that I can even take readable pictures (after a couple tries) of transit passes for our vanpool reporting while sitting in the van as I zips down the freeway. :)

    6. Re:No good answers AFAIK by Anonymous Coward · · Score: 0

      I got so fed up in college at being ripped off by the textbook racket (So sorry, this year we're using the 19th edition..) that I began photocopying my books and returning them to the bookstore. After I got my digital camera, I'd buy my books, setup a tripod and capture software, and spend a couple of hours turning pages and pressing the spacebar. Print to pdf and return the books. Not entirely honest? Perhaps, but when the campus bookstore told me that they'd only buy back my software engineering text for $7 because they weren't going to be used the next semester and I found *my book* back on the shelf being sold for $75 I snapped. Thus began my personal war with the bookstore and the college textbook industry.

    7. Re:No good answers AFAIK by deranged+unix+nut · · Score: 1

      While I don't encourage theft of copyrighted material (as copyrights are the only thing that makes my industry viable), most university libraries do keep at least one copy of the current textbooks on hand, and I remember one case mentioned on slashdot where the differences in the "new" version of the textbook were that the 1000 words of introduction was replaced with a picture of the author. If/when I go back to school, I will do my homework before I purchase any textbooks.

    8. Re:No good answers AFAIK by Anonymous Coward · · Score: 0

      "most university libraries do keep at least one copy of the current textbooks on hand"

      I don't know about that. My university didn't, and if you consider the cost of replacing the library's copy of every $100 textbook every couple of years when they release a new edition you can see why it would be cost prohibitive.

      I'm convinced that the textbook industry is a racket. Seriously, how is a calculus book published this year going to be any better at teaching you calculus than any of the calc books from the last 20 years? The textbook publishers can create artificial market leverage in the university setting. If you have a class that requires you to do problems from the text like mathematics or engineering, you're not going to be able to use a past edition. I've had classes where the required text was written by the instructor, and I've had friends complain that their professor would require the latest edition (only available new at full price) by specifically assigning coursework including the problems changed from the previous edition.

      If I had known about half.com when I was still in school, there probably would have been a few more texts that I would have purchased instead of copying. What I object to are the overly high prices and blatant market manipulation. It becomes a matter of economics too. It's not worth my time to copy a 1000 page text if it only costs me $40-50.

    9. Re:No good answers AFAIK by Anonymous Coward · · Score: 0

      OK, thanks a lot! I guess it's off to the camera store I go...

  50. Outsourcing by Anonymous Coward · · Score: 0

    Just pay some little kids (younger siblings?) like 3 bucks each to type it up.

  51. Gotta be careful though. by Faust7 · · Score: 5, Funny

    Outsource the job to India

    "No, no, not my entire job, just this one part. No, I can do the rest. No, really. No! No... please..."

  52. Digital Copy machine by Doppler00 · · Score: 1

    I don't understand why, but most people don't realize that most new copy machines are also PRINTERS and DIGITAL SCANNERS. I always find it funny when companies purchase fax machines/scanners/copy machine/printers when they really only need one device.

    If you can find access to a digital copier at your university somewhere, you can just put the whole stack of paper in the sheet feed and it should be able to scan every page double sided and put it on a network drive somewhere.

    It might take awhile to figure out how to set this up, but it's infinitely easier than trying to scan each page by hand using a crummy consumer scanner.

  53. Use your students! by Anonymous Coward · · Score: 0

    Don't put messy handwritten notes on the web. Its very unprofessional and looks rubbish. Ask for student volunteers to transcribe the notes into latex, then use html/pdf conversions for the web.

    It'll take longer, but it will be worth the effort, especially when it comes to maintaining the notes in the future.

  54. Solution --- New job! by Anonymous Coward · · Score: 0

    'nuff said ;-)

  55. Try using a camera. by Anonymous Coward · · Score: 0

    At your budget, I'd get a digital camera (Nikon Coolpix on e-bay, for example), shoot the pages, and put the pages together with acrobat as pictures. Spies can shoot at speed, and I expect 3 secs/page might be a realistic guess.

  56. Large Scale Paper to Digital Conversion by felila · · Score: 4, Informative

    I do conversion for fun, at Distributed Proofreaders.

    The problem is the mixture of graphics, equations, and text.

    It's easy enough to turn a page of text into a smallish file. Get a good automatic-feed scanner ($3500 or so) and a copy of ABBYY OCR software. If the original isn't too speckly, tiny, or smudged, ABBYY will give you a 95% accurate text you can then correct. Best format to save in? Depends on what the school is going to do the files. If they're to be posted on web sites, perhaps XHTML. If it's just for preservation, plain text (if there's no Greek characters) or XML with UTF-8.

    Equations -- well, there's supposedly a version of XML for math, but Distributed Proofreaders has ended up using TeX, as it seems to be the mathematical standard. While this would work for preservation, it wouldn't work for a web site.

    For a web site, perhaps the best way would be to intersperse text with pngs of the equations and graphics. The pngs would still take a lot more space than text, but the files would be smaller than PDF versions of the whole page.

    1. Re:Large Scale Paper to Digital Conversion by mrchaotica · · Score: 1

      There are programs that can easily convert TeX equations to PNGs.

      I would save the archival version as TeX, and just run it through a TeXtoHTML (TeXtoPNG for equations) kind of program if you want a copy for a website.

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

  57. One solution by jjohnson · · Score: 1

    At work, I set up a document scanning function for our BAR system (Business Approval Request)--everything that's submitted must include documentation, which is often a paper quote or invoice.

    We bought an HP Scanjet with sheet feeder for about $200 (sorry, don't remember the exact model), and use Paperport to scan the documents to a network folder named for the person requesting the scan (the executive assistant does it). We save in 300 dpi TIFF files in 1 bit color (B+W), which are small (8.5" x 11" comes out around 50K), and extremely clear and legible, and can be printed out again at almost the same quality. The scanning is pretty fast, and it includes batches. The only slow part is that PaperPort (which comes with the scanner) scans to MAX files, which need to be saved as TIFFs.

    --
    Anyone who loves or hates any language, platform, or manufacturer, doesn't know what they're talking about.
  58. A Fujitsu scanner, SANE and Quartz Python bindings by sabi · · Score: 5, Informative
    Such as the fi-4120c is what I'd recommend. You might have to stretch your budget a bit. The cheap HP sheet feeders are very unreliable; we went through two HP 5550c's enduring constant paper jams before switching to a better (Fujitsu) scanner.

    Unfortunately you don't have much use for something like Acrobat Capture because you have handwritten notes to deal with. To process the files, SANE and/or TWAIN interfaces are reasonably easy to write code for. The cool thing about SANE is that you can run the saned daemon on any Mac or Linux box, and with a couple of lines of config file changes, it's instantly available over the network from any Mac, Windows, or Unix box (there are TWAIN bridges for Mac/Windows so it even shows up in Photoshop and so forth); there are also standalone GUI clients like XSane.

    I wrote a document management system in Python/wxWidgets (for Windows) in about a month part-time, and it works very well. Either on Mac or Windows, PDF makes sense because of the ubiquity of the viewers, even if you lose a bit in compression compared to more optimized formats such as DjVu. On Windows you can easily embed the Acrobat ActiveX control; on Mac OS X you have native PDF support, Panther's Preview kicks ass, and there are several open-source PDF browsing components such as the ones out of TeXShop or Glen Low's Graphviz port you can embed in your own app.

    Given a choice I would probably pick the Mac to do this project, because of the wonderful Quartz/CoreGraphics Python bindings. You can just draw right to PDF, and place PDF files as if they were images; for example, here's a short script to rotate a bunch of PDF files (sorry, Slashdot destroys Python indentation):

    #!/usr/bin/python

    from CoreGraphics import *
    import math, sys

    for inputPDFPath in sys.argv[1:]:
    inputProvider = CGDataProviderCreateWithFilename(inputPDFPath)
    &n bsp; inputPDF = CGPDFDocumentCreateWithProvider(inputProvider)
    &n bsp; if inputPDF is None:
    print >> sys.stderr, \
    "unable to open '%s': perhaps is not a PDF file?" % inputPDFPath
    continue
    outputContext = CGPDFContextCreateWithFilename(
    inputPDFPath + '-rotated.pdf', None)

    for pageNumber in xrange(1, inputPDF.getNumberOfPages() + 1):
    mediaBox = inputPDF.getMediaBox(pageNumber)
    rotatedBox = CGRectMake(0, 0, mediaBox.getMaxY(), mediaBox.getMaxX())
    outputContext.beginPage(rotatedBox)
    outputContext.saveGState()
    outputContext.translateCTM(0, rotatedBox.size.height)
    outputContext.rotateCTM(-math.pi/2)
    outputContext.drawPDFDocument(mediaBox, inputPDF, pageNumber)
    outputContext.restoreGState()
    outputContext.endPage()
    outputContext.finish()
    You could also use ReportLab, but because a lot of the PDF processing code is written in Python it's somewhat slower and memory-hogging for high-volume use. (I used ReportLab on Windows for the above project, and use CoreGraphics Python bindings for my research, so I do know what I'm talking about mostly :)
  59. Eww. by Anonymous Coward · · Score: 0

    Chances are she's a plump, old, matronly, bespectacled hausfrau.

    1. Re:Eww. by Hatta · · Score: 1

      Dunno man, couple of the secretarys in my department are real foxes.

      --
      Give me Classic Slashdot or give me death!
  60. There Are Ways Other Than Outsourcing by Cprossu · · Score: 1

    but not for $200 you can get a Canon DR-2080C off of ebay for $630 and it can accept both usb2 and scsi-II interfaces

  61. Try a panasonic by big+tex · · Score: 1

    The HP one you picked looked ok, but feeder looks a little chitsy.

    We have a panasonic at work, and use it to scan in design packages. it's something like the model KV-S7065C Don't be fooled by the 'low volume' tag - we routinely make 100 page pdf's out it (high volume = insurance office), even though it will take a few min. Thing works great. Highly reccomended. The panasonic comes with software that allows you to save all as a single file, break into xxx page long files (where you get to pick xxx), and many other features.
    My favorite is that it makes it easy to create pdf's with changes in page size / resolution. Our packages are mostly design calcs (8.5x11, 300dpi) with a few drawings (11x17, 600dpi), and it works slick.

    We used to send out ~5-10 fedex packages a week, but now we just scan and email. Saves so much money, time, and they can get packages right away.

    A good way to keep down on the cost is to get a B&W scanner - you probably don't need color anyway, and it keeps the file size way down.

    --
    I think I need a new sig here.
  62. digital camera? by Rob+Bos · · Score: 1

    For that price, a digital camera on a fixed mount might be easier than a scanner. Lay out the sheet, take a shot, lather, rinse, repeat. Generate a PDF using imagemagick/ghostview.

  63. My dad's office by pavera · · Score: 5, Informative

    My father is an attorney,
    he has a couple of high speed scanners from panasonic. They cost less than a thousand dollars (4-500) if I remember correctly, they scan at about 20 ppm, and the software that came with them will save each scanned group of pages as a separate document (pdf, tif, whatever). My dad uses this setup to scan all of the files that his cases generate (shrinking his document storage from about 1000 sq ft to 2 shelves in a bookcase). we are talking files that consist of 10,000+ pages, and normally he saves a years worth of cases on 3-4 cds. They can scan up to 500 pages at a time.
    Here is a link:
    High Speed Scanners

    1. Re:My dad's office by um3k · · Score: 1

      normally he saves a years worth of cases on 3-4 cds.

      I wouldn't put too much confidence in those CD-Rs lasting as long as you think.
      I hope your father is testing those CD-Rs regularly for signs of degradation or failure!

    2. Re:My dad's office by Anonymous Coward · · Score: 0

      Dad the Lawyer is also my solution.

      Is it just me or must there be a printer scanner combination somewhere that is actually decent?

      I can just imagine aliens landing here and going "holy shit their copy process sux0rz"

      Gutenburg is probably spinning and spinning and spinning.

    3. Re:My dad's office by just+some+computer+j · · Score: 1

      I work for a litigation support company, i.e. scanning and copying legal stuff. I would call a litigation support company and see what they can do for you. I would suggest someone other than Icon or OmniDocs, because they suck ass and don't do a very good job. A mom and pop kind of place would do a great job, and quality control the scanning to see if all the pages were scanned in and in proper order. And depending on how many pages you have, you could get all of it back on CD in a day or two, making you look like a god. Try that, and if you need suggestions for a good company to help you out let me know.

      --
      eh, this sucks, I am going back to bed....
    4. Re:My dad's office by PhunkyOne · · Score: 1

      I don't see a single one of these that's under a couple thousand on froogle - most are 5-6k.

    5. Re:My dad's office by beakburke · · Score: 1

      I would assume that the cases stay on a HD too.

      --
      ----- Question authority, but not ours. Hate the man, but we're not him.
    6. Re:My dad's office by pavera · · Score: 1

      They stay on HD for a year, CD for 5-7, then they are gone.

    7. Re:My dad's office by pavera · · Score: 1

      It wasn't the best link sorry,
      my dad has the s2025c which you can get for around $500, I found two places on froogle that have it for $700 and I didn't even search hard.

      The s2026c can be had for around $700 (retail is $900 from what I've seen)

    8. Re:My dad's office by um3k · · Score: 1

      There are many documented cases of CDs being unreadable after just 1-2 years...

  64. Outsource it and save yourself the trouble by BiteMyShinyMetalAss · · Score: 1

    I worked on a similar project in the past, where I had to PDF a lot of paper-based documents.

    A nice ADF scanner will save your sanity. We had a newer ScanJet, resembling the 5550c, where you couldn't feed too much at once, and it would jam up, We later got a hold of an older HP Network ScanJet that worked like a champ. If I could remember the model numbers, I'd give them to you. :(

    That said, from the sounds of your situation, outsourcing would be the best solution. They already have the high-end scanners, they high-end software to work with your documents, (i.e. Acrobat Capture) and all you'll have to worry about is giving them the documents, and picking up the CDs with the PDFs on them. I don't remember what it cost us, but I'd wager that the overall value was superior.

    Good luck!

  65. You need to Think out-side-of-the-PDF-box by The+Time+Keeper · · Score: 1

    Don't waste your money buying a scanner. Teach the professor to use M$ Power Point or OO Impress. Those slide can easily be web published.

  66. Use a fax service by nev4 · · Score: 1

    Fax the documents to something like an efax account. Most Universities have a heavy-duty fax machine lying around somewhere. Or you could just give it to the secretary and say hey, "The prof. asked me to give you these, fax them to this number." Then, in comes your fax already converted to an electrnoic format. Most of the free fax services only allow you to receive a few faxes per month, but you could always just sign-up for one of the better ones and then cancel.

  67. All you can do... by cliffiecee · · Score: 5, Insightful

    Is say "Sure. I'll get this done- when I can. Don't expect it to be done for at least a few weeks, maybe longer."

    DON'T CLEAN UP THE SCANS. Don't even look at the scans. DO NOT RETYPE ANYTHING.

    With the kind of volume you say you're receiving, the only way you're going to survive is to:

    1. close your eyes,
    2. load the documents into the feeder,
    3. press 'scan'.
    4. Make sure everyone knows this policy.

  68. Not for now but for later by schotty · · Score: 1

    Get the Dr. to use a pc in the firt place. That way all you need to do is clean up the material. Sketches, flowcharts, and the kin can be entered using the appropriate tool.

    --
    Sigs are nice guns ...
  69. Xerox makes a great product... by val1s · · Score: 1

    http://www.xerox.com/go/xrx/equipment/product_deta ils.jsp?Xcntry=USA&Xlang=en_US&prodID=DigiPath&cat =Product+Taxonomy%2fProduction+Workflow%2fFreeFlow +Digital+Workflow
    While i'm not suggesting you buy it, but find a local service provider that has one. If your school is large enough they may have something like this already. These are they type of scanners that drive xerox's 120-180ppm printers, they are lightning quick 60 double sided pages a minute, and surprisingly good quality.

  70. It's not a technology problem it's a problem of by Bob+Bitchen · · Score: 2, Insightful

    poorly set expectations. How did the professors get the idea that it was possible? It's not pssobile under the contraints that you are faced with. If money was not a limiting factor you could do this. But I'll assume money is a factor and time as well. So go back and tell them that it's possible but it's going to cost this much to automate the process and this much if I type it in by hand and this much if someone else does it but with poorer accuracy and so on and so forth. Put the burden on them to decide how they want to deal with this. Only then will the appropriate solution be found and chosen.

    --
    http://tinyurl.com/3t236
  71. USG has this problem in Iraq by Anonymous Coward · · Score: 0

    I've heard the US military has this problem with the millions of documents taken from various government offices in Iraq. there's no easy way to get the information from them except to use fast scanners to put them into pdf. then, you just have to hope they can find a translator to look at a random document and hope it has some valuable info in it.

    if someone came up with a google-like crawling engine that would OCR all the pages and put them into a searchable database, it would make their jobs a whole lot easier.

    another poster suggested outsourcing to india. but if someone in the US developed the above-mentioned product, the USG would probably pay the 1000% premium to avoid outsourcing, and you'd be rich. ah, if only i were a programmer...

  72. HP9200C? by MrChuck · · Score: 2, Informative
    We have one at work. You put in a pile of papers, tell it "go" and it emails a PDF of each to you. I've been struggling without a manual to reconfigure it a bit.

    Cheap? Dunno. It was just there. In any sort of volume though, the cost drops precipitously (cheaper that you doing a flatbed scanner!).

    Check out something like that (or indeed that) used, use it, resell it. Or new, then use/resell. Or get the school to buy it.

    If this is a continuous thing, then all the better to own.

  73. Visioneer OneTouch 8650 by Anonymous Coward · · Score: 0

    I just bought this one for work and it seems nice, so far. It's relatively inexpensive (around $200) and has a 50 sheet capacity document feeder. Of course, if you want newer and faster (and 2.5 times the price), you can always go for the 9450 PDF model, which as the name suggests can export directly to PDF files.

  74. What I Use for a Similar Task by LightForce3 · · Score: 1

    I work at the Academic Support office at a university. Much of what I do is scanning textbooks for visually impaired students, and I've recently started using Adobe Acrobat 6.0 Standard for some books. After a semi-scientific study, I found that scanning in black and white (that's 1-bit pure B&W, not 8-bit grayscale or whatever) and using Acrobat's adaptive compression gives good results with a small file size. Of course, this is usually with printed text, so YMMV.

    The scanner I use is an HP ScanJet 7400C, and while the scanner is OK, the software has some major flaws that require workarounds. However, this is a fairly old scanner with old software (last updated in 2001, I think), so more recent versions may be improved.

    Someone else suggested a high speed scanner from Fujitsu. I don't have any experience with these, but in addition to being very fast, they are very expensive and may require you to buy additional hardware (some of them use a SCSI interface instead of USB).

    I'd suggest spending the money you have on a mid- or high-end consumer scanner with a good Automatic Document Feeder.

    If you've got more questions, I'd be happy to answer them as best I can. Feel free to reply here or send me an email. If you do email, be sure to put "Slashdot" in the subject line.

    ~~LF

  75. Do you work at Kent State? by NevarMore · · Score: 2, Interesting

    Kent State just announced thier FlashNotes website. I go to school there, email me at fiveonethree@yahoo.com I would be more than happy to come down and help you sort out your options.

    A bit of opinion on the project. This is not a good idea. Its one more tool that studnets will rely on to memorize information isntead of taking time ti THINK about thier subjects and really LEARN the material.

  76. Canon DR-2080C by Anonymous Coward · · Score: 0
    I have a Canon Dr-2080C which works great.

    It takes up to 50 pages at a time, scans both sides in one pass at up to 20 sheets per minute, and produces a PDF file containing the original graphic image of the page plus (optionally) a OCR'ed text version for doing searches.

    You can set it up to start scanning whenever pages are added, and then just refill it as you're doing other things (I've scanned in several books this way).

  77. ScanSoft OmniPage by New+Folder · · Score: 1
    ScanSoft OmniPage is supposed to handle the OCR component of PDF creation. I haven't tried it out, but it seems to be targetted at this problem.

    /not affiliated with ScanSoft -- just trying to help bring an end to PDF's full of rasterized text

  78. *Large*-scale conversion by Anonymous Coward · · Score: 0

    100 pages is really not that much.

    There are entire companies in the middle-east with hundreds of employees typing in or scanning paper documents for REAL large scale conversion jobs like 10000+ pages. The employees are paid about $0.10 per hour. I read an article about such a company once and they mentioned Lockheed-Martin as one of their costumers apparently they had a *huge* amount of specs on paper that they needed to digitize.

    Just my 12 minutes...

  79. Though Adobe sucks, by Hero+Zzyzzx · · Score: 1

    Acrobat Capture 3.0 is the way to go. I think Adobe makes really, really crappy software in the Acrobat line of products, but Capture gets the job done, and can create pretty small PDFs if you get the settings right. The other OCR'ing PDF creation alternatives, when you're creating hundreds of pages, are MUCH more expensive.

    You can't do this project practically for a couple hundred dollars: You need a duplexing auto-feed scanner and those are not cheap. For project I manage, we knew we were going to need to turn tens of thousands of pages of paper into PDF and we dropped the bomb on a Ricoh 450de duplexing document scanner: it does 55 pages per minute, both sides. This scanner has been trouble free and heavily used for 4 years, I cannot recommend it highly enough.

    You really don't want to do this on a scanner without an autofeeder, and if documents change from single to double sided you REALLY don't want to scan them on a simplex scanner: it'll take you forever, and be more error-prone in terms of screwing up the order of or omitting pages.

    You have a few options, assuming you can't get the scratch to get the stuff to do this project right in-house:

    1. Say no. It's not practical without the right equiment.
    2. Outsource it. There are companies that do paper-to-pdf conversion, but they are not cheap.
    3. Bite the bullet and do it on a crappy scanner. You WILL hate yourself if you go this route, though, never mind buying (or gluing together with a scripting language) the right software.
  80. Visioneer Paperport + document feeder by Anonymous Coward · · Score: 0

    We have a modified need for the same thing -- scanning meeting notes, customer diagrams, and so on, all to PDF. The documents are usually 5-10 pages. We've been happy with a Visioneer Paperport 9450, which comes with a document feeder ($500 including a full copy of Acrobat).

  81. mods on crack by Anonymous Coward · · Score: 0

    This is a troll but a previous post is considered +5 teh fny?!?!?

  82. What about usability? by Anonymous Coward · · Score: 1, Insightful

    Why do it all one way? It sounds like a very great deal of stuff that may never be used by students. Why not try to find a prof who will cooperate with letting you see his/her webpage usage patterns?

    In my experience, it is very hard to predict what students will use for any given class based on the moronic ramblings of /.ers claiming to represent all students. On the other hand, by trying different things in my classes, I've been able to find out what my students will use eagerly. Hint: It ain't the same type of thing for every class!!!!!!

    I'd like to say that you're at a really shitty university that would take this kind of student-hostile course of action, but then, I checked out MIT's Open Courseware only to find that the first course I looked at, Gilbert Strang's linear algebra, was a botch job. There was a postage-stamp-sized video of Strang telling anecdotes on the first day of class that could only be appreciated by someone who'd already taken the class. So much for leveraging the web's inherent strong points!

  83. Latex and lots of typing by Anonymous Coward · · Score: 0

    Write it all into latex. You'll be happy afterwards.

    Messy text doesn't go well with OCR. You might get some of it through, but you still have to proofread it thoroughly and you may miss some stupid lookalikes.

    Diagrams probably require drawing them by hand. Either plot the diagrams with matlab/octave or approximate them with bezier curves with your favorite figure-drawing editor.

  84. How about.. by InternationalCow · · Score: 1

    Having the professors scan their own shit? Where I work, there's no way a professor would ever consider asking such an impossible thing. They would either scan it themselves or have their secretaries do it (FYI, every department in our hospital has one or more flatbed scanners including some automated ones). I mean, this is real donkey work for which you are likely to be too highly trained and too expensive. Again - no way.

    --
    ----- One learns to itch where one can scratch.
    1. Re:How about.. by DrDebug · · Score: 1

      I agree. This is work that should be done by each professor; or at least a graduate assistant. Dumping work like this on the author requires (a) a lot of the authors time, and (b) equipment he doesn't have.

      If I were the author, I would just scan the document in and post it as the scanner found it. The author is under no obligation to correct the papers errors. The author may not even have the expertise to correct the handwritten errors.

      Just my humble opinion.

  85. Acrobat Capture by Wesley+Felter · · Score: 1

    I'm surprised no one has mentioned Acrobat Capture, which is designed for exactly your scenario. The JBIG2 plugin can make really small PDFs from scanned documents. The downside is that it's not cheap.

  86. The Tech exists, check w/ Xerox by RedLeg · · Score: 1

    I number of years ago, the LARGE (read beltway bandit) contracting firm I was working for landed a private contract with a major insurance firm. Said firm had been NAILed in a class action lawsuit, and as part of the resultant consent decree, had to digitize ALL of it's paper policies and contracts, going back years. This averaged over 90 front-and-back pages per customer, and there were millions of customers. They had (originally) about 90 days to get it done.

    This insurance company custom built several scan assembly lines, which used automated (Xerox IIRC) scanners and document handlers, as well as lots of custom software (that we customized).

    This was more than seven years ago, so I would be suprised if the core technology isn't available at Kinkos or maybe even somewhere within your own university. Ask around, and if it's not there, call the local Xerox rep and ask to have one of the devices out on a demo. Whether the uni buys one or not, you can probably get YOUR work done, and make YOUR prof happy.

  87. Some photocopiers support this by adamsc · · Score: 4, Informative

    Check whether any of the photocopiers around campus support scanning: we have a Canon ImageRunner in one of the labs which I support. It's extremely fast - ~1 second per page for a double-sided scan and the feeder is pretty robust - we have grad students who take handwritten lecture notes for an entire class and dump this stack of a couple hundred crumpled pages into the feeder and end up with a PDF a couple minutes later.

    1. Re:Some photocopiers support this by Rude+Turnip · · Score: 3, Interesting

      We have one of these in our office and they're great for taking stacks of workpapers from clients, scanning them in and getting rid of the originals. You can email a PDF directly to someone, or store the PDF on a server somewhere.

  88. Digital Copiers by Geronimo6260 · · Score: 1

    Somewhere in the department (or at least in the university) there must be a high capacity digital copier. If it isn't already installed you may have to beg to get the scanner support enabled but copiers are built to work with huge quantites of paper and they'll turn your outlines into PDFs and either email them to you or save them somewhere for you to receive in a matter of minutes.

  89. Comic Book Reader by astrashe · · Score: 1

    If you can get away with it (which seems unlikely), you should just make JPGs of the pages and put them in .cbr files.

    It would be much easier than scanning or typing the stuff up, and there's a good free viewer for windows.

    I'm amazed that you can OCR handwritten pages at all -- that's incredible. I had no idea the technology was that good.

    1. Re:Comic Book Reader by LightForce3 · · Score: 1

      I'm amazed that you can OCR handwritten pages at all -- that's incredible. I had no idea the technology was that good.

      Well, you can OCR anything, including your neighbor's dog, but that doesn't mean an OCR engine will produce anything intelligible. ;)

      Seriously, I think in this case OCR would be a waste of time. Even the really good OCR engines (like ABBYY Fine Reader, which I use at work, supposedly very good) have significant problems with handwritten text and will produce lots of errors, making the ability to search the document worthless. Also, to my knowledge, there are no OCR engines that correctly handle formulas and symbols.

  90. Comment removed by account_deleted · · Score: 3, Insightful

    Comment removed based on user account deletion

  91. Works for me by sglow · · Score: 5, Interesting

    I tend to scan lots of documents and setup a simple perl script that uses the 'scanimage' command line tool to do the scanning. Using my Epson Perfection 1650 scanner (pretty standard flatbed scanner) I can scan an 8"x10" page in black & white mode in about 10 seconds.

    I actually added a button to the Nautilus GUI shell so I can move to the directory I want and hit the button to scan a page to that directory. Very convenient.

    I scan to tiff and then use the convert utility (part of imagemagick) to convert to png. The resulting files typically run about 100K to 200K depending on the content.

    If anyone's interested in seeing the perl script I've posted it to: www.ollies.net/scanscript.html

    Steve

    1. Re:Works for me by dhammabum · · Score: 1

      This is quite right - the guy doesn't need PDF where a simple image will do. There is no hope of OCR here.

      --
      I am not a robot. I am a unicorn.
  92. You Don't Need OCR by LightForce3 · · Score: 1

    By the way, you shouldn't need to do any OCR with these files. I do use OCR (or what Acrobat 6.0 Standard calls "Paper Capture") for my scanning, but only because that allows the PDF to be read aloud, which gives greater accessibility to visually impaired students.

    Besides, even the best OCR packages (we have ABBYY Fine Reader, supposedly very good) will do a poor job with handwritten text, and no OCR package that I know of will correctly do formulas.

    ~~LF

  93. handwritten? In this day and age? by 16K+Ram+Pack · · Score: 1
    Sorry, but this is madness.

    When I started work 18 years ago, we still had no word processors and had to write our notes. Secretaries would type them, get things wrong, we'd have to redraft them and they'd do them again. Of course, any later changes would have to go back to them.

    I don't even think about handwriting now. It's just terribly wasteful.

    The only handwritten things I see nowadays are things like compliments on report approvals, where someone is trying to add a personal touch (and birthday cards).

  94. Cheap sheet-feeder scanners by billstewart · · Score: 1
    There are lots of cheap sheet-feeder scanners out there if you only need to handle ~30-50 pages in a batch. Fax machines can do it, so it's obviously not inherently expensive, though most of the current cheap scanner market is oriented towards single-page flatbed scanners that do photographs well. That $299 HP scanner may be overkill, but it'll be solidly built and well-supported. You can also check out multi-function printers from brands like Brother that'll scan, fax, copy, and print, but be sure the resolution is good enough.

    A couple years ago I bought a sheet-feeder scanner at Fry's for $29. In addition to regular paper, it could also handle business cards. Unfortunately it got stolen out of my office, and I couldn't find a cheap replacement; I'm now using a flatbed scanner.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:Cheap sheet-feeder scanners by whereiswaldo · · Score: 1

      A problem I found with consumer-targeted products is that they can only handle a smaller number of pages at a time, and the sheet feeder is pretty basic. Recent photocopier machines (that I have seen) have much better sheet feeders and can handle more pages at a time. Anyone else have similar experiences?

  95. Check other departments by vandelais · · Score: 1

    Most universities have excessive duplication of information storage because their departments are like Chinese fiefdoms.
    Public affairs, publications, admissions, business services. Ask around. One of these probably has what you're looking for or knows somebody else at another nearby college/University that does.
    You may be just a phonecall/fax/fedex package away.

    --
    Game: Player 'Donald J Trump' now has AI skill level 'experimental'.
  96. cheap labor! by hesperant · · Score: 1

    I would suggest hiring a typist or two. Often you can get 7 cents a page (side) of typed document in digital format, although offering upto 50 cents per page is the real way to go. Hire a school kid or the like who types fast and well and bam, your set. The added benefit of helping someone out is just cool.

    1. Re:cheap labor! by DarkDust · · Score: 1

      I second that ! Hiring some school kids for a few dollars per hour that retype the scripts in OpenOffice (can export to PDF nativly) gets you the most satisfying results. You just can't scan in hand-written scripts without a human correcting the mess, so why not letting a human do it in the beginning ?

  97. I have a solution, but not for $200... by neophyte+grognard · · Score: 1

    ...closer to $2000. Caveats: this assumes you want to produce an image file, no OCR, black and white; otherwise, you're on your own. I worked for a company that scanned vehicle loan documents for customer service call centers. They used a Kodak 2500 (not sure they make it any more) to scan up to 50 ppm, double-sided. We normally used about 100 dpi for images, which were perfectly legible but kept the image sizes down (about 120kb for a single sided 8.5 x 28 inch sheet). This data was imported into a workstation (old Win98 box with a SCSI card) running a software package called Paperflow. The images could then be indexed by hand and exported to a file server, and index information moved to a database (we were using something that worked with SQL, but I can't remember what). This is only worth the effort and cost if it becomes an ongoing project -- all future notes also get scanned. Option 2: Outsource, but it might not be necessary to go to India. We bought all of our equipment from a company called Mackin Imaging (www.mackinimaging.com), and they do stuff like this for schools, banks, insurance companies, etc. all the time. I have no idea what they'd charge for a one-off project, but it will be done the way you want, indexed, no dropped pages, etc. Hope this helps.

  98. As such a student by kabloom · · Score: 1

    Speaking as such a student, I really hate that kind of PDF. They can be megabytes in size, sometimes a megabyte per page, and they're usually not worth my time or effort to download, and they're difficult to read. Get them to type (or LaTeX) their lecture notes. Offer to convert those to PDF yourself. Don't scan them. Don't encourate them to generate more handwritten PDFs. If you really must, then don't do PDFs, but use the most compressed image format you can find.

  99. You know by Freston+Youseff · · Score: 1

    I think that the actual results of this would be less than stellar there, James Bond. Try it sometime, it actually doesn't work.

    --

    1. Re:You know by Anonymous Coward · · Score: 0

      Uhhh, actually if you don't care about OCR this works quite well. I digitize all my personal docs this way. Takes about 10 secs a page to position, snap, pull the doc.

  100. Re:Bring down emissions by jpmkm · · Score: 0, Offtopic

    Swing and a miss. Just keep posting that and some day it will be on-topic.

  101. Powerpoint by thespacegeek · · Score: 1

    As much as I hate to say it, I really hate having class notes that aren't in power point. Powerpoint allows me to search it very easily. It also has the added benifit that the professor can use the slides in next year's class which helps me to concentrate on the lectures. It's a really big pain, but I don't think there is any way around typing them by hand...

  102. Give profs a choice by billstewart · · Score: 1

    In some fields, typing really is difficult, because you need to draw pictures, so scanning is probably appropriate. But in many fields, most of the material is text, and they ought to be typing it anyway :-) So if they're the type that can be motivated this way, give them a choice of ugly scans (8-bit color, 300dpi) or else submitting their typed notes, and give them a friendly interface for uploading their typed notes (if you can support web, email, and also drag&drop, that increases the chances that they'll use it.)

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  103. To reduce file size by Anonymous Coward · · Score: 0

    I assume you're scanning most of the documents in grayscale. This sounds obvious, but almost no one does it: reduce the colour depth! I find that most grayscale documents are still perfectly legible at 16 colours (or even 8 depending on how clear the original was). You only need 4 bits for 16 colours instead of 8 bits for 256. Virtually any compressed file format will be able to take advantage of this - you'll get an immediate 2x reduction in size. An you'll probably get even more than that, because the reduced-colour image will compress better too. When I have a lot of pages to scan, I set up a macro that does these three steps:
    - increase contrast and brightness (generally clears up any blotches in the background)
    - auto-equalize (to restore black/white balance, since incrasing brightness usually makes the text lighter too)
    - reduce to 16 (or 8) colours (without dithering)
    This means you need a grpahics program that can perform these steps. I use Corel PhotoPAINT only because that's what I'm used to, but any half-decent graphics program (commercial or free) should be able to do those steps.

  104. General principles for document imaging by mangastudent · · Score: 2, Interesting
    I used to develop systems to do this sort of thing ("document imaging"), so here are a few basic principles:

    The quality of the scaning is obviously important; get or borrow the best scanner you can. The point made about putting a black backing onto a flatbed scanner is important. Also important is adjusting the scanner settings so that you get minimum noise (random black dots) without degrading the stuff you want to keep.

    For this sort of thing you almost certainly want to do it bi-level/B&W/one bit deep (hopefully there are no shaded pictures, but you can use screening for those), and to my knowledge nothing has been developed that compresses these images better than CCITT Group IV (fax machines use Group III). You almost certainly don't want to use grey-scale, at least not for your final images.

    You should see if you can find some post-processing software; we used to use ScanFix, which would straighten the image (which makes Group IV compression a lot better) and depending on settings clean it up as well. You also need to decide upon the size of the final images; you want to scan at 200 to 300 or even 400DPI, but you don't have to have final versions at those high resolutions.

    The standard used to be TIFF images with Group IV compression, but not every image viewer can read them, or display them well (esp. if the image needs resizing, and I doubt you can assume everyone reading these has their monitor at a high resolution).

    If PDF will accept and display images compressed with Group IV compression, you're probably best off with that, since Acrobat Reader is ubiquitous and fairly easy to use.

    PNG is a nice format that I use by preference for > 1 bit deep images, but a quick check of some PNG documentation says that Group IV "often" compresses a lot better than 1 bit "greyscale" PNG; it was simply not designed for document imaging. And you also want to avoid JPEG, it's a lossy (will introduce artifacts) system that also wasn't designed for bi-level images.

    Hope this helps.

  105. PDF settings by mdkemp · · Score: 2, Informative
    If you intend for people to print this stuff out, PDF is definitely the file format of choice. The size of the resulting files will largely depend on the scanning resolution and color settings you use, as well as the type of compression in the PDF.

    If the lecture notes you're scanning don't contain any grayscale or color graphics, your best bet is to scan in black-and-white mode (as opposed to color or grayscale) for smallest file size. I'd suggest scanning at 300 DPI for sharp-looking printouts. Be sure to play around with the "threshold" value (or equivalent) in your scanning software until you figure out what looks best. If it's not set to a good level, text may look too thick and blocky, or thin lines might disappear completely.

    Once you have a monochrome scan, you'll want to save in a lossless compression format that preserves the monochrome attribute of the image, such as compressed TIF, and not as JPEG. When exporting to PDF, you could experiment with both ZIP and fax (CCITT group 3/4) compression types -- both compress black-and-white images very well. If your PDF software doesn't have those options, the default should probably be good enough. Even at 300 DPI, most pages should fit into about 30K or so.

  106. University doesn't already have this service? by needacoolnickname · · Score: 3, Informative

    Most universities already have this service. The professor might not know it exists, but check the other departments to see if they have one (not the scanner - but the service at the school). It is usually somewhat intertwined with a Distance Learning center or department.

    It takes away the cost of printing lectures/notes/required readings from the departments and tacks it onto the students who now seem to pay for printing above a certain limit in the labs.

    At least this is the way at the universities I have worked at.

    1. Re:University doesn't already have this service? by Anonymous Coward · · Score: 1, Informative

      The university I attended also had similar services for disabled students, so that's another place to check. Scanning and OCR is invaluable for blind students.

  107. The Solution by AeZero · · Score: 1

    I've recently been asked to do a similar task. I spent about a week writing a custom software application. My resulting PDFs are approximately 50-100K per page, based on page content. The PDFs themselves will contain approximately 50-75 pages per "set". Our organization will be using this solution with a Fujitsu 4340C Scanner. We're looking at thousands of pages per month. So far, everything seems to be working well. While we used the Fujitsu scanner, any TWAIN compliant scanner should suffice. If you are able to do the custom development in a Windows environment, I'd be happy to share my experiences and the tools I utilized in the project.

  108. Canon has a good scanner for that by fulgan · · Score: 1

    The company I work for developped an invoice archiving system for small to medium company.

    Without going into the details, we researched several scanning solution and the best price/quality machine we found was the Canon DR-2080C.

    It's a double-side, monopass, color scanner designed for archiving documents. You can load it with 15-20 pages a go, set it to scan all documents to PDF and have it automatically deskew (which is really nice if you're going to OCR the documents afterward). The only issue we've had with it was with a Dell system that wouldn't recognize it no matter what (Dell's forlks are working on it, I'm told).

    There is, however, no Linux driver available.

  109. GIGO by foofoodog · · Score: 1

    In another industry, programmers, no matter how smart they are, should not create 100's of pages of code that is not distributable, readable, searchable or re-usable. If they do then they are not doing what they are being paid to do.

    Give your professors a copy of Open Office and have them redo the work in a format that can be read, indexed, searched and distributed.

    --
    Can I bum a sig?
    1. Re:GIGO by dougmc · · Score: 1
      Give your professors a copy of Open Office and have them redo the work in a format that can be read, indexed, searched and distributed.
      While I do agree that this would be ideal in an ideal world, I know that we do not live in an ideal world. If you are requiring that the professors rewrite their notes on a computer so that they can be indexed and searched and such before they're put online, you're pretty much guaranteeing that most of them never get put online.

      Images of handwritten pages (be it pdf, png, tif, jpeg, whatever) may suck, but they're better than nothing, and they're the best you're likely to get without dedicating many many man hours towards rewriting them in a better format.

  110. Xerox Scanner doesn't do OCR by Anonymous Coward · · Score: 0

    Yes, it makes a PDF of all the pages, but each page is just a picture. There's no way to search for text in the result. Also, graphics are much larger than text.

    1. Re:Xerox Scanner doesn't do OCR by zenquest · · Score: 2, Interesting

      Xerox bundles OCR as a software add-on. It works well when you get it all set up at your company. By the time you get back to your desk, the document is open and ready to be OCR'd with a drag and drop.

      It obviously wouldn't be so convenient if he had to go to Kinkos, but they might have it set up on one of their machines. (Yeah, I doubt it, too.)

    2. Re:Xerox Scanner doesn't do OCR by timeOday · · Score: 2, Insightful
      Yes, it makes a PDF of all the pages, but each page is just a picture. There's no way to search for text in the result.
      There is no way you're going to solve that problem with one person and a couple hundred dollars.

      I know there are Adobe archival systems that store the scanned image, along with whatever text they manage to recognize. You don't expect near 100% OCR accuracy from an old, largely handwritten sheaf of lecture notes and transparencies. But hopefully enough is recognized to be of some use.

    3. Re:Xerox Scanner doesn't do OCR by dougmc · · Score: 2, Insightful
      Xerox bundles OCR as a software add-on. It works well when you get it all set up at your company. By the time you get back to your desk, the document is open and ready to be OCR'd with a drag and drop.
      The original question said that the notes were handwritten. Has anybody had any sort of success whatsoever in reading handwriting with OCR? (Not that I'm aware of.)
    4. Re:Xerox Scanner doesn't do OCR by Anonymous Coward · · Score: 0

      Get a decent digital camera that does close up and just take pictures. It's fast, they look good, and no OCR is going to do what you want. If they're too lazy to type, that's the best you can do.

    5. Re:Xerox Scanner doesn't do OCR by jayminer · · Score: 2, Informative

      This claims hardwriting recognition. (Although it requires some sort of structure in the OCR'd page I think)/

    6. Re:Xerox Scanner doesn't do OCR by Anonymous Coward · · Score: 2, Funny



      Well I hope someone develops something soon, I've been unable to read my own handwriting since 1995.

    7. Re:Xerox Scanner doesn't do OCR by ahfoo · · Score: 1

      Yep, in fact, that's what these high priced one-second scan machines consist of, just an array of digital cameras that stitch the image back together. That's why they're so much faster than a flatbed.
      Doing my own experiments, I've found that my old 1.3 megapixel digital camera can easily produce a readable image of one page of a magazine but when I run it through OCR I probably get less than forty percent accuracy on glossy paper with small fonts. If I use a book with fairly large fonts I can get OCR probably closer to eighty percent which is still not too hot, but you can easily get an idea of what it says.
      But that's with a 1.5 megapixel camera. These days six megapixels are cheaper than that one was when I bought it four years ago. I read, on Slashdot if I'm not mistaken, that professional scan machines usually have an array of image sensors in the range of fifteen megapixels. So perhaps two six megapixel cameras in a mounting and with a script to automatically paste them together would give you acceptable results.
      Alternately, there was another story, also on Slashdot, about camera phones that had built-in stitching that allowed decent scans with a lower resolution camera. Some day that will likely be interesting, but I'd think two high end digital cameras could give you some interesting results with a bit of scripting magic. Perhaps you could use a naming scheme during capture so you could keep track of which images were meant to be stitched in which order.

  111. Ask MIT by TV-SET · · Score: 1

    You might want to contact MIT and ask around, since they were/are doing a lot of what you need. Check out their MIT OpenCourseWare.

    Maybe you can also convince your professors to use their notes - than it's just a simple wget job for you. :)

    --
    Leonid Mamtchenkov ...i don't need your civil war...
  112. GIF +OCR is really just fine. by billstewart · · Score: 1
    This is the kind of application where GIF files are really just fine. Unless your prof is showing photographs, which really do need good color depth and can be handled separately, most class material is either text or line drawings, so a limited color palette (16-256 colors) is enough, and GIF-type lossless compression is more appropriate than JPEG-type lossy fuzziness (which is better for photographs.)

    The big annoyance of image files of any type is that they make it hard to cut&paste text, but if you're working from raw bitmaps anyway, you don't lose out by using GIF instead of PDF to package the pictures. (PDFs that are created from text make it possible to retrieve the text, at least with newer PDF versions, but you don't have text to retrieve.) So also try running the thing through an OCR to extract anything you can, but don't expect much.

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
    1. Re:GIF +OCR is really just fine. by PEdelman · · Score: 1

      DjVu also supports real text embedded together with the image, plus their position in tscanned bitmap. So you can search for text within a DjVu file. Works really neat.

      And [Ff]ree DjVu viewers can be obtained for multiple platforms.

      --
      Like science? Comics? Wicked...
      Funny By Nature
  113. ADF Scanners-Drive a Tank. by Anonymous Coward · · Score: 1, Interesting

    I see I'm not the only one with one. The nice things about it is that it's built like a tank (weights it too), and can handle legal size. The only downsides is the resolution isn't as high as modern scanners, and that sheet-feeder is bulky.

    Anyway I run the output through this and a bit of OCR (doesn't have to be perfect), and store it in a Database.

  114. One word: SCSI scanner. by NNland · · Score: 1

    I once had the job of doing basically the same thing as the asker was asked to do, though I was to scan 176 pages in total.

    I ended up getting a login on an old dual PPro 200, running NT4, with an old HP SCSI scanner. Scanned in at 300 dpi, 256 color greyscale. Each page took around 10-15 seconds, each save and page swap took 15-20 seconds (I used the kodakimg app that came with NT, saving to compressed tiff). I initially tried the same thing with a USB scanner, but each page scan took 1-2 minutes. To hell with that.

    After the 176 pages (88 pages double-sided), my arms and back were sore, but it wasn't too bad. Thankfully, I got them in two sets (115 and 61 pages), and it was even easier.

    I did a scripted resize with photoshop to fit each image on its own page (though you could probably do the same thing with the 'resize image' powertoy for XP), generated a single web page that contained all of the images, loaded the web page (it took a few minutes), and used 'print to pdf' which is available if you have Acrobat installed.

    If the asker only has one class, and its only 1-200 pages/week, 2-3 hours to do it all isn't that bad. No OCR, but I've seen so many crappy conversions it is hard for me to trust them.

    1. Re:One word: SCSI scanner. by hey · · Score: 1

      One word: SCSI scanner is multiple words!

    2. Re:One word: SCSI scanner. by NNland · · Score: 1

      Point for you, but a SCSI scanner is still good advice.

  115. professionally by curator_thew · · Score: 2, Insightful


    The professional approach is to go back to them and clarify the outcome:

    (a) you can scan the documents in, and they'll take X amount of space, and Y time; and this doesn't include OCR;
    (b) you did a few tests (using the supplied document) and these are the results for TIFF, JPG, PDF, etc;
    (c) OCR is probably infeasible (or not, do some tests) because of the nature of the documents;

    Include in (a) the option of purchasing an automated document scanner, and the corresponding reduction in time.

    Based upon all the above, get a clear go-ahead, and make the purchase if new equipment is authorised.

    You said "where I work": this is your job: it's a bit poor to do as the other posters suggest and refuse to do the work: you need to make sure that the customer (professors) understand exactly what they are getting, and give them a choice to buy into it or not - i.e. "clarify the expectations".

    If you assess that it's 2 weeks worth of work, and the professors don't disagree, then you're supervisor just has to put up with it.

  116. We are a Law Office and use a Kyocera 5530 by sir+lox+elroy · · Score: 1

    with the Scan to PC Option Package. I can set 50+ pages on the sheet feeder and hit scan to PC and it will make a single PDF in about 5 minutes on the Server. (For note we use a Mac OS X Panther server which Kyocera does support) It is helping us go from paper to electronic quite well. The 5530 is a Copier with network abilities, and the scanner adds a secnd network interface and adress. It can also scan to e-mail. This ystem works quite nicely. We also used Fujitsu ADF 11x17 scanners I believe they were the 1100's it's been quite awhile ago. THose could do 22 ppmbut had a dedicated PC and the kofax card got quite hot, and the one we had only worked in Win 3.11. There are quite a few systems out there that are good at that, but the prices are going to be rather hefty.

    --
    Kosh: "Understanding is a 3 edged sword, your side, their side, the Truth."
  117. You must not have looked too hard. by turbomonkey2k · · Score: 0, Flamebait

    Your research skills astound me in their nonexistence. If you are typical of today's college student, then I fear not for my job security. For less than $100 you can get a Lexmark(x125) at freakin Office Depot that will sheetfeed scan as well as color print and fax.

    Now you're lazy but, I'm smart and lazy so I'd just go to Kinkos, give them the stack to process and present the receipt to the professor for reimbursement. I would also be surprised if your school doesn't already have the facilities to perform your needed task.

    One of the things you will learn in your life is that usually your problem has already been solved multiple times by multiple people, and the least bit of effort on the internet will generally provide myriad examples of these solutions. Though, I can't believe this problem actually made it to slashdot. Must be a slow news day.

  118. HP Scanjet 5550c scanner in use by jessedh · · Score: 1

    we use this everyday at my office to automate / consolidate information from various investors and then send them out via pdf. It works pretty well, but its not too fast..

  119. What about doing it with a bit more considerate? by Anonymous Coward · · Score: 0

    Here in Germany we do hire living people to do this. It's a bit more than just scanning and making PDFs, you should prepare the courseware with a little more respect than that, or the future students will hate you, and for a very good reason.
    Is manual work so imposible to pay these days, or should everything be as dirty cheap as possible?

  120. Definitely xerox by adamiis111 · · Score: 1

    HP is nice for many things, but the xerox docuscan is really the solution for you. Just get the most expensive one you can afford (more money = faster) and go for it. MIT press classics recently did a bigger project converting all their books to PDF for reprint - too bad you don't have those funds http://www.xeroxscanners.com/default.asp?pageid=10 0

  121. Digitizing lecture notes by weresquirrel · · Score: 1

    Quick suggestions:

    1. Get the profs to do it in a digital format, _any_ digital format that can be scrolled -- ideally something which can incorporate diagrams and equations, but really whatever the default IT word processor is, or if individual, work with them -- because the result, from the experts themselves in a searchable "hands on" format, will simply be an order of magnitude better than anything you can scan in and attempt to make searchable.
    2. Look for local scanning firms. We just finished a 15,000 page run for a client via a local legal dbms firm here in town (Seattle) for .10 a page, in PDF format at 600 DPI. We spent almost as much "coding"(in the paralegal sense) the documents into a sharepoint list.
    3. Scan and bear it. You are right, unless you have been very lucky in choice of scanners and the people involved didn't wrinkle the sheets too much, it is going to require an attentive human monitor.

  122. outsource it to India by test007 · · Score: 1

    Why not outsource the job to India. For a few hundred dollars you should be able to hire a few people to type everything for you.

    --
    There are 10 kinds of people. Those who understand binary and those who don't
  123. Xerox Document Centre by Morthaur · · Score: 1

    Find yourself a Xerox Document Centre that you can borrow for a few minutes. I can place a stack of sheets in the tray, select a destination folder, and hit 'go.' A few minutes later, there is a .pdf file sitting on a network share. I have used this for everything from digitising entire books (cut the binding and stack the sheets up) to small documents such as a CV; it is fast and effortless.

    --

    +++++++
    "Look, dear, it's a crazy hairy scary man!"
  124. DjVu-Managment. by Anonymous Coward · · Score: 0

    True, but you still need some kind of document managment system, else all you have is a pile of scanned images. Were's bill so and so, sent august of 2002? Also some documents will need to be kept in paper form for legal reasons.

  125. PDF Settings by AvitarX · · Score: 2, Informative

    I've been reading a few minutes and nobody seams to address your setting etc.

    The you should scan in grey-scale or if there is high enough contrast (pen notes, not pencil) in Black and White. The grey-scale with a JPEG medium or even low compressions is going to be much smaller then the deafaults. A pure black and white with group four compression will be even better. At work we scan pages at 300 DPI that way and get 20 to 30 k files (I think, haven't done it for a while).

    Also typically images for web viewing of even text are scanned at 72 dpi (all the scholarly journals at my university). This can make things hard to read but really shrinks the file (about 1/16th the size of 300 dpi).

    Also if the scanner is set low res pure black and white it will scan a lot faster, but still be pretty slow.

    The other option is to pay someone to do it. If you have all of the stuff ready at once and give the pros a week or so to do it when they aren't busy you can probably get as low as 50 cents a page.

    Blah blah, I lost my train of thought 2 paragraphs ago

    --
    Wow, sent an e-mail as suggested when clicking on "use classic" banner, and got a fast response that addressed my msg
  126. perfect device by hpavc · · Score: 1

    its all about a hp network scanner ... i have been using one for quite some time now scanning in magazines and dnd books etc. create multipart pdfs, puts them on a network share etc.

    --
    members are seeing something, your seeing an ad
  127. docutech is the way to go... by capsteve · · Score: 4, Informative
    being in the prepress industry, i see more and more traditional printing going the way of xerography. of the competitors in the field, xerox probably has the best system with the docutech series... you may want to consider kinko's which is an authorized user/vendor of the docutech system.

    on a side note, if the professors are utilizing a lot of additional material which includes might include3 handwritten information, you might consider getting encouraging them to transcribe that material(hopefully your not the TA that has to do the transcription) into a digital for, be it text or WORD. this'll difinitely help in reducing the size of your files.

    also consider looking into adobe's pdf service, if you're overwhelmed with just orginizing the material itself. probably not so kosher to suggest ity on /. but it could be something the school already has an agreement with adobe(taking into account the units of acrobat the school itself might be using). i know it's not rolling your own, but sometimes using an "out of the box" solution to get thing up and running so you can explore other solutions has it's merit as well...

    --
    three can keep a secret, if two are dead - benjamin franklin
    1. Re:docutech is the way to go... by Hognoxious · · Score: 0
      you might consider getting encouraging them to transcribe that material(hopefully your not the TA that has to do the transcription) into a digital for, be it text or WORD. this'll difinitely help in reducing the size of your files.
      Quite. I don't see any point at all, artistic reasons aside, in digitising handwritten stuff (except as a prelude to to OCRing it, and I doubt the current state of the art is up to that).
      It's like using a horse to pull a Ferrari.
      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  128. PDF of handwritten notes is DUMB!!! by madstork2000 · · Score: 4, Insightful

    It makes no sense at all to me, to have a PDF created of handwritten notes. Since most students will probably just download and print out the PDF anyway. The only adavntage is it may save a few trees not everyone will print them out.

    It sounds like the school wants to shift the production costs (i.e printing) to the students. This seems inefficient because the old way where the instructor could go to the copy center and have the notes copied the at the schools expense (I know these expenses are often passed along to the students anyway), rather than at the students DIRECT expense of their time for downloading, then printing out on their own equipment or using their own printing accounts at the computer center.

    If the notes were being OCR'd and then made available on-line, or post processed in such a fashion (where they are searchable, indexed, etc) where they were searchable, it would be useful. Otherwise this seems like a waste of time and money.

    -MS2k

    1. Re:PDF of handwritten notes is DUMB!!! by kingsy · · Score: 1

      how about saving some tree's buddy.... 3/4 of the notes if the lecturer printed them for EVERY student would be lost, discarded or left unused and end up in the bin anyway. Now thats what I WOULD call "a waste of time and money".

    2. Re:PDF of handwritten notes is DUMB!!! by P-Nuts · · Score: 1

      Sure, most students on receiving these notes will just print them, but there are some advantages to having the notes available in electronic format. Once electronicized the notes can be made available to students into the future, but photocopied notes won't be available to future years. I've taken courses where the previous lecturer's notes were more useful than those of the lecturer giving the course in my year.

      Come on, storage is cheap these days, so even scanned PDF's aren't prihibitively large. Various journals have their archives as scanned pages, from when they were produced by less electronic means, and the filesizes, although around 10 times the size of pucker electronic files, are not so large that they strain the bandwidth of an academic network connection.

    3. Re:PDF of handwritten notes is DUMB!!! by alephdelta · · Score: 1

      Give them a Knoppix CD, Teach them Latex, and you will have the problem solved for future notes.

  129. As a Student by Anonymous Coward · · Score: 0

    Scanned in images of handwritten stuff that has been produced by clever lecturer types is usually very difficult to read. The Engineering department where I study puts all its outline solutions in single page .PDFs produced from scaned images. It takes ages to print them out one at a time and the quality is so bad they are almost not worth having.

    Getting someone to spend some time converting the documents to Latex makes some very easy to read and editable output. Yes it would take ages, but it can be updated and errors corrected. It also produces something that is much much more usuful to me the student.

    Ian

  130. Get a Canon Document Scanner by spizm · · Score: 3, Insightful

    The company I work at scans large amounts of documents to PDF format on a daily basis. Depending on the volume some people do, we use either a Canon DR-3060 or DR-5020 document scanner. These will scan both sides of a page simultaneously, clean up the image (despeckle and deskew) and convert them into TIF or PDF all on the fly. They're fast too. Between 20 and 50 pages per minute. Only problem is that they're expensive.

    For your budget, you may be able to afford the Canon DR-2080C which goes for around $600. It has all the features of the more expensive ones, but it's meant for smaller volumes like what you're dealing with. With that, you'd be able to scan 100 pages into a pdf document in around 5 minutes.

  131. OR by geekoid · · Score: 3, Interesting

    charge by the hour, at least 50 dollars an hour. That way you can hire 3 student at 10 bucks an hour to do the actual work.

    --
    The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  132. JBIG2 inside PDF by jab · · Score: 2, Interesting
    Actually, the newer PDF specifications and newer PDF viewers (like the totally excellent xpdf utility, and oh yeah, Adobe acroread 5 and onwards) all support JBIG2 compression. JBIG2 is a token based compression technology giving roughly similar file size and image quality compared to DjVu, but with the advantage that everyone and their uncle can deal with the PDF file format.

    So, I recommend scanning to TIFF (or TIFF inside PDF). Even if you don't currently have the encoding softeware, you can convert to JBIG2 compression later as it becomes more and more ubiquitous in the future.

    And definitely use a automated document feeder of some sort to keep from going crazy. Newer Xerox machines work pretty well for this (I use a DocumentCentre 440ST for this all the time) unless you have hundreds of thousands of pages to deal with, in which case you should either invest in industrial scanning equipment or outsource to a scanning center that does.

  133. ScanSnap by Fujitsu by Anonymous Coward · · Score: 0

    Well, if it was compatible with Mac OS X I would think the ScanSnap would be the thing to get. I've been wanted something just like this myself and I use an iBook, but unfortunately this is one of those reasons why people say "windows is better for business".

    Anyway, if you could find someone with a PC I think the ScanSnap would be what you're looking for. It's $300-$400, scans directly to PDF, scans both sides at the same time, sheet fed, etc. Here's the URL... http://www.scansnap.com

    SOMEBODY PLEASE WRITE SOME MAC OS X SOFTWARE FORT THIS BEAUTIFUL ADF SCANNER!

  134. Fixed Link by magefile · · Score: 1

    The script is here.

    The parent's link didn't work because freecache only caches files larger than 5 MB, while that is ~1 KB.

    Is there anyway to use sane (or any other Linux scanning software) to scan over a network? I've got my printer shared, and I access it via Samba. Is there an equivalent setup for scanners, perhaps using DaemonTools?

  135. ask your professor to be precise ... by pikine · · Score: 3, Insightful

    I think what your professor wants is not a bitmapped copy of his handwritten notes or some vector curves that resembles such, but actually a typeset version of the lecture notes. If that is the case, assuming that his handwritten notes are sparse (and hopefully without diagrams, since it takes more time to mess around with them), you can definitely do a stack of 100 sheets in a week, or, as someone already suggested, hire some typists to help you out.

    --
    I once had a signature.
  136. Re:Hello by Anonymous Coward · · Score: 0

    Michael isn't "a little" anything.

  137. Very interesting by Mr.+Ophidian+Jones · · Score: 1
    That's one of the best articles I've seen on this event.

    Sponsored by Intel Corporation, I run one of the Grand Challenge teams, Team Overbot. We have a vehicle (a modified six wheel drive Polaris Ranger), a shop in Redwood City, funding, equipment, and people. We're well along; the vehicle has most of its actuators and some of the sensors working, and about a third of the software is running. We're one of the five DARPA-accepted teams.

    Many of us are Stanford alumni or students, but this is not a Stanford project.

    Our basic technical approach is to build a rugged, reliable vehicle with conservative control strategies. Others may be faster, but we expect they'll get into trouble at high speed. Our top speed is 40MPH. The real problem with the Grand Challenge is not going fast on the easy parts; it's getting through the hard parts.

    The 6WD chassis we're using is one of the most bump-tolerant platforms around. It can go over railroad ties at top speed without problems and without going airborne. The center of gravity is low. The front and mid axles have independent suspension; the rear axle is a swing arm. This simplifies low-level vehicle control. All wheels can be driven, although at higher speeds, we will switch from 6WD to 4WD.

    We have five computers on board. Three are small PC/104 machines, and two are Pentium 4 machines. All run QNX (the OS for when it has to work.) All are industrial-strength ruggedized units. The actuators are all servomotors driven by industrial microcontrollers. All this hardware is off-the-shelf industrial control gear.

    Sensors include LIDAR, doppler RADAR, sonars, cameras, INS, GPS, etc. Some of them are used in unusual ways. That's all I'll say about that.

    The pathfinding strategy is indeed borrowed from video game technology. It's more structured than Brooks-type behavior based robotics, and it's less structured than Latoumbe-type planning. There are three layers of control; the top one we call the "back seat driver", because it has only advisory authority over the "driver".

    We have road map and topo data onboard, but it's used more as a hint than as rigid guidance. We take the waypoints DARPA gives us (on a CD, at 0430 hrs the morning of the race) and load it in. There's no offline preplanning. Wouldn't help in the real world.

    If nobody wins this year, which is quite likely, we'll be back next year with a faster vehicle.

    Post questions and I'll answer them here.

    John Fagogle
    Team Fuckbot

  138. Depends how good you want to do it by Danh · · Score: 4, Informative

    If you want to do a good job, you have to type it, in LaTeX. It's the only way to get something nice and something the professors will be able to enhance in future.

    If a digitized copy of the manuscripts will do for you, you can go the scan -> image enhancement -> OCR -> save to PDF way.

    For scanning, you already got a lot of good comments how to automatise the scanning of dozens of scripts. If you lack these possibilities also a SCSI or USB desktop scanner should do the job (it's definitely less than 1 min per page), so you scan a script in 2 hours. No need to bother to outsource the job to India. Probably you can scan B/W and don't need greyscale or colors. I would scan handwritten scripts at 200 DPI and save the whole pictures in front of the OCRed text, so the user doesn't see the OCRed text and can only use it for selecting and copy&paste. It would be too much work to correct the OCRed text here. For machine written text I would use 300 dpi or more for better OCRing.

    As image enhancement you only need to be able to automatically orient the page so that the text is horizontal. I don't remember if Acrobat does it, but for this job I would anyhow get a good OCR program.

    As OCR program I recommend FineReader, but also Omnipage is ok. FineReader does better OCR than Omnipage and Acrobat. It also saves better to PDF (with retaining all of the paragraph structure) than Omnipage.

    If you keep the image before the OCRed text in the PDF you can expect files of 10MB for 100 pages for B/W scan at 200 dpi. OCRing of machine written text has become incredibly accurate, so you can do real OCR there and throw away the bitmap picture. This of course gives much nicer output (and smaller filesize), but you need to spend a lot of time correcting the text. Here the best OCR program really pays off (you probably have a lot of words which are not in a dict, need custom dicts (does Acrobat have them?),...). A program with a single flaw (e.g. that recognized you formula as text, or code as paragraph text,...) will let you waste a lot of time correcting it on every second page.

  139. Ascent + High Speed Scanner by Anonymous Coward · · Score: 0

    If you're going to be doing this often I suggest that you look at Ascent Capture (http://www.kofax.com/) with one of the supported high speed scanners. I support about 10-15 users on Fujitsu M-4097 scanners which will do about 50 pages per minute (duplex simplex is 26) and they're quite reliable. My guess is that they're about $8000 (US) though, I'm not generally involved in the money side of things.

  140. Large Scale OCR and/or PDF conversion by kingsy · · Score: 2, Informative

    The answer is simple. You are at a university. MOST modern photocopiers do inbuilt pdf conversion or OCR'ing to network drives or email. Find one of them.

  141. True! by NumbThumb · · Score: 1

    Some company (i belive it was called D-Info) did that a couple of years ago: they where the first to offer a phone-book for all of germany on a CD-ROM. They took the info from the hard-copy phone books (where are, or rather used to be, a public record and thus not copyrighted) -- but they where not allowed to scan it, because layout and such *was* copyrighted (which was made rather clear by the Deutsche Telekom (or, back then, the Deutsche Post).

    So, wat did they do? They had a couple of hundred chinese (mostly) women typing away for some months. It seemed to have worked quite well, until the Telekom started to release their own CD-ROM.

    --
    I have discovered a truly remarkable sig which this 120 chars is too small to contain.
  142. Ok. Use voice Recognition by Anonymous Coward · · Score: 1, Insightful

    1. Get Dragon Naturally speaking.
    2. Dictate the Essay, albeight a bit lengthy, into it.
    3. Import to Word or your favorite word processor.
    4. Add any cool equations and such that you cannot dictate.
    4. Publish to PDF.

    Nice small file size I'm sure.

    Scanning is nice, but it only works with fonts it can recognize. Not Proffesorese.

    It could take you a day or so to dictate, but after your finished, more than likely you will have alot less spelling and random letter and symbol problems.

    But again, this might be more work that you want to do. Why? Well, if you do it this way, make a nice clean portable document that everyone can read, you might find yourself getting more "extra work" than you wanted.

  143. 8100C by MushMouth · · Score: 2, Informative

    They no longer make it but they can be found on ebay for a few hundred bucks and no I am not selling this one or one at all.

    1. Re:8100C by larkost · · Score: 2, Informative

      I use a 8100c on a daily basis, and it is a good little machine, but their are a few gotcha's:

      The color tiffs use a depreciated form of tiff that was rescinded from the standard as unworkable. On top of that, the version they use does not work with an variant of libTiff. Basically you are stuck working with a few windows programs... and graphics converter on the Mac. Photoshop with sometimes even choke on them.

      When we try and scan yellow documents the scanner will occasionally freeze up. It seems to happen sooner in tiff mode, later in PDF mode... but eventually it just freezes and has t be rebooted. Reloading firmware does nothing to abate this.

      Oh... and it does mean that the scanner has to feed a windows box, as I have not found a means of attaching it to anything else.

    2. Re:8100C by MrChuck · · Score: 1
      The 9200C we have is on a mixed network, but it uses SMTP to get the PDF to the user. My issue is taht it's using an SMTP server I've been trying to retire for a couple months. And all these scanners seem to point to it by IP addr.

      Sigh.

      Clearly, the OP should check out what's needed and get a sample or 3 sent to his/her laptop.

      But the main point is that the tech is out there for mid-sized volumes of scanning. OCR is a different game, and at 99% reliable, that means a typo every few lines.

      Not sure why this was worth /. rather than a simple comp.soemthing.scanning news question.

    3. Re:8100C by Anonymous Coward · · Score: 0

      As a matter of fact /. is not worth a dime.

  144. Use EFAX or some such by Anonymous Coward · · Score: 0

    Noone has mentioned what I consider to be the most practical solution. Using a physical fax machine, a virtual efax account, and software of acrobat reader, you can convert mass paper docs into digital images.

    Algorithm below...

    0. Sign up for an e-fax account
    1. Find a standard cheap fax machine with auto-feeder
    2. Send the documents to 'yourself' via efax. Phone/PSTN -> digital image
    3. receive the document as a fax image in your email
    4. Install Adobe Acrobat, so you can print as PDF
    5. Using efax's client, and print the faxes as a PDF

    Done

  145. VirPack by WyrdOne · · Score: 1

    I work for a company that develops Document Imaging & Delivery software & enterprise solutions for the mortgage industry.

    We reccomend the use Fujitsu M4099D or Kodak i810 scanners for large scale scanning jobs.

    However you probably don't need the 50,000+ pages per day scan rates that we spec systems for.

    For you I think I would reccomend the Fujitsu 4750 or Kodak i80 scanners.

    Now if you are interested in our software, contact me off-forum and I'll put you in touch with our sales guy. ( dchubb@virpack.com )

  146. Lots of options by Anonymous Coward · · Score: 0

    a) Use a service to scan, organize, store
    b) Use an undergrad student to scan, organize, store
    c) Use a grad student to scan, organize, store
    d) Use a post-doc to scan, organize, store

  147. In my perfect world by foofoodog · · Score: 1

    I guess it has to do with the goals of the project. It looks to me like the current approach is low effort for low value. I assume that after scanning they will have to attach some meta-data to the files and perhaps have them reviewed for legibility and make corrections, which brings it to medium effort for medium value.

    I figure if you are going to do medium effort anyway you might as well shoot for high value. I see it as medium effort for someone to transcribe their own notes of a subject they understand completely and it returns high value.

    Perhaps it can't realistically be done with existing work but if the people paying the professors want to get more out of their investment then they might introduce it as part of the process and include it as part of the deliverables as it were. I understand that universities are as political if not more so than corporations and that what I have suggested may not always work either place.

    --
    Can I bum a sig?
  148. either outsource or use a digital camera by misanthrope101 · · Score: 2, Insightful
    I've used a flatbed for this type of thing, and it works, but it takes forever and it's frustrating. It isn't hard, mind you, but time-comsuming and mind-numbing. The first 30 pages is easy and then you get really really sick of it. If you do scan it yourself, you don't ned more than 200dpi or so, and you can save as high-quality jpeg. This isn't artwork, and there is no need for perfection. Acrobat will accept any image file. I'd scan with a standalone image program (I use ACDSee and it works well) and then feed the images into Acrobat. But as far as a recommendation...

    Have it professionally done, like other people here have recommended. High-end sheetfed scanners are great, but you probably can't afford one, and it wouldn't make sense as a one-time expense for this small of a job. I'm a big fan of just handing someone some money and it's magically accomplished.

    Alternatively, use a digital camera and well-lit copy stand. You can improvise a copy stand with a tripod or whatever, but make sure you have a lot of light. It's a lot faster than using a scanner, and the results are acceptable if you have a good camera. The more megapixels the better - don't use the old 1.3mp one you have lying around. 3mp will technically work, but more is better. Ideally a digital SLR pointed straight down at the page, a very well-lit area (a clamp light on either side of the page works nicely), and you sitting there sipping Starbucks while you hit a cable shutter release after you flip every page. You could get a few hundred pages an hour done this way--your only limitation is how fast you can turn the pages. You'd only have to stop to transfer images to your computer, and you only have to do that often if you don't have enough memory cards. After you get all the pages into the computer, feed them into Acrobat and you're done.

    If you don't want to use acrobat you could make a web-page with thumbnails linked to the hi-res images. Then your end-users wouldn't need to download the Acrobat reader. I love Acrobat's ubiquity but hate the file sizes and the slow start-up time.

  149. How I Do It by MSInsight · · Score: 3, Informative

    I scan and upload various land use and financial documents for a county and its townships to the internet on a shoe-string budget - actually, no budget - all volunteer, public service for fellow citizens. This is my prescription:

    Stay with your current flat-bed scanner. Do not waste money on a sheet-fed scanner. You do not have nearly enough money for a high-end Fujitsu or Bell & Howell sheet-fed scanner which will reliably get the job done without mechanically screwing up. The pros use high-end scanners because they never screw up and they go fast. Cheap sheet-fed scanners miss sheets or jam up too often to trust them with anything. Make a sign-up sheet for work-study or volunteer students in your academic department to sit down at your computer and scanner and scan the documents into the computer. Give them free pops and gummy bears (slur it so it sounds like "rum & beers") or something similar which won't transfer from fingers to documents. Just take a few minutes to set them up and show them what to do. Keep it simple. Let those empty minds waiting to be filled with knowledge (and beer) do the time consuming zombie work. You should focus your attention on how to put the files on the website.

    The scan file format I use is Portable Network Graphics format or PNG format. On average, it compresses black and white graphics 20-25 percent smaller than the widely used GIF format. PNG format is also supported to a basic enough level to be displayed using MS Internet Explorer, Netscape, Mozilla, and other internet browsers.

    I use free Xsane scanning software on a linux system to scan the documents. Xsane can be set to scan in line-art mode, also known as black and white mode. This software can also be set to save files directly to disk in PNG format and automatically change the file names using numerical iteration, i.e., file-01.png, file-02.png, file-03.png, etc. without the need for human intervention to change the file name each time. I use a 100 dpi scan resolution setting because documents do not need to look ultra-smooth; they just have to be legible. Anything beyond that is a waste of hard drive space. Using this resolution also means I do not have to spend time embedding the graphic file in html code to constrain its width so it can be viewed on the average 15", 800x600 resolution monitor. I just insert weblinks to the individual, one-page graphic files: "Page 1, 2, 3, 4, ...", with each page number hyperlinked to a corresponding graphic file. Your graphic files will run 15-25kb each. The use of PDF graphics format is a waste of time and space unless a professor gives you a MS Word file of their lecture notes which you can convert directly into a PDF file with embedded text. That is the only case in which I would use PDF over PNG. Good luck.

  150. High volume scanning by Anonymous Coward · · Score: 1, Informative

    I work for a scanning service, we do hundreds of boxes of paper at a time. I can only speak to the really high volume scanners, and Kodak is the best, hands down. Most of them are actually two scanners in one, one for the front and one for the back. An i830 will do about 80 ppm both sides at 200 dpi. That's a $64K scanner. I think the Slashdotters gave you a pretty good list of less expensive models. If it's all clean stacks of mostly white paper a less expensive scanner will do well. More expensive scanners are not only faster, but recognize contrast better (even black writing on pink paper) and feed torn, curled or irregular paper better.

    Equally important is good scanning software that will break up your images into the files you want. Software can read bar coded sheets of paper to start new documents, or to index them in a database. Legal firms use software to match a database to the Bates stamps that ID files submitted in discovery. Software can remove the black border around the page, de-skew the image, or de-speckle the image.

    As for PDF, it's a great choice compared to a lot of formats. And with Acrobat, you can do bulk conversion of multi-page tiffs to PDF in one pass.

    A service bureau will do the job for less than 10 cents a page.

    My throw-away email account is bugmenot@fastmail.us in case of questions.

    Hope this helps

  151. Acrobat Capture by Skuld-Chan · · Score: 1

    There's a little known program thats pretty good for this job called Acrobat Capture - it uses isis compatible scanners.

  152. digital camera by sewagemaster · · Score: 1

    since it's hand written notes and would be hard to OCR anyway (files are tend to be huge), how about using a digital camera and take snapshots of these pages?

  153. Sounds like an English Prof.... by Nikker · · Score: 1

    It sounds like they want something where they will just send you a couple of scraps into a machine and evreything is typed up for them (no secretary needed) Use one of the profs as a test pilot that seems cool and see if you can get in with the tech spirit and get evreything inputed directly onto touch scree / stylus set up on his desk??

    This will also be able to show him upfront what the system can and can't be recognized and can be fixed immediately or before he descided to "submit" the page into the main file system / backup.

    It would take up a little more time if you want to do it right and write a program from scratch (get marks for it?) but a good idea would to get both sides to come to a happy medium (prof vs computer) make it so it weens him from using over exaggerated bad penmanship and put in a couple of hours desiging a descent ui for formulas that are variable (lambda, pi etc) friendly and will understand on which side of the divisor evreything belongs. It would be even kinda cool if it could solve the probs too, as it went along.

    Because I am not sure what program you are in this may/may not be feasible.

    an alternative would be to ask your profs if they buddy with the math/comp si profs and donate some eager brains for possible standings within the course ;) that way no money and the profs are working you. They will like your idea !!

    Good Luck

    --
    A loop, by its nature, continues. If that didn't make sense, start reading this sentence again.
    1. Re:Sounds like an English Prof.... by Anonymous Coward · · Score: 0

      ... with equations? You did read the original story, didn't you? Guess not.

  154. digial camera? by Anonymous Coward · · Score: 0

    How about a 4 megapixel digicam, 1 click and it's 'scanned' very fast, and easy!

  155. Two inexpensive ideas by totoanihilation · · Score: 1

    Two inexpensive ideas that come to mind:

    1. Find the best fax machine in the establishment. They usually have sheet-feeders and are fairly quick.
    Fax the document to your OSX-enabled powerbook. You should then have every page in individual .tiff files. Write an applescript to automate GraphicConverter to batch-process the images.

    2. (This is a method I've used, and it works fairly well, although it requires some manual labor).
    Find a music stand, a tripod, and a digital camera. Aim the camera at the music stand, and take pictures of each side of the pages as needed. Depending on the writing, you can take the pictures at one megapixels.
    Again, Applescript, GraphicConverter to adjust the whites and convert the files...

    HTH

  156. Typical pages, please! by Cardbox · · Score: 1

    If you could scan & post (with lossless compression eg. GIF) a couple of vaguely typical pages then we could all try our favorite compression software on it & get some idea of how much storage could be saved.

  157. Mass Scanning by carldot67 · · Score: 2, Insightful

    I looked into this once for a client. Agencies charge around 5c a page but that is only to scan. Add more for OCR, manual verification and/or transfer to M$ Word or what-have you. I think I recall seeing 50c a page for such value-adds. Agencies are good because you dont get need to buy the kit (30K and up) or watch it run (they need feeding and jam quite a lot, especially if the paper is lower quality). Agencies also make sense for shops with nil/low expectation of producing more paper in the future. Get some quotes, references and examples of their work and start with a short trial run.

    --
    I wish at was Friday, but I dont want to wish my life away. So I wish it was last Friday.
  158. That's what Groklaw is doing by DaveAtFraud · · Score: 1

    PJ & Co. at Groklaw are faced with an easier problem and the best solution thay have is to OCR what they can and then have individual volunteers fix the stuff that the OCR process misses. I say they have an easier problem because they are getting "published" court documents that have been scanned in as graphics. For that matter, you could also technically do what the court is doing and simply scan the notes as graphics and publish them that way as PDFs. That is, don't even try to convert them to text.

    --
    They that can give up essential liberty to obtain a little temporary safety deserve neither safety nor liberty.
    Ben
    1. Re:That's what Groklaw is doing by mikeee · · Score: 1

      Clearly, the easy solution is just to get your documents included as a filing in one of the SCO cases and let Groklaw do it.

  159. I'm archiving stuff at my university by adrew · · Score: 4, Informative

    We've undertaken a pretty large archiving job at my university. We're scanning every page of every newspaper we've ever printed (started in 1927) up to the time we have digital archives starting around 1993 or so. We're also scanning about 80 300 page yearbooks. Hopefully this can offer you some help or suggestions.

    We have a dual-processor G4 and an Epson 1640XL large-format FireWire scanner with the optional auto document feeder. It's probably a bit out of your budget ($2899 + ~$1200 for the ADF) but it's awesome. It can scan at up to 1600dpi and the ADF can automatically duplex and scan both sides of the page. We're using OmniPage Pro X for OCR software.

    Right now we're more concerned with scanning the documents and getting them online, so we haven't started OCR'ing everything yet. But the ADF is awesome. It can scan both sides of all 300+ pages of a yearbook automatically in about 2 1/2 hours.

    The newspapers are a bit different. They're getting a bit fragile in their old age so we have to manually scan them. We scan them at 300dpi in full color, so the 12x18 pages are around 50MB per page. But the scanner takes less than a minute per page. It's impressive.

    We use Photoshop's web gallery feature to generate the image galleries. Pretty simple really. Let me know if you have any questions.

    1. Re:I'm archiving stuff at my university by Hecatombles · · Score: 1

      Hello, I have some questions:
      Have you fixed a reference scan when you started the scanning process to compare to another scan of reference later ? I ask this because I wish to know is there is some degradation of the scanning mechanics and when.
      Which format are you using for arkiving and how much large is it ?

      Thank you in advance

    2. Re:I'm archiving stuff at my university by adrew · · Score: 1

      We haven't actually done any reference or control scans. This scanner, however, is a beast. It's built like a tank--with the ADF it weighs about 70 pounds. It automatically focuses and adjusts the optics before each scan job. We haven't experienced any image degradation yet.

      We're saving all the images as TIFF files with LZW compression (lossless). The 12" x 18" pages at 300 dpi are around 40-50MB. We scan the yearbook pages (9x12) at 200 dpi grayscale and they are around 2MB per page.

      HTH.

  160. MOD PARENT UP by hopethishelps · · Score: 1

    Seems a reasonable comment to me. should never have been modded down.

  161. best conversion I know by tfcdesign · · Score: 1

    Get an efax account and fax them to yoursell. They'll arrive to you as multi page tiffs. Quality is very good. Sign up for a free efax account to test efax.com

  162. I do this all the time by s.a.m · · Score: 2, Informative

    Hopefully you'll get to read this one and hopefully it won't get modded down to oblivion.

    Yes there are scanners out there that can work for you. I have a Canon DR-5020 which we just feed it a ton of paper and come back in a few and it's done. It can scan VERY quickly. PDF format would work just fine as well. It's the best option especially since it's hand written notes as well.

    If this is a requirement which is going to be on-going then you will have to pony up the money and spend a few thousand. If you're not ready to do that, you may be in luck. Some places will lease it out to you and with that few hundred bucks I'm sure you can easily get a hold of one for about a week or 2.

    Look up for people who do Document Imaging, and you should find a lot of business that come up. If you're in the washington dc area then maybe I can help you out quite a bit.

  163. Re:Get stuffed - outsource to india by remolacha · · Score: 3, Informative

    we've gotten a bunch of jobs like this - turning handwritten documents into searchable pdfs - and had a lot of luck sending them to firms in india, either by sending the documents snailmail or scanning with a sheet feeder and ftp'ing. the firm we got the best results from was called suntec, suntecindia.com I believe. I know outsourcing is a touchy subject these days, but they were all set up for this, we weren't, and their prices were quite good.

  164. How to do it... by Foofoobar · · Score: 1

    I've had to do thuis exact same thing for lawyers and had to advise them on how to do this exact same thing for documents and briefs that were hundreds of pages long.

    Basically, they have form fed scanners that can handle hundreds of pages in just a few minutes. If you use these scanners in conjunction with a good text recognition software (like Textbridge for example), you can convert them into plain text docs.

    HOWEVER... they must exist in some typed up version first. It cannot recognize handwriting. It CAN recognized typed script. But nowadays, most of those typed documents are done in word processors and do not need to be converted.

    Same some time and teach them how to use a word processor. Might I suggest Open Office? :)

    --
    This is my sig. There are many like it but this one is mine.
  165. Xerox DigiPath with Scanner is the answer. by Anonymous Coward · · Score: 0

    I work with this equipment (Xerox DigiPath and Docutech) everyday, along with alot of other digital printing software/copiers/printers.

    Xerox's DigiPath can scan in all those documents, create DigiPath TIFFs (which GhostScript does quite nicely), PDF (very high quality), and regular PostScript files.

    DigiPath contains a program called Scan & Make Ready. All of those documents can be stored, converted, printed, whatever.

    I currently work with thousands of jobs that have been converted to this format, it is ideal for black and white document storage.

    Find a digital print shop in your area, get a quote for the conversion, it should be reasonable.

    Another poster suggested just FAXing the the documents, this to is also a great FREE way, using a fax, a fax server linked to GhostScript, you'd be able to accomplish the same results.

    Using the DigiPath, you can make changes to the pages, and other things.

    I do not work for Xerox. I work for a digital printing company that uses Xerox color and black/white production equipment. I have scanned and converted many of documents in my time, doing the same thing you want to.

    Good Luck!

  166. Visioneer XP 450 Strobe by PetieG · · Score: 1

    This desktop scanner is very fast to scan and fast to xfer (ala USB2) -- i recommend it.

  167. Don't Pay For Scanning Services by Terrigena · · Score: 1

    I do conversion services on projects of 5,000,000+ pages for government, medical and financial industries. What you are describing is not "large scale" conversion. Working two hours to scan 100 pages for an instructor is not too bad with a flat bed, but if you have the budget to purchase a new scanner I think the one you cited will work fine. We use Fujitsu and Bell & Howell, but those are for production environments with 40,000+ pages a day.

    An idiot above suggested you pay to have them professionally scanned. That is a bad idea, the cost would probably exceed that of a new scanner.

    A lot of people don't like to download adobe's software so you should provide the documents in two formats. Stick with PDF and also do GIF.

    1. Re:Don't Pay For Scanning Services by Anonymous Coward · · Score: 0

      >> An idiot above suggested you pay to have them professionally scanned. That is a bad idea, the cost would probably exceed that of a new scanner.

      Correct. I work for an office supply chain and our copy center charges twenty-five cents per page for scanning to tiff, while making a hardcopy costs only eight cents per page!

      Now for the math: $300.00 / $0.25 = 1,200 pages scanned. Oops, I forgot that we charge $17.00 to burn your tiffs to a cd. So $300 - $17 = $283 / $0.25 = you're fucked.

  168. We do this kind of work at our University Library by Anonymous Coward · · Score: 1, Informative

    I work in the library at the University that I go to. We have something called "E-Reserves" where a professor can submit a bunch of documents that they want available for students to download and view on their computer. We recently set up a really neat system consisting of a sheet-fed scanner, a piece of software from Doculex called "Gobe" in combination with some scripts that we wrote. Here's how it works:

    1) Professor submits stack of documents
    2) A person at the library makes sure all the copyright stuff is in order.
    3) For each document, a piece of software that's part of Gobe is used to create a "cover sheet"
    4) Each of the documents are stacked on top of eachother with a cover sheet on top of each
    5) The stack is placed in a sheet-fed scanner
    6) Hit go
    7) ???
    8) It's on the web in 10 minutes!

  169. Scanning is silly, if it is valuable, type it. by npendleton · · Score: 1

    I am PDFing lecture notes now. At my scientific publish organization, we regularly scan our old journal articles, creating huge PDFs. But these are already well indexed in our databases and our website.

    But lecture notes are teaching materials, which many people are investing a lot of time in using, more than the journal articles. These notes should be done right.

    Lecture notes should have Tables of Contents, indexes, legibility, bookmarks, and so on. If teachers are teaching with them, the articles should be defined and linked from the ToCs and indexes. They should be typed, so files are smaller and legible. The math should be LaTex or scanned and placed. All the artwork, equations, annimations, and related files should be embedded in the PDF.

  170. Use a fax by osmood · · Score: 1

    I needed to scan up to 50-page documents to PDF and found that the cheapest way to do it, and the most automatic, was a small Oki fax, with a fax-to-email feature. It's quick, self-sufficient (no PC needed), efficient (small PDF size) and Just Works. The cost was $1500 AU from a small reseller. If you can justify that for other jobs as well it's a good solution.

  171. Re:We do this kind of work at our University Libra by Anonymous Coward · · Score: 0

    I should add, multi-page documents are placed into a single Adobe PDF document because the cover sheet seperates the multipage documents, and the cover sheet provides info such as what to name the file and where to put it. It's all very slick! No OCR though. :(

  172. Some thoughts by JohnsonWax · · Score: 1

    First, call MIT. Their Opencourseware is the largest such project, and certainly they have a mountain of useful information to pass along.

    Second, don't do any cleanup yourself. If they can't give you electronic text (PDF, Word, etc.) then give them nothing put a PDF scan. If they don't want to take the time, have them pull the funds for transcription out of their budget.

    If you have to scan, don't desktop scan. I have a small office with an automated copier/scanner. My unit probably has a total of 10 of these machines. They're cheap. Go find one you can use. They're 10x faster than desktop machine and you can post-process the PDF.

    If you have to transcribe, hire work study students. Where I am, they'd cost you about $3/hr. If you hire students in your programs, they'll learn something along the way.

    Finally, if you are going to the effort, do yourself a favor and invest in a CMS (some are very inexpensive) and put the time into to semantically code your work. That way when they want to convert it to xyz or to change the presentation (how will you handle students with disabilities?) you can do it without too much effort.

  173. University Library? by Solstice · · Score: 1

    Did you happen to check the University's library? They may have a large volume scanner that you can use. It's worth a shot, and they may even let you use it for free.

    University libraries tend to have high-volume document scanners left over from specific projects that they may not use on a daily basis. They may have used it to say, convert all of the thesises they have on file to a digital form or for a digital peridical archive or something like that.

  174. I have been using that exact scanner by Anonymous Coward · · Score: 0

    I've been using that exact model HP scanner to scan a large number of documents into PDF's. For it's price, it works pretty well. It takes about 10 - 15 seconds per page to scan. I've had a few jams, and I've found that they happen more after it's been running for a while, so when it starts jamming I take a break and let it cool down. The sheet feeder has a capacity of 35 pages, and filling it to capacity doesn't cause any jamming (more than normal).

    Even though I'm not doing character recognition, I'm using OmniPage Pro 14. It has a batch mode to automate scanning, and it can also handle double sided pages by scanning all one side, then prompting you to put the paper back in the sheet-feeder upside down. OmniPage Pro can also read PDF files, so if I want to do OCR on the files I have scanned I can load the PDF's back into OmniPage at a later date.

    If you can, try scanning the documents as black and white instead of grey scale. They'll be much smaller. I think I'm averaging about 40k per page for black and white scanned documents.

  175. Distributed hand transcription or a fax machine. by munpfazy · · Score: 1

    Perhaps the option which is more labor intensive (but not for you) is to tell the prof's to assign a few pages to each student in the course and ask that the students typeset their section. If there are difficulties about assigning that sort of thing, you can always make it optional and have students volunteer. Most probably will. Make sure everyone is using the same format - eg. latex with standard packages - and then just combine everything.

    Sure, it's a lot of work. But at the end you'll have a beautiful set of notes, and they'll be easy to edit the next time the class is taught. It will also save you the trouble of either trying to get image->text converts to handle equations or the file size hit that comes with encoding everything as images.

    I've seen this work beautifully in a single grad level technical course. Might be a lot more difficult in a general-ed class, where students have less invested in the material and may not even understand the text they're typesetting.

    A very cheap, funky, but workable alternative is to use a fax machine and a faxmodem. Chances are there's a machine somewhere in the department which will take a stack of hundreds of pages. Just send them all to a computer and convert them to the compressed format of your choice. Our campus has a centralized system that does this. The result is an ugly, low-resulution mess, but it does work. The quality won't be anything close to what a professional typesetter would produce, but it has the advantage of being both free and ongoing.

    - Munpfazy (rather thinks it ought to be the prof's job to typeset their own damn notes... but can see why that might not be easy to argue.)

  176. document feeder scanner by Anonymous Coward · · Score: 0

    Get a document feeder scanner. Lots of multifunction machines that do fax have these. I don't have the model number handy, but we have a MF machine at my office that supports scans from the doc-feeder. I scanned in ~60 pages with it one day last week. Just loaded them up, set the DPI, and went to lunch. When I got back, there was a pretty reasonably sized PDF.

  177. Kinkos isn't worth it (probably). by darkonc · · Score: 2, Interesting
    Somebody else noted that Kinko's would probably charge $.30/page. That's $30/100 pages. If you can manage to set up a sheet-feeding scanner such that you can do one page/30 seconds you would be getting cheaper results by paying that person $30/hour to do the same job.

    As other people pointed out, if you can get a couple of departments in on this, then you can more easily amortize the costs of really good equipment to do this...

    One thing that I'll note is that I don't really like PDFs for this sort of stuff. If you really have a 100 page article, you're going to be looking at a 3 meg file and, perhaps, a 30 second startup time... That's fine for someone who's going to read the document from cover to cover, or print it... On the other hand, it's a pain if you only want to look at pages 37 and 38.

    GrokLaw gets PDFs of court filings regularly, and I got so fed up with PDF's that I created a (semi-automated) batch system to split up the PDF's into separate PNG images and create a simple index.

    You can see a sample here. Far easier to view a page or two there (IMNSHO) -- but not as easy if you just want to download and print it.

    Before you go too far, you might want to get a good handle on how people are likely to use what you produce -- Use that knowledge to decide just how you want to organize the result. You may want to make it available in two (or more) different formats. It's not that difficult to bulk convert things between different forms (at lest, not if you can dual boot into Linux, or have OS/X).

    --
    Sometimes boldness is in fashion. Sometimes only the brave will be bold.
  178. Try to fax it... by Anonymous Coward · · Score: 0

    Try to fax it with a fax machine to a computer with a fax-modem.

  179. Canon 2080c by b0bby · · Score: 1

    The Canon 2080c does double sided well - you can set them so they discard pages of less than a given % of black, so you can toss in a mixture of single & double sided pages & the software tosses out the blanks. I've been happy with them for the larger scanning jobs, they can output multipage tifs or pdfs, rotate the image etc.They're closer to $500-600 though I think...

  180. Use a Digital Camera on a Tripod by |>>? · · Score: 1

    As some of you know, I hit the road over a year ago, but my wife couldn't bear to leave her favorite recipes behind, so we set our digital camera up on a tripod, with some rulers stuck to the table to stop sheets from mis-aligning and just photographed the lot. It works a treat!

    --
    |>>? ..EBCDIC for Onno..
  181. LyX for LaTeX!! by IronBlade · · Score: 2, Interesting
    Get some students of the professor's course to type them into LaTeX.

    Use the fairly user-friendly LyX to do the LaTeX-ing.
    Heck, get the academics themselves using it to prepare their notes in the first place!
    They might actually thank you for introducing them to this convenient and easy document processor.

    --
    Important info:
    http://www.lifeaftertheoilcrash.net
    http://dieoff.org/synopsis.htm
    http://www.peakoil.net
  182. One word: SCSI scanner-Any Port in a storm. by Anonymous Coward · · Score: 0

    I use to work for Epson, and we had high-end scanners ($1000+) that would take SCSI, Firewire, Ethernet, or Parallel Port if you wanted. I'd personally prefer Ethernet, because you can put the machine anywere. Firewore's nice for the local machine.

  183. Distributed workload by Adrick42 · · Score: 1

    Everyone seems to be attacking this with technological solutions. I say return the papers to the prof's in question and tell them you will be glad to put up anything that is submitted in digital format. (Read: not on papaer)

    The time-suck that this would represent would be enormous and IMHO any director would understand why undertaking this project in this fashion would be ridiculous.

  184. Here's the best solution to go with... by bigtrouble77 · · Score: 1

    I think this guy needs a scanner like the Xerox DocuMate 510. It's only $350 and pretty much does what he needs. At 10 pages per minute it's definately not a speed daemon, but it doesn't really sound like he's going to be scanning 10 of thousands of documents anyway. The only pain is going to be dealing with those multi-sided pages.

    I think this scanner plus a summer temp should be enough to get all those prof notes scanned and organized. Definately not an unrealistic request.

  185. Bunch of BS by WhatsAProGingrass · · Score: 1

    Lecture notes on photocopy paper sounds like a way for the prof to make even more cash off the student. I'd ask him for more money for the troubles. He is going to make his crappy ass hand written notes into a supplement book and make students buy it for likr 15, 20 bucks. I think the prof should spend some quality time typing is crap up if he want to make money off it. I mean, some lecture notes are good, but when your in lecture class, don't you take notes?

    As for mass scanning, I would definately take it to the pro's.

    --
    Mark
  186. JPEG2000 by Sparr0 · · Score: 1

    I know your target audience cant view them, so they arent an end solution, but JPEG2000 would be very appropriate for storing the initial scanned images. It performs exceptionally well for compressing things like handwriting (lots of distinct changes) that arent 'sharp' enough for great PNG/GIF compression.

  187. Suggestions by foltzwerk · · Score: 1

    As long as you're not expecting to OCR the professor's handwriting you should be able to do this task fairly quickly. Under an hour after hardware is configured. This Microtek scanner http://tinyurl.com/ys5dk is a good value. I have had terrible experiences http://tinyurl.com/2zsor with HP document feeders. If you have access to Microsoft Office XP Professional or later you could use the Microsoft Office Document Scanning program to easily scan all pages into one large multi-page TIFF document. In turn, the TIFF document can be processed into PDF files with the free Paper Capture plugin for Adobe Acrobat (not sure if the paper capture plugin is available on Mac). There are probably Open Source software tools available to do the above mentioned process, anyone care to chime in? Anyone care to chime in and suggest Mac software to accomplish the same? Scanning at 1 bit 300dpi would be ideal (for speed and final doc size) but if the professor's notes are in varying shages you may be required to scan at 8bit. I know this is easy because last month I used my Epson Perfection 2400 scanner with my 1.8GHz Windows/Office XP laptop to scan 130 pages in 50 minutes. The result was a multi-page TIFF file that was easily converted to PDF. This was accomplished without an automatic document feeder. I also was using a USB 2.0 connection which I believe helped speed things up. Good luck.

  188. priorities by tinkerton · · Score: 1

    1. make sure you use a good sheet feeder.

    2. that's all.

    The optimal choice of compression is not important. Does it matter if one format takes twice as much place as another format? 100 pages will fit on a CD.

    Scanning time : if a few thousand * 20 seconds is acceptable, then i wouldn't bother too much about that either.

    OCR: only consider it for indexing typed pages, if at all.

  189. Outsource it abroad by rahard · · Score: 1
    I know a group of people here (in Indonesia) who got a job of scanning various things (images, documents, etc.). Their office opens 24 hours/day, (3 or 4 shifts?), scanning, scanning, ... and scanning. The output is the written to CD(s) and sent back.

    I don't know if they're still doing it though.
    But, for labor intensive tasks, just outsource it.
    It's not a classified / top secret document anyway, right?
    You can work on more productive things.

    my 2 cents

  190. Useful software link by egork · · Score: 1

    I have done once almost the work you have to do. I put all the paper into sheetfeed scanner (friend of mine in "another" department had one), got jpegs with 300dpi resolution and burned them all on a CD. Then I run them all through OCR on my PC and finally through (a translation software Promt ) Later on I would eyeball the translation and correct it manually, but you do not need this step at all as you do not need any translation.

    The whole setup worked just fine for me. Well, if I had no friend with a scanner solution I would probably just buy myself one and use a document management software. My favorite one is Fine-reader Macintosh version is also available.

    PDF is good if you want to package the images as books, but I believe jpegs can be processed on almost any system. We actually used these HP digital senders, but not that much.

  191. Get them typed by The_reformant · · Score: 1

    As a student I can honestly say even when theyre all scanned and everything theyre still rubbish. My advice is to take those hundred dollars and get someone to type them for you..or do it yourself..once you get to know latex its pretty quick for laying out equations and stuff, only scan in diagrams etc and include them in your latex document in eps format.

    Seriously the last thing the students need is for their professor to thing he's fulfilled the task of putting the course notes on the web when all that up is some scans of barely legible scrawls
    For $5 /hour I'd happily translate some notes into latex maybe you could approach the students on the course to see if they'd be willing to do the donkey work.

    --
    I have discovered a truly remarkable sig which this post is too small to contain.
  192. Don't bother by An+Onerous+Coward · · Score: 2, Insightful

    Frankly, I've seen professors' handwritten lecture notes, and 90% of them add nothing to the educational process. Certainly not more than a quick note saying, "Read sections 2.1, 2.2, and 2.4, paying special attention to least-squares curve fitting and finding orthonormal bases." They're generally disorganized and difficult to follow because they usually take a lot of material for granted when they write.

    The mere fact that it's handwritten means that it's basically a rough draft that was hastily flung together. Send them back to him, and have him type them in and rework them until he figures they're worth recycling for next semester. The prof will save time in the long run, and the students will have something nice, clean, and organized to peruse.

    --

    You want the truthiness? You can't handle the truthiness!

  193. Old Tech way by thogard · · Score: 1

    Dig out the campus directory and look under "secretary pool" for someone taht can type in the messy text at speeds that may only be 1/2 the speed of a cheap scanner. They will most likly type it into word (or maybe word perfect) but then if you convert it to LaTeX, you can add in all the nice forumlas and figures in a way that they can be properly maintained.

  194. Large-scale? Ha! by Anonymous Coward · · Score: 0

    Sorry, but 100+ pages isn't large scale. I work for a litigation support company, and we regularly get 10,000+ pages a week to scan, OCR, and load into a database. And then we also hire people to read through them all and do sorting and filtering. And yes, there are scanners that do what you want, but they cost a bundle. Good luck.

  195. Spring for tiff conversion and ftp storage by georgeha · · Score: 2, Informative

    or just get the ftp storage and a DocuJob Converter, which converts DocuTech jobs to TIFF or PS, or just use a DigiPath instead.

  196. Scansoft Visioneer PaperPort Office 9 by seier · · Score: 0

    Holy smokes, it would seem that almost no one or no one in fact knows what they hell they are talking about. Half the messages in here should have PaperPort in the subject line, but they don't. PaperPort has since 1998 and before that presumably been the best Paperless office solution. PC magazine just named it the best document handling system again in their June 8, 2004 issue. Just get an epensive ADF from Visioneer, HP, Brother, or anyone else that provides integration (no, not TWAIN compliant drivers, but a scanner who's buttons and software will actually integrate with PaperPort 9. Make sure it says PaperPort 9, not 7, and not 8, that won't work very well with 9.). If you have to go with PaperPort 7 or 8 then I'd recommend downloading PDFCreator an open source application from SF.net. Cheers, Christian Blackburn

  197. Brother MFC series by DarthBobo · · Score: 1

    I have a Brother MFC-8420, its a $400 laser printer/sheet-fed copier/scanner/printer. It comes with PaperPort software (which you need to upgrade to PDF capabality.) I use it to scan journal articles, notes and bills and it does a great job.

    You absolutely aren't going to find a solution under $500 that does turn you into 10c/hour slave.

    --
    +--------------------- You idiot! I told you we were facing the wrong way!
  198. Alternatives by RebornData · · Score: 2, Informative

    Hardware for image acquisition:
    Check to see if the department copy machine has scan functions... most built in the past few years do, even if they aren't used in most places for that. You'll get a decent sheet feeder and way faster scanning than most desktop sheet-fed scanners.

    If you have to buy something and have to go *really* cheap, you could get a multi-function print / scan / fax thing. Most will handle legal size, because they're not actually moving the sheet fed paper onto the flatbed glass... the image element stays stationary while the paper goes by. But, of course, you get what you pay for... expect to spend time dealing with paperjams and skipped pages. However, it should be faster than hand-feeding a flatbed.

    Software:
    I mention this simply because nobody else has (that I've found): Scansoft Omnipage Pro is designed for highly repetitive, batch-oriented OCR. It has options for doing automated or hand-tweaked "area recognition" (separating text from graphics) and has the best proofreading UI I've seen... it flags "low confidence" recognitions automatically, and displays both it's best dictionary guesses and the actual scanned words. Not sure it will help much with hand-written work, but for printed material it works well.

    Format: Your primary concern when looking for a destination file format should be longevity... will the files be readable 5 years from now? I've seen a number of people recommending highly efficient but obscure compression schemes, which are a terrible idea if you want the data to stick around. Saving a few bits doesn't do you much good if you can't figure out what they mean. I recommend that people scan to two formats, just for safety (Omnipage can do this automatically).

    -R

  199. Xerox DocuMate 510 by jball · · Score: 1

    This scanner by xerox looks promising. Note it does not do duplex (two-sided scans).

  200. Find a Scanner that Does PDF by cmacb · · Score: 1

    http://www.pcmag.co.uk/Products/Hardware/1145964

    for example. But HP used to make one too, I couldn't find it just now. It was pretty cool. You could set it up in a central area and let everyone get to it. Scan in your document, and the scanner would send you the results as an E-MAIL attachment. This technology REALLY should have replaced faxing by now.

    Anyway, if you make the process easy enough, maybe those lazy professors will do it for themselves. They will for a while at least, 'till the new-toy effect wears off.

  201. Just for some perspective... by MickLinux · · Score: 1

    ... if this job is "scan it straight to PDF", then the result will be huge, really eat bandwidth, and not be very useful to the students. It'll take forever to load.

    On the other hand, if you want something fast, accurate, easy to use, and useful, then you have a job similar to what I and two others did -- at $15-$25 per page.

    http://www.brookscole.com/cgi-brookscole/course_ pr oducts_bc.pl?fid=M20b&product_isbn_issn=0534408427 &discipline_number=13

    Of course, when we first started jobs like this, the publisher specified MS Word 5.1a for Mac; and it took us 1/2 an hour per page ($11-$15/pg). Then they wanted it in HTML, so they specified MS Word 98. That jumped our time per page to 1.5 hours, and at $15 per page, we lost around $7000.

    Then we changed it to Quark + Acrobat, with pieces available in Word (but no final prepublishing in Word), and that took us an hour per page at $25 per page. At that rate, we still went broke, but barely finished our contract, saved the publisher ~$100k by reducing the page count, and made an excellent study guide.

    However, as of right now, we said that our next bid would have to be significantly higher ($70k-$100k), and the publisher decided they want to try someone else.

    But you are right about the college professors not realizing what they were asking. That hour per page not only included layout, graphics, equations, and formatting. It included approximately 2-3 complete rewrites on the text, chapter after chapter, and sometimes I had to suggest the final wording, myself.

    --
    Correct Horse Battery Staple: 72 bits of entropy. Enter "Correct H" into google. When it generates the phrase, that's
  202. Re:DjVu is way better by Anonymous Coward · · Score: 0

    The PDF capture thing doesn't work very well actually. You get those ugly documents that are a mixture of quasi-recognized text and bitmaps.

    The documents look really ugly, and the faithfulness to the original is not so good.

    DjVu manages to produce a fast rendering, totally faithfull image (with hidden searchable OCRed text) in smaller files than PDF's mixed format.

    Go with DjVu, it's open source.

  203. DjVu is way better than PDF by Anonymous Coward · · Score: 0
    DjVu is way better than PDF for scanned documents (see http://djvu.sf.net, http://www.djvuzone.org and http://www.djvu.com).


    The files are about 5 times smaller than with PDF for black and white 300dpi scans, and 10 or 20 times smaller for color scans (nothing even comes close to DjVu for high-res document scans).


    DjVu is open source (the decoders and viewers at least). There are open source compressors, but they are not very good for scanned docs. You are better off using the free conversion server (see http://any2djvu.djvuzone.org ), or the commercial app from LizardTech (there is a free download version).


    -- Anonycous Moward.

  204. 300$ doesn't mean large scale. by Stonent1 · · Score: 1

    That's a PERSONAL scanner. You need something like this

    I'm telling you don't go for the cheap route for something like this. I've worked at several companies that generate and scan in thousands of invoices per person per day. They used some heavy leased Bell+Howell scanners with software called Documentum which provided a browser frontend to the invoices. Similar to what google does to PDF files. You could search text (even handwritten IIRC) and display the documents in your browser and print them.

  205. Another vote for Xerox workflow by Anonymous Coward · · Score: 0

    I agree with the other posters, you don't want to try this with consumer products, just too many pages. Xerox is the way to go. I've used their DigiPath solution with their professional scanner and the main advantage is the some 100 page/minute speed, 600dpi, and it never jams or misses a sheet. Once it is scanned in, you can just export it to PDF using their software. It's the standard for big companies and you should be able to find them at Kinko's or at a professional print shop.

  206. I think Fujitsu has one of those scanner by Anonymous Coward · · Score: 0

    Fujitsu has one of those scanner with autofeed.
    I saw one at work, but never use it.

  207. Who do you work for? by Anonumous+Coward · · Score: 1

    Asking for the impossible is easy, delivering the impossible for just under 200 dollars is slightly more difficult.

    You need to start by answering a simple question, probably together with your boss (or even letting him answer it for you): who works for whom?

    If the students work for the teachers, then you can publish huge and illegible scans and let the students work them out.

    If the teachers work for the students, then the teachers should deliver cleanly typed and formatted electonic documents, which you can turn to neat .pdfs for the students without any scanner at all.

    If you work for the school, then the school should provide you with whatever means it reasonably takes, money- and timewise, to process the work you have, even if that means buying industrial scanners and exhorbitantly-priced software for handwriting recognition, or sitting for weeks there, typing out the hand-written papers.

    My point is: everybody wants to offload all their responsibilities on the admin, but that's surely not a reason for the admin to go along with that. If they want you to do the impossible, they should also pay you accordingly. Do they?

  208. some practical advice for you by sribe · · Score: 1

    However, every desktop scanner I've ever used takes 1-2 minutes of user-attention per page and the resulting files end up Huge, impossible-to-read, or both. All I have at my disposal is my PowerBook, Acrobat, a couple hundred dollars of department funds for a new scanner (this maybe?), and, if I ask nicely, overnight use of the secretary's Win2k box. Any ideas?

    - You do need more than a couple hundred dollars, but certainly not 10x--so maybe you can talk them up a bit.

    - I have one of these on my desk connected to my Mac (see note) and am sure you would be pleased with its performance. CDW sells it for $999, and I've seen it offered by one of their partners (Scantastik if I recall correctly) for under $800.

    - I think that this less expensive scanner might be just fine for what you want to do. CDW has it for $480, and the Fuji web page mentions a $100 rebate.

    - The problem with the hugeness of PDFs relates to the graphics file format. You can embed graphics in PDF using more than 1 format, and much software defaults to JPEG. What you want for typed or handwritten pages (no color diagrams or photos) is 1-bit TIFF with CCITT Group 4 compression. That will easily get you back down to < 100K per page, often 20-30k per page at 300dpi.

    Note: the fi-4120c does not come with a Mac driver; I wrote my own and it's not yet complete, thus not fit for distribution. In fact, you'll find that the kind of scanner you want is generally not supported on the Mac at all. So you definitely need to check into borrowing that Windows box.

  209. Re:A Fujitsu scanner, SANE and Quartz Python bindi by Anonymous Coward · · Score: 0

    Listen to this guy! The HP line 5500/7400/8200 is the cheapest with automatic document feeders, but I would not recommend it for daily usage.

    Fujitsu's 4120c has a nice little ADF. You won't have a flatbed, but that shouldn't matter since you already have one. It's a bit pricier but it'll save you headaches. You might also want to look for the older Fujitsu 3091.

  210. Lots of Options by PhunkyOne · · Score: 1
    I have started doing this at my organization on lots of different levels.

    For personal level scanning we use Visioneer 9450's with Acrobat 6.0 Professional - I have found that Acrobat 6 makes file formats smaller if you are willing to sacrifice backwards compatibility. These work pretty well when people want to do maybe 5 batches of 5-20 sheets per day. A little spendy for many users which leads me to out next method.

    Each division has a Xerox DocuCenter 425 (Docucenter's are smaller units - not behomeths like the print/copy center class DocuTechs). These units have scan to email capability at approx 25-30 pages a minute including double sides capacity. They work quite well allowing employees to walk up select their name drop their document in the feeder and hit the start button. 30 sec to a minute later it's in their inbox.

    Next for the copy center in the department we have a Kodak i60 scanner (I think it scans 60 sheets a minute - could be wrong though). This one scans both sides at the same time, unlike the xerox which sucks it through a second time for the second side. It comes with Kodak Capture software which does a great job at processing jobs including blank page removal which is quite helpful if you have a set of documents that have material on some of the back sides of pages but not all. This works really well as we have a stand alone computer dedicated to this task.

    Next in the copy center we have a Xerox DocuCenter Pro 75 which does essentially the same thing except it drops the PDFs or TIFFs directly to a novell or SMB share.

    I hope this helps people some...costs are as follows (a rough idea), the visioneer about $200-250, the xerox 425 about $8000-10000 (also serves as network printer and copier), the Kodak i60 is about $1600-2100, and the WCP-75 is about $35000 give or take.

  211. How about getting your Prof into the 1990's? by bigt_littleodd · · Score: 1
    Seriously, come on...scanning handwritten notes for PDF distribution?

    He/she could get a used Palm on eBay for less than what you could buy a scanner for. Sure, the handwriting recog doesn't work for everyone, but it's a step in the right direction.

    Your prof prolly already has a computer. Can't the thoughts be typed by the professor? That sure would make more sense for everyone.

    Maybe your prof needs a new computer. How about one of those newfangled tabletPC thingies? Again, there's the handwriting recog problem.

    --
    Let's play Four Horsemen of the Apocalypse. I'll be Pestilence.
  212. I did this once by acidrain69 · · Score: 1

    I had to scan in a photocopy of an old book written by a prof. I used abbyy Finereader 5.0 and some kind of Cannon fax/printer/scanner solution. The cannon probably wasn't worth more than $200-400, they had it on hand and I borrowed it. It didn't hold the whole book at once (300+ pages, all ugly photocopies), so I did small stacks at a time, and I would scan more pages while editting the scans from the previous stack. Photocopies are fugly, so you have to remove all the little marks and such, and some of the text and equatons came out bad, so I had to kind of paint the document as well. For the most part I would just format the tech and run the OCR on that, and keep the graphs and equations as simple black and white images.

    It took a while, but I was about to be about of a job, so I kind of dragged my feet on it. If you have nice source documents, it shouldn't take too long to do this and the OCR software is pretty good nowadays (unlike when I tried it in the early 90's on a grayscale scanner).

    Fine reader is real sweet and worth the money. Someone had a similar question on slashdot and it was recommended, and I used it when I had to do this, so I'll pay it forward and recommend it for you.

    --
    -- Having a Creationist Museum is like having an Atheist place of worship
  213. Large-scale solution by masoncooper · · Score: 1

    My office is actually working on a large scale paper-to-digital conversion. So far we have nearly 300,000 sheets scanned but we are far from being finished. Scanning the paper is only the first part of the equation. Since you are scanning a hundred pages max I'd recommend a plain scanner with an ADF capacity of at least 30 pages. Our office is using our high-speed copiers to scan-to-tiff as fast as 120ppm on the newest units. Documents are separated by a dark sheet with a large X on it. A scanrouter server then takes these file and drops them in a folder where someone, using a thumbnail view identifies the first page and classifies it by person and document type (depo, condensed depo, cv, etc...) They then select the entire document (looking for the next large X and drops them onto a custom application I wrote.
    This program will rename/move these files into an image share under the format //#/00000001.tif where # is the next available folder number beginning with 1. The program uses an auto-complete box for the persons name so we reduce the number of misspellings and the available doctypes can be modified using an ini file.

    Next, we run another custom program that will crawl ALL folders in that image share looking for any .TIF file that does not have an identically named .TXT file. More on this later. Each file it finds matching this criteria are placed in a database.
    Finally, another program which plugs into ScanSoft's OmniPage Pro 14 COM abilities and will pull files from this database, recognize, and place the output TXT file next to the accompanying TIF. The benefit of using a database is that we can unleash a large group of machines on recognizing these pages and start/stop them as needed. We figured that when we're done, we'll need approximately 30 days of computing time on a P4/2.8Ghz to finish off 250,000 files (for those counting, that's about 10 seconds a page)

    Now here's where it all comes together, we use an application called Summation which will import a .dii file (which is a specially formatted text file describing each folder, generated by yet another app) and allow users to search for text by person, type, anything, and read/print out the corresponding page.

  214. Copyright by Phazz666 · · Score: 0

    Thats copyright plain and simple. Be careful the FBI might bump copyright up to 4th under music piracy on their agenda.

  215. xerox docushare does EXACTLY this by Anonymous Coward · · Score: 0

    http://www.xerox.com/go/xrx/equipment/product_deta ils.jsp?tab=Overview&prodID=DocuShare&Xcntry=USA&X lang=en_US

  216. Scanner Wiz by vonkas · · Score: 1

    unless your Powerbook has USB2 stay away from anything but Firewire for the interface, HP has notoriously bad scanner software for Macs, Canon is much better and it's got to be single sided sheetfed (doublesided = no end of trouble). As a prep copy double sided material to single sheet. Scan at 100-150 dpi resolution and use Acrobat 5 or 6 to make the pdfs. If you're good at scripting (or ask a Uni script wiz) you can string it all together with AppleScript. If mastered you will become the Univerity PDF Producer! Don't underestimate that title - doing what you intend to is unique, extremely useful and will be highly valued!

  217. ScanSnap by takasuz · · Score: 2, Interesting

    ScanSnap may be just what you need if the notes are on a uniform-sized paper (e.g. A4 or letter). You need Acrobat (included) on a Windows machine, but you just set the notes on the scanner and click a mouse then it scans 50 sheets (both sides in one-pass) without human intervention and gives you an Acrobat file in a few minutes. It is small and weighs light so you can easily bring it into the secretary's office. The price is also reasonable ($495 with Acrobat 6.0), and it seems they are even offering a $100 rebate now.

    The specified resolution is for a colored documents. For a b/w one, you will get a better resolution. You can obtain scan samples from a Japanese page (pdf files at the bottom).

    Actually, a newer model, fi-5110EOX, has already been available in Japan, and I think that is why they are offering a rebate now. The new model have usb2.0 connection and a higher resolution mode (excellent) that is not possible with fi-4110.

  218. HP 4c also works with SGI IRIX by green+pizza · · Score: 1

    The HP 4C scanner also works with Impressario, the printing/scanning software that ships with IRIX 6.5. I used to have this exact scanner running on my Silicon Graphics Indy.

  219. Distributed proofreading? by cwm9 · · Score: 2, Interesting

    I don't know what the specifics of your work is, but you probably have a huge supply of untapped workpower at your fingertips.

    The students who are taking these classes could easilly be a source of tappable work hours.

    See Project Gutenberg's proofreading site for an example of this type of effort. http://www.pgdp.net/c/default.php

    If you could get the professors to offer a little bit of extra credit for proofreading or converting a page, the task could be much easier for you.

    Envision this: You use and ADF to scan an entire stack of notes in order, but you don't worry about how the scanning goes on each page. Then you xerox the whole stack and place the copies in a binder in someone's office. The students are then offered 10 points extra credit per page translated from .TIF to word/wordperfect/Mathematica, whatever, up to three pages worth.

    The points are justified since the student is in the class and learning something by carefully duplicating, analyzing, correcting, and studying the professors notes for that class. (Can you imagine a more likely way to end up accidentally committing three pages of facts to memory?)

    You can place the .TIF files in a class-accessible online folder, and accept the end result in an e-mail.

    If the file isn't legible, the student can check the xeroxed copy out from the binder. Since it's just a copy, you don't need to worry about losing it.

    You could skip the scanning altogeather, and ask the students to return any pages they don't finish translating.

    Obviously this works best for large classes where the student:pages ratio is large.

    Make sure you number pages if you do anything like this.

  220. Ricoh Aficios, Ancient Fujitsus, and OmniPage Pro by BigBlockMopar · · Score: 4, Informative

    we've gotten a bunch of jobs like this - turning handwritten documents into searchable pdfs

    We had to do this, too. For a Court, which requires the reasons, decisions, etc. to be publicly available online.

    *Thousands* of documents, hundreds of pages each. The responsible department got me, as the IT guy, to set it up for them (after they'd already bought the stuff to do it).

    Basically, a couple of Ricoh Aficio series copier/scanners, a couple of ancient Fujitsu sheet-feed scanners, and a bunch of students sitting all day in front of computers running OmniPage Pro.

    The Ricohs were great on paper - fast, networked, etc. but their scanner drivers were poor (reminded me of bad CD-ROM drivers - "Copywrite 1995 Behavior Tech Computer. All right reverse." [sic,sic,sic]), and their service (contract) involved having to call the Ricoh guy because the scanner portions randomly wouldn't appear on the network, then wait for him to appear while at least one of the students sat idle. 2 stars out of 5.

    Ancient Fujitsu scanners, black and white only, don't remember the model number, required proprietary SCSI cards, no support under Windows NT/XP/2K. These were commercial-grade super-expensive scanners when new (about 1990). Installed Windows 95 on a bunch of relics with ISA slots for the SCSI cards and let 'er rip. Scanning was fast, feed was reliable like a good-quality photocopier or fax machine. Only issue was requirement for an old computer running an old OS; better overall than Ricohs - 4 stars out 5.

    OmniPage Pro 12 - reading was *excellent*, far better than anything else I've ever seen. Handled French and English, simple monochrome diagrams, etc. with only very small occasional formatting problems. Print to a PDF using Acrobat on the file server. Only real problem was stability, frequently locking up and losing the scan and OCR on page 99 of a 104 page document. 2 stars out of 5, being punitive because of frustration.

    As they got to be more proficient with OPP, and as OPP's dictionaries filled up, we were able to add more and more computers and scanners, so that they were running around, tossing files into the scanners, stapling scanned documents back together, and occasionally rebooting one of the Windows 95 workstations. Peak was 15 computers and scanners.

    Task took 3 students 3 months full-time.

    --
    Fire and Meat. Yummy.
  221. Welcome to the world of tomorrow by Hecatonchires · · Score: 1

    Besides slipping in an obligatory Futurama quote, I'm here to enlighten you. A great many universities/educational institutions are making course materials available online. This is often done as a pdf on a website - wether the site is password protected for current students only is a non-issue here.

    Why pdf? So it looks the same if the student views it on a mac, windows or linux pc. Why online? Because thats the zeitgeist! Everything should be available online!

    Why not printed? Because a universities primary occupation is not printing. (It's providing administration officers with jobs)

    A lecturer/professors job is not waiting in line at the printshop. (it's raking in the research funding and spending it on fast cars and the unibar)

    Why do the students have to print it out? They don't, they can read them on a screen. Don't have a computer? Tough luck. Get access to one quick. My university would not accept handwritten anything. They provided a plethora of labs, and access times around the clock.

    The fact that these are handwritten notes that need to pdf'd (scanned, ocr'd whatever) is the real problem. Enforce some discipline on the teaching staff, make them learn to use (office presentation tool of choice). Most of them allow saving as pdf/whatever (depedning on plugin).

    --

    Yay me!

  222. Prof. Clueless, PhD by im+a+fucking+coward · · Score: 2, Insightful

    Basically, professors want to hand me a big (often 100+ page) stack of their handwritten lecture notes (with messy text, equations, and diagrams; sometimes double-sided) and expect me to post a PDF-or-something-similar to their course's web page.

    After I stopped laughing, I realized this may be a serious inquiry rather than a joke. I've assisted local government agencies in converting clear, printed, 8.5x11" text documents into searchable text / pdf documents, and the cost for these is over 10 cents a page. (Tax and mill levy records have to be verified 100% correct, as I'm sure your prof's notes need to be.) That's with volume discounting (> 500,000 pages), using nearly perfect ascii text documents, not scribbled notes.

    So my advice is to get a few bids from outside contractors, then submit a realistic estimate based on the average. Hint: Given those spec's, it's clear you/your management have no idea what's involved in this process. (Shows at least a modicum of IQ that you had the good sense to ask, however.) If you simply need to scan/save as pics (jpg/tiff -> pdf), you can do this yourself at reasonable cost/effort expenditure. Seems to be implied that you need OCR capabilities for handwritten text, as complicated as equations at that, so you're really pretty screwed. Even simply creating 100-200 kb jpg's & emailing them in an automated process is going to run into problems when the campus mail servers refuse to accept attachements larger than a Meg.

    Good luck, BWAhahahahaha!

  223. Get a really hungry dog by Anonymous Coward · · Score: 0

    and let nature take its course. "The dog ate your homework."

    Maybe try reading the notes into voice recognition software program.

  224. Re:HP Digital Sender & HP 4101 MFP by C0L0PH0N · · Score: 1

    My company has a new requirement to scan about 30,000 contracts per year (1-4 pages per contract). I've been looking at these digital senders, and have been pretty impressed. A recent develepment at HP, the HP Digital Sender 9100c has been imported into the HP4101 MFP, which is more capable and is cheaper to boot. Biggest difference, the 9100c does 15ppm at 300dpi, the HP4101 MFP does 25ppm at 600dpi. With an additional software package called DSS 3.0, the HP machine can scan your documents, convert them to TIFF or PDF, drop them into a folder on your server according to instructions from the control panel, which are configurable, and can email or fax them also. I am arranging a demo of the HP 4101 this week at my company, so do not have any experience, but have spoken with an IT director who uses them both (9100c and 4101), and he is very impressed with the HP4101, and it's cheaper. I think you can lease them for under $100, if you give a 4-year commitment. I don't know if that works for your department, but if it does, you could get a lot of bang for your buck. Of course, haven't had demo yet, and am paying attention to reports here of slow scan times, etc, and I will be sure to include that in my demo testing. Also the bit about getting a daemon running on the server. Thanks for the tips, all!

  225. Fujitsu Scanners are great by Anonymous Coward · · Score: 0

    I've used Fujitsu scanners for these things. Not the small ones (Scanpartner etc), but the big ones M3096 and M3097. There are duplex capable versions. Watch out to get a SCSI device and not a video interface model for which you need a special interface board from Kofax or Xionics. These scanners are made for large volumes.

  226. Yeah Right. by Laroue · · Score: 1

    Your only realistic option is to take the documents to an outside agency. Take them to kinkos and have them scanned in. Take the digital copy back and upload it.
    You should present to your boss that the allocated budget for your project is enough for a one time job. In other words there aren't any revisions, no professors get to modify the documents after words. Which in and of itself does greatly reduce any value to the project.
    To do this kind of job correctly, you need a high speed scanner either kodak, fujitsu or panasonic. Some software that will scan the documents in(easy to find or to make). And time. The scanner and the time are your big costs. A couple hundred just isn't enough to do it, more than once.
    I would really recommend documenting what you can do for that amount and showing it to the boss. I imagine the powers that be will change their minds and realize that would be a problem for each professor.
    On the other side you could push for the prof's giving you the document's digitally, it is after all a college. One could reasonably expect the prof's to type their own material, or have their TA's type it for them.
    Just my 2 cents....
    Either way good luck, sounds like you need it.

    --
    #### ## Laroue ####
  227. HP All in ones by strider_starslayer · · Score: 1

    As much as I find HP all in one devices to be crashy peices of crap, they are cheap, and most can do ADF scanning for under $300, you'll have to handhold the machine to get it working, but the ADF and scanner on it are actually rather solid units. (then again, what could you possibly screw up in an ADF and a scanner?!?)

    I also have to agree with others, you MUST set it up as your policy that you will only put the documents into the scanner, press scan and walk away- nothign more; or else your productivity in other areas will be consumed by this project (unless you can get more money for this project to hire students)

    --
    -Millions of Monkeys, Millions of typewriters, 6 hours of sorting through faeces encrusted pages to find: This post
  228. Use the students to do it by puusism · · Score: 1

    At my university (Univ. of Helsinki), the students at the Maths department typeset the handwritten lecture notes as part of their LaTeX course. The lecture notes are divided to small stacks of perhaps 40 pages each, and the students form teams and divide the pages among them to do the job.

    Everyone benefits: the students get LaTeX experience and curriculum units, and the professional staff needs only to proofread the results.

    --
    - Ismo
  229. the stupid version by bumby · · Score: 1

    Do as the lectures at my school, take a picture of the paper with a digicam, import the image as a gif, and put it in a MS Word document. Only ~20Mb / doc-file. The dail-up users loves it! ;)

    --
    Hey! That's my sig you're smoking there!
  230. here is how I do it. by _Qiang_ · · Score: 0

    sounds like we have a similar job.

    We have a HP scanner which i can feed the papers to the paper feeder and one button, the scanning starts automatically. of course, you can set the scanning to be low resolution and resule in a smaller pdf file for you.

    but that only scaned one side, you can turn up the pages and scan the other sides of the papers. then in PDF pro you can rearrange the paper order very easily.

    It's not hard job if you have a decent HP scanner (or any brand ) and a pdf professional.

  231. Zylab by Tune · · Score: 1

    Ask these guys. They seem to know how to go about this business (or at least claim to do so)..

  232. How I did this by Mondor · · Score: 1

    First of all, did you said HANDWRITING? If so, do you expect documents to be just a pictures, not recognized text? Anyway, you will need a printer with ADF (automatic document feeder). If you want a recognition of printed text, you can use FineReader - highly recommended Russian OCR. Handwriting may be saved as PDF, or, if size matters, TIFF (group 4, if I remember right, black-white tiff, that uses about 30kb per page). Concerning the automatic scanning, you could use that feature in FineReader, or, search the internet for such free utility. It is a matter of an hour to write such utility for this special purpose. In my case I did everything with HP ("AllInOne") SJ 3300 with ADF and FineReader 7. Just reload the tray with new papers time after time (with high precision it takes about one page per minute). Seems like your professor want's to buy a horse for a hamster price, as such printer and ADF and software costs more then $300, maybe 3 times more.

  233. It is simple by Anonymous Coward · · Score: 0

    When bidding on a complicated matter Under promise and over deliver.

  234. easy - by zarniwhoop · · Score: 1

    outsource it to India!

  235. Scanners by Gax · · Score: 1

    I bought a Brother MFC 8820D sheet-feed scanner for work. It reduced my workload by 2/3 when copying large amounts of legal documents. The "send to email" feature is nice, but a bit pointless for large scans. Once scanned, the files are saved as PDF and multi-page TIFFs.

    If you go down this route, you should check your multi-page scans before saving them. Acrobat has a random buffering problem, which causes some pages to be placed in the wrong order.

  236. I use the Brother MFC9880 by Anonymous Coward · · Score: 0

    I looked around a lot for exactly this - I've decided to try and run a "paperless house" (boy did I have a big shredding session) because I'm going to be a new New Age traveller and sell up and travel - but I want all my documents available.

    I found that the Brother MFC9880 with a network card will do the job cheaply. It is a fax/scanner/copier/printer and has a sheet feeder. It converts to multi-page TIFF (for B&W) or JPEG (for color) and will email the job to different email accounts on an SMTP server (you can set this up via a web interface).

    It's a lot cheaper than Xerox. A bit of an awkward UI, but then most product UIs are designed by engineers with Aspergers anyway.

    It doesn't jam much, either. I just wish it had a built-in shredder (my shredder motor overheated after 30 minutes so I had to resort to the end-of-the-3rd-Reich-style burning of documents).

  237. Parent is bang on the money by aegilops · · Score: 1

    Sorry for the "me too" but I would totally endorse this recommendation. We were advised to get one from our sister company. Although I was a bit skeptical at first, it soon became apparent that it was a tremendous time saver, particularly compared to the laborious manual alternative. We got the 50 page sheet feeder (would consider that a 'must') model and it was great. Same size as a small fax machine, dead simple to use. Integrated with our Exchange address book too.

    We never bothered pushing the model to explore further functionality (e.g. I proposed we looked at programming it to scan documents to save output TIFFs into a central folder, which we could then use best-of-breed OCR software to convert to text) but the potential was clear.

    Aegilops

  238. what about accessibility? by Anonymous Coward · · Score: 0

    great. scan it to a TIFF to post process..or direct to a PDF. but what about accessibility???

    If those PDF's cant be read out by a screen reader then you've just slapped yourself into a LOT of trouble. when the blind person wants access to those documents you'll soon learn about disability rights!

  239. I've had EXACTLY the same problem!!! by Anonymous+Writer · · Score: 1

    I've been trying to find out how to do this myself for ages!!! I used to archive handwritten documents and sketches using a Visioneer PaperPort Vx sheet-feed scanner on a Windows 95 laptop years ago. I could manage to save at least a file cabinet drawer worth of pages onto a single CD-R. The setup worked great, and was even portable so I could travel with it! It scanned pages pretty quickly.

    The kind of medium I was scanning could cause problems. Sheets of pad paper, paper bound notebooks, and even hard-bound notebooks that I took apart would usually have remaining bits of binding glue that would cause a paper jam. I would have to pull the page from the other end of the scanner to help it avoid jamming. Since then I've switched to using spiral-bound unruled notebooks with covers solid enough to keep the corners of the pages from curling due to wear and tear. The spiral binding insured that I didn't have to deal with binding glue jams. Crisp flat pages also prevented jams due to curled corners.

    I scanned them in at 300 dpi in black and white using the text enhanced mode so that the contrast was adjusted automatically for better compression. Without this, the blank areas of a scanned page would be percieved as having some shade, and the scanned image would have some pixel dithering to represent the shade. This would cause difficulty for the compression algorithm and result in a large file size. With the text enhanced mode, the blank areas were percieved as being absolutely white, which would maximise the efficiency of the compression algorithm. This would result in much smaller file sizes. At first, I used the PaperPort software's ".MAX" proprietary file format, but I ended up converting them to LZW-compressed TIFFs so that I could open the documents on computers not equipped with PaperPort software.

    If the papers you need to scan are crisp uncurled pages without residual binding glue like that you find on pads, scanning will be a breeze. You can use a scanner with an automatic document feeder, because you won't have to worry about paper jams. Otherwise, you will have to scan each page manually. The Visioneer Strobe XP 450 PDF looks like a good one for this. If they do have curls or glue but are all of a uniform size, a flatbed would be your best bet, because you wouldn't have to worry about jams and would have to only manually set the cropping size just once. If the papers vary in size a great deal (say if you were scanning in a bunch of receipts of different lengths and widths) a sheet-feed scanner would be better because they crop the pages automatically, although you would have to worry about jams. At least the Visioneer ones do. There is another sheet-feed scanner for the Mac called the TravelScan 464M, but I don't have any experience with it, so I don't know if it automatically crops.

    I eventually decided that I would like to try scanning in greyscale, because although black and white was fine for printed text, I felt that it wasn't clear enough for handwriting and sketches. I knew that the file sizes would be larger, so I decided I would need to burn them onto DVD. I bought the first laptop to burn DVDs immediately when it first came out, which was the PowerBook with SuperDrive. To my disappointment, I found that Visioneer dropped support for the Mac when OS X was introduced, so I couldn't use their scanners. I got a legacy Visioneer Strobe Pro scanner on eBay, ordered the Mac OS 9 installation disk from Visioneer, and I tried installing the PaperPort software for System 9, wit

  240. White paper by lachlan76 · · Score: 1

    I think the white paper is so that (semi)transparent materials will come out white. Imagine that the background was black, and you had to manually photoshop all the black out of a taped together/irregular/smaller than scan size image.
    I could be wrong of course, but the white paper helps me, and I can't think of any other reason.

  241. Use a fax machine to scan large stacks of paper by vrmlguy · · Score: 1

    Fax machines typically have excellent sheet-feeders. Take your stack of papers, and fax them to a PC with fax software installed. This will create TIFF files. Then, "print" the files to PDFs.

    --
    Nothing for 6-digit uids?
  242. Forget Kinkos - how about 0.15/pg & 0.10/pg O by Anonymous Coward · · Score: 0

    go here : http://www.GNLServices.com

    Located in Ohio...can handle out of state jobs.

  243. High-speed scanner by ares284 · · Score: 1

    I have a similiar task at a law firm. It's a pain, but having a high-speed scanner is a God-send. I use a Canon DR-5020. It's rather old, and only black and white, but it hardly ever clogs up, and is really quite fast. I can only imagine how much faster the newer models are. Scanning 100 pages at a time is nothing for this scanner (the more the merrier - there is nothing more annoying than scanning a big box of papers in which every 10 sheets or so is stapled together). Of course, I imagine the price tag is a bit hefty, but these scanners are quite easy to use. I rarely need to stop scanning due to paper jams, etc. In addition, it scans directly into Adobe Acrobat without a problem whatsoever. I scan at 300dpi b&w. I typically have around 5,000 - 10,000 pages in a document, usually ranging from 300MB to 500MB (we then burn them to CDs for storage - one CD for around 10-20 boxes full of paper, so it's quite a space saver, eh?). The last document I did was 311MB (PDF) with 6,870 pages.

    Check out Canon's high-speed scanners here.

    If you need a color high-speed scanner, may I suggest the DR-9080C.
    It claims to do "90 pages-per-minute (black-and-white or grayscale) and 50 pages-per-minute (color)."


    -Ares

  244. HP Digital Sender by Anonymous Coward · · Score: 0

    You want an HP Digital sender. It will do color and grayscale scanning of one and two sided documents. It then converts the scanned documents to a PDF file and emails it to wherever you'd like. I use this thing daily at work in order to scan handwritten notes to post on my website. The only downside is these things start at about three thousand dollars.

  245. my 2 cents by bucket74 · · Score: 1

    While we're all giving suggestions here's mine -

    I do this at my job quite a bit...I recommend capturing the pages at 300dpi grayscale and then converting/saving them as bitonal tiffs. That way Acrobat (6.0 for sure, not certain about earlier versions) can automatically apply Group4 compression to them (compression used by faxes). This will reduce your filesize tremendously. Converting the grayscale scans to bitonal is fairly simple...

    in Photoshop: first run auto contrast under the Image menu, then Image>Mode>Bitmap (options: 50% threshold, output resolution = 300 dpi).

    If you feel the quality is too low try capturing at a higher resolution (but still output to 300 in your bitonal conversion).

    If you don't have that much to do this is perfectly practical to do yourself on a flatbed (don't forget black construction paper for the double sided scans). If you have more than a hundred, I'd recommend outsourcing to a service provider.

    *begin shameless sales pitch*

    Coincidentally, I'm a digital imaging project coordinator so anyone feel free to send me a PM if you have any work of this nature you'd like to farm out.

    *end shameless sales pitch*

  246. Taking this a bit further by NeverEnoughTime · · Score: 1

    Many comments are covering the hardware and software for a project such as the original question posed, but I'm curious: does anyone use this approach to digitize their general office documentation? How many IT managers out there actually scan in all the invoices, work orders, support requests, purchase orders, etc. into a digital database? Do you think it would even be worthwhile?

  247. UMI is your friend by Foozy · · Score: 1
    UMI does this as a business. They've even got your specific need covered for next time: XanEdu
    The company's XanEdu division provides electronic learning support materials for college students. Professors use XanEdu resources to create online CoursePacks, a unique way to provide comprehensive, up-to-date information packets to supplement other educational materials. Students access CoursePacks via the Web and download information as needed.
    I don't work there (any more).
  248. My experience: by DigitalSorceress · · Score: 1

    I used to scan/ocr typewritten hard-copy manuscripts for a technical publisher. I would get a 500+ page manuscript on Friday and have to deliver ASCII text files for all of it by Sunday evening.

    I used a Hewlett Packard 4C scanner with 50 page doc feeder.

    It wasn't too bad - drop in 50 pages, hit scan... go do something for 30 to 50 minutes, come back, repeat. (You aren't really asking about OCR, so I'll skip the gruesome details)

    Of all the scanners in the consumer market ($1000 or under ish), past and present, that I've tried I've always liked the HP 4C and its replacement series (6200c or somesuch?). They would handle about 1 page every 30 to 50 seconds with the doc feeder at the resolution I was using.

    --

    The Digital Sorceress
  249. Get stuffed by dzinegrp · · Score: 1

    ...or gimme your lecture recordings.

    Since professors love to hear themselves talk, perhaps they recorded or will record their lecture notes. Play these through a voice-text conversion application.

  250. This is a GREAT strategy! by crighton · · Score: 1

    Yes, yes, yes

    Even if it's not LaTeX (the tool doesn't REALLY matter), this is the way to go. It get's the student's engaged with the material, they organize and format it in a way that works for them, and the job gets done. It's a strategy that benefits a lot of people.

  251. Common job for law firms by SpaceGhost · · Score: 1

    This is a common task for law firms. typically in the discovery process you get tons of documents, and it is becoming more and more common to send those out to a vendor to get imaged as 300dpi type 4 tiffs. there should be some vendors in your area that can do this, and provide OCR services (IF it's machine-generated text, handwriting doesnt ocr.) Look for firms advertising litigation support. It's also common to number each page for future reference, the vendor can do this and it can make it a lot easier to find things. iPro is a popular scanning software, and Ricoh's are popular scanners. Tell the vendor you just want .pdfs, that should be easy for them.

  252. OCR by Raven42rac · · Score: 1

    Have you looked into OCR software? You may have to do some cleaning up afterward, but the files would be much smaller because they would be characters instead of being basically pictures. I don't know if the software would be able to work well with the messy handwriting, but I would give it a shot. Look here for some more information.

    --
    I hate sigs.
  253. Use a digital camera by anvilsoup · · Score: 1

    How about taking a digital photo of each page? You might need a macro lense or take the photo from further away with zoom. Obviously a web cam won't give you enough resolution though.

  254. Workable Methods and Tools by Anonymous Coward · · Score: 0

    (1) Switch on TV, scanner, and get some kind of distraction from the mind numbing endeavour.
    (2) SoftSnow.biz - Probably not as good for handwritten notes as it should be, but is very highly recommended amongst people I know who scan entire books (200 pages average)

    Alternatively, take a look at project gutenberg's distributed proofing tools.
    (3) HTML Tidy
    (4) Reproof XHTML manually / convert to other formats as needed.

    The slowest part is the manual proofreading.

  255. Scanning sub by x-guru · · Score: 1

    See if there is a local scanning sub-contractor that can handle that work en-mass.

    These guys have large very fast machines that could scan all of your documents in just a few minutes, and I am sure the charge would be minimal.

    Best of Luck.

    --x

  256. general recommendations by Anonymous Coward · · Score: 0

    You've described a fairly average problem. Here at Penn State, our Electronic Reserves department scans course reserves using HP scanners with document feeders, Acrobat, and photocopies of pages. They build low res PDFs, tweaked for the express purpose of being displayed on the web. As I recall, they scan in grayscale at 200 dpi for text, and set Acrobat to reduce page size, use maximum page compression, etc.

    I work in the Preservation department and we use the Xerox Digipath system, which has high-speed black and white scanners and can build PDFs from them. Fast but highly expensive ($50,000 for the entire system) and the individual image files are stored in the Xerox proprietary file format.

    If you can get a Fujitsu scanner with your department funds, I would HIGHLY recommend that. They don't curl pages as they pull the documents through, and can get decent scanning speeds. Some models also scan both sides of the page at once. I would definitely recommend using web display settings in Acrobat when saving the files to PDF. I would, however, recommend using 300 dpi when scanning, just so you get decent images going into Acrobat. It's also the national guideline recommended by the Digital Library Federation for scanning anything in grayscale.

    I would also brief the professors who want these materials online that they're going to have to accept delays in getting those images up and loss of image quality. 100s of pages do not go up in a few hours, all cleaned up and OCRed. I wouldn't even bother with cleaning and OCR. The pages are being scanned so people can read them, not search through them. If the professors want more out of this, they should kick in money for it.

    And back up all of the PDFs to some media (CD, hard drive, etc.). A worthwhile investment so the pages don't have to scanned a second or third time.

  257. obvious solution by xpyr · · Score: 1

    how bout telling your professor to learn how to type? buy a laptop for him and the savings will add up.