Ask Slashdot: State-of-the-Art In Amateur Book Scanning?

← Back to Stories (view on slashdot.org)

Ask Slashdot: State-of-the-Art In Amateur Book Scanning?

Posted by timothy on Sunday December 27, 2015 @07:14AM from the well-equipped-household dept.

An anonymous reader writes: I have a shelf full of books and other book-like things ranging from old to very old that I would like to turn into PDFs (or other similarly portable format), and have been on a slow-burn quest for the right hardware and method to do so on a budget. These are mostly sentimental — things handed down over generations, and they include family bibles, notebooks, and photo albums, as well as some conventional — published, bound — books from the late 19th and early 20th Century. None of them are especially valuable as antiques, as far as I know; my goals in preserving them are a) to make them available to other people in my family who are into genealogy or just nostalgia, and b) so I can read some of those old, interesting books (et cetera) without endangering them any more than it takes to scan them once. I was intrigued by the (funded, but not yet available) scanner mentioned earlier this year on Slashdot; it seems to do a lot of things right, but like any crowdfunded project, the proof is in the pudding, and the pudding hasn't yet arrived. It's also cheap, and that fits my household budget. What methods and hardware are you using to scan old documents? Any tips you have from a similar project, with regard to hardware, treatment of the materials being scanned, light sources, file formats, clean-up and editing tools, file-size-vs-resolution tradeoffs? In the end, I'm likely to err toward high-resolution scans, since they can be knocked down to size later if need be, but I'd be interested in hearing about what tradeoffs you've found to work for you.

One big question that I'd like to have answered: Is there stand-alone Free / Open Source software, or even just cheap software (I am mostly on Linux, by choice, but won't leap onto a sword to keep my Free Software purity) that makes for easy correction of the distortion introduced by camera-based imaging? If I could easily uncurl and keystone-correct pages, then a lot of input methods (even my phone) are suddenly much more attractive. My old Casio camera could do this 10 years ago, but I haven't found a free software desktop utility that lets me turn photos into nicely squared-up pages.

122 comments

Min score:

Reason:

Sort:

Don't! by For+a+Free+Internet · 2015-12-27 07:22 · Score: 1, Funny

Scanning is stealing from GOD

--
UNITE with the Campaign for a Free Internet because today, our future begins with tomorrow!
1. Re:Don't! by fustakrakich · 2015-12-27 07:29 · Score: 2, Funny
  
  Why is that being modded down? The copyright gods demand blood. This person is in violation.. Just read that little notice on the first or second page... "All rights reserved. No part of this book may be reproduced in any form..."
  
  --
  “He’s not deformed, he’s just drunk!”
2. Re:Don't! by Anonymous Coward · 2015-12-27 07:57 · Score: 0
  
  Well that's dubious, to say the least, to claim any universal morality from the letters on a page which may have never been legally binding in the first place.
  Copyright law is extremely complex, but suffice it to say: there are plenty of times when this is legal; just look at how many books Google has digitized and made freely available.
  Also, in some jurisdictions, it is still legal to make a personal backup copy of pretty much any media, even though this has been explicitly banned now in many countries.
3. Re:Don't! by ArmoredDragon · 2015-12-27 08:03 · Score: 2, Informative
  
  Why is that being modded down? The copyright gods demand blood. This person is in violation.. Just read that little notice on the first or second page... "All rights reserved. No part of this book may be reproduced in any form..."
  Nobody modded him down. He's from Gay Nigger Association of America. I'm not joking, or trolling, or trying to be racist, that's the actual name of the group he represents, more details about them here:
  https://en.wikipedia.org/wiki/...
  He gets down modded so much that all of his posts are -1 right out the gate.
4. Re:Don't! by Anonymous Coward · 2015-12-27 08:55 · Score: 0
  
  Why is that being modded down? The copyright gods demand blood. This person is in violation.. Just read that little notice on the first or second page... "All rights reserved. No part of this book may be reproduced in any form..."
  Because you are a fool. How many sock puppets do you have, anyway?
  Just look at yourself. What would your parents think?
5. Re:Don't! by Anonymous Coward · 2015-12-27 09:51 · Score: 0
  
  You the mod bomber?
6. Re:Don't! by NostalgiaForInfinity · 2015-12-27 10:47 · Score: 1
  
  "All rights reserved. No part of this book may be reproduced in any form..."
  It says that, but it's actually a lie. In the US, publishers cannot restrict fair use.
7. Re:Don't! by suutar · 2015-12-28 08:04 · Score: 1
  
  Remember though that "fair use" (in the US) is not a right, it's a defense. You're still infringing, you're just doing so in a way that results in no damages. If you can sustain the case long enough to convince the judge that the use is indeed fair, that is.
8. Re:Don't! by Anonymous Coward · 2015-12-28 11:55 · Score: 0
  
  Remember though that "fair use" (in the US) is not a right, it's a defense. You're still infringing, you're just doing so in a way that results in no damages. If you can sustain the case long enough to convince the judge that the use is indeed fair, that is.
  Actually, that depends on who you ask.
  If you ask an ethical lawyer, he or she will tell you that "fair use" is imply an exercise of rights arising under the 9th Amendment. It's merely the way in which the right to reasonable conduct manifests itself in copyright law. There's no need to admit an infringement of anybody other parties rights ("I have the right to do this, it's reasonable conduct, end of story"). Laws and precedents to the contrary are illegal violations of the highest law in the land, and thus irrelevant. Simple, straightforward, and easy. Ethical lawyers understand that the oaths they have sworn to uphold the Bill of Rights require this of them, they can't just pretend the 9th Amendment doesn't exist.
  If you ask an unethical lawyer, he or she will tell you that you have to admit to wrongdoing ("infringing") before you can claim the defense ("I was doing something wrong, but it's actually ok in the eyes of these fifty precedents which my lawyer will argue resemble the facts of this case, in great detail and at considerable length"). It's time consuming, complicated, and very expensive. The unethical lawyers see the 9th Amendment as a horrible, evil, no good, very bad thing. They're terrified that somebody will start asserting a strong right to ethical practice of law under the 9th Amendment, which will invalidate huge portions of how law is currently practiced. They find it convenient to pretend the 9th Amendment doesn't exist.
  Unfortunately, the unethical lawyers vastly outnumber the ethical ones. They pretty much control the selection process for high judicial positions, and that means nobody gets selected for those offices who will rock the ethics boat.
  The inmates are running the asylum, and this will continue to be the case until the public wakes up to the importance of having a strong say in legal ethics matters.
without endangering them? by Anonymous Coward · 2015-12-27 07:23 · Score: 0

A nineteenth-century book should cope with a good few readings unless it has been stored _really_ poorly. And if members of your family want to look at the books, why not lend them?
If you want to build/use a book scanner, so be it -- cool project! -- but a few not-very-valuable books isn't a reason :) Don't post-rationalise, just build.
1. Re:without endangering them? by fizzer06 · 2015-12-27 09:49 · Score: 1
  
  Search Google Books for free downloads. They might have already done the hard work on those older books.
2. Re:without endangering them? by Anonymous Coward · 2015-12-27 23:56 · Score: 0
  
  A nineteenth-century book should cope with a good few readings unless it has been stored _really_ poorly. And if members of your family want to look at the books, why not lend them?
  If you want to build/use a book scanner, so be it -- cool project! -- but a few not-very-valuable books isn't a reason :) Don't post-rationalise, just build.
  The submission seems like a rationalization for hoarding, not for reading.
  10 to 1 he's got a garage full of bikes / wagons to fix, too many cats, and 386's lying around as well.
Keep your books by AndyKron · 2015-12-27 07:24 · Score: 0

Just keep your books. They'll outlive you anyway, so why waste time digitizing them?
1. Re:Keep your books by Anonymous Coward · 2015-12-27 07:28 · Score: 0
  
  For other people to read? You don't want to be flipping through books from 150 years ago every day if you expect them to outlive you.
  Also, there is a point in most people's lives that they decide to do things like this for longevity.
2. Re:Keep your books by Anonymous Coward · 2015-12-27 07:31 · Score: 0
  
  I find it difficult to search a paper book.
3. Re:Keep your books by Anonymous Coward · 2015-12-27 07:32 · Score: 0
  
  Sheesh, how is this answer helpful? He's not saying he wants to get rid of of his paper books, he just wants to scan them. Maybe he wants to read them on his tablet, or something.
  To OP:
  Since it's obvious you won't find any intelligent answer here, I would recommend you ask your question on http://www.diybookscanner.org/.
4. Re: Keep your books by Anonymous Coward · 2015-12-27 07:35 · Score: 0
  
  Try Microsoft office lens app
5. Re:Keep your books by amiga3D · 2015-12-27 07:46 · Score: 4, Interesting
  
  I collected paperback and hard cover books for almost 5 decades. I had storage bins in the attic and garage full of them. They all went to a "friends of the library" benefit sale. About two thousand books gone freeing up space and now I have more than that amount on a hard drive and about 200 on my phone. No more dead tree books and magazines for me. I can pull up something to read any time and any place. Technology is fabulous.
6. Re:Keep your books by AndyKron · 2015-12-27 07:59 · Score: 1
  
  I will admit reading a digital book is easier now that I'm 5+ decades old.
7. Re:Keep your books by Anonymous Coward · 2015-12-27 11:36 · Score: 0
  
  By all means keep your books, but as an accessible anywhere, roughly searchable (OCR isn't perfect, but it can be ok enough for search), show your friends and family alternative access medium PDFs are a great addition.
  It's also perfect for those books that you have no particular attachment to but are keeping because you think you think you might want to check up subject x sometime. I had a bunch of physics texts like this - not really my area (solid state physics mostly), no particular historical significance, age or quality, but marginally interesting and possible might want to refer to. Scanned the lot, sold and gave them away at my nearest uni, now have less clutter but can still access the content - win win.
8. Re:Keep your books by Anonymous Coward · 2015-12-27 18:49 · Score: 0
  
  I don't. Table of contents gives a coarse-grain breakdown by subject, and the index gives you the fine-grain search. And being designed by a human with author feedback it gives more relevant references than a simple text find. Corner fold and bookmarks for future references.
9. Re:Keep your books by Anonymous Coward · 2015-12-28 03:13 · Score: 0
  
  Indeed, why respond at all if you just want to tell the questioner not to use technology?
10. Re:Keep your books by Anonymous Coward · 2015-12-28 07:52 · Score: 0
  
  Ahh, the Slashdot Luddite.
  They seem to have multiplied to the point that they've nearly overrun the place.
  Completely useless wastes of flesh, they insist on posting their useless drivel in response to anyone asking a technical question.
Already exists commercially:Fujitsu ScanSnap SV600 by Anonymous Coward · 2015-12-27 07:27 · Score: 0

The IndieGogo project is just a clone of something that already exists:
http://scanners.fcpa.fujitsu.com/scansnap11/features_sv600.html
Project? by RandomUsername99 · 2015-12-27 07:32 · Score: 4, Informative

If you're looking for a project, what we use at my university library to scan some of the rarest and most delicate books on the planet, is definitely achievable at home. It's simply a table with interchangeable wedge shaped foam pieces, and a rack above with two cameras pointing down. Since the book is on a v cradle, the pages lay flat. You can change the angle and position of the cameras to point squarely at the pages. There's a pedal that will snap a picture with both cameras at once, so once you've got it set up, all you need to do is flip the pages and hit the pedal. You might need to readjust if the book is particularly thick, but that's all pretty intuitive once you're used to the setup.
1. Re:Project? by Anonymous Coward · 2015-12-27 08:59 · Score: 0
  
  Funny - I googled 'V Cradle Book' and got plenty of meaningful hits. Your Google-Fu is weak old man.
2. Re:Project? by Anonymous Coward · 2015-12-27 09:03 · Score: 0
  
  Actually, he/she is perfect for the fourm of our Slashdot /.
  CAP === 'subsume'
3. Re:Project? by Anonymous Coward · 2015-12-27 09:04 · Score: 2, Informative
  
  If you're looking for a project, what we use at my university library to scan some of the rarest and most delicate books on the planet, is definitely achievable at home. It's simply a table with interchangeable wedge shaped foam pieces, and a rack above with two cameras pointing down. Since the book is on a v cradle, the pages lay flat. You can change the angle and position of the cameras to point squarely at the pages. There's a pedal that will snap a picture with both cameras at once, so once you've got it set up, all you need to do is flip the pages and hit the pedal. You might need to readjust if the book is particularly thick, but that's all pretty intuitive once you're used to the setup.
  Probably an Atiz BookDrive. Yes, it is possible to homebrew one; Instructables has directions but it's a lot of work.
  The OP's best bet, really, is to Google for "non-destructive book scanning service" and find one to do it for him professionally.
4. Re:Project? by Marxist+Hacker+42 · 2015-12-27 09:15 · Score: 1
  
  The. Very. First. Link. On. Google. http://pro.atiz.com/
  
  --
  SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
5. Re:Project? by Anonymous Coward · 2015-12-27 09:23 · Score: 0
  
  Awww, the AC is butthurt because he's not very good at google.
  https://www.google.com/search?q=V Cradle Book
6. Re:Project? by RandomUsername99 · 2015-12-27 09:42 · Score: 0
  
  That's why I said "If you're looking for a project." If the OP wasn't looking for a project, then a service would be a better fit than a project. The OPs focus on DIY methodologies indicated he was looking for a project.
  We have a number of commercial ones, including a few dozen BookEye units for quick scan-and-deliver jobs, and some BookDrives for more delicate work, but one of the preservation guys made a couple by hand that are still regularly used.
7. Re:Project? by Anonymous Coward · 2015-12-27 09:47 · Score: 0
  
  Ladies and gentlemen.... One of those idiots that has no idea what GOOGLE is for.
8. Re:Project? by RandomUsername99 · 2015-12-27 10:14 · Score: 1, Funny
  
  I know some people have already criticized the obtuseness of your comment, so I'm going to try and turn this around to make it a more positive experience by saying some positive things about your contribution to the conversation.
  1) You figured out what search engine to use to get more information without me having specify it, and that's pretty great.
  2) You read at least some of my comment, because you picked up on key concepts like "V-Cradle", which was given that rather obvious name both by the shape of two wedges together, and the shape of a partially opened book.
  3) You almost used sentences, and that shows a pretty reasonable effort with basic communication skills.
  4) Even though you missed the distinction between my comment, which was supposed to be an industry-informed idea proposal, rather than a complete step-by-step solution to this problem, I'm pretty sure that you've got the aptitude to make such distinctions in the future, and improvement is always exciting.
  Have a great day!
9. Re:Project? by Anonymous Coward · 2015-12-27 19:02 · Score: 0
  
  Another explanation for the lazy AC poster. You don't connect the foot pedal to the cameras -- the foot pedal alerts the SOFTWARE to capture the video image from the cameras and add them to the document. The cameras could be web cams, DSLRs, GoPros, cell phones, or any other type of high quality camera with a real time video output.
  
  .
10. Re:Project? by Anonymous Coward · 2015-12-28 03:17 · Score: 0
  
  I saw this link in one of the comments above, and when I went to the site, I believe the scanner there has a V-cradle.
  http://www.diybookscanner.org/
diybookscanner.org forum by tebee · 2015-12-27 07:32 · Score: 4, Informative

I would suggest you look here http://www.diybookscanner.org/...
I'm planning to do much the same thing as you myself, but I've still not decided how to do it and other things have been occupying my attention recently, so I've not kept up with developments for a year or so.
There are plenty of ideas there and suggestions for software and workflows that will do what you want .

--
N.B. this user is far too lazy to write a witty and intelligent sig.
1. Re:diybookscanner.org forum by Anonymous Coward · 2015-12-27 08:38 · Score: 0
  
  I love the idea, as it originally started out being about cheap book scanners. However, the latest kit is $1,200! Quite a bit beyond my budget. I know there's a lot of kits on the forums where people have built a scanner for much less. I would prefer a kit like OP talked about though.
2. Re:diybookscanner.org forum by Anonymous Coward · 2015-12-27 12:19 · Score: 0
  
  Agreed, it's a great resource. A huge variety of hardware builds on the forum, everything from cardboard up to laser cut plywood and even some efforts at automatic page turning. You can build yourself or buy a kit. With glass to flatten the pages and good lighting, you can get very good quality scans quite quickly. I haven't kept up perfectly, but there are at least two linux software packages, scan tailor and book scan wizard, that help with the work of cleaning up the images and there are also some efforts at software that assembles the results into an ebook or pdf.
3. Re:diybookscanner.org forum by sweet+'n+sour · 2015-12-27 13:26 · Score: 1
  
  I've built a book scanner from that site a number of years ago and it has worked well enough for what I needed.
  
  The real problem isn't the hardware though, it's the multiple programs needed to process the images and get everything into a small text searchable pdf file afterwards.
  
  To give you an idea, my workflow usually starts by importing all the left pages into Lightroom, process for things like correcting blacks and whites, keystoning, skewing, and cropping, and then I export everything as jpg files. I then repeat for the right pages. This has to be done separately because each camera sees the book from a different angle -- lighting is usually different, keystoning will be different, and even the distance of the camera to the page has to be taken into account for correct cropping. After that, I run a perl script to combine the left and right pages so that they're numbered sequentially, and then finally import into Adobe Acrobat Pro to make text searchable pdf files. I've tried all other OCR software, and Acrobat has them all beat. If there is color in the images the pdf file will be HUGE. I've scanned some of my son's books for school that were in color and attempting to view them on an iPad 3 was folly.
  
  There is a program called Scan Tailor that also helps process images. It does a decent job of finding the borders of the pages for auto-cropping, and attempts to correct skewed pages, however it requires looking through each page to make sure it's found everything correctly. Too often I'll find it crops incorrectly, missing things like page numbers in the corners of the page. http://scantailor.org/ When I'm looking to make the smallest PDF files possible, I'll use this after Lightroom.
  
  This indigogo campaign looks to make this whole thing a lot simpler (Czur Scanner): https://www.indiegogo.com/proj...
  
  It's apparently an all-in-one solution with hardware and software. The video shows it doing black and white well enough, but I question how well it will deal with color (They don't show any demos of color books). Seemed good enough to purchase (I did so), even if only for black and white, and simpler than the DYI setup I've been using.
4. Re:diybookscanner.org forum by sunhou · 2015-12-27 14:30 · Score: 1
  
  I built a primitive single-camera scanner using a cardboard box, a piece of glass, and a point-and-shoot camera and tripod I had handy, after reading that site. It was a pain lifting the glass to turn the page, and I spent a lot of time trying different lights (and locations of lights), but It worked well enough that I decided to pursue it seriously.
  I started looking at building one of the better scanners using plans on that site. But after a lot of time thinking about it, and reading about the many decisions that went into the Archivist scanner kits, I finally decided to just buy one of their kits. Yes, it was $1200, but in the end I decided that just getting good quality wood, glass, and lights wouldn't be all that cheap, and I don't have a lot of free time, so it was worth it to me.
  I got a couple of refurbished cameras from Canon, and a raspberry pi to drive the cameras. I use the SpreadPi software, and connect to it via a web browser on my laptop (I don't have room for a monitor/keyboard near the scanner, but can lay my laptop on a nearby bed), and a super-cheap foot pedal that I use to trigger the cameras.
  I can scan about 1300 pages/hour (about twice as fast as when I started several months ago). Spreadpi then lets me download a tar file containing the JPG images (from the web browser). I then spend about 2 minutes per book opening a few pages in Gimp (the front cover, back cover, one even page, and one odd page) to determine cropping regions, then use a Perl script that calls ImageMagick to crop and rescale everything and stitch into a PDF. I reduce the image size a bit to reduce file size without compromising readability much. I also convert to a grayscale colormap for B&W books to further reduce file size.
  I save all the original JPGs because I expect later I may re-do the post-processing in a better way. E.g. for now I am not doing OCR. I'm mostly scanning children's books and math books, in preparation for an extended international trip. I didn't want to haul a bunch of my kids' books with me, nor the part of my library I need for my work. But for now, the 2 minutes or so of attention per book I need to devote (after physical scanning) makes this not feel like a chore.
5. Re:diybookscanner.org forum by loyukfai · 2015-12-27 16:27 · Score: 1
  
  Is this primitive enough? : )
  http://byfai.com/content/diy-b...
6. Re:diybookscanner.org forum by loyukfai · 2015-12-27 16:29 · Score: 1
  
  Yes, definitely, it took me quite some time to set up the software environment. But that was a few years ago.
  If anyone is interested in the story, it's here:
  http://diybookscanner.org/foru...
  The hardware setup:
  http://byfai.com/content/diy-b...
At home is as at home does by djupedal · 2015-12-27 07:34 · Score: 2

Sit down with transcription software and read those books aloud. Done.
1. Re:At home is as at home does by BenBoy · 2015-12-27 07:41 · Score: 5, Funny
  
  Just describe the illustrations ... common wisdom is that they'll be about a thousand words each ;-)
2. Re:At home is as at home does by PNutts · 2015-12-27 08:29 · Score: 2
  
  Sit down with transcription software and read those books aloud. Done.
  "Low texting then that we will require what what was asking but our answer will be okay." - Charles Dickens, A Tale of Two Cities
3. Re: At home is as at home does by Anonymous Coward · 2015-12-27 14:40 · Score: 0
  
  Hmm, what was the original quotation? The only one I know is far far better, though I much prefer A Sale of Two Titties by Darles Chickens.
4. Re: At home is as at home does by DarkVader · 2015-12-28 08:59 · Score: 1
  
  I've not heard of that one, only the Edmund Welles version.
Re:Already exists commercially:Fujitsu ScanSnap SV by WolphFang · 2015-12-27 07:35 · Score: 1

It's NOT cheap. Seems rather Windows Centric.

--
leather-dog muksihs
Blog: @muksihs
K.I.S.S : unbind the book, scan the pages by sanf780 · 2015-12-27 07:39 · Score: 1

This method is destructive, as you are removing the pages from the book. However, it gets you an adequate scan without the need of controlled ambient light or running transformations from a photo such that the page seems to be flat. Any other method is complicated, so expect to invest a lot of time.
do it right by sribe · 2015-12-27 07:48 · Score: 2

Is this out of your budget? Buy one, sell it on eBay it when you're done. Anything else, you'll just be wasting huge amounts of your time.
1. Re:do it right by Anonymous Coward · 2015-12-27 08:36 · Score: 0
  
  Is this out of your budget? Buy one, sell it on eBay it when you're done. Anything else, you'll just be wasting huge amounts of your time.
  The Fujitsu SV600 is low res compared to a modern flatbed scanner. He'd be better off using a high quality 10MP+ digital camera to take the scans and post-processing them using software that's dedicated to that kind of thing.
2. Re:do it right by zAPPzAPP · 2015-12-27 12:53 · Score: 1
  
  Does that matter?
  The pictures only have to be high enough res to reliably enable OCR, anything above that is irrelevant.
3. Re:do it right by sribe · 2015-12-27 14:29 · Score: 1
  
  The Fujitsu SV600 is low res compared to a modern flatbed scanner. He'd be better off using a high quality 10MP+ digital camera to take the scans and post-processing them using software that's dedicated to that kind of thing.
  I see simple math is beyond the grasp of this AC ;-)
4. Re:do it right by ortholattice · 2015-12-27 16:39 · Score: 1
  
  Does that matter?
  The pictures only have to be high enough res to reliably enable OCR, anything above that is irrelevant.
  
  It might not matter if you are scanning novels to produce ebooks, but for technical works with equations you want to see the actual text layout in a pdf, and for small subscripts less than 300dpi (preferably 400) is a no-go.
5. Re:do it right by sribe · 2015-12-27 18:18 · Score: 1
  
  It might not matter if you are scanning novels to produce ebooks, but for technical works with equations you want to see the actual text layout in a pdf, and for small subscripts less than 300dpi (preferably 400) is a no-go.
  That's true. (Heck, I scan everything at 600.) But 1) this scanner is not less than 300dpi, and 2) 10MP is barely over 300dpi. So the grandparent post was still completely silly ;-)
6. Re:do it right by Anonymous Coward · 2015-12-28 05:35 · Score: 0
  
  The Fujitsu SV600 is low res compared to a modern flatbed scanner. He'd be better off using a high quality 10MP+ digital camera to take the scans and post-processing them using software that's dedicated to that kind of thing.
  I see simple math is beyond the grasp of this AC ;-)
  I see you have no idea what "+" means. Seems like a far more fundamental failure to me.
pdf by Threni · 2015-12-27 07:55 · Score: 1

"I would like to turn into PDFs (or other similarly portable format)"
What is it about PDF files which you think makes it portable? You'd be better off with PNG format.
1. Re:pdf by amiga3D · 2015-12-27 07:59 · Score: 2
  
  If he wants to put it in a reader he'll need to use a book format such as epub or mobi. PDF would work as well but I think there are better choices nowadays.
2. Re:pdf by sehlat · 2015-12-27 08:07 · Score: 1
  
  If he wants to put it in a reader he'll need to use a book format such as epub or mobi. PDF would work as well but I think there are better choices nowadays.
  PDF would NOT work as well. PDFs do not scale well. You end up either having to scroll through the "pages," which distracts from the reading, or you end up trying to read stuff sized for "bigger than your tablet" in text shrunk to fit the page.
  "Oh you need little teeny eyes
  For reading little teeny print
  Like you need little teeny hands
  For milking mice!"
3. Re:pdf by amiga3D · 2015-12-27 09:39 · Score: 1
  
  I've managed to read PDFs in an eReader but it is not the best choice obviously. I have converted all my books to ePub but magazines have pictures which complicates things. Usually I read the magazine PDFs on a laptop. A 15" screen is best for those.
4. Re:pdf by Anonymous Coward · 2015-12-27 12:30 · Score: 0
  
  Most (well practically every) os I can think of has pdf viewing software available, usually multiple options. It also bundles all of those images into something that acts me like an ebook, and if you can be bothered running OCR software over it you can have searchable text hidden behind those nice looking scanned pages, which is very convenient.
  It may nt be perfect, but it is pretty darn good,
5. Re:pdf by Threni · 2015-12-29 06:41 · Score: 1
  
  It's kind of because I use a kindle all the time now that I mentioned the subject. Yes, they support PDF format but they're horrific and almost unusable. There's no reason I can think of to want to create them. Reading them is exactly the same as reading an overly large PNG file; you have to zoom in/out, scroll around etc. It's just a UI disaster.
Decent hardware and software available by Anonymous Coward · 2015-12-27 08:04 · Score: 0

I purchased one of these kits when they're periodically available:
http://www.diybookscanner.org/forum/viewtopic.php?t=2593
It's slightly a pain to put together, but really only took an afternoon with a power drill. As far as software, I use a combination of ScanTailor (http://scantailor.org/) and Tesseract (https://github.com/tesseract-ocr/tesseract). I like ScanTailor, but I've never been able to get it fully automated, so I still have to spend time manually fixing pages. Tesseract has also been flaky in the past with its pdf generation. Basically, it used to modify the image, which I hated, but I think the newer versions stopped doing that. At least, I was able to finally get a version that stopped doing that.
Anyway, from my experience, I've been happy with what I've been able to do, but the process is not fully automated. Though, I find it fast enough.
1. Re:Decent hardware and software available by Anonymous Coward · 2015-12-27 08:07 · Score: 0
  
  I guess I forgot to include the full workflow:
  1. Take pictures using the above DIY book scanner
  2. Convert images to tiffs and run a script that interlaces the file names. Basically, you want a lexiographic ordering of the pages in the correct order. Technically, ScanTailor can reorder things, but you'll hate yourself. Do it here.
  3. Use ScanTailor to fix the images
  4. Use tiffcp to combine all of the processed tiffs into a single tiff
  5. Use terract to ocr the tiffs *and* combine them into a single pdf
  Again, not fully automated and not completely pretty, but it's really not that bad and I've been extremely happy with the result.
The proof is not in the pudding by John+Jorsett · 2015-12-27 08:06 · Score: 3, Interesting

I'm not much of a grammar Nazi, but I'm seeing this error everywhere now and I'm afraid it'll become the norm. The saying is, "The proof of the pudding is in the eating," which makes a lot more sense when you think about it.
1. Re:The proof is not in the pudding by Anonymous Coward · 2015-12-27 08:47 · Score: 0
  
  If the proof /is/ in the pudding, you just have to dig for it...
2. Re:The proof is not in the pudding by PNutts · 2015-12-27 09:51 · Score: 2
  
  I'm not much of a grammar Nazi, but I'm seeing this error everywhere now and I'm afraid it'll become the norm. The saying is, "The proof of the pudding is in the eating," which makes a lot more sense when you think about it.
  Too late. It originated in the 20's and became common in the 50's.
  https://en.wiktionary.org/wiki...
3. Re:The proof is not in the pudding by Anonymous Coward · 2015-12-27 09:55 · Score: 0
  
  Yep, this is the old sense of "proof" meaning "test." If people just said, "the real test of the pudding is in the eating," they'd understand more easily.
  Of course, Americans would probably have to say "cake" or "pie" instead of "pudding," since they seem think it means "custard."
4. Re:The proof is not in the pudding by RockDoctor · 2015-12-28 10:31 · Score: 1
  
  That's a US usage, and you know how often they get our language wrong.
  
  --
  Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
unpaper is the GPL software for curls, etc by raymorris · 2015-12-27 08:09 · Score: 5, Informative

The software piece you mentioned for turning scans into nice clean rectangles exists as "unpaper". Here's one fork: https://www.flameeyes.eu/proje...
The people who have bothered to fork and improve unpaper probably did so because they did a project similar to yours, so you might ask them about other tips and resources.
As someone else said, while pdf is convenient for READING book, it's not a particularly great format for archiving a collection of images which you may want to convert to another format later. There are several good grayscale image fomats to choose from. To order those images into a cohesive document, perhaps with separate chapters, one could produce html via a tiny Perl or shell script. That would preserve the images in their native format for later conversion as needed in the future.
buy a scanner by pz · 2015-12-27 08:14 · Score: 3, Interesting

Keystoning is easy to correct in Gimp. But that's going to be pretty labor intensive, and you really would want something automatic. I'd follow what others have said and buy one of the better products, like a professional scanner, and re-sell it once you're done. You can buy the ScanSnap SV600 (which everyone else seems to be recommending) for under $600 -- is that budget-friendly for you? If not, have you looked into renting such a device, or using one at a local library?
As an analogy, if you wanted to scan the old family slides, then the way to go is to buy a used Nikon pro-level slide scanner, do your stuff, and re-sell it with nearly zero loss, with the understanding that you're putting the couple of thousand dollars of purchase price at risk. I'm in the midst of doing exactly that, although given the number of slides I have to scan, I bought the scanner with the expectation that it will be a full write-off, and that's the price of not risking loss of family heirlooms by shipping the slides somewhere to have a minimum-wage flunky do the scanning.

--

Put my fist through my alarm clock with its ding-dong death inside my ear. - The Blackjacks.
1. Re:buy a scanner by Anonymous Coward · 2015-12-27 16:14 · Score: 0
  
  ImageMagick is my preferred tool for batch image processing like this, although gimp and others may have script able interfaces. If you can scan your pages consistently, whether with a flatbed or more probably a camera, you can handle all the cropping, keystone-ing, rotatipn etc. with a simple IM script. For example, I bulk-"scan" small photo prints with my DSLR on a tripod, then use an IM one-liner to crop and rotate them all the same. The requirement is that the size of the images, and light does not change for the batch and that you don't bump the camera. I even scan whole sheets of mounted slides and use this script to crop out each little picture for indexing. Not fancy, no edge detection or fancy machine vision just common ImageMagick commands and tweaking some parameters to get it right on a sample from the batch then run it.
  https://github.com/Fasrad/photopticon
Squaring up photos of pages by Anonymous Coward · 2015-12-27 08:16 · Score: 1

ImageMagick can do (most of) what you want for squaring up photos of pages. It is free/open source software. I'm not sure that I'd describe it as "easy" though: you would have to manually mark out the fixes required for each page.
1. Re:Squaring up photos of pages by BetterSense · 2015-12-27 16:25 · Score: 1
  
  If you scan the pages in batches consistently, the crop and rotate parameters will be the same for all the pages in a batch. Then you just have to tweak the parameters once per batch. If you use a tripod and don't bump the camera, the batches can be big.
  
  I wrote a script to do this exact thing. You put the crop, rotate, etc. parameters into a text file which you can save along with the batch of images. It also supports multicropping each image so that you can use it to build indexes of negative and slide pages, which is why I really wrote it. I can take pictures of dozens of Printfile pages of slides or negatives and auto-crop out the individual images to build an index of low-res images for browsing. I also use it for batch - scanning photo prints at full resolution and just trim the edges off,fix slight rotating etc.
  
  https://github.com/Fasrad/photopticon
  
  The IM crop command basically does all the work. It is a very powerful program.
2. Re:Squaring up photos of pages by CaptQuark · 2015-12-27 19:30 · Score: 1
  
  Paint Shop Pro also has a batch mode for making image corrections. You open a sample document, turn on the macro recorder, make your corrections (crop, straighten, brightness, contrast, re-sample resolution, save location, etc), save the macro, then apply the macro to all the files in a folder.
  
  Doing a quick review of the resulting output images allows you to correct any anomalies because the original files don't have to be overwritten. Very convenient and useful for a ~$50 software package.
  
  --
Dead tree technology by rlh100 · 2015-12-27 08:21 · Score: 1

I am an aficionado of dead tree technology. I find reading long documents online is very tiring. That is why I prefer dead tree technology.
Dead tree technology has many benefits:
It never needs to be recharged.
It is very portable. Just toss it into your bag. No cords or power supply.
It is very easy to share with some one. Just hand the book to them.
It has a very user friendly user indexing system called "dog ear".
Simply fold a corner of a page over and you can find your place again.
It is very easy to make notes with a pen or yellow highlighter technology.
But only if it is your own book.
Character image resolution is excellent. No "jaggies" in the font.
Reading a book has a great tactile feel.
Holding it in your hands, turning the pages.
The only drawback is that it requires an external light source. Sunshine can be great to read by.
Yes, I do like reading using my "dead tree" technology. The only problem is that in a decade or two, children will be asking me about my odd hand held device. Do I really never have to charge it? How can I use it if it does not connect to the Internet? What if I have a question or want to text my friends? Do I really need a different one for each book I want to read?
Apologies for this being off topic.
RLH
1. Re:Dead tree technology by Anonymous Coward · 2015-12-27 08:44 · Score: 0
  
  Drawback:
  rots
  weight
  spines break, and you already vandalise them
  take up space, which may be limited
  a pain when you have to move
  attracted bugs
  inability to change typeface and size.
  Text files are tiny. You can have a massive searchable library on your shittiest of devices, accessible by your entire family simultaneously. You can take as many books with you, and not be limited to pocket or bag space. Or are you claiming you don't even have a smart-phone.
  e-readers do not have jaggies, and they do take away paper books option just yet. Get it? They are not mutually exclusive, durrr.
2. Re:Dead tree technology by amiga3D · 2015-12-27 09:33 · Score: 1
  
  Ereader pros:
  Never have to find my place
  Searchable
  All my books on an SD card
  No light needed
  Page turn with a touch
  Ultimately portable
  Never wears out
  Change fonts type and size with ease
  Read on any device
  I have to carry a phone anyway
3. Re:Dead tree technology by Anonymous Coward · 2015-12-27 10:30 · Score: 1
  
  I can pick up a book printed 150 years ago and as long as I understand the language it was written in I can read it. Do you honestly think that in 150 years someone will be able to do the same with today's ebooks given that even the publishing industry can't agree on a standard file format?
4. Re:Dead tree technology by Bing+Tsher+E · 2015-12-27 13:11 · Score: 1
  
  To many people it doesn't matter. That book will have been filtered through the memory hole at the Ministry of Truth a dozen times before 150 years has passed.
5. Re: Dead tree technology by hpycmprok · 2015-12-27 15:45 · Score: 1
  
  Explain to the kids that books are like television for really smart people.
6. Re: Dead tree technology by unami · 2015-12-28 00:40 · Score: 1
  
  sure, just use a e-book managing software like calibre and save your books in multiple formats, including at least one plain text. i'd doubt that there won't be any scripts in 150 years, that will transfer yor .rtf, .txt, .epub,... whatver into the format du jour, as long as you don't keep it on then obsolete media. there's no problems reading more obscure formats (like c64 roms, e.g.) from 30 years ago today, let alone a simple text-file.
7. Re:Dead tree technology by RockDoctor · 2015-12-28 10:24 · Score: 1
  
  I have to carry a phone anyway
  I don't have to carry a phone, so all the other advantages you list cease to be advantages.
  
  --
  Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
8. Re:Dead tree technology by DedTV · 2015-12-28 12:49 · Score: 1
  
  Few people bemoan the loss of favor of stone and metal tablets or papyrus scrolls for print. Paper won't be any more enduring to the public than those technologies were.
  
  In 150 years, paper books will likely be a tiny niche market, at best, as people with sentimental and institutional attachments to paper die off and people get acclimated to whatever form of digital media consumption is most convenient at the time.
  
  And while I don't own an 8-track player, a VCR, or a punch card reader but if I had media in those formats and wanted to convert them to a modern format, it wouldn't be anything close to impossible to do so now. It's unlikely it'll become so within 150 years, assuming human society doesn't regress or collapse before then, so file formats aren't a major concern for personal media.
9. Re:Dead tree technology by amiga3D · 2015-12-28 17:54 · Score: 1
  
  Well then, there's an exception to every rule.
10. Re: Dead tree technology by Anonymous Coward · 2015-12-29 01:13 · Score: 0
  
  I have a quel database on qic-20 tape I need to read...
11. Re:Dead tree technology by RockDoctor · 2015-12-30 15:31 · Score: 1
  
  That's a rule to remember.
  E.G. the next time you try to pitch something as the "the solution to every [BLAH] users needs", your audience is unlikely to believe you. While if you say "85% of users of X-DEVICE will find this useful," they" are more likely to find your presentation "credible". (The "85%" may be as crude an estimate as you like ; but you do estimate your market, no?)
  "Credible" is not the same as "invest-worthy", but it's an appreciable step that way.
  
  --
  Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
best graycale formats by rlh100 · 2015-12-27 08:26 · Score: 1

I scan B&W pages of historical manuals I have (SunOS 1.1, not Solaris). What would you recommend for grayscale and why?
RLH
1. Re:best graycale formats by John+Bokma · 2015-12-27 08:39 · Score: 3, Interesting
  
  DjVu: https://en.wikipedia.org/wiki/...
  
  --
  
  Perl Programmer for hire
2. Re:best graycale formats by Anonymous Coward · 2015-12-27 12:24 · Score: 1
  
  What is the current status of djvu viewing software? I toyed with it a little years ago but never did anything serious with it because finding software was a hassle of some platforms, whereas pdf seems more or less ubiquitous.
Google did it by Anonymous Coward · 2015-12-27 08:33 · Score: 0

So you can too
Actually a great product on Kickstarter for this.. by Anonymous Coward · 2015-12-27 08:53 · Score: 0

The C-zur Scanner (https://www.indiegogo.com/projects/czur-scanner-build-your-own-digital-library#/ ) do look perfect for this job.
Mod parent up by Dadoo · 2015-12-27 08:54 · Score: 1

Agreed. I scanned a bunch books that way, using a commercial-grade Fujitsu scanner, capable of scanning about 60ppm - both sides. I got a little over 20,000 pages in, and I had to quit, because the work was so intense. That was more than 10 years ago, and I still haven't been able to get back to it.
There's more to scanning a book than just scanning. Between preparing the book for scanning and making sure it scanned correctly, there's a lot of work involved.

--
Sit, Ubuntu, sit. Good dog.
Re:Already exists commercially:Fujitsu ScanSnap SV by myid · 2015-12-27 09:34 · Score: 1

The IndieGogo project is just a clone of something that already exists:
http://scanners.fcpa.fujitsu.com/scansnap11/features_sv600.html
The ScanSnap SV600 looks good, but you'll have to keep something in mind: The web page for this link says, "* Maximum document scanning thickness is 30 mm (1.18 in.)"
A maximum thickness of 1.18 inches would limit the books you can scan, unless you break apart a thick book, and scan it in sections.
Default Ubuntu Scanner Software by Anonymous Coward · 2015-12-27 09:52 · Score: 0

I used Simple Scan to scan my High School yearbook from cover to cover. Saved as pdf, it was a large file (10mb) but well worth it. The hardest part was centering each page on the flatbed scanner, and in the right orientation before scanning. Once I got the hang of it, it took about 20-30 minutes for the entire yearbook.
Bittorrent by jtownatpunk.net · 2015-12-27 10:01 · Score: 1

Seriously. Most popular books are already in electronic formats. While there may be some solid questions about whether it's legal, I think it's a perfectly ethical move. You're going from print to print, not audiobook, play, or movie version of a printed book that you own.
And scanning it is probably illegal anyway so it's not like all the extra work will protect you. Yeah, Google got away with it but they've got millions of dollars worth of lawyers who argued that their work was done for research purposes.
Scan Tailor by kilf · 2015-12-27 11:36 · Score: 1

I often scan in music from bulky books. I find Scan Tailor (http://scantailor.org/) works pretty well. It lets you crop, unbend, despeckle etc. in a wizard like way. The drawback is that it wierdly insists on TIFF format input and output. So you have to be handy with tools like pdf2pnm, pnmtotiff, tiffcp and tiff2pdf, etc.
Works really well apart from that.
Scanning by Anonymous Coward · 2015-12-27 12:13 · Score: 0

IRTI.net makes really cheap SW and devices.
Czur Scanner by Anonymous Coward · 2015-12-27 13:16 · Score: 0

I've ordered one...it looks promising.
https://www.indiegogo.com/projects/czur-scanner-build-your-own-digital-library
1080 webcam... by FatdogHaiku · 2015-12-27 13:48 · Score: 2

I was asked to look at scanning file cabinets full of legal size folders with contracts (mostly NCR forms) and hand writing on the inside covers of the file folder itself. Using a bit of lumber and a square you can make a frame to position the item to be scanned and if need be mount the webcam so it can be run up and down a mast for very thick items. Our focal plane was about 15" up for a good legal size paper image. They had an old XP box so I used IrfanView for capture and X-mouse Button Control 2 for mapping a right click to IrfanView's batch scan mode (right mouse button remapped to = ctrl+shft+a) giving sequenced file numbers to the output. They had to reset the counter for each new contract and change the destination folder, in the end we worked in the same temp folder and moved each contract before starting the next task... books would be easier in that regard. IrfanView supports a lot of plugins, but if you make a good fixture for aligning your books below the camera, keystoning should not be an issue. I don't think I would do two pages at once at 1080, but on a small book it might be OK. The main thing is lighting, even lighting makes all the difference in that rig. If I had a budget I would have set up LEDs, we just used two desk lamps. Not counting the webcam and XP box, we were way under $50. I understand why you want to do this, good luck however you go.

--
You have the right to remain sentient. If you give up the right to remain sentient, you will be elected to public office
What resolution by Anonymous Coward · 2015-12-27 14:50 · Score: 0

I see lots of comments here about using cameras (including 1080 webcams, which aren't really 1080 monochrome pixels, of course).
Back in the day, 300 dpi resolution was considered minimal for laser printers to be acceptable print quality, with 1200dpi being typical for decent typesetting.
So you need, say, 12,000 pixels to get that kind of good resolution for a page.
An article I turned up says "the OCA project outputs variable–resolution JPEG2000 files built from lossy camera–generated JPEG files. A consumer area array digital camera is used to produce images with resolution from 600 dpi at 4.5 inches x 7 inches down to just 300 dpi for 11 inches x 14 inches works."
So google is using an array of cameras that are stitched (a decent solution, if you can adaptively calibrate), but, still, this is for "access" not "preservation": i.e. it's readable, in the same sense that a photocopy is readable, but not a real replica of the original work.
I use a lot of IEEE journals online, and sure, they're all pdf, but anything with graphics comes across pretty badly. You'd really want that original paper from the 1970s if you want the photos or graphs to be readable.
Donate to Internet Archive by martiniturbide · 2015-12-27 14:55 · Score: 1

It can be interesting to donate some books to the Internet Archive. While your local library may sell your donated books to students or recycle them, the Internet Archive will scan them and put them on openlibrary.org.

I also hear that you can pay the Scanning Service close to your location to scan your books. but you will need to check on that.
Check if by any chance your books are already digitized on OpenLibrary.org
The Internet Archive Book Drive - https://openlibrary.org/bookdr...
Scanning Services - http://archive.org/scanning
easy to convert from djvu to pdf or anything by raymorris · 2015-12-27 15:38 · Score: 1

Although I don't have the answer to the exact question you asked, I can point out on thing. It's easy to convert from djvu to pdf if, at some point in time, you want a pdf copy for some reason. The reverse isn't so true. If you archive it as pdf, you can't readily convert to anything else without losing information.
Overall, pdf is reasonable for viewing (right now), but not good for editing, manipulating, and archiving. Even for viewing, pdf at its heart assumes it is being printed on letter- sized paper, and that's the layout you'll always get. It doesn't flow or scale well or work well on widescreen displays. This is because pdf is essentially the Postscript printer language, zipped. It's designed for printing, not for screens of varying sizes, resolutions, and aspect ratios.
It is not the scanner - it is the software. by xtronics · 2015-12-27 15:48 · Score: 1

OK - open source has a really good OCR engine - tesseract.
But that is only one part - you need software that can recognize layout - differentiate pictures from text etc.
There are two approaches - put a text layer under a bitmap (searchable image) - or make a real document with fonts and pictures where needed (clear-scan) . (Hopefully a ODT file ).
Even in Windows clear-scan is iffy - diagrams with text confuse the software. Clear-scan to ODT is what we want - but can't have yet..
Notes and links on this: https://wiki.xtronics.com/inde...
1. Re:It is not the scanner - it is the software. by Anonymous Coward · 2015-12-27 16:54 · Score: 0
  
  > OK - open source has a really good OCR engine - tesseract.
  Tesseract sucks sweaty donkey balls.
  Seriously, I gave it a try and fed it 600dpi clean b/w scans of some pages, and what I got out was utter garbage. It would have been faster to retype the pages than correct the errors in Tesseract's output, no joke.
  I have been told that Tesseract is a very mature product that a lot of people have put a lot of hard work into. While I thank their efforts, I can only describe the result as 'tragic.'
Scan Tailor is Opensource, Fopydo for capture by jwillis84 · 2015-12-27 16:42 · Score: 1

On the path to essential we all take a few detours to learn things.. one of my favorite 'sayings'.
Scan Tailor fits your original description and price range.
http://scantailor.org/
There is a GitHub site for downloading the installer, works on Windows 7 for me, but I see no limitations to prevent it from working on OSX or Linux.
The documentation isn't great, but the software is very good, quite on par with most of the BookDrive or BookScanner types of programs.
Digital Book Collecting, or Scanning or Ripping depending on how you prefer to call the process; is basically two things:
1. Capture
2. Post processing
Capture is usually to a series of TIFF files, which are lossless compressed images files, sometimes people compile those direct into PDF files, but are usually not satisfied with the size or the results.
So the "gold standard" is direct to TIFF (although direct to large JPG is kind of becoming common)
You generally want to make sure the images are scanned at around 300x300 dpi, to make really good Optical Character Recognition (OCR) is possible. (Abby Fine reader has been the gold standard for OCR for years). Also an image is not indexable or "Searchable" which is what people start wanting when they need to search a document.
A PDF will hold multiple TIFF images and the results of an OCR scan in a single PDF file, and its a nice format in which you open and can use the built-in "Find" to skim the index and take you right to a page.
A PDF can also have a full functional Contents page and Index with clickable "hot links" to take you direct to a page.. this is also almost "expected" these days, but first you need software to OCR and index it, and usually someone to make the links for you.
A "Cross Document" searcher like FileCenter by Lucion will even index multiple PDF files in a catalog and let you search between them for references. FileCenter will also work direct with Fujitsu TWAIN scanners to let you capture and OCR everything that will fit in the scanner into arbitrary folders on your computer or home nas device.. its fairly inexpensive paperless office software (and it actually works, I use it a lot). http://www.lucion.com/
For Step #1 Capture you need some type of stable camera stand and a camera to snap a picture of a document/book, if it is a loose group of pages a Scanner can work, Fujitsu usually makes the best and still support TWAIN on their high end. They have Automatic Document Feeders (ADF) and flatbed models, and ADF+Flatbed all in ones. Fopydo makes some stiff plastic construction board type stands for very low cost that will support a book or documents and your cell phone for capturing images, and they are available on Amazon. Atiz makes very high end scanning "booths" which support professional DSLRs and flood lights to illumninate opposing sides of a 'V' shaped cradle with a plexiglass levitated platform for pressing the pages of a book flat before photography. They are somewhat combersome to use and require a permenant location dedicated to scanning. Atiz also former made a Canon Powershot model to take advantage of lesser expensive prosumer cameras for shooting images, but the Booksnap is no longer available. The Planetary or Overhead shooting tower that uses a Cell phone cam or a dedicated image sensor built-into the tower is becoming more popular, Fujitsu makes one one high quality, but it appears a bit slow and its still quite expensive.
For Step #2 you will want to break it down into Prep work before the OCR, then Post work after the OCR and finally Binding or Publishing the eBook to a format of your choice. Scan Tailor, BookDrive, and others are for Prep work before the OCR, they let you adjust contrast, tease out image artifacts or correct for under/overexposure and the "bleed through" bright lights and thin pages can bring out from the opposing side of the page that was imaged. OCR requires either the freebie copy of whatev
Re:Already exists commercially:Fujitsu ScanSnap SV by CaptQuark · 2015-12-27 18:56 · Score: 2

We have ScanSnap scanners at work and one of the biggest pains is they do NOT support the TWAIN/ISIS driver standard. That means you cannot scan using any software except the ScanSnap software. And at ~$900 it is a little expensive for home use.

--
No TWAIN driver by CaptQuark · 2015-12-27 19:09 · Score: 1

We have ScanSnap scanners at work and one of the biggest pains is they do NOT support the TWAIN/ISIS driver standard. That means you cannot scan using any software except the ScanSnap software. And at ~$900 it is a little expensive for home use.

--
1. Re:No TWAIN driver by sribe · 2015-12-28 02:54 · Score: 1
  
  We have ScanSnap scanners at work and one of the biggest pains is they do NOT support the TWAIN/ISIS driver standard.
  True. You want TWAIN or ISIS, you have to move up to fi series scanners. I personally don't care about using standalone scanning software--it gets me what I want.
Project Gado -- Open source Document Scanning by softcoder · 2015-12-27 19:22 · Score: 1

You might investigate Project Gado.
A free open source robot for taking pictures of documents without exposing them to danger.
Not sure if it has all the software you want, but there is an open source community developing for it, the Univ of Finland seems to be the hub.
http://projectgado.org/2015/07...
linearbookscanner by jandar · 2015-12-27 22:32 · Score: 1

If you are able to build this thing, look at linearbookscanner. This would be my preferred method of digitizing but to build it is above my ability :-(.
Camera alignment by oneiros27 · 2015-12-28 02:49 · Score: 1

If you have to correct for keystoning, your cameras aren't aligned well.
You want to use a mirror for alignment, as it allows you to verify that the camera is in the correct place -- a non-reflective target only ensures that you're pointed at the correct place.
The Czur has no platen, and therefore there will be distortion due to curved pages which would have to be corrected for. It also won't be able to image as well closer into the binding -- if you have to spread the book flat, you're going to end up damaging the spine.

--
Build it, and they will come^Hplain.
1. Re:Camera alignment by Anonymous Coward · 2015-12-28 08:47 · Score: 0
  
  If you have to align your cameras carefully, your software isn't doing its job well.
  Seriously, automatic curvature correction and keynote correction make more sense than a platen and careful camera alignment these days. The computer is plenty fast enough to handle it - why make the human do the work?
correcting lens by Anonymous Coward · 2015-12-28 03:53 · Score: 0

There are perspective correcting lenses available for what you want to do. This type of lens makes the image look as if the film laid on top of the object photographed, with no pincushion or perspective distortion. Ready your wallet!
The best setup would be a copy stand with appropriate lighting, a high resolution video camera and accompanying capture card. This type of camera records the equivalent of several million pixels at live video speed. $$$
Second best would be the copy stand and a digital camera with good macro capability, modified so that a contact closure trips the shutter. The camera is set on manual, fixed exposure and focus. Unless the lighting is very bright (lens stopped down), the depth of field won't be as forgiving as the above solution but it will work.
For lighting I suggest ordinary 500 watt quartz-halogen work lights operated on 240 volts. Blinding amount of light with the color temperature just right for the "tungsten" setting on your camera. Such lights must be brought up on 120 and then switched to 240 just long enough for the exposure. Cold mirrors are recommended (Edmund scientific) to keep the heat off the document. This lighting setup is both cheaper and produces much more light than any LED photography light source I've found.
Check your local library by meloneg · 2015-12-28 03:58 · Score: 1

Before building something yourself, make sure you don't have access to better equipment locally. The main library here in Cleveland has what they call a Preservation Lab that has library-grade equipment available for public use.
http://cpl.org/clevdpl/
Separate digitizing from correction by iamacat · 2015-12-28 04:38 · Score: 1

First digitize using the best solution which is easily available to you now, like a good flat bed scanner, and then look for correcting software later. So long as you have the original JPEGs/PDFs, you can continue enhancing them without putting your documents in danger.
Seems preferable to waiting for perfect hardware/software while your archive deteriorates further.
1. Re:Separate digitizing from correction by DarkVader · 2015-12-28 09:07 · Score: 1
  
  Flatbed book scanning? In 2015? 7 seconds per page for 300dpi in greyscale?
  Wow, that's terrible.
This might be an option by Anonymous Coward · 2015-12-28 08:17 · Score: 0

This scanner is supposed to be released in the first part of 2016.
https://www.indiegogo.com/projects/czur-scanner-build-your-own-digital-library#/