The DIY Book Scanner
azoblue writes "Daniel Reetz did not want to lug around heavy textbooks, so he built a book scanner to create digital copies. '... over three days, and for about $300, he lashed together two lights, two Canon Powershot A590 cameras, a few pieces of acrylic and some chunks of wood to create a book scanner that's fast enough to scan a 400-page book in about 20 minutes (PDF). To use it, he simply loads in a book and presses a button, then turns the page and presses the button again. Each press of the button captures two pages, and when he's done, software on Reetz's computer converts the book into a PDF file. The Reetz DIY book scanner isn't automated — you still need to stand by it to turn the pages. But it's fast and inexpensive.'"
This would be a good activity for the winter months when farming isn't possible.
Here comes the Publisher's Copyright Enforcement Gundams to give you "What For!".
Imagine that, thinking you could actually DO Something like that with your very own property.
What cheek!
Guaranteed! This comment 100% Anthrax free!
...a horde of pirates willing to steal my Intellectual Property!!
Except for the lack of an automatic page-turner, Daniel's device is the same as one you can buy commercially for about $20,000 (http://www.treventus.com/bookscanner_pageturner.html).
He was wise to decide on manual page-turning.
I'm a Programmer. That's one level above Software Engineer and one level below Engineer.
I do this for my law school textbooks (unless you're a book publisher, in which case I am joking and would never break the law).
I was excited when I read this because it is a pain in the ass to turn the pages in a 1000 page Constitutional Law textbook. Thus, you can imagine my disappointment when I read that his machine doesn't automate this.
Most universities have at least one library which has a Ricoh scanner that does exactly what his does, i.e. it writes out a PDF onto your USB stick. I don't know where he's a graduate student, but I bet if he looked in his library he could have saved himself $300.
How soon before the manufacturer of the $20,000 commercial version files a lawsuit against him? That would be extraordinarily sad because the American system of patent/copyright only serves to stifle independent innovation like this.
I wonder what he studies. it seems like the majority of the work was done by others (hacked firmware, post-processing, pdf conversion). is he a mechanical engineer? (I suspect an amateur couldn't design such a thing structure.)
It may work well enough for basic textbooks, but the problem is that (for high-quality scans) you can't ever get the same image quality from a $800 camera that you can from a $80 scanner. At 1200 DPI, a scanner is equivalent to a ~384 MP camera. Even scanning at "only" 300 DPI is ~90 MP, a far bigger image than any consumer-grade camera can provide.
The cameras he used were only five megapixels.
Might work for looking at the pages on your iPhone. Not gonna look very readable on your laptop screen, and forget about reading the book's footnotes.....
~
What a coincidece! I too have a book scanner that scans books, and requires a human operator to attend to turning the pages.
It's called a scanner.
http://bkrpr.org/doku.php
Same thing, much cheaper (I built mine for ~150 USD.)
http://CryoLANparty.com/ A lan I'm staff on!
If so, wouldn't it be easier to just rip out the binding and put in the pages? The $15 cost of buying another copy is less than all that boring, repetitive manual labor.
Reetz also writes really awesome electronic music as Fake
You must not have ever gone to college. A textbook for $15? Get real.
He keeps talking about how expensive the books are. Clearly he is just using this to scan other people's books to avoid paying.
Still a pretty cool build though :P
And how is this better than using a tripod with a horizontal swinging arm and a digicam?
built by one of their ex-employees.
Its in a case in their front office.
This is a market that relies on outrageous reproduction prices just like cd's used to. They are equally doomed. I know a LOT of college students who no longer buy books ... they rent them for free by buying them, shooting them, and returning them. It may take a couple of hours to do manually without a device like this, but $80 per hour is pretty good wages for a college student.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
The publisher of the textbooks I use offer e-versions. The e-versions are cheaper than the physical book, and even though they are copy-protected, they can easily be saved as a pdf file.
Just use a bandsaw to cut off the spine and feed it through a normal scanner with a sheet feeder. Duh. Faster, cheaper, and better results along the spine.
Oh, you wanted to keep the books INTACT?
-- Minds are like parachutes... they work best when open.
from the comments with the article
posted by: irrational | 12/11/09 | 11:56 pm
I do it in 5 steps, and you get rid of the book when you’re done since you don’t need to store it. After you get done putting 200 hours into your creation, you’ll have spent thousands of dollars worth of your time. I solved this problem much more quickly years ago:
1. Buy a good sheet-fed and high-speed scanner. I have a Panasonic KV-S2026 color.
2. Get a decent jigsaw from Home Depot. Use metal cutting blades (24 teeth/inch or better)
3. Saw the spines off the book and for God’s sake use some C-clamps on each end of the book. Preferably sandwich them between two flat boards.
4. Remove and feed sheets through the scanner to OmniPage and text recognize the pages.
5. Save as PDF.
6. Repeat. You now have searchable digital books!
might not be able to *write* the entire collection of Shakespeare, but with this setup, I'm quite sure that they would be able to digitize it!
Ironically, all these books that he and others are trying to scan into a digital format where created in a digital format from the start, sitting on a publisher's computer somewhere.
Thanks copyright laws! Thank you very little.
One semester's worth of books in college today runs around $1000. With this device you can return the books after you've scanned them. If you rip out the binding, most bookstores are going to frown on returns.
So this device saves about $700 the first semester, and $1000/semester after that.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Yes. The photocopier in the maths department where I work scans a stack of sheets and emails the pdfs to you. Others can save it to a usb flash drive. It is great for things like theses we have lost source for but have unbound. But the point of the machine described here is to scan whole books non destructively.
I'm amazed at how good OCR has gotten. I did the same thing without building anything: just connected my Canon PowerShot A540 to a tripod, lay the tripod on a coffee table, put the book on the floor, and started snapping away. Fed the JPGs to ABBYY FineReader 10, and it spit out plain text that was *at least* 97-98% accurate on every page. I did not use any special lights, do not know anything about photography, and frankly thought I'd have to buy all sorts of special equipment. The only other thing I added for convenience sake was Dirk's CanoRemote so that I would not move the camera (however imperceptibly) every time I pressed the shutter.
I thought about doing this several years ago to archive a huge stack of old lab notebooks, then we bought some Ricoh copiers that were also scanners with a platen large enough to scan two pages at once. I was able to turn a 300 page notebook into pdfs in about a half hour.
If Slashdot were chemistry it would look like this:Cadaverine
The scanner was described 3 months ago in a question to Ask Slashdot:
http://ask.slashdot.org/story/09/09/27/199251/Software-To-Flatten-a-Photographed-Book
The answer:
http://ask.slashdot.org/comments.pl?sid=1383895&cid=29559637
I have a project that requires text recognition. I'm need to quickly identify the presence of text URLs in several thousand photographs. In the easy cases, the URL is a solid color on a contrasting background, added as a band across the top or bottom of the photo. But in the hard cases it's a partially transparent watermark across the center of the photo that may be rotated several degrees from horizontal. The good news is that the URLs all start with "http://", and I don't need the software to capture the entire URL, just let me know that it's present. I need a solution that is faster than a human and reasonably reliable. Can current OCR software handle this? Thanks!
Nothing for 6-digit uids?
I used to use my Sony DSCP72 Cyber-shot 3.2MP digital camera to digitize chapters of my text books in college and convert them to PDF (I forget what software I used), but I stopped once I realized I never actually read them. The major limiting factor in how many pages I could scan at once was the camera's battery life, and in this design the cameras he uses are still powered by AA batteries. Being able to use an AC adapter would be useful, though that's really just a limitation of the cameras he's using. Since turning off the flash improves the camera's battery life a lot, the halogen light is nice to keep the images bright. The other improvements over simply taking the pictures with a camera are pretty minor, but definitely make it less cumbersome.
Images of pages taken from my netbook's 1.3 MP camera are actually pretty readable as well. For the poor college student, simply using a webcam and some decent lighting is a viable alternative: it's a lot cheaper, more space efficient, and gets around the problem of having to use batteries, at the cost of being a bit slower, since you have to take more time setting up the book and camera and can only take one page at a time.
See also the BookLiberator, a somewhat more compact cube-in-cradle design, that's also easy to build. Although soon you won't have to build your own: we're prototyping a manufacturable, flat-packed kit to sell from our online store; see questioncopyright.org/bookliberator for more about the project. It should be ready next year.
None of which is to detract from Reetz's accomplishment, of course. This renaissance in personal book scanners is going to make it easier for all of them, in the long run, especially as we can share the same open source software among all the scanners.
http://www.red-bean.com/kfogel
On a recent visit to a copier manufacturer's show room, the account manager indicated it's pretty common for people to buy a textbook, cut off the spine, and load all the pages from said book into the copier's autoloader. Some copiers will scan both sides of the page at once when using the autoloader, so you get VERY fast double sided scanning to PDF. On top of the extra speed, you don't get the shadow effect from scanning the book with spine intact.
I have a book I would LOVE to preserve digitally. I have an extremely rare and out of print book -- it doesn't have an ISBN or anything! Technically, though, I believe it is copyrighted. I would like to scan it in and OCR it into a usable format that can then be put anywhere. (PDF bitmap pages are ridiculously large!) It is "Home Again" by James Edmiston. Copyright 1955 by James Ewen Edmiston, Jr. First Edition, signed by the author. Library of Congress number 55-5265. It is a significant and important book, in my opinion, and quite likely valuable as well. (Originally sold for $4, quite likely worth a lot more now...) I wonder how long the copyright will last on this book?
I am in Northern Virginia, so if anyone has a book scanning rig, I'd love a chance to use it.
Now if only textbooks came as e-books, then this whole tech would be un necessary.
"It was a watershed moment when I realized getting an 8-megapixel Canon camera was cheaper than buying a bunch of textbooks."
There in lies the real problem. Textbooks are too damn expensive and have been for many years.
-ted
You have a Kodak printer too, eh? :P
You can run it from a Windows VM.
Do what thou wilt shall be the whole of the Law
. . . a light box and two cameras?
Perhaps I'm not smart enough to see it, but that's a pretty low standard of DIY, even for "Instructables".
http://www.youtube.com/user/bookscanner#p/c/14E09F2A975DB14F I think that it would be easy to actually make this. It uses vacuum to simply pull the pages up, and then whenit gets to the end of the page, turns off one of the vacuums t toss the page over.
Mods - please mod this right to the top!
Quoting from link:
...
Or, everyone had been thinking so, until I found
@that a scratch of an eraser can turn a page, and
@that if you place the scanner upside down you don't need to flip the book,
Careful. I attend Concordia University (Montreal, Canada) and some books cannot be returned once unwrapped from their shrink wrap. Those books usually include an online access code, software or some other gizmo that the book store considers unsellable once open.
Something just like this setup was in a comment for an ask slashdot article -
http://ask.slashdot.org/comments.pl?sid=1383895&cid=29559637
I'd be for cutting off the binding of all the books and using a standard duplex scanner - you'll be able to sell the books on to a poor student (or give them away) and you'll be able to sell copies of the format shift to your fellow students; you'll need proof that a) they own the book and b) the publisher doesn't do an electronic public sale already. You could even buy a glued-tape style binding machine if you found that it was cost effective.
I am skeptical that fair-use rights to create the digital copy would remain once you sell (or return) the original.
- RG>
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
There's plenty of people working on this at the DIY Book Scanning site, but what they all lack... is page turning. I found this great project some students came up with that is simplistic and doesn't require you to preload pages at all.
Incorporate that, with the glass/plexi platen of the stock DIY book scanning projects, and you have a 100% complete, automatic, turn-it-on-and-walk-away book scanner from beginning to end.
Oh it's absolutely illegal. But how would you ever hope to catch them?
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
A while back I got a Fujitsu ScanSnap S510. Now when I want to scan a book, I just saw the spine off (table saw, band saw or even a steel ruler and X-acto knife will do the trick). Take the loose sheets, about 40 at a time, and put them into the ScanSnap. The ScanSnap comes with Acrobat Pro and does a fine job of making a searchable PDF file of the book. The paper? Into the recycle bin. I've cleared off several feet of shelf space.
He would get much smaller file with the same or better readability with djvu
$1000? I averaged under $500 per year on textbooks in college (graduated 1.5 years ago). It will vary based on the school you attend, the books your professors assign, and what effort you put into finding used copies or buying old editions and getting updated problem sets from the library copies or from classmates. Of course it also depends on the major you're in, I'm an engineer and I think the price gouging isn't quite as severe in this field.
But $1000 per semester... geez... that's just not okay.
http://www.pageflip.com/Pricing.html, $350. Wireless button operated/foot peddle. I'm sure you could rig it up somehow on a timer so say every three seconds the page is turned picture taken, rinse and repeat.
All, I'd have thought.
To have a right to do a thing is not at all the same as to be right in doing it
Most of the fields I looked at 2 years ago had books costing between 60 and 150 each, x an average of about 1 & 2/3 per class, x 5 classes to keep your graduation on schedule.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
Well, yes, that one word was in jest.
"Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking