Domain: diybookscanner.org
Stories and comments across the archive that link to diybookscanner.org.
Comments · 21
-
Re:diybookscanner.org forum
Yes, definitely, it took me quite some time to set up the software environment. But that was a few years ago.
If anyone is interested in the story, it's here:
http://diybookscanner.org/foru...
The hardware setup:
-
diybookscanner.org forum
I would suggest you look here http://www.diybookscanner.org/...
I'm planning to do much the same thing as you myself, but I've still not decided how to do it and other things have been occupying my attention recently, so I've not kept up with developments for a year or so.
There are plenty of ideas there and suggestions for software and workflows that will do what you want .
-
Brittle books don't mix w/ a flat spine
If you're dealing with old books, you want a scanner than can cradle the book without opening it up flat.
And 60 pages per minute is actually pretty slow for these scanners. As you're imaging two pages at once, you only need to approach a page flip a second to get 120 pages/minute:
http://arstechnica.com/gadgets...
Note that the costs have gone up since that article was written. It used to be $500+electronics
... it's now $1200 + electronics + shipping. (as it's no longer someone doing it in his free time, and now a company doing it ... but it also now comes painted).If you have access to a plywood cutting machine, all of the cutting patterns are available under GPL:
http://www.diybookscanner.org/
But as it holds the pages flat (with glass that presses down on the pages), rather than the book's spine flat, you don't have to worry about trying to correct for the distortion from curved pages. (or damage your books in the process)
-
Been there, done that
I typically do not post but I figured I would put in my two cents worth here.
I have been digitizing books for over a decade using various technologies, including the very expensive predecessor to this kind of scanner, namely the PS7000 from Minolta. http://www.microfilmworld.com/...
The problem with these kind of scanners is a fact that the extremities of any object you are scanning tend to look fuzzy, even with a high megapixel image sensor, even with background removal. Especially after you allow for PC-driven skewing and flattening of book pages, you still get fuzziness at the extremities. If memory serves, a 16 megapixel camera produces images (prior to processing) of little over 300 dpi, which is okay but not great.
You can test this out for yourself by taking any book, holding down the edges if needed, then snapping a picture with your smart phone. Then import into either Photoshop or GIMP and play around with the picture to clarify it. You'll see what I mean.
A better approach would be something on the order of a flatbed scanner. In this example, the distance between the image head and the object being scanned is almost 0. (We're accounting for only the thickness of the glass, and some small spacing between the traveling image head and the glass.)
The results from this approach are crystal-clear, and need little or no computational correction. The text looks sharp and frequently requires no background removal.
If you are scanning a book, the best approach is to use a sheet feed scanner of some sort. The Fujitsu ScanSnap series is a good entry level option. It's affordable and it produces great results. The downside is you have to cut off the spine of the book in order to make it work. If you have a priceless book, this is not an option.
http://www.diybookscanner.org/
These fine folks offer a frame that allows for two-camera scanning of books without destroying the books. You supply the cameras and the computer that drives it; the software to stitch everything together is open source and free.
The goal of the operation here is to keep one camera each directly pointed at each page face of a book. This naturally minimizes distortion. The book sits in a cradle, and frequently has a 90-degree piece of glass which drops down and flattens the pages out (sapphire glass preferred). I haven't experimented much with this personally due to time, expense and spacing requirements, but based on what I have seen from example results, this is about as close as you are going to get to perfection without having to throw your book in the trash when finished.
The proposed Czur scanner will work in a pinch if you have nothing else on hand, but I wouldn't rely on it as a production device at all. The results have historically been too lousy.
-
Fails to account for collector's.
Write my name in the front of my 1st edition of Dune? Yeah I'm the kind of moron who'd love to do that! Wait, no, I'm not..
Writing your name in the front of your book immediately devalues it. Depending on the book possibly many many times the value of a digital copy.. and a discounted digital copy?.. Yeah right!So they harvest some data, get you to fuck up your possessions, and then give or sell you things already have the right to.
There are some cool DIY book scanners out there. Hey look how fast google is! http://www.diybookscanner.org/
-
Photocopy!
Years back, when in yee olde academy, I used to run into costs surpassing $500 per semester for books. I think they're north of $1,000 for a full semester now.
Sooooo, I used to get all my books bought and would immediately head up up to the local copy shop and then spend around $120+ or so and get double-sided copies of all my books on 11" x 17" paper. Then I'd return the books for full refund and then pocket the difference, a savings of up to 70% or more off the cover price. The process would take a few hours but for a poor college student who was literally going without food here and there, it was worth it. It was moderately inconvenient to have to lug around bags filled with of stacks of photocopies but I managed.
Nowadays, this is all obsolete. The copiers are now digital and making digital PDF scans of the books (or copies) is certainly within the realm of reality.
http://www.diybookscanner.org/
These charming folks here have a scan fixture for $500 (you supply the cameras and light and computer and GPL software, however). I used to hear of frat houses on campus pooling cash to buy one or just a few books which would be passed around to all the members who are taking the same class.
I wonder how long it will take before some enterprising folks to start to pool their cash and buy (or build) one of these fixtures to get around artificially-created barriers such as we see here?
-
Re:False economy
Frankly, I like the idea presented by these guys better:
http://www.diybookscanner.org/
The have the book lying down on it's spine and supported in a nice 45-ish angle that prevents too much of a tear. However they use ordinary cameras instead of the scanning tech used in a...well...scanner. Though I believe cameras tend to work faster than a scanner, so I don't see a downside.
-
Re:Expand you horizons
I agree, you do have free tuition, I am not disagreeing to that, but I just wanted to clarify no tuition didn't mean no costs at all. (once had the Finnish fever, before I realised I could never scrounge together the living money to actually go study there...yeah talk about shattered dreams
:D)Off-topic, but as far as books and pictures are concerned, I suggest you visit this website and their forums, they have a whole thing going around it:
-
English link
The link in parent post from Google Dutch.
Also, ugh, back scan all over! Can't read the bloody thing due to the back page image being scanned in. (courtesy of a flatbed, back-lit scanner?)
I think it should have been scanned with one of those front book scanner (like the ones they make here[1]) I dare presume that would have eliminated the problem?
-
Re:Market Analysis
Scan the books yourself. Don't pay for it twice. I can't imagine this being illegal on books you already own. (And they can't exactly put DRM on physical books.)
-
Re:Fujitsu ScanSnap or similar
My Scansnap can do 20 double-sided pieces of paper a minute. My camera can't.
I to capture documents like books that I can't send through the Scansnap, but it's much more effort.
-
Re:ADF Scanner and notepad
+1 for the parent
or simply shoot the pages with a digital camera and if needed do some post-processing.
You can even have full color if you need it.This is overkill for your project, but may lead some interesting places.
http://www.diybookscanner.org/-Greg
-
Re:Electronic Hoarder
Anyone have an easy way to convert an existing paper library to a useful elibrary?
http://www.diybookscanner.org/
I haven't built one yet, but I want to. The open source software offerings in this space just keep getting better all the time; toolchains to invoke gocr, etc., and now someone's even running a service to OCR the pages for you, as long as it's OK for them keep a copy in the Internet Archive.
The claims are that you can scan a novel in 20 minutes or less. It might take you quite a few nights and weekends to get through 9000 books at that pace, but if you never start you'll never finish.
-
Re:Another reason
It's almost impossible to find ANY ebooks for a lot of the stories I'd like in my library. It's not worth the publishers' time or investment to have them processed into ebooks, but God forbid anyone distribute them for free without paying. They're MUCH better off with everyone just forgetting the stories and the authors.
It's sort of like the movies they let rot in the vaults, locked away where nobody can pirate them.
I've been seriously thinking about putting together the Instructable book scanner... http://www.diybookscanner.org/
Is that the kind of rig you're using, or something else?As for sending the author the money, John Scalzi, at least, has said he doesn't WANT you to send him money instead of buying his books, because he has a good relationship with his publisher, and wants them to do well.
But books I'm thinking about scanning because I can't find them as ebooks at all...
Leonard Wibberley, the "Grand Fenwick" books especially, but he wrote some other great stories too.
T.J. Bass, "Almost Human" and "The Godwhale"
Trevanian the "Sanction" books
Dennis Schmidt, "Wayfarer"
Thomas Burnett Swann, lots of really sweet mythology/fantasy
most of Peter Benchley (OK, guilty pleasure. I thinkI own all of these in paper, but most of them I bought used, which don't contribute to the authors, their heirs or their publishers ANYWAY.
I would just like to have them as ebooks to clear out some space.A few of these might have a title or two available on the torrent sites, and a lot of authors ONLY appear on the torrent sites, all user scanned, proofread (usually poorly), and converted.
Chief among those, of course, is J.K. Rowling's series, since she STILL hasn't allowed a legal ebook version to be published, because she thinks that will keep it from being pirated.
-
Re:Students will complain
it'd hardly be rocket surgery to rig up a stand to hold the smartphone/camera.
Like this Do-It-Yourself Book Scanner?
-
Re:Anyone got error rates?
This is tesseract without training so the error rates are going to be high. It doesn't say if it is specifically using the development version, but if it's not, there is no layout analysis. That doesn't stop you from doing the scanning, and then do the OCR sometime in the future. Consider Diybookscanner.org for a much faster, cheaper, etc. way to scan your books.
-
Re:Blue print company
Even if your numbers are right it's so much cheaper to use 2 or 4 DSLRs with kit lenses (Canon's EF-S 18-55mm 1:3.5-5.6 IS for example has practically no distortion, so no reason to spend more than $ 175 on a lense). Most projects I've seen in the last year use the 450D, which costs roughly $ 600 including the lense I mentioned above. It features 12.2 MP and IIRC its successor will feature 15 MP for the same price. Canon is very popular in that field because they are the only manufacturer offering a stable API for accessing their cameras.
You can find further information here. There is also Atiz which offers very promising sets including software. I haven't seen their products in action yet (they don't do much business in Europe) and AFAIK they only offer book scanning devices, but the software should be able to do maps as well.
Another option is to use traditional overhead scanners. They are extremely expensive but their quality is unmatched. Zeutschel and Imageware are pretty large manufacturers. -
Build your own....
Simply set up a rig with 2 digital cameras and a plexiglass V to photograph 2 pages at a time. It's quite fast and cheap.
http://www.diybookscanner.org/
Works great. I built one to turn a couple of rare automotive books into PDF so I dont damage a $180.00 book in the garage.
-
Plenty of links, but what about page turning?
There's plenty of people working on this at the DIY Book Scanning site, but what they all lack... is page turning. I found this great project some students came up with that is simplistic and doesn't require you to preload pages at all.
Incorporate that, with the glass/plexi platen of the stock DIY book scanning projects, and you have a 100% complete, automatic, turn-it-on-and-walk-away book scanner from beginning to end.
-
Re:How about a $300 home-built scanner?
Follow the http://diybookscanner.org/ link. He says he's migrating it to there.
-
How about a $300 home-built scanner?
Some guy posted a great instructables on building your own high speed book scanner, purposely designed to rapidly photograph book pages without curves. He even includes a software stream that OCRs the contents and sticks them into PDFs.
It's been quite popular -- so much so that he's created an online forum at http://www.diybookscanner.org/ dedicated to discussions from DIY book scanners all over the place, where they talk about builds, parts, and software.
I've been very tempted to build one myself just to avoid carrying heavy books around in my backpack.