Scan a Book In Five Minutes With a $199 Scanner? (teleread.com)
New submitter David Rothman writes: Scan a 300-page book in just five minutes or so? For a mere $199 and shipping — the current price on Indiegogo — a Chinese company says you can buy a device to do just that. And a related video is most convincing. The Czur scanner from CzurTek uses a speedy 32-bit MIPS CPU and fast software for scanning and correction. It comes with a foot pedal and even offers WiFi support. Create a book cloud for your DIY digital library? Imagine the possibilities for Project Gutenberg-style efforts, schools, libraries and the print-challenged as well as for booklovers eager to digitize their paper libraries for convenient reading on cellphones, e-readers and tablets. Even at the $400 expected retail price, this could be quite a bargain if the claims are true. I myself have ordered one at the $199 price.
...
You still have to turn pages manually, I had expected they would have automated that (well, perhaps better if you still want to return the book to the library later).
Any digital camera on a tripod can do the same thing.
You've been able to do this for years and years a different way.
1. Get a sheet fed scanner like a Fujitsu Snapscan ($400)
2. Cut the binding off the book
3. Place the stack of pages into the scanner
4. Get a coffee
And you're done, the thing's 600 DPI and does both sides in the same pass. It creates a PDF directly, and you then want to OCR the PDF, running a sharpen filter on the text, and decide on how much you want to compress the PDF. A 1000 page textbook ends up being about 700 megabytes, in crystal clear quality.
"print-challenged"? Is this some form of computer-abelism? Against people that are physically unable to buy and setup a printer? This is a microtechnoaggression, plain and simple. I have to go check my overclocking privilege.
Much cheaper, same functuionality
P.S. If anyone from Staples is reading, your website is a bag of arse.
Confucius say, "Find worm in apple - bad. Find half a worm - worse."
You can get the same result at the same Speed with just a Smartphone or a Camera and the free Software ScanTailor.
Just craft a holder for the Smartphone/Camera out of an old Cardboard Box or something (holding by hand is also possible, but gets tiring after some pages), fotograph the book 2 pages at a time. The Pictures you then load into ScanTailor, which splits the pages, crops, rotates and sharpens the text automatically. Youre done.
In fact, watching the video i suspect that their software is a ripoff of ScanTailor.
The actual big news here: The company doing the indiegogo is located in Shenzhen, China.
This is the first one of these I've seen. It struck me as very odd that the video narrator was an almost perfect midwest accent, but had terrible grammar and word choice, but when looking at the location of the startup, it became more obvious that this was actually an Indiegogo out of China.
Anyway, good on them; I expect that we will be seeing a lot more people doing crowd-sourcing from non-U.S. locations, given that VC thends to be pretty tight outside of specific regions of the U.S. (which is, in turn, why most startups that go anywhere are U.S. based, rather than being in Europe, or elsewhere, where the funding climate is pretty terrible).
This would be very useful for damaged or brittle books, even a slight flex can to irreparable damage, or odd books like the chinese triangle or english 4 way books which are very hard to scan.
The only reason devices that can display printed sheet music like tablets and e-ink readers are not popular is that they are essentially useless for sight reading. A foot pedal for page turns could easily create a reader for musicians. It would catch on like wild fire and the music publishers could finally start to distribute good editions again. I have been saying this for years and no one listens, it is the usual routine with industry not seeing the forest for the trees that are still being cut to print music.
Forget everything you assume about whether or not there is a market for large format e-readers. Categorically there is and all it would take is a foot pedal. So simple but currently the great music publishing houses are in crisis because of digital equipment and unless they get on the program and start to distribute standard editions in digital form they will all die and be bought out by the large corporate bastards who have essentially ham strung the music publishing industry with senseless worries about DRM. All they have succeeded in doing is to force musicians to cheat and file share scans of music and in doing so have also greatly degraded the once esteemed high art of printed music notation and distribution. Precious few are only now realizing the mistake they made with their fears about their copyrights being broken.
I will gladly pay reasonable amounts for well edited digital sheet music, in fact I still buy from the best publishers that are still around. If I can also not have to waste money on the ink and printing racket so I can play music that is out of print I would be in heaven. I know most real musicians who read and understand the importance of well done sheet music will also do the same and pay for decent music editions in digital format.
There is much more than just books and the literary arts at a cross roads because of today's technology! Lets get together and put the ink and paper out of business once and for all, I say it has become archaic and far to costly environmentally and socially.
This message was not sent from an iPhone because Peter Sellers really was a deviated prevert without a dime for the call
convenient reading on cellphones, e-readers and tablets.
Strangely, most people seem to disagree with that very idea. Reading not convenient on electronic devices. Paper still is the best medium for books. If I have the book, why would I want to read it digitally?
The one thing an electronic library is good for is rapid searching. If you need a vast amount of knowledge available at a fingertip, and on the road, not in your library, then it's great.
For everything else, I and most other people prefer to turn around, take the book from the shelf and look it up there.
Assorted stuff I do sometimes: Lemuria.org
It's simply easier to read the PDF although the file size is enormous and you're basically looking at images of some yellowing old book which means lots of panning and zooming particularly on small devices. And forget reading it on an e-reader.
So yeah I think you could automate scanning of books, but the second step of getting it into EPUB format is the tricky part.
Buy a cheap set of USB racing foot pedals and a micro-usb adapter and voila, you can probably already do that. Or at the most a simple driver to interface the pedals as standard inputs and assign macros to them.
I have scanned 100 books from my personal library and realized I can't find nice open source software to OCR the images and search over the text of the entire library for keywords. At some point I created my own clone of Google Books, with OCRopus for translating the images and my own front end for searching and hi-lighting keyword matches. It would be very useful if we had a way to manage searching in hundreds of books, taking notes and remembering the page/citation. It would work like a research library.
It is often too difficult to read PDFs and scans on mobile devices. We could use a software to identify individual words in the scanned page and reflow the text to match the narrow screen size of phones and tablets. The reflowed document would use the original images of the words, only the rows and pages would be changed. Then we could read without panning and zooming.
Personally I prefer to just download the books with utorrent, usually somebody already scanned it, so no need to spend a couple of hundred bucks.
This is actually quite revolutionary. There seems to be a push now for digital libraries in developing countries where getting physical books is very difficult or prohibitively expensive www.tandli.com came up on a recent episode of BSDNow for instance.
Glad to see these things happening and that the community was strongly behind it!
This is just a camera and some CPU board for image processing and interfacing (Wifi, USB, HDMI).
If they opened their algorithms, you could probably do the same with a RPi and its camera module (assuming there is no AF or aperture control build in).
Forget everything you assume about whether or not there is a market for large format e-readers. Categorically there is
Categorically? Have you done any market research? Or are you just projecting your own desire (so strong that you've essentially posted off-topic to bring it up) onto everyone else, because you can't imagine why they wouldn't want the same thing?
A large format e-reader would be considerably heavier than a few dozen pages of sheet music. Yes, it could store more data, but that's not really going to be of much use to someone playing a fixed set. You can't fold it down the middle to save space. You can't make arbitrary notes on it. It (probably) doesn't photocopy too well to share with your fellow musicians (and you certainly couldn't put it into a feeder and leave it to copy while you make a cup of tea). It would probably be disproportionately expensive as well, since you would not be manufacturing them in the kind of numbers they make, for example, Kindle Paperwhites in - imagine the costs of equipping an entire orchestra. Page turns would have to be faster, and that black-white refresh would be a hell of a distraction. E-readers - last time I checked - are still not quite as bright or as crisp as printing on actual paper. And e-readers, reliable as they are, still have failure modes. The battery can run out, or simply fail. The footpedal is a separate mechanical device that can fail. Paper doesn't have a failure mode, apart from being actively destroyed.
systemd is Roko's Basilisk.
Since this product gets free placement here at /., I figure it is okay to put in a word for the good folks at Distributed Proofreaders.
Books are scanned and [sometimes roughly] OCR'd.
Each and every word, period, hyphen, and ellipsis on each and every page is scrutinized by at least three proofreaders.
Each bold, italic, underline and indent is evaluated by at least two formatters.
The work is finalized in HTML, proofread as a whole, and published to Project Gutenberg in various formats, txt, pdf, html and epub.
The resulting publication typically has far fewer publishing errors than the original book. This is especially true of books from the 17th century where drinking was part of a typesetter's expectation.
Be a part of it.
Sign up at http://www.pgdp.net/c/
I've had one of these for quite some time now, and it looks pretty much the same except more expensive and without the foot pedal option (great idea!)
The important thing is the software rather than the hardware which is meant to be able to detect the curvature of the pages on a bound book and adjust for it. It sort of works most of the time on the SV600 but it's not especially fast and neither is it entirely reliable.
I gave up on it mostly because the software for the Mac was pretty unreliable. I do note they release updates for it very regularly so maybe I should try it again as I haven't touched it in over half a year.
Jolyon
Please read my Canon EOS tech blog at http://www.everyothershot.com
figure ebooks average $10. little more for some new releases and a lot less for catalog titles. why spend $400 to pirate paper books?
How is turning a paper page any easier than pressing a button?
http://media.johnlaudun.org/wo...
I see your only exposure to a device capable of displaying electronic documents is the Apple Newton. You might want to check out the advances made in the past 22 years.
While you're at it, you might want to lose the brick mobile phone, the "Where's the beef?" bumper sticker and the Hypercolor t-shirt.
Offtopic member of the bridgetending fraternity that he may be, he does raise a valid question.
I typically do not post but I figured I would put in my two cents worth here.
I have been digitizing books for over a decade using various technologies, including the very expensive predecessor to this kind of scanner, namely the PS7000 from Minolta. http://www.microfilmworld.com/...
The problem with these kind of scanners is a fact that the extremities of any object you are scanning tend to look fuzzy, even with a high megapixel image sensor, even with background removal. Especially after you allow for PC-driven skewing and flattening of book pages, you still get fuzziness at the extremities. If memory serves, a 16 megapixel camera produces images (prior to processing) of little over 300 dpi, which is okay but not great.
You can test this out for yourself by taking any book, holding down the edges if needed, then snapping a picture with your smart phone. Then import into either Photoshop or GIMP and play around with the picture to clarify it. You'll see what I mean.
A better approach would be something on the order of a flatbed scanner. In this example, the distance between the image head and the object being scanned is almost 0. (We're accounting for only the thickness of the glass, and some small spacing between the traveling image head and the glass.)
The results from this approach are crystal-clear, and need little or no computational correction. The text looks sharp and frequently requires no background removal.
If you are scanning a book, the best approach is to use a sheet feed scanner of some sort. The Fujitsu ScanSnap series is a good entry level option. It's affordable and it produces great results. The downside is you have to cut off the spine of the book in order to make it work. If you have a priceless book, this is not an option.
http://www.diybookscanner.org/
These fine folks offer a frame that allows for two-camera scanning of books without destroying the books. You supply the cameras and the computer that drives it; the software to stitch everything together is open source and free.
The goal of the operation here is to keep one camera each directly pointed at each page face of a book. This naturally minimizes distortion. The book sits in a cradle, and frequently has a 90-degree piece of glass which drops down and flattens the pages out (sapphire glass preferred). I haven't experimented much with this personally due to time, expense and spacing requirements, but based on what I have seen from example results, this is about as close as you are going to get to perfection without having to throw your book in the trash when finished.
The proposed Czur scanner will work in a pinch if you have nothing else on hand, but I wouldn't rely on it as a production device at all. The results have historically been too lousy.
Paper degradation is an extremely common problem with document archiving.
Wait so people think that a crappy Chinese company that needs funding from indigogo has solved camera based book scanning AND curve elimination? HAHAHAHAHAHAHAHA
Google uses a laser to get 3d page data and eliminate curves... software line straightening is iffy at best. That video doesn't show a lot of output, does it? What about gutters? Hot spots from lights? Shadows?
tl;dr; People are still stupid.
The only reason devices that can display printed sheet music like tablets and e-ink readers are not popular is that they are essentially useless for sight reading. A foot pedal for page turns could easily create a reader for musicians. It would catch on like wild fire and the music publishers could finally start to distribute good editions again. I have been saying this for years and no one listens, it is the usual routine with industry not seeing the forest for the trees that are still being cut to print music.
You clearly have done zero research. There's a number of options, the most popular I've come across is the AirTurn, although the Cicada works well too from what I've heard.
"Don't meddle in the affairs of a patent dragon, for thou art tasty and good with ketchup." ~ohcrapitssteve
Maybe I'm just ignorant of how well OCR works, but I'd be very concerned with scanning mathematical texts. If a subscripted 'm' in the text were transcribed as an 'n' in the scan, there's a lot of probability and algebraic formulas that become dangerously misleading. Rho and p are two more characters I definitely wouldn't want interchanged. Is this a valid concern? Are the existing softwares pretty good at making these subtle distinctions, or is this meant more for pleasure reading rather than textbooks. Cause it would be great to forgo lugging around huge reference bibles across campus, but not if there's a chance of compromising the integrity of the information.
I have been saying this for years and no one listens, it is the usual routine with industry not seeing the forest for the trees that are still being cut to print music.
But did you EVER google a USB Foot Pedal?
Anywhere from 10 to 200 dollars.
I've tried scanning some of my books with a camera. This is simply an overhead scanner with manual page turning; you can buy them already. Realistically, it probably takes around 2-3s to scan a page, so it's about 20 minutes to scan a 500 page book. That's a lot of time to sit at a table turning pages.
But let's say you're willing to put in the work. The hard part in making this work is the software, not some $200 digital camera on a stick. And the really hard part in making this work is not on books that are as well behaved and flat as the ones they use in their demo, but on thicker hardcovers, exactly the kind of expensive books you want to preserve by scanning. Unfortunately, they don't talk about their software much, which leads me to believe that they haven't completed it yet. If they had, they could already be selling it without the hardware scanner.
If you're dealing with old books, you want a scanner than can cradle the book without opening it up flat.
And 60 pages per minute is actually pretty slow for these scanners. As you're imaging two pages at once, you only need to approach a page flip a second to get 120 pages/minute:
http://arstechnica.com/gadgets...
Note that the costs have gone up since that article was written. It used to be $500+electronics ... it's now $1200 + electronics + shipping. (as it's no longer someone doing it in his free time, and now a company doing it ... but it also now comes painted).
If you have access to a plywood cutting machine, all of the cutting patterns are available under GPL:
http://www.diybookscanner.org/
But as it holds the pages flat (with glass that presses down on the pages), rather than the book's spine flat, you don't have to worry about trying to correct for the distortion from curved pages. (or damage your books in the process)
Build it, and they will come^Hplain.
To me, the device looks like a camera on a copy stand.
My guess is that it uses a camera from a cell phone, some LEDs to provide illumination, and the foot pedal is the shutter trigger.
To scan, you hit the foot pedal to snap a photo, turn the page, hit the foot pedal again to snap another photo, turn the page, snap another photo, turn the page again, snap another photo, etc. Software then combines the photos into a scanned document.
Burning of the great library of Alexandria
Does the processing take place on the PC or the scanner? Ideally it would process on the scanner and just look like a USB drive to the computer, so it would be OS independent.
I've heard this TTS numerous times but only from Chinese companies trying to sell something. Most videos with this TTS were credited with creation by Alibaba Group.
It sounds mostly convincing but has always been accompanied by poor grammar - perhaps it sounds less realistic when the grammar is correct?
Does anyone know which TTS engine and voice this is? A brief interwebs search pulls up nothing that sounds like this.
Such a shame. Louisiana had the best books.
the global leaders in copyright abuse and piracy, to come up with a cheap way to easily and quickly produce a digital copy of a printed book.
David Rothman and Timothy, you are both stupid idiots if you think this device is worth anything. With a computer, OCR software, and a camera, I can already do the same thing. What makes this device any better? The software? Probably not. Hey, if you have money to burn, just burn it instead of buying more trash.
What's with the BlackBerry Passport and stock footage of Toronto in the video - from a Chinese company?
*** Don't be dull.***
Why don't foreign companies hire native English speakers to improve the texts of their translated material? This seems particularly bad with Chinese companies, whose English manuals and ad copy seem awkward, at best, as if they'd been translated literally, using either a Chinese to English dictionary or the software equivalent, rather than colloquially.
The English narration of the video about this product at Indiegogo is a prime example of this. I don't understand why the narrator didn't fix it himself, assuming it was a human and not a really good text-to-speech program. I haven't done voice work in a long while, but when I did, I would have made suggestions to the client for free, just so I wouldn't end up sounding like an idiot.
Oh well, at least they didn't say "All your books are belong to us!"
Speed is the key here, and if you watch carefully, they never show a real time video of one person scanning several pages. They keep changing the actors or shooting from different angles or speeding up the video. I want to see a 10 seconds real time video of one person scanning 10 pages before I buy one. They could have easily shown that but decided not to. Instead they claim they have new "algorithms". A bad sign.
$199 for a scanner that will scan a book in 5 mins and send a copy back to the chinese govt.
Buy a simple USB Pedal set(common for transcription), use Pedable or other software.
It's already done.
Lots, and lots and lots of reasons to dis this offering here. No new tech, just buy eBooks at $10 a pop, who wants hundreds of books on a device, what's so hard about destroying a book, OSS software already exists that does this, etc. etc.
Here's the thing for me: I want a research library I can take where ever I go. I am a heavy research library book user, and I buy a lot of used books, trying to get out of print texts. When I need a book, I need that exact book, and no substitute will do, because none exists. No, many of the books I want CANNOT be downloaded because someone else scanned it, or as an eBook. I cannot destroy library books, and if I own the book I'd rather have the paper copy too. Whether something is possible with similar tech and software is irrelevant. It needs to be fast and convenient. A well integrated system to this is worth a lot, a lot more than $234 (the final cost if you buy it now). I have worked with a number of Windows and Linux-based processing chains for scanning, page clean-up, compression, OCR, etc., and they are painful to use without exception, some are more painful than others, but none is anything like painless.
Will this really do the job I hope it will? I don't know, but I will find out.
Second class citizen of the New Gilded Age
He means this library fool.
(Some) Luddites in academia will still object if you show up in class and pull out a tablet with the book digitized on it. The dead-tree-textbook-publishing racket will die a slow and painful death as the publishing professors and companies seek to maintain their monopoly. $400 for a "new" Calculus textbook printed this year when the previous edition of that same book was in print for only 2 years? In most other areas of life this would be called extortion.
Organization? You must be joking..
Oh, what a relief! Minnesota has never produced anything of value, so it's no big loss.
Thanks for the schooling fellow Coward!
The indigogo site says "Your sketches, paintings, and notes can be scanned and stored in the Czur cloud".
Do we have the option to use our choice of server (maybe local)?
What if I don't want everything that I scan going to a company in China?
What if one day the "Czur cloud" is gone - is the scanner then unusable?
Has anybody tracked down these answers? The product seem appealing if non-cloud, independent operation is allowed.
Chinese don't write good books so they come up with a way to destroy the value of them. Stop ripping off authors. Stop giving money to the Chinese!
You might want to actually read what I've written, which is in reply to someone suggesting e-ink for sheet music.
systemd is Roko's Basilisk.
It's not about document archiving; it's about "live" documents printed to be used, not archived, and the impracticalities of applying e-ink as the solution in certain cases.
systemd is Roko's Basilisk.
bwahahaha
I have several terabytes of ebooks that I downloaded for free off the internet. All these and more fit into a physical space the size of a paperback novel, and I didn't have to spend millions of dollars to purchase them.
Yes, for me and for millions of others like me, they have definitely replaced printed books.
-- shiftless (410350)
I have millions of dollars "worth" of books (mostly in PDF, epub, and mobi formats) on a 3.5" form factor hard drive the size of a cheap paperback novel. I downloaded them all for free off the internet. I give out free copies to anyone and everyone. I don't care about legalities. Copyright is a dying, obsolete idea, perpetrated by a criminal state and corporate monopolists who'd own and monetize the English alphabet if they could find a way to do it.
Nothing you have said is based in reality though.
You go try to fold the stack of sheet music for "Rhapsody in Blue" or any other moderately lengthy classical piece and tell me if it's still smaller, lighter and more convenient than an E ink reader. You might be right if you've just got some crap rock band and play the same four chords over and over, but for most trained musicians switching to a reader would be a lot better.
Ever heard of "email" or a "web site"?
So you have terabytes of books --- that you will never read. Bonus for you.
I would be rather dubious about getting adequate quality images for OCR without controlling the lighting better. (I also wouldn't consider trying a task like this without pretty good OCR. that is near enough a solved problem these decades, given reasonable original images.)
Getting decent enough images to accurately render figures - graphs, or in one book I scanned previously, the tear-down/ re-build photos for the wheel hub on a broken car I owned. As presented, there is no effort at controlling the curvature of the pages. that is incredibly annoying to attempt to read, and is going to be highly destructive to attempts to OCR the images. Text size will vary along each line, along with the focus.
With a HP flat bed scanner, running a stack of open source OCR components, and manually turning the book and the pages, I could get 4 - 5 pages per minute, which was adequate. Otherwise, find a reliable scanning company in India, and post the books over there, if your time is more valuable than my off-shift time is.
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
Ah, another aficionado of dead tree technology. I find reading long documents online is very tiring. That is why I prefer using dead tree technology by printing the document.
Dead tree technology has many benefits:
It never needs to be recharged.
It is very portable. Just toss it into your bag. No cords or power supply.
It is very easy to share with some one. Just hand the book to them. Remember to put your name in it.
It has a very user friendly user indexing system called "dog ear".
Simply fold a corner of a page over and you can find your place again.
It is very easy to make notes with a pen or yellow highlighter technology. But only if it is your own book.
Character image resolution is excellent. No "jaggies" in the font.
Reading a book has a great tactile feel.
Holding it in your hands, turning the pages.
The only drawback is that it requires an external light source. Sunshine and daylight are great to read by but indoor lights work just as well. Even a flashlight under the bed covers.
Yes, I do like reading using my "dead tree" technology. The only problem is that in a decade or two, children will be asking me about my odd hand held device. Do I really never have to charge it? How can I use it if it does not connect to the Internet? What if I have a question or want to text my friends? Do I really need a different one for each book I want to read?
Apologies for this being off topic.
RLH
Slashdot a product review and best deals site? What next Fruit and Vegetable sales? Slashdot was once a repository of great tech info doled out by Tech snobs now its so down market it reads like its being produced in a basement by work experience interns A sad sad sad day. PS I am looking for a good home delivery service............
A cheaper version of this device is what everyone wants:
http://spectrum.ieee.org/automaton/robotics/robotics-software/book-flipping-scanning
No pedal needed (just put your media on desk), book mode available,but lacking wifi - its been invented: http://www.sceye.eu/de/
Other disadvantages of dead tree books: they come in one type size per book, they take up room, and they weigh a lot. (We had a structural engineer in to compensate for the weight of the bookshelves.)
"When you have eliminated the unacceptable, whatever is left, however improbable, must be the truthiness" - Holmes