Google's Book Scanning Technology Revealed
blee37 writes "Last March we discussed Google's patent for a rapid book scanning system. This article describes and provides pictures of how the system works in practice. Google is secretive, but the system's inner workings were apparently divulged by University of Tokyo researchers who wrote a research article on essentially identical technology. There are also videos of robotic page flippers and information about how Google wants to use music to help humans flip pages."
Can RTFA for me
Nullius in verba
I often wondered if it would be possible for a book to be scanned while closed, using some kind of MRI technology that digitally sliced the book page by page, picking up on the density difference between the ink and the paper slice by slice.
A work that expires before its copyright never enters the public domain and thus enjoys eternal copyright protection.
Sea Shanties were sung in association with ship-board tasks (often repetitious in nature). Is Google paving the way for the Librarian chantey?
Looks kinda like this guys machine:
http://hardware.slashdot.org/story/09/12/13/1747201/The-DIY-Book-Scanner?art_pos=3
human type book into PC, machine print book on paper, machine binds book ---time goes by--- machine unbind book, robot and human flip pages of book, machine photograph book, machine put book on PC.
Simply set up a rig with 2 digital cameras and a plexiglass V to photograph 2 pages at a time. It's quite fast and cheap.
http://www.diybookscanner.org/
Works great. I built one to turn a couple of rare automotive books into PDF so I dont damage a $180.00 book in the garage.
Do not look at laser with remaining good eye.
Thank you, kind overlord!
Google is secretive, but the system's inner workings were apparently divulged by University of Tokyo researchers
Surely the whole point of the patent system is to grant exclusive use for a period in return for publishing full details of how whatever it is works? How can you have a patent without divulging the crucial information?
on youtube.
The link is slashdotted so I'm not sure if this is the same technology mentioned ITF...
...and information about how Google wants to use music to help humans flip pages.
...you will know it is time to turn the page when Tinkerbell rings her little bells like this...
Yaz.
Google's book scanning technology? Two guys and an Epson V500.
#DeleteChrome
The United States invaded half of Europe.
Nerd rage is the funniest rage.
Plus, I could envision a system where you loaded many books into a cartridge of sorts, say about 6 feet long, with a divider of some kind placed between each book.
As the scanner worked its way down the cartridge, it could detect the dividers, which would delineate one book from the next.
Thus even if the scanner were slow, perhaps it could scan say 50 books in one pass.
Steve
A work that expires before its copyright never enters the public domain and thus enjoys eternal copyright protection.
It involves pigeons, doesn't it?
The enemies of Democracy are
Back when we called them "Service Bureaus" book scanning was fast, easy, and cheap, as long as you didn't want the book back.
You deliver your book, magazine, phone book, map, large format document, or whatever to a Service Bureau.
They will then use a paper saw and cut the binding off and the other three sides to make perfectly smooth edges.
Then they put the whole mess into a hopper. The hopper feeds the pages to a scanner.
When it's done, flip the pile over and put it back into the hopper to get the odd-numbered pages into the scanner.
What you get back is your original book (as a pile of pages with no binding) and a CD-ROM of its contents in both original TIFF and OCRd text files. Now you can get them as PDF/A and DejaVu formats.
I suppose Google's point is that they don't want to ruin the books, or maybe they are so proud of their 3D-scanner enough to use it at all costs. But think of this: there are usually several thousands, perhaps millions, of copies of the books I've seen in Google's library, so destroying one copy of the book seems fair enough.
Kriston
The farmer in the dell, the farmer in the dell... hi-ho the merry-o, the farmer in the dell.
"They were pure niggers." – Noam Chomsky
I think this technology was developed to scan rare books. the kind you cant destroy you know ?
But once Google puts them online, they are no longer rare.
"Twist the spine"
There are also videos of robotic page flippers and information about how Google wants to use music to help humans flip pages.
From TFA:
The patent describes how a musical tone can be played from the speakers at regular intervals to give the operator a pace to flip pages to.
Not sure what this means, but what is the difference between a "musical tone" and a "tone"? Probably none, except a pleasant timbre with a pitch. From the description, it likely just means a pleasant-sounding "beep" or "ding" or something that recurs at intervals so people know when it's safe to turn the page.
In any case, hardly "using music" to encourage page-flipping -- which brings up weird images of people "Sweatin' to the Oldies" while turning pages for Google.
The University of Toyko's version is demo'd using a manga... go figure. The high-speed camera approach is also really cool. Reminds me of that TNG episode (yeah yeah, I know) where the aliens built that casino/hotel based on a book for that astronaut... Picard hands the novel over to Data and asks him to summarize. Data just flips through all the pages in like 3 seconds and spews out the madness.
Johnny Five had no problem flipping pages and scanning them back in 1986. I don't see what the big deal is here.
Long live the BSD license
Doesn't patent law require Google to disclose the invention in order to get it protected? I mean, I only have a vague idea of how it works, but I thought this was one of the points.
I am sure that in this worldwide depression, Google can easily find people willing to carefully place and turn books for $1/day. Sugar cane farmers in S. America work for $1/day. I would think being a book scanner would be a highly sought after position. Si Senor, the room has AC to keep the books comfortable?
I was very closely following this project having know the project team lead and talked to him about different projects he had going for the library deal. I remember 8 years ago talking to him about how he was accomplishing the scanning part of it, he told me they even created their own scanning software.
Today I saw the coolest little gadget that some homebrew tinkerer made, covered on /. a month ago, don't have link sorry....
and he used 2 cheap cameras and a big ass metal frame meant to keep the book open and then flip the pages hydrolically...or something like that....this makes more sense and is much more cost effective for you and me....then use pdf to take all the images and place them in pdf format, voila!
Since then, I don't even want to bother hearing about this project, as I know it cost millions of software dev. and hardware creation, and this other guy did his under 1000$....goes to show, not because your are Goggle big, that you need to spend google big money to get the job done.
It's surprisingly hard to automate page-turning. I saw the first page-turning machine many years ago, at the Census Bureau. It was used for 1970 Census form booklets, and used a vacuum belt to hold the booklet down while a wheel with vacuum holes rolled over the page to turn the page. This only worked for booklets with known dimensions, and it was rather rough on the booklets. But it was fast, doing about two flips a second.
It's such a boring job for humans that they screw up. A hand appears in the picture, or they turn two pages. So you need automation, or at least automated error checking.
The problem with mechanism design is making it both fast and gentle. There are lots of things that will work at one page every five seconds. Getting to two pages a second and never tearing one is tough. Most of the existing designs are simplistic; they're just some dumb mechanism making a repetitive motion with an air picker. The book-scanning developers haven't progressed to closed-loop force control yet.
Festo, the German robotics and actuator company, could probably build a better page turner. They build a wide range of machines which handle delicate objects fast in production environments. Their Bionic Tripod with Fin-Gripper is an example.
No, but a book is not 6 feet tall.
The speed of an MRI is proportional to the resolution you wan to have.
More voxels = More time.
The issue is, is the MRI, slow as it is, faster than flipping each page manually 300 times and taking individual scans?
In the Z dimension, you'll need an insanely extreme resolution to be able to tell apart 300 pages. And on each pages, you need a resolution high enough so the pages are clearly visible and not blurry.
For the record, a hidef anatomy-research-grade MRI scan of a brain has only 256x256x200 voxels and can take half an hour. And for that we used special high-speed techniques (3d mprage), which restrict you to 200 voxels in one of the dimensions. (You could make it even faster, but the result are going to be blurry and contain more artefacts)
For a library archive, you're going to needs much more voxels, and could end-up spending a couple of dozens of hours. (Well at least, a book is not a living patient and can remain immobile for 30hours without any problem).
I agree the tech isn't there yet, but it isn't inherently impossible.
Speaking of tech :
- You're hoping that the ink and the paper can be told appart based on their signal intensity (which is usually influenced by the proton density) - otherwise the image will be only a homogeneous monocolor blob.
- You're hoping that two adjacent pages can be told appart (air should contrast, but is there enough between the pages of a closed book ? Otherwise you'll have to do even more complicated 3D models to slice a continuous blob into pages and hope you don't slice at an angle and end up with 1 page on 2 different slices.
- You're hoping that no other material in the book will react (for exemple it doesn't have metallic paint). Because the golden leafs on older books can "eat" the MRI signal and result in image with hypo-instense shadows.
- You're hoping that no other material in the book will react II : Because some component absorbing the radio wave might convert the energy into thermal, and unlike a living patient, a book doesn't have a circulatory system or other ways to thermo-regulate and thus might get pretty damn hot after 30 hours of scanning.
- You're hoping that the device's price will drop dramatically - because you'll need a lot of machines taking 30hours per book to replace the work of 1 guy quickly flipping books in front of a camera. (30 machines if it takes the guy 1 hour to flip a whole book through the Japanese scanner).
- You're hoping that you'll find a way to pack 30 machines together with huge magnetic fields each, without disturbing each other (which *is* doable today by calibrating the machines) and without causing major problems to the environment (30 machines with magnetic fields of several Tesla each. DO NOT WEAR ANYTHING PUT A CLOTH OR PAPER GOWN when near by, and leave all your electronic gizmos and credit cards in a different building).
Meanwhile, with the Japanese system, you just have one guy (can be a poor student hired for a low pay) mindlessly flipping page in from of a 3D camera which captures the pages on the fly mid-flip and unwraps them in post-processing.
"Sufficiently advanced satire is indistinguishable from reality." - [Tips: 1DrYakQDKCQ6y52z6QbnkxHXAocMZJE61o ]