Slashdot Mirror


Google's Book Scanning Technology Revealed

blee37 writes "Last March we discussed Google's patent for a rapid book scanning system. This article describes and provides pictures of how the system works in practice. Google is secretive, but the system's inner workings were apparently divulged by University of Tokyo researchers who wrote a research article on essentially identical technology. There are also videos of robotic page flippers and information about how Google wants to use music to help humans flip pages."

30 of 100 comments (clear)

  1. Now my PC by bugs2squash · · Score: 5, Funny

    Can RTFA for me

    --
    Nullius in verba
  2. MRI technology? by maillemaker · · Score: 5, Interesting

    I often wondered if it would be possible for a book to be scanned while closed, using some kind of MRI technology that digitally sliced the book page by page, picking up on the density difference between the ink and the paper slice by slice.

    --
    A work that expires before its copyright never enters the public domain and thus enjoys eternal copyright protection.
    1. Re:MRI technology? by CRCulver · · Score: 2, Insightful

      MRIs are very slow. Ever have one?

    2. Re:MRI technology? by SnarfQuest · · Score: 5, Insightful

      When the book is closed, the ink from facing pages will be mashed together, shouol you will need to be able to tell which page the ink is attached to. Since the ink mostly sits on top of the paper (if it soaks through you wouldn't be able to read the other side veery well) it is a very thin layer. Your scanning technology would need to be able to sense very small volumes of ink. I don't think we are anywhere close to the necessary precision yet.

      --
      Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
    3. Re:MRI technology? by trb · · Score: 2, Interesting
      The patterns generated by 2 pages of text superimposed on each other (with one set in mirror image) are not impossible to read. Take a two-sided page and hold it up to the light and try to read it. It may seem difficult, the symbols may be fully or partially superimposed, but it's not impossible. It may be solvable with sufficient computes, which means that if you can't do it now, you'll probably be able to do it on your cell phone in 10 years.

      As for finding the boundaries between books in a stack, if a scanner can scan pages in a closed book, I think it will have little trouble separating the books.

    4. Re:MRI technology? by Thud457 · · Score: 2, Interesting

      yeah, it'd really suck if Google applied their sizable brainage to solving a problem that would have making MRI's cheap and fast as a side-effect. totally suck.


      I haven't verified this, but my father-in-law told me the guy that invented the MRI wanted to develop it as a medical scanner to the point where it was cheap enough that everybody could afford it. Then GE et al locked up the idea and turned it into a profit center.

      --

      the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

    5. Re:MRI technology? by TooMuchToDo · · Score: 2, Interesting
      http://nanoscale.blogspot.com/2007/09/secret-joys-of-running-lab-helium.html

      The downside of liquid helium is that it's damned expensive, and getting more so by the minute. Running at full capacity I could blow through several thousand liters in a year, and at several dollars a liter minimum plus overhead, that's real money. As a bonus, lately our supplier of helium has become incredibly unreliable, missing orders and generally flaking out, while simultaneously raising prices because of actual production shortages. I just had to read the sales guy the riot act, and if service doesn't improve darn fast, we'll take our business elsewhere, as will the other users on campus. (Helium comes from the radioactive decay of uranium and other alpha emitters deep in the earth, and comes out of natural gas wells.) The long-term solutions are (a) set up as many cryogen-free systems as possible, and (b) get a helium liquifier to recycle the helium that we do use. Unfortunately, (a) requires an upfront cost comparable to about 8 years of a system's helium consumption per system, and (b) also necessitates big capital expenses as well as an ongoing maintenance issue. Of course none of these kinds of costs are the sort of thing that it's easy to convince a funding agency to support. Too boring and pedestrian.

      By the way, I spend most of my days on site at the largest US particle accelerator. Let me know if you'd like to chat with the cryo dept. about how much the tankers of liqiud helium cost ;)

    6. Re:MRI technology? by guruevi · · Score: 2, Interesting

      I never heard the story but you might be confusing MRI with X-Ray machines. You might also remember the stories about X-Rays in shoe stores and why that wasn't a good idea.

      But either way, the costs are not unrealistically high, you can pick up a used MRI machine for about a 100k. GE doesn't have a monopoly on MRI's, Siemens, Hitachi and a few others make them as well. The simple physics alone however would not allow an MRI machine for most people though. The magnets involved are just too strong that they become dangerous when any metal is brought in the room (see MRI safety videos for examples of the missile effect). The higher powered machines (1.5T and up) require high-power, supercooled magnets which draws a lot of power from the grid (about 100A or the maximum capacity of the average house installation). Of course afterwards you might need to be able to interpret them so you'll still need a doctor familiar with your MRI system as even the simplest images can have artifacts that are easily misinterpreted.

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    7. Re:MRI technology? by Daley_G · · Score: 2, Insightful

      MRIs have resolution down to 90nm.

      Simpler/faster solution would be to insert a piece of paper in between all the pages to be scanned...

      Wouldn't that defeat the purpose of using the MRI to begin with? Inserting ONE sheet of paper between EVERY page in a book doesn't seem like it would take much more effort than flipping the page and photographing it.

    8. Re:MRI technology? by Snaller · · Score: 2, Funny

      Brilliant idea!

      Make it so!

      --
      If Google really cared they would fix Android Chrome to reflow text, instead of discriminating
    9. Re:MRI technology? by cdfh · · Score: 2, Funny

      which means that if you can't do it now, you'll probably be able to do it on your cell phone in 10 years.

      I can't solve the TSP for 1000 cities on my desktop computer today, but I suspect in 10 years time I'll be able to solve it on my mobile phone.

  3. Librarian Chantey by _Sprocket_ · · Score: 4, Informative

    Sea Shanties were sung in association with ship-board tasks (often repetitious in nature). Is Google paving the way for the Librarian chantey?

    1. Re:Librarian Chantey by FooAtWFU · · Score: 5, Funny

      Cape Cod ladies don't use no books -
      Haul away, haul away!
      Well they read their stories on robotic Nook®s
      and we're bound away for Australia!

      --
      The World Wide Web is dying. Soon, we shall have only the Internet.
  4. summary of summary. by Anonymous Coward · · Score: 5, Funny

    human type book into PC, machine print book on paper, machine binds book ---time goes by--- machine unbind book, robot and human flip pages of book, machine photograph book, machine put book on PC.

  5. Build your own.... by Lumpy · · Score: 4, Interesting

    Simply set up a rig with 2 digital cameras and a plexiglass V to photograph 2 pages at a time. It's quite fast and cheap.

    http://www.diybookscanner.org/

    Works great. I built one to turn a couple of rare automotive books into PDF so I dont damage a $180.00 book in the garage.

    --
    Do not look at laser with remaining good eye.
    1. Re:Build your own.... by Malard · · Score: 2, Funny

      I built one to turn a couple of rare automotive books into PDF so I dont damage a $180.00 book in the garage.

      Great, so now you can damage a $1800 laptop instead?

      --
      XBMC | Pulse-Eight
    2. Re:Build your own.... by 0100010001010011 · · Score: 2, Informative

      Or damage some cheap 8.5x11 that you print out the relevant pages on.

    3. Re:Build your own.... by Lumpy · · Score: 4, Interesting

      What idiot would use a $1800 laptop in the garage to view a PDF?

      Let me guess, you change your oil wearing a cashmere sweater and silk shirts as well.

      Nope. I risk my $40.00 fujitsu tablet PC that views pdf's just fine but has not enough Horsepower to do much else. works awesome as a garage PC to read PDF's and read the engine codes with my RS232-ODBII scanner/logger.

      --
      Do not look at laser with remaining good eye.
    4. Re:Build your own.... by Coren22 · · Score: 2, Funny

      Maybe he is running Acrobat Reader in Vista

      --
      APK likes to ask for responses to the same things over and over. Maybe he just likes the responses?
  6. Missing the point? by johnw · · Score: 4, Insightful

    Google is secretive, but the system's inner workings were apparently divulged by University of Tokyo researchers

    Surely the whole point of the patent system is to grant exclusive use for a period in return for publishing full details of how whatever it is works? How can you have a patent without divulging the crucial information?

    1. Re:Missing the point? by pclminion · · Score: 3, Insightful

      I work for a company with a lot of patents. Our products are protected partially by patents and partially by trade secret information. In other words, to recreate our product you would need to license the patents AND figure out how we did the other stuff, that is NOT patented, but is secret. There's no reason you can't mix patented and trade secret technology in one solution.

    2. Re:Missing the point? by zavyman · · Score: 2, Informative

      While you may be correct in certain circumstances, your wording gives a false impression that this always works. You must disclose the best mode when filing a patent application.

      The specification . . . shall set forth the best mode contemplated by the inventor of carrying out his invention.

      "The best mode requirement creates a statutory bargained-for-exchange by which a patentee obtains the right to exclude others from practicing the claimed invention for a certain time period, and the public receives knowledge of the preferred embodiments for practicing the claimed invention." Eli Lilly & Co. v. Barr Laboratories Inc., 251 F.3d 955, 963, 58 USPQ2d 1865, 1874 (Fed. Cir. 2001).

      The best mode requirement is a safeguard against the desire on the part of some people to obtain patent protection without making a full disclosure as required by the statute. The requirement does not permit inventors to disclose only what they know to be their second-best embodiment, while retaining the best for themselves. In re Nelson, 280 F.2d 172, 126 USPQ 242 (CCPA 1960).

      As you hint at, there's nothing wrong with combining one invention with another, one protected by patent law and the other by trade secret.

  7. Musical page flipping. by Yaztromo · · Score: 4, Funny

    ...and information about how Google wants to use music to help humans flip pages.

    ...you will know it is time to turn the page when Tinkerbell rings her little bells like this...

    Yaz.

  8. Executive summary by 93+Escort+Wagon · · Score: 2, Funny

    Google's book scanning technology? Two guys and an Epson V500.

    --
    #DeleteChrome
  9. Lemme guess... by Chris+Burke · · Score: 2, Funny

    It involves pigeons, doesn't it?

    --

    The enemies of Democracy are
  10. We used to call them "Service Bureaus" by kriston · · Score: 4, Interesting

    Back when we called them "Service Bureaus" book scanning was fast, easy, and cheap, as long as you didn't want the book back.

    You deliver your book, magazine, phone book, map, large format document, or whatever to a Service Bureau.
    They will then use a paper saw and cut the binding off and the other three sides to make perfectly smooth edges.
    Then they put the whole mess into a hopper. The hopper feeds the pages to a scanner.
    When it's done, flip the pile over and put it back into the hopper to get the odd-numbered pages into the scanner.

    What you get back is your original book (as a pile of pages with no binding) and a CD-ROM of its contents in both original TIFF and OCRd text files. Now you can get them as PDF/A and DejaVu formats.

    I suppose Google's point is that they don't want to ruin the books, or maybe they are so proud of their 3D-scanner enough to use it at all costs. But think of this: there are usually several thousands, perhaps millions, of copies of the books I've seen in Google's library, so destroying one copy of the book seems fair enough.

    --

    Kriston

    1. Re:We used to call them "Service Bureaus" by Monkeedude1212 · · Score: 3, Informative

      There are ALOT of books out there which would NOT be suitable for this method. A friend of mine in University for Museum Studies often has to read these books which are incredibly old. I believe the University has a couple that date somewhere around the 1830's which is older than the books you find in the historical village we have in town.

      Yes, the university lets you read books that are old enough to belong in a museum. She showed me one of them one time. It was like a manuscript, Thick leather binding, nothing written on the front, heavy faded pages. I almost couldn't believe it.

      Sadly, that was the most exciting part of it. The writing was dryer than a desert, and it was on some subject that I had zero interest in. They are supposedly starting to go ALL digital, so I have no idea what they're going to do with those old books and mansucripts they've got sitting around.

      I hope they don't destroy them.

    2. Re:We used to call them "Service Bureaus" by swillden · · Score: 2, Interesting

      That's insufficiently destructive.

      They should use the method from Vernor Vinge's novel "Rainbow's End", where the books are fed into what is essentially a giant chipper/shredder. The shredded pages are then blown through a tunnel studded with cameras, swirled around so that every side of every piece of paper is photographed at some point, and then all of the images are reassembled to form complete images of every page. At the end of the tunnel is an incinerator which burns the shredded paper.

      The books are gone.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
  11. Re:Video of book scanner... by Jeremy+Erwin · · Score: 3, Informative

    Elegant, hypnotic, and not what google uses. Google scans the books, lying flat. It projects a grid-like pattern over the pages in IR, photographs up the distorted image using 3D cameras, and recreates a 3D model of the book, and uses that model to undistort the pages. It uses human slaves to turn the pages, since robots aren't as gentle.

    The link isn't slashdotted anymore

  12. Re:Short Circuit by Ksevio · · Score: 2, Funny

    Yeah, but Johnny Five was ALIVE!