Slashdot Mirror


Book-Digitizing Robots

Makarand writes "Robotic digitization systems are the new help available to complete voluminous scanning tasks. Robots that can turn the pages of books and newspaper volumes and attain scanning speeds of more than 1000 pages/hour are now available. They even use puffs of compressed air to separate sticky pages!"

54 of 233 comments (clear)

  1. Freedom 'Bots by rdewald · · Score: 5, Insightful
    I think there is a touch of naivete in this notion:

    "Think about the power of bringing our library to little schools in the middle of Africa," Keller said. "Would it make a difference for those who now have their minds closed to the idea of democracy?"


    I am not sure it would. It might turn them on to the idea of thinking for themselves, though. That could have interesting consequences. Unfortunately, just this very possiblity is threatening to those who are now profiting from their ignorance. These people are likely in a position to be gatekeepers for the dissemination of information.

    But, having a robot do something which is enhanced by mindless repetition is a natural robotic application. Then having that application be something that could enable political liberation is a interesting twist of the old "robots in service to humanity" ideals. I'm not so sure that those holding the reins are going to be so interested in this--call me cynical.

    What I would like to see is a similar device for converting analog recordings, in whatever form be at tape, vinyl, wax cylinders, to an open digitized format and then have those recording made available in like fashion. It might be just as interesting to turn those kids in Africa on to Mozart, or oral arguments from the Supreme Court.
    --
    The best way to do is to be.
    1. Re:Freedom 'Bots by Herg · · Score: 2, Funny

      Isn't Mozart already available in digitized format?

    2. Re:Freedom 'Bots by Joe+the+Lesser · · Score: 4, Funny

      Would it make a difference for those who now have their minds closed to the idea of democracy?

      Are you talking about the US Government here?

      --
      "I only speak the truth"
      Karma: null(Mostly affected by an unassigned variable)
    3. Re:Freedom 'Bots by CodeHog · · Score: 2, Insightful
      "Think about the power of bringing our library to little schools in the middle of Africa," Keller said. "Would it make a difference for those who now have their minds closed to the idea of democracy?"

      Think about the power of bringing food and water to little communities in the middle of Africa. Now that's powerful.

      --
      Fat, drunk, and stupid is no way to go through life, son.
    4. Re:Freedom 'Bots by KrispyKringle · · Score: 4, Informative
      Interesting point. However, its useful to note that there are a lot of charitable and commercial corporations which currently fund (perhaps for the PR value rather than their own good intentions, and because the US dollar goes so far in most parts of Africa) technology initiatives and other educational programs. I've posted in the past about a program I'm involved in funded by a couple US coporations to put computers and networks in a West African university.

      In regards to your vinyl recording idea, couldn't you just hook up a record changer (yes, they do make these; they have a big spindle and an arm) to a DAT or similar digital recording device, and then use some audio software to cut tracks at blank space?

    5. Re:Freedom 'Bots by qoncept · · Score: 4, Insightful

      Wouldn't they need something capable of viewing these digitized formats first?

      --
      Whale
    6. Re:Freedom 'Bots by PateraSilk · · Score: 3, Funny

      To crib some characters from lower down--

      It might turn them on to the idea of thinking for themselves, though.

      Mbutu: Whoa. Plato sez this is all a shadow of some higher plane of existence.

      Kwasa: Die Hutu scum!

      Unfortunately, just this very possiblity is threatening to those who are now profiting from their ignorance.

      Mbutu: Whoa. Marx sez the capitalists exploit the surplus wealth from their employees. Adam Smith sez each person has the ability to trade freely in the marketplace to maximize his or her advantage. Why am I digging these diamonds for foreign robber barons again?

      Kwasa: Die Hutu scum!

      "Think about the power of bringing our library to little schools in the middle of Africa."

      Mbutu: Whoa. Gandhi sez nonviolence is the best way to solve problems. What do you say to that, class?

      Kwasa: Die Hutu scum!

      Ahh, well. One can always dream, right?

      --
      Danke tres mucho, tovarishch.
    7. Re:Freedom 'Bots by gurps_npc · · Score: 4, Insightful
      I think your concept of converting analog to digital is ridiculous.

      Analog by definition is ALWAYS readable. It is the SINGLE format that is by definiton OPEN, can always be understood by anyone, and can stan the test of time. Aliens could discover an analog recording 50 billion years from now and decode it without knowing ANYTHING else about our culture. But right now, data encoded 25 years ago in an open digital format is often incredibally hard to translate to a usable form.

      Digital requires people to understand the digital format. The ONLY advantage to it is quality via the suprression of unintended noises. But if we are copying something that started out as Analog, then the quality improvement is minimal at best.

      DO not blindly use Digital for things that Analof is far better.

      --
      excitingthingstodo.blogspot.com
    8. Re:Freedom 'Bots by Tackhead · · Score: 4, Insightful
      > Analog by definition is ALWAYS readable. It is the SINGLE format that is by definiton OPEN, can always be understood by anyone, and can stan the test of time. Aliens could discover an analog recording 50 billion years from now and decode it without knowing ANYTHING else about our culture. But right now, data encoded 25 years ago in an open digital format is often incredibally hard to translate to a usable form.

      Hey Glortzotnik! Check this out! These humans, they used lasers to inscribe little hills and valleys in aluminum discs 12" in diameter for video, then smaller hills and valleys in aluminum discs 5" in diameter for audio, and then they used lasers to start chemical reactions that changed the color of a dye later in big sloppy round holes with lots of fuzziness around the edges for video again.

      Okay, nothing wrong with that, but the funny part - get this - they called the laser paintings and the chemical dyes "digital", as if it were somehow different from scratching clay with a stick or a wax cylinder with a needle. Laugh riot, these humans!

      To a DSP engineer, everything is analog.

    9. Re:Freedom 'Bots by konch · · Score: 4, Insightful

      actually, Africans such as the Igbo people of Nigeria have always had democratic institutions. And most Africans I know are very well informed. The people who need to learn more about democracy are the Americans. They've got a long ways to go.

    10. Re:Freedom 'Bots by Idarubicin · · Score: 2, Funny
      Freedom 'Bots

      Word to the wise--since the invasion of Iraq is over now, we're allowed to call them French Bots again.

      --
      ~Idarubicin
  2. Yeah but... by mschoolbus · · Score: 2, Funny

    What about that Speed Reading TV Offer I took advantage of?!?!?!?!

  3. Digitizing Pr0n? by Flamesplash · · Score: 2, Funny

    They even use puffs of compressed air to separate sticky pages!

    Whoah! I guess some pr0n really have decent articles.

    --
    "Not knowing when the dawn will come, I open every door." - Emily Dickinson
    1. Re:Digitizing Pr0n? by msheppard · · Score: 4, Funny

      I'm afraid a "puff of compressed air" ain't gonna unstick those pages.

      M@

      --
      Krispy Cream is people
  4. Hard to read on a screen. by Obscenity · · Score: 3, Insightful

    After a long night of coding or sleeping for that matter, it is hard to focus on the text on the screen. Scrolling down is another matter, i end up putting text up to 200% zoom in Mozilla. So now we can all print out these digatized copies and read them. This is neat stuff sure, but reading from a screen is hard, and most people will print it out anyways. The good thing is that people can now download it from the net. Assuming it is hosted on a site.

    --
    OMG OMG OMG WTF OMG WTF BBQ STFU RTFM, OMFG OMG OMG OMG ROFL LMAO OMG WTF STFU ROFLMAO
    1. Re:Hard to read on a screen. by JR · · Score: 2, Interesting

      I often read a great deal of my news and general research on the screen. I do this at a variety of screen resolutions, but often at 1024 x 768 up to 1600 x 1200 always at a refresh of 75 Hertz or higher.

      I've made no special adaptations for purity of screen color or gamma.

      I have excellent low light vision and wear sunglasses only on the brightest of days or in special circumstances like spending time in high glare situations (on the water, bright sand, snow, etc.).

      I've even read entire novels on the comparatively low resolution of an early Palm III. In that instance, the greater annoyance was more the small amount of text per "page" than the quality of the image.

    2. Re:Hard to read on a screen. by Eccles · · Score: 2, Funny

      This is neat stuff sure, but reading from a screen is hard, and most people will print it out anyways.

      Am I the only person reading Slashdot who gets amused by someone who says that?

      You won't get first post that way, anyways...

      --
      Ooh, a sarcasm detector. Oh, that's a real useful invention.
  5. Short Circuit by sin(theta) · · Score: 5, Funny

    Finally, Johnny-5 is coming alive!

  6. Current Books? by Acidic_Diarrhea · · Score: 2, Interesting

    With all this trouble of digitizing books, when the publishers send their books to libraries - do they include digital copies? They really should. Although, I don't know if there's an RIAA equivalent in the literary world but if there is, the idea of giving a digital copy might frighten them. Librarians? Has a publisher ever mentioned digital copies that are in a non-crippled format?

    --
    I hate liberals. If you are a liberal, do not reply.
    1. Re:Current Books? by Drakin · · Score: 2, Informative

      I beleive that up until recently most contracts between publishers and authors didn't include rights to publish digital versions.

      Not sure in the non fiction line of books who has uncrippled digital versions, but in fiction, Baen leads the way, between their Webscriptions service, free library, and the CD's included with some of their recent hardcovers. They provide the books in HTML, RTF, Mircrosoft Reader, some format that's Palm/Psion/WinCE friendly and Rocket Ebook.

      The first two are more than enough... their HTML setup is quite good actually.

  7. Scanned pages by Ed+Avis · · Score: 4, Interesting

    This story is a good opportunity to plug some free software you could use to help digitize books.

    Stuart Inglis's tic98 is a lossless compressor designed for black-and-white scanned documents. It achieves better compression ratios than anything else, or at least it did a couple of years ago. If you have scanned documents to make available online, it's fairly simple to write a CGI script to convert tic98 on the fly to PDF.

    Hopefully someone else will reply to this comment with a recommendation of good free OCR software.

    --
    -- Ed Avis ed@membled.com
    1. Re:Scanned pages by tempestdata · · Score: 5, Informative

      Actually, I've seen this robot operate in person and it is a work of art. The way the arms move makes you think its going to rip the book to pieces, yet some how it manages to pick up exactly one page( It detects if its picked up two pages and drops the extra page) and flip it.

      I was the lead developer for the software side that actually does the crunching on the images. However, I'm not sure exactly how much I am allowed to talk about it so I wont. Basically, the software side of it does produce PDFs, JPGs and TXT files from the OCR performed on the images.

      --
      - Tempestdata
    2. Re:Scanned pages by tempestdata · · Score: 2, Informative

      Oh... and no, unfortunately, its not open souce.

      --
      - Tempestdata
  8. DMCA smack down by BMonger · · Score: 3, Funny

    Those people in #bookz on IRC are gonna be so excited about this...

  9. Hmm... by stratjakt · · Score: 3, Interesting

    What do the newspapers, and more likely magazines think of this?

    Now the magazine rack at 7-11 will show up on Kazoom and all that.

    I mean, comic books or "graphic novels" as the nerds call 'em already get traded freely, but that's because some joker with no life takes a day out of his life to scan and crop each page.

    But if you could just take the magazines, stick 'em in this robot, then share 'em, it could hurt the publishing industry the way it's hurt the recording industry.

    And everyone will justify it by saying "why should I buy a magazine when it only has one good article and the rest is crap!"

    So what measures can we expect to see? Lighter inks, crazier fonts to screw with the robots OCR? Funny paper that makes it hard to flip pages?

    --
    I don't need no instructions to know how to rock!!!!
    1. Re:Hmm... by bob_jordan · · Score: 4, Funny

      " So what measures can we expect to see? Lighter inks, crazier fonts to screw with the robots OCR? Funny paper that makes it hard to flip pages? "

      I think you just described a typical issue of wired. Are they worried about people copying?

      Bob.

    2. Re:Hmm... by Phantasmo · · Score: 2, Insightful

      But if you could just take the magazines, stick 'em in this robot, then share 'em, it could hurt the publishing industry the way it's hurt the recording industry.

      The music industry hasn't be hurt by filesharing, it has been helped.
      People want the CD case, the inside jacket filled with graphics and lyrics.

      Similarly, most people hate reading off of a computer monitor. Lots of magazines give away some (or all) of their articles on their webpage already. If anything this'll inspire more subscriptions.

      Of course, all of this assumes that some magazine geek is going to shell out the cash for an OCR robot.

      --

      The US Army: promoting democracy through unquestioned obedience
  10. Re:But can they also by TopShelf · · Score: 2, Funny

    sure, as long as they get Popular Mechanics or something...

    --
    Stop by my site where I write about ERP systems & more
  11. Re:Great, but.. by daves · · Score: 2, Insightful

    ... or until someone donates one to Project Gutenberg.

    --
    People who disagree with you are not automatically evil, greedy, or stupid.
  12. I'm all for democracy, of course... by CommieLib · · Score: 5, Funny

    But does this passage puzzle you a bit?

    "Think about the power of bringing our library to little schools in the middle of Africa," Keller said. "Would it make a difference for those who now have their minds closed to the idea of democracy?"

    I'm not sure I get the connection:

    Mbutu: Hey, Kwasa, check out this copy of "The Horse Whisperer" on my Palm Pilot.

    Kwasa: Incredible! We must hold free elections immediately!

    --
    If your bitterest enemies are people who hack the heads off civilians, then I would say you're doing something right.
  13. Project Gutenberg by Mechanik · · Score: 5, Insightful

    What do we need to do to get one of these donated to Project Gutenberg? Right now one of the biggest things holding them up is a lack of volunteers to manually scan the books.


    Mechanik

    1. Re:Project Gutenberg by tempestdata · · Score: 4, Interesting

      Well I have some good news for you. While, I was working (and I still am actually) on this project I asked the Digital Library Projects Manager, who is basically in charge of this project about releasing the books they scan to the public. His reply was that they were probably going to release a pretty significant portion of the books they scan to the public. The rest would only be available within Stanford University Libraries.

      So, you may at one point see those books freely available for download, provided they can get those copyright issues ironed out.

      --
      - Tempestdata
    2. Re:Project Gutenberg by Musashi+Miyamoto · · Score: 3, Informative

      Actually, the primary thing holding up Project Gutenberg is the Sonny Bono Copyright Extension Act. The copyright law was recently extended so that nothing created earlier than the 1920s is going into the public domain.

      There is a large body of great 20th century works that will not enter the public domain for many years. Stuff by F. Scott Fitzgerald, Joseph Conrad, Arthur Conan Doyle, Rudyard Kipling, Willa Cather, Wallace Stevens, Yeats, Virginia Woolf, et al.

      Its a shame. I actually enjoy reading literature, and I am forced to go to the library for anything newer than 1923.

  14. Re:Great, but.. by Daniel+Boisvert · · Score: 4, Insightful

    All it takes is one *really* large project. If somebody like the Library of Congress started scanning/digitizing their collection (I know--subject/verb agreement :), it would obviate the need for just about any smaller libraries to do so. You don't need thousands of libraries to scan the same book, you only need one, and then you can replicate electronically. Surely there are specialty libraries around that have unique collections, but again--all you need is one...

    I didn't RTFA, but this could be useful not only for developing countries, but as a "force-multiplier" of sorts for smaller community libraries. En masse digitizing of published works would allow smaller libraries to compete on a more even footing with larger ones, without having to invest loads of money into their collections and facilities to hold them.

    Any well-heeled library patrons out there want to donate some money earmarked for one of these things to the large library of your choice?

  15. Archival Projects by borkus · · Score: 4, Insightful

    This would be awesome for records/document archiving. I knew a guy who worked at our State Library who had to catalog courthouse records across the state. He'd go out to some remote county where all the marriage, land and court records were on paper and try to figure out what they had. Some of the records went back to before the American Revolution. In nearly all cases, the only records were on paper.

    If he could drag this robot along to a courthouse and scan the records over a couple of weeks, it would allow him digitize that information quickly. Not only would the digital copies be easier to search, they would be easier to preserve. One courthouse, where their file room was in the basement, nearly lost all of its old records to a flood.

  16. Finger lickin good by dspfreak · · Score: 4, Funny
    They even use puffs of compressed air to separate sticky pages!

    I'm glad they didn't go with the design where it licked its thumb before turning each page. I hate that!

    --
    "Tolerance is the virtue of the man without convictions." -- G. K. Chesterton
  17. Book Ripping and Burning! by Dr.+Evil · · Score: 4, Funny

    Time for a change in terminology.

  18. I can't wait for digital textbooks by gnurb · · Score: 2, Interesting

    I did quite a bit of research on a low cost book scanner awhile ago, because the though of not having to lug around a heap of books from class to class is a dream come true. I hope this technology really takes off, and they find a way to make the whole thing a bit smaller/cheaper. I bet textbook publishers are scared silly about this..

    --
    hooray! it's a sex wiki
  19. LORD - Dont you people see what's happening here?! by blakespot · · Score: 4, Funny
    I don't know about you, but when I see a robot latched onto one of humanity's tome's of knowledge, poring over it at 1000 pages / minute puffing and aiming its high resolution CCD, I see what is clearly the first step in the rise of machines which will lead to the utter anhialation of humankind!!! We can't just feed them our knowledge!!

    For the love of GOD, someone check this!!


    blakespot

    --
    -- Heisenberg may have slept here.
    iPod Hacks.com
  20. Does it cost that much? by zebadee · · Score: 4, Insightful

    The article says it would become cost effective for 5.5 million pages. Later it says it costs between $1 - $4 per book in the Far East. So if you estimate a book to have around 300 pages, doing the digitising manually would be $18333-$73333 per 5.5 million pages (ie 5500000/300 multiplied by cost per book). From the way article is written I expected it to cost ALOT more. I guess the proof reading cost for manual conversion could be high?

  21. It's already starting to hit home... my experience by boy_afraid · · Score: 3, Informative

    Not to long ago I had to do a research paper for a college class. No big deal, I've done many of them, and I was not looking forward to this one. Well, I went to the Houston Public Library in Downtown (which I hadn't been to in many many many , you get the idea, years). I got the library card that gave me access to some computer terminals and computer card catalogue. I was amazed about what they had converted electronically and links to other sites that had dictated material. I was also amazed that I could get all this same access from home using the information printed on the library card. So I go home (I have Road Runner cable modem) and do my research instead of being trapped in the library and get to work. I find electronic format of lots and lots of textbooks, magazines, government docs, and many many more. What put me a notch or two down from my high horse was that I even found that they had radio talk shows transcribed (which I used in my research paper) that helped a lot!

    There is a lot of information ALREADY converted from text and audio sources at your fingertips that was unfathomable a few years ago. And all of this is free from the website (and links to other sources) from the public library. Talk about your one stop shop.

  22. Heidelburg press by Ars-Fartsica · · Score: 3, Informative

    Using air to separate and move paper is not new. Heidelburg platen presses (you may remember them from high school graphic arts classes) have had this feature for about fifty years.

  23. Destroying books to save them by shoppa · · Score: 3, Interesting
    The page-turning robots are unique because they do little (or no?) damage to the book to get them digitized.

    The more traditional way to preserve the contents of the old books is to destroy them in the process. Actually cutting the page out of the book lets you get a much higher quality scan because the page is then really truly flat. (Yes, there are correction techniques for turning scans of non-flat pages into flat "projections" but they aren't nearly as good as just ripping the page out and scanning it.)

  24. Buzzwords without a clue. by Anonymous Coward · · Score: 2, Funny

    Words like "Democracy" and "Freedom" is to an American what "Java" and "XML" used to be to a manager. Nowadays I guess it might be "C#" and "Dot NET".

  25. Been around the spook community since 70s/80s by crovira · · Score: 2, Informative

    This is not new.

    The hardware has been hard at work since the late 70s/early 80s when PDP-8s and PDP-11s were used to control the hardware and store the results.

    The first scanners had very small CCD arrays and these had to be pulled across the page horizontally as well as vertically AND it had vacuum "bars" on robot-arm "page turners".

    --
    MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
  26. Distributed Proofreaders by Ugmo · · Score: 2, Informative

    Once books are digitized and OCR'd they need to be proofread by humans. The people who can afford this machine might do it another way but Project Gutenberg has volunteers at Distributed Proofreaders.

    There was a Slashdot Article about it last year but there have been a lot of changes since then (many due to Slashdotters). If you haven't seen the project in a while you should check it out.

  27. The NWAA reponds to this threat by Lumpy · · Score: 2, Funny

    The NWAA (Novel Writers Artists Association) has issued that they will fight for legistlation to fight this piracy tool in congress.

    "These reading Bots will put the book publishing business under within months..", their congress represenative said.

    "There hasn't been this strong of an attack against the goodness of books and authors cince that evil man Gutenburgh created that evil printing press." Word on the street is that Hillary Rosen is oging to be hired as their spokesperson to help outlaw this evil that will undermine american life as we know it.

    --
    Do not look at laser with remaining good eye.
  28. Naivete might be a little soft by ianscot · · Score: 2, Interesting
    That was my reaction too. You sort of head down the same path, though -- poor people in underdeveloped countries can't "think for themselves"? What do you base this observation on?

    Having traveled in subsaharan Africa a bit, I can safely say that people I met there aren't "closed to the idea of democracy." (They're sometimes consciously "closed" to the idea of allowing mammoth, conscience-free American-based multinational corporations to subvert the democratic institutions they do have, though.)

    I bet that was just an isolated quote the reporter chose, though. Seems more like her/his bias than the librarians, at first glance.

    --
    "Fundamentalism" isn't about divine morality. It's about human authority.
  29. Better for what? by rdewald · · Score: 2, Interesting

    Analog is subject to degradation everytime it is reproduced. Digital conversion halts the degradation at conversion. Ones are ones and zeroes are zeroes from then on.

    --
    The best way to do is to be.
    1. Re:Better for what? by sketerpot · · Score: 2, Informative

      Which is why you use forward error correction. Have you ever scratched a CD? It can still play, thanks to FEC. (Cross Interleave Reed-Solomon Code, to be specific---good at correcting fairly small numbers of errors, like somebody drilling a 1 mm hole in a CD).

  30. Typical shortsighted response by siskbc · · Score: 2, Insightful
    Here's a hint, it's Africa. They can't eat books!

    Yeah, but if they don't learn to read, they're going to be stuck with the same subsistence agriculture that hasn't worked too fucking well form them recently. That or UN or NGO handouts that only serve to strengthen the oppressive regimes that are torturing these people, because little of the aid that reaches the docks reaches the people thanks to rampant corruption.

    Here's the current process:

    1. Africa has crappy food production

    2. West sends food

    3. Food is intercepted by dictator's thugs.

    4. Dictator sells food or uses it to extort loyalty

    5. Dictator becomes rich and powerful

    6. People become dependent upon the west and their dictator for food.

    7. People get worse at farming, continue to starve, and dictator becomes yet stronger.

    8. Goto 1.

    Seems to me that education and empowerment might be part of the way to break that shitty cycle. Keeping people poor and incapable of supporting themselves isn't.

    --

    -Looking for a job as a materials chemist or multivariat

  31. Make them free by b-baggins · · Score: 2, Interesting

    "We have hunger and want in the world because evil men use the vehicle of government to deny men that liberty which they need to produce abundantly."

    Ezra Taft Benson

    Make them free, and they'll bring the food and water into their villages themselves.

    --
    You can tell a great deal about the character of a man by observing those who hate him.
  32. Alternative to flipping by Ed+Avis · · Score: 2, Interesting

    Instead of picking the book up and flipping the pages, couldn't you use X-ray tomography (or possibly microwave tomography) to get a 3d image of the book and extract pages from that?

    This assumes two things: that the ink makes a difference to X-ray penetration compared to just paper, and that the resolution of the scanner is high enough to pick out individual pages. But typical medical scanners are pretty high-res I think. Has anyone tried this?

    --
    -- Ed Avis ed@membled.com
  33. Another way to scan.... by chippcom · · Score: 2, Interesting

    I seem to remember a few years back, during a tour of MIT's media lab, a project underway to basically MRI scan a closed book, then 'slice and dice' it page by page via some sophisticated algorithms into seperate files which could then be OCR'ed. The plus to this approach, is that for some books, just opening them would damage them beyond all repair.

    I thouhgt it a pretty cool idea. Anyone ever heard of this befoe?

    -Chipp