Book-Digitizing Robots
Makarand writes "Robotic digitization systems are the new help available to complete
voluminous scanning tasks.
Robots that can turn the pages of books and
newspaper volumes and attain scanning speeds of more than 1000 pages/hour
are now available. They even use puffs of compressed air to separate sticky pages!"
I am not sure it would. It might turn them on to the idea of thinking for themselves, though. That could have interesting consequences. Unfortunately, just this very possiblity is threatening to those who are now profiting from their ignorance. These people are likely in a position to be gatekeepers for the dissemination of information.
But, having a robot do something which is enhanced by mindless repetition is a natural robotic application. Then having that application be something that could enable political liberation is a interesting twist of the old "robots in service to humanity" ideals. I'm not so sure that those holding the reins are going to be so interested in this--call me cynical.
What I would like to see is a similar device for converting analog recordings, in whatever form be at tape, vinyl, wax cylinders, to an open digitized format and then have those recording made available in like fashion. It might be just as interesting to turn those kids in Africa on to Mozart, or oral arguments from the Supreme Court.
The best way to do is to be.
What about that Speed Reading TV Offer I took advantage of?!?!?!?!
They even use puffs of compressed air to separate sticky pages!
Whoah! I guess some pr0n really have decent articles.
"Not knowing when the dawn will come, I open every door." - Emily Dickinson
After a long night of coding or sleeping for that matter, it is hard to focus on the text on the screen. Scrolling down is another matter, i end up putting text up to 200% zoom in Mozilla. So now we can all print out these digatized copies and read them. This is neat stuff sure, but reading from a screen is hard, and most people will print it out anyways. The good thing is that people can now download it from the net. Assuming it is hosted on a site.
OMG OMG OMG WTF OMG WTF BBQ STFU RTFM, OMFG OMG OMG OMG ROFL LMAO OMG WTF STFU ROFLMAO
Finally, Johnny-5 is coming alive!
Music wants to be free.
With all this trouble of digitizing books, when the publishers send their books to libraries - do they include digital copies? They really should. Although, I don't know if there's an RIAA equivalent in the literary world but if there is, the idea of giving a digital copy might frighten them. Librarians? Has a publisher ever mentioned digital copies that are in a non-crippled format?
I hate liberals. If you are a liberal, do not reply.
This story is a good opportunity to plug some free software you could use to help digitize books.
Stuart Inglis's tic98 is a lossless compressor designed for black-and-white scanned documents. It achieves better compression ratios than anything else, or at least it did a couple of years ago. If you have scanned documents to make available online, it's fairly simple to write a CGI script to convert tic98 on the fly to PDF.
Hopefully someone else will reply to this comment with a recommendation of good free OCR software.
-- Ed Avis ed@membled.com
Those people in #bookz on IRC are gonna be so excited about this...
What do the newspapers, and more likely magazines think of this?
Now the magazine rack at 7-11 will show up on Kazoom and all that.
I mean, comic books or "graphic novels" as the nerds call 'em already get traded freely, but that's because some joker with no life takes a day out of his life to scan and crop each page.
But if you could just take the magazines, stick 'em in this robot, then share 'em, it could hurt the publishing industry the way it's hurt the recording industry.
And everyone will justify it by saying "why should I buy a magazine when it only has one good article and the rest is crap!"
So what measures can we expect to see? Lighter inks, crazier fonts to screw with the robots OCR? Funny paper that makes it hard to flip pages?
I don't need no instructions to know how to rock!!!!
sure, as long as they get Popular Mechanics or something...
Stop by my site where I write about ERP systems & more
... or until someone donates one to Project Gutenberg.
People who disagree with you are not automatically evil, greedy, or stupid.
But does this passage puzzle you a bit?
"Think about the power of bringing our library to little schools in the middle of Africa," Keller said. "Would it make a difference for those who now have their minds closed to the idea of democracy?"
I'm not sure I get the connection:
Mbutu: Hey, Kwasa, check out this copy of "The Horse Whisperer" on my Palm Pilot.
Kwasa: Incredible! We must hold free elections immediately!
If your bitterest enemies are people who hack the heads off civilians, then I would say you're doing something right.
What do we need to do to get one of these donated to Project Gutenberg? Right now one of the biggest things holding them up is a lack of volunteers to manually scan the books.
Mechanik
All it takes is one *really* large project. If somebody like the Library of Congress started scanning/digitizing their collection (I know--subject/verb agreement :), it would obviate the need for just about any smaller libraries to do so. You don't need thousands of libraries to scan the same book, you only need one, and then you can replicate electronically. Surely there are specialty libraries around that have unique collections, but again--all you need is one...
I didn't RTFA, but this could be useful not only for developing countries, but as a "force-multiplier" of sorts for smaller community libraries. En masse digitizing of published works would allow smaller libraries to compete on a more even footing with larger ones, without having to invest loads of money into their collections and facilities to hold them.
Any well-heeled library patrons out there want to donate some money earmarked for one of these things to the large library of your choice?
This would be awesome for records/document archiving. I knew a guy who worked at our State Library who had to catalog courthouse records across the state. He'd go out to some remote county where all the marriage, land and court records were on paper and try to figure out what they had. Some of the records went back to before the American Revolution. In nearly all cases, the only records were on paper.
If he could drag this robot along to a courthouse and scan the records over a couple of weeks, it would allow him digitize that information quickly. Not only would the digital copies be easier to search, they would be easier to preserve. One courthouse, where their file room was in the basement, nearly lost all of its old records to a flood.
I'm glad they didn't go with the design where it licked its thumb before turning each page. I hate that!
"Tolerance is the virtue of the man without convictions." -- G. K. Chesterton
Time for a change in terminology.
I did quite a bit of research on a low cost book scanner awhile ago, because the though of not having to lug around a heap of books from class to class is a dream come true. I hope this technology really takes off, and they find a way to make the whole thing a bit smaller/cheaper. I bet textbook publishers are scared silly about this..
hooray! it's a sex wiki
For the love of GOD, someone check this!!
blakespot
-- Heisenberg may have slept here.
iPod Hacks.com
The article says it would become cost effective for 5.5 million pages. Later it says it costs between $1 - $4 per book in the Far East. So if you estimate a book to have around 300 pages, doing the digitising manually would be $18333-$73333 per 5.5 million pages (ie 5500000/300 multiplied by cost per book). From the way article is written I expected it to cost ALOT more. I guess the proof reading cost for manual conversion could be high?
Not to long ago I had to do a research paper for a college class. No big deal, I've done many of them, and I was not looking forward to this one. Well, I went to the Houston Public Library in Downtown (which I hadn't been to in many many many , you get the idea, years). I got the library card that gave me access to some computer terminals and computer card catalogue. I was amazed about what they had converted electronically and links to other sites that had dictated material. I was also amazed that I could get all this same access from home using the information printed on the library card. So I go home (I have Road Runner cable modem) and do my research instead of being trapped in the library and get to work. I find electronic format of lots and lots of textbooks, magazines, government docs, and many many more. What put me a notch or two down from my high horse was that I even found that they had radio talk shows transcribed (which I used in my research paper) that helped a lot!
There is a lot of information ALREADY converted from text and audio sources at your fingertips that was unfathomable a few years ago. And all of this is free from the website (and links to other sources) from the public library. Talk about your one stop shop.
Using air to separate and move paper is not new. Heidelburg platen presses (you may remember them from high school graphic arts classes) have had this feature for about fifty years.
The more traditional way to preserve the contents of the old books is to destroy them in the process. Actually cutting the page out of the book lets you get a much higher quality scan because the page is then really truly flat. (Yes, there are correction techniques for turning scans of non-flat pages into flat "projections" but they aren't nearly as good as just ripping the page out and scanning it.)
Words like "Democracy" and "Freedom" is to an American what "Java" and "XML" used to be to a manager. Nowadays I guess it might be "C#" and "Dot NET".
This is not new.
The hardware has been hard at work since the late 70s/early 80s when PDP-8s and PDP-11s were used to control the hardware and store the results.
The first scanners had very small CCD arrays and these had to be pulled across the page horizontally as well as vertically AND it had vacuum "bars" on robot-arm "page turners".
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
Once books are digitized and OCR'd they need to be proofread by humans. The people who can afford this machine might do it another way but Project Gutenberg has volunteers at Distributed Proofreaders.
There was a Slashdot Article about it last year but there have been a lot of changes since then (many due to Slashdotters). If you haven't seen the project in a while you should check it out.
The NWAA (Novel Writers Artists Association) has issued that they will fight for legistlation to fight this piracy tool in congress.
"These reading Bots will put the book publishing business under within months..", their congress represenative said.
"There hasn't been this strong of an attack against the goodness of books and authors cince that evil man Gutenburgh created that evil printing press." Word on the street is that Hillary Rosen is oging to be hired as their spokesperson to help outlaw this evil that will undermine american life as we know it.
Do not look at laser with remaining good eye.
Having traveled in subsaharan Africa a bit, I can safely say that people I met there aren't "closed to the idea of democracy." (They're sometimes consciously "closed" to the idea of allowing mammoth, conscience-free American-based multinational corporations to subvert the democratic institutions they do have, though.)
I bet that was just an isolated quote the reporter chose, though. Seems more like her/his bias than the librarians, at first glance.
"Fundamentalism" isn't about divine morality. It's about human authority.
Analog is subject to degradation everytime it is reproduced. Digital conversion halts the degradation at conversion. Ones are ones and zeroes are zeroes from then on.
The best way to do is to be.
Yeah, but if they don't learn to read, they're going to be stuck with the same subsistence agriculture that hasn't worked too fucking well form them recently. That or UN or NGO handouts that only serve to strengthen the oppressive regimes that are torturing these people, because little of the aid that reaches the docks reaches the people thanks to rampant corruption.
Here's the current process:
1. Africa has crappy food production
2. West sends food
3. Food is intercepted by dictator's thugs.
4. Dictator sells food or uses it to extort loyalty
5. Dictator becomes rich and powerful
6. People become dependent upon the west and their dictator for food.
7. People get worse at farming, continue to starve, and dictator becomes yet stronger.
8. Goto 1.
Seems to me that education and empowerment might be part of the way to break that shitty cycle. Keeping people poor and incapable of supporting themselves isn't.
-Looking for a job as a materials chemist or multivariat
"We have hunger and want in the world because evil men use the vehicle of government to deny men that liberty which they need to produce abundantly."
Ezra Taft Benson
Make them free, and they'll bring the food and water into their villages themselves.
You can tell a great deal about the character of a man by observing those who hate him.
Instead of picking the book up and flipping the pages, couldn't you use X-ray tomography (or possibly microwave tomography) to get a 3d image of the book and extract pages from that?
This assumes two things: that the ink makes a difference to X-ray penetration compared to just paper, and that the resolution of the scanner is high enough to pick out individual pages. But typical medical scanners are pretty high-res I think. Has anyone tried this?
-- Ed Avis ed@membled.com
I seem to remember a few years back, during a tour of MIT's media lab, a project underway to basically MRI scan a closed book, then 'slice and dice' it page by page via some sophisticated algorithms into seperate files which could then be OCR'ed. The plus to this approach, is that for some books, just opening them would damage them beyond all repair.
I thouhgt it a pretty cool idea. Anyone ever heard of this befoe?
-Chipp