Book-Digitizing Robots
Makarand writes "Robotic digitization systems are the new help available to complete
voluminous scanning tasks.
Robots that can turn the pages of books and
newspaper volumes and attain scanning speeds of more than 1000 pages/hour
are now available. They even use puffs of compressed air to separate sticky pages!"
I am not sure it would. It might turn them on to the idea of thinking for themselves, though. That could have interesting consequences. Unfortunately, just this very possiblity is threatening to those who are now profiting from their ignorance. These people are likely in a position to be gatekeepers for the dissemination of information.
But, having a robot do something which is enhanced by mindless repetition is a natural robotic application. Then having that application be something that could enable political liberation is a interesting twist of the old "robots in service to humanity" ideals. I'm not so sure that those holding the reins are going to be so interested in this--call me cynical.
What I would like to see is a similar device for converting analog recordings, in whatever form be at tape, vinyl, wax cylinders, to an open digitized format and then have those recording made available in like fashion. It might be just as interesting to turn those kids in Africa on to Mozart, or oral arguments from the Supreme Court.
The best way to do is to be.
What about that Speed Reading TV Offer I took advantage of?!?!?!?!
They even use puffs of compressed air to separate sticky pages!
Whoah! I guess some pr0n really have decent articles.
"Not knowing when the dawn will come, I open every door." - Emily Dickinson
After a long night of coding or sleeping for that matter, it is hard to focus on the text on the screen. Scrolling down is another matter, i end up putting text up to 200% zoom in Mozilla. So now we can all print out these digatized copies and read them. This is neat stuff sure, but reading from a screen is hard, and most people will print it out anyways. The good thing is that people can now download it from the net. Assuming it is hosted on a site.
OMG OMG OMG WTF OMG WTF BBQ STFU RTFM, OMFG OMG OMG OMG ROFL LMAO OMG WTF STFU ROFLMAO
books that change pages into robots next? What's this world coming to???
___ Shout Central - Crushes your nuts!
...not innovation. I know it's important, but it's not as exciting. Perhaps this attitude is why software is so buggy?
Iiiiinput!
So they're finally going to make something good come out of the Short Circuit movies?
Finally, Johnny-5 is coming alive!
Music wants to be free.
Make me say "whoa." Make robots that skip to the end to find out whodunit.
With all this trouble of digitizing books, when the publishers send their books to libraries - do they include digital copies? They really should. Although, I don't know if there's an RIAA equivalent in the literary world but if there is, the idea of giving a digital copy might frighten them. Librarians? Has a publisher ever mentioned digital copies that are in a non-crippled format?
I hate liberals. If you are a liberal, do not reply.
Combine this with M$ speech synthesis (Sam) and that could replace my old history teacher.
All he did was dictate notes to us, Very Fast and boring
Ctrl-Z
This story is a good opportunity to plug some free software you could use to help digitize books.
Stuart Inglis's tic98 is a lossless compressor designed for black-and-white scanned documents. It achieves better compression ratios than anything else, or at least it did a couple of years ago. If you have scanned documents to make available online, it's fairly simple to write a CGI script to convert tic98 on the fly to PDF.
Hopefully someone else will reply to this comment with a recommendation of good free OCR software.
-- Ed Avis ed@membled.com
"don't we all get our pr0n on the web these days?"
Now you know where all the pr0n came from.
If you keep throwing chairs, one day you'll break windows....
It's so expensive! The article estimates that the robot is only cost effective for huge projects (>5.5million pages). This technology is not going to make an impact until it becomes cheaper.
What did everyone forget Number 5"?
Fight Spammers!
Those people in #bookz on IRC are gonna be so excited about this...
What do the newspapers, and more likely magazines think of this?
Now the magazine rack at 7-11 will show up on Kazoom and all that.
I mean, comic books or "graphic novels" as the nerds call 'em already get traded freely, but that's because some joker with no life takes a day out of his life to scan and crop each page.
But if you could just take the magazines, stick 'em in this robot, then share 'em, it could hurt the publishing industry the way it's hurt the recording industry.
And everyone will justify it by saying "why should I buy a magazine when it only has one good article and the rest is crap!"
So what measures can we expect to see? Lighter inks, crazier fonts to screw with the robots OCR? Funny paper that makes it hard to flip pages?
I don't need no instructions to know how to rock!!!!
sure, as long as they get Popular Mechanics or something...
Stop by my site where I write about ERP systems & more
Every time I have to show three forms of ID and walk through a metal detector just to go to my office, the terrorists have won, eliminating productivity and positive morale. And for what?
How many L.O.C's per hour is that?
liqbase
Really, it's ridiculous that I've got 140 gigabytes of storage in my apartment, and all these shelves of paper. (And don't bitch to me about reading on screen, a tablet with high-resolution screen displaying large type wouldn't be too bad, and digital paper ain't far away.)
But does this passage puzzle you a bit?
"Think about the power of bringing our library to little schools in the middle of Africa," Keller said. "Would it make a difference for those who now have their minds closed to the idea of democracy?"
I'm not sure I get the connection:
Mbutu: Hey, Kwasa, check out this copy of "The Horse Whisperer" on my Palm Pilot.
Kwasa: Incredible! We must hold free elections immediately!
If your bitterest enemies are people who hack the heads off civilians, then I would say you're doing something right.
So, is there a "BIAA" who can lobby all the worlds politicians to make this device illegal?
nice one
-- Karma Karma Karma Karma, Karma Chameleon - Boy George
Actually, I found the original (by Asimov) ...
better
What do we need to do to get one of these donated to Project Gutenberg? Right now one of the biggest things holding them up is a lack of volunteers to manually scan the books.
Mechanik
This would be awesome for records/document archiving. I knew a guy who worked at our State Library who had to catalog courthouse records across the state. He'd go out to some remote county where all the marriage, land and court records were on paper and try to figure out what they had. Some of the records went back to before the American Revolution. In nearly all cases, the only records were on paper.
If he could drag this robot along to a courthouse and scan the records over a couple of weeks, it would allow him digitize that information quickly. Not only would the digital copies be easier to search, they would be easier to preserve. One courthouse, where their file room was in the basement, nearly lost all of its old records to a flood.
Being able to scan 1000 oages an hour: $Lots
Converting 8 million volumes into a digital database: > $250 million
Having robots digitize every porn collection in the world, fast : Priceless
c - a blessed +5 grain of salt
I'm glad they didn't go with the design where it licked its thumb before turning each page. I hate that!
"Tolerance is the virtue of the man without convictions." -- G. K. Chesterton
Yeah sure, the articles is why people buy these magazines. But really, this would be cool for Project Gutenberg, or more specifically those scanning books for the Distributed Proofreaders.
Time for a change in terminology.
Once librarians get their hands on these they could be the new b00kw@r3z G0dz. Just think about searching the content of your library on kazaa.
;)
By that time someone will have thought up copy protection
I did quite a bit of research on a low cost book scanner awhile ago, because the though of not having to lug around a heap of books from class to class is a dream come true. I hope this technology really takes off, and they find a way to make the whole thing a bit smaller/cheaper. I bet textbook publishers are scared silly about this..
hooray! it's a sex wiki
A robotic scanning machine for books could be very useful for litgation support too. I work for a company that is an out-source firm for law firms and we get a lot of books to copy and scan. Hand place copying is a pain like you wouldn't believe. This machine could end all of that, only if you had a large enough project to justify buying this machine.
But of course, this would also probably raise the cost to the law firms we have as clients, and of course they would charge their clients more.
Now Johnny 5 can scan in all the sticky pr0n on earth!
Ceci n'est pas un post.
For the love of GOD, someone check this!!
blakespot
-- Heisenberg may have slept here.
iPod Hacks.com
A book is essentially a form of encryption. You cannot copy pages from a book into a digital form without using some sort of technological device that breaks this "analog" encryption, which under the DMCA is clearly illegal.
Expect to see these outlawed real soon. Either that, or expect a "Steven King" model to be available this fall.
--
Slashdolt
I think there is a touch of naivete in this notion:
"Think about the power of bringing our library to little schools in the middle of Africa," Keller said. "Would it make a difference for those who now have their minds closed to the idea of democracy?"
I am not sure it would. It might turn them on to the idea of thinking for themselves, though. That could have interesting consequences. Unfortunately, just this very possiblity is threatening to those who are now profiting from their ignorance. These people are likely in a position to be gatekeepers for the dissemination of information.
But, having a robot do something which is enhanced by mindless repetition is a natural robotic application. Then having that application be something that could enable political liberation is a interesting twist of the old "robots in service to humanity" ideals. I'm not so sure that those holding the reins are going to be so interested in this--call me cynical.
What I would like to see is a similar device for converting analog recordings, in whatever form be at tape, vinyl, wax cylinders, to an open digitized format and then have those recording made available in like fashion. It might be just as interesting to turn those kids in Africa on to Mozart, or oral arguments from the Supreme Court.
The article says it would become cost effective for 5.5 million pages. Later it says it costs between $1 - $4 per book in the Far East. So if you estimate a book to have around 300 pages, doing the digitising manually would be $18333-$73333 per 5.5 million pages (ie 5500000/300 multiplied by cost per book). From the way article is written I expected it to cost ALOT more. I guess the proof reading cost for manual conversion could be high?
... was created by a cadre of book turning robots for Sinus0idal... Hmmmm. Naaaah.
We can also put the digitizing, framing, and hosting services together for a prestigious piece of fine art. Imagine a beautifully framed old book that has been digitized for reading on the World Wide Web. The digitized antique book display provides a beautiful piece of art for your office, home or library and testifies to your gift of free reading on the Internet. The digital publishing preserves the unique signs of age of the book and proves that only you own the original book that was made forever visible by all.
I once took part in a project that intended to digitize millions of newspaper clips, some of them copies of more than 125 years old originals.
That was in 1999.
Digitizing was the easy part, actually, since the pages were convenintly in A4 paper, but the OCR, oh mighty Cthulhu! I was a young and inexperienced one in those days, and OCR software really wasn't up to the task (we didn't have the money to proofread all that text).
I don't have to tell you how disappoiting it was trying to index 1.2Gb of garbled text.
I miss being naive. =)
while the books crumble away because they have fallen back into copywrite and some suit with no vision beyond the next quarter refuses to allow his 'property' to be 'stolen'.
It's Christmas everyday with BitTorrent.
Just wait! Soon they'll organize and create a city called 01 and enslave the world!
And then some guy named Keanu will save us! And think he can act too!
They might even call it a Second Renaissance!
Not to long ago I had to do a research paper for a college class. No big deal, I've done many of them, and I was not looking forward to this one. Well, I went to the Houston Public Library in Downtown (which I hadn't been to in many many many , you get the idea, years). I got the library card that gave me access to some computer terminals and computer card catalogue. I was amazed about what they had converted electronically and links to other sites that had dictated material. I was also amazed that I could get all this same access from home using the information printed on the library card. So I go home (I have Road Runner cable modem) and do my research instead of being trapped in the library and get to work. I find electronic format of lots and lots of textbooks, magazines, government docs, and many many more. What put me a notch or two down from my high horse was that I even found that they had radio talk shows transcribed (which I used in my research paper) that helped a lot!
There is a lot of information ALREADY converted from text and audio sources at your fingertips that was unfathomable a few years ago. And all of this is free from the website (and links to other sources) from the public library. Talk about your one stop shop.
"They even use puffs of compressed air to separate sticky pages!"
;)
they are going to need more than compressed air to unstick the Pr0n mags!
C:\earth\humans\del *.m0ronz
<quote> ...
The newly installed robot is finishing two pilot projects, scanning books published by Stanford's Center for the Study of Language and Information and
</quote>
It means they have scanned this this this this and this?
Will they be available online for reading? :)
Using air to separate and move paper is not new. Heidelburg platen presses (you may remember them from high school graphic arts classes) have had this feature for about fifty years.
More Input!
I am the lord of the pun. Dance Knave!
...to build a Tivo for books!
-JDF
The more traditional way to preserve the contents of the old books is to destroy them in the process. Actually cutting the page out of the book lets you get a much higher quality scan because the page is then really truly flat. (Yes, there are correction techniques for turning scans of non-flat pages into flat "projections" but they aren't nearly as good as just ripping the page out and scanning it.)
Words like "Democracy" and "Freedom" is to an American what "Java" and "XML" used to be to a manager. Nowadays I guess it might be "C#" and "Dot NET".
This is not new.
The hardware has been hard at work since the late 70s/early 80s when PDP-8s and PDP-11s were used to control the hardware and store the results.
The first scanners had very small CCD arrays and these had to be pulled across the page horizontally as well as vertically AND it had vacuum "bars" on robot-arm "page turners".
MSBPodcast.com The opinions expressed here are my own. If you don't like 'em... Think up your own stuff.
Once books are digitized and OCR'd they need to be proofread by humans. The people who can afford this machine might do it another way but Project Gutenberg has volunteers at Distributed Proofreaders.
There was a Slashdot Article about it last year but there have been a lot of changes since then (many due to Slashdotters). If you haven't seen the project in a while you should check it out.
Makarand writes "Robotic masturbation systems are the new help available to complete voluminous spewing of ejectus. Robots that can jerk your gherkin and pull your pud while attaining whacking speeds of more than 1000 yanks/minute are now available. They even use puffs of compressed air to simulate the vacuum of blowjobs!"
Ok, I have too much free time.
They even use puffs of compressed air to separate sticky pages!
Sometimes, I need a puff of compressed gas to separate my cheeks...
This has been done before
Make / Model / Size / CRT - LCD ?
Thanks
Johnny 5 is alive!
Anyone who likes to study a subject in depth, and doesn't have access to a major academic library, is going to benefit hugely from this. Among other things, it will greatly facilitate the development of scholars independent of academia (think of Marx sitting in the British Museum).
About age 5, the same time I started reading.
I read all the time (12-16 hours a day sometimes).
Green
Never wear sun glasses, have always tested better than 20/20 in both eyes. I would say I see excellent in low light, but have never been tested...
Envision 19" lcd tv/computer monitor thing
Bright 100% Contrast 100%, I prefer the color setting 'warm' whatever that is.
The NWAA (Novel Writers Artists Association) has issued that they will fight for legistlation to fight this piracy tool in congress.
"These reading Bots will put the book publishing business under within months..", their congress represenative said.
"There hasn't been this strong of an attack against the goodness of books and authors cince that evil man Gutenburgh created that evil printing press." Word on the street is that Hillary Rosen is oging to be hired as their spokesperson to help outlaw this evil that will undermine american life as we know it.
Do not look at laser with remaining good eye.
They even use puffs of compressed air to separate sticky pages!
useful when archiving all those old hustler's...
PC moderators can suck my White pierced, tattooed dick. If you think pride == hate, s/dick/Aryan meat mallet/g.
Cool - it'll have my dead-tree porn collection scanned and uploaded in no time!
Actually it is 1000 pages / hour. But I think the idea is the same.
McCartney fans pay bus tickets. [...] Lennon fans too, with discretion.
Is to just cut the binding off and sheet feed.
As far as page turning goes, there are lots of people willing to do that manually for minimum wage. Probably cheaper, lower maint and less error prone than any robot.
Having traveled in subsaharan Africa a bit, I can safely say that people I met there aren't "closed to the idea of democracy." (They're sometimes consciously "closed" to the idea of allowing mammoth, conscience-free American-based multinational corporations to subvert the democratic institutions they do have, though.)
I bet that was just an isolated quote the reporter chose, though. Seems more like her/his bias than the librarians, at first glance.
"Fundamentalism" isn't about divine morality. It's about human authority.
Analog is subject to degradation everytime it is reproduced. Digital conversion halts the degradation at conversion. Ones are ones and zeroes are zeroes from then on.
The best way to do is to be.
> They even use puffs of compressed air to separate sticky pages!
Oh good, that means these robots can digitize my porn magazine collection!
- For the complete works of Shakespeare: cat
...robots scan YOU 1000 times an hour!
You cannot circumvent the copy "protection" technology of manually turning the pages of a book/newspaper. Shame on you. Officer take the bot away......
We handle safety documentation for a big company (a very big company), and we have to do quarterly updating of some 60 000 documents, plus update new sites as they come in. Suddenly, page-turning ultra-scanners and super-OCR programs look very interesting to me. All our output has to be in PDF, so something like what you described could be very useful.
;)
I can also think of a few non-work uses for the thing, too. Dare I say, avariciously, "I want one!"
I'm not a geek, I'm just a clever script.
Yeah, but if they don't learn to read, they're going to be stuck with the same subsistence agriculture that hasn't worked too fucking well form them recently. That or UN or NGO handouts that only serve to strengthen the oppressive regimes that are torturing these people, because little of the aid that reaches the docks reaches the people thanks to rampant corruption.
Here's the current process:
1. Africa has crappy food production
2. West sends food
3. Food is intercepted by dictator's thugs.
4. Dictator sells food or uses it to extort loyalty
5. Dictator becomes rich and powerful
6. People become dependent upon the west and their dictator for food.
7. People get worse at farming, continue to starve, and dictator becomes yet stronger.
8. Goto 1.
Seems to me that education and empowerment might be part of the way to break that shitty cycle. Keeping people poor and incapable of supporting themselves isn't.
-Looking for a job as a materials chemist or multivariat
"We have hunger and want in the world because evil men use the vehicle of government to deny men that liberty which they need to produce abundantly."
Ezra Taft Benson
Make them free, and they'll bring the food and water into their villages themselves.
You can tell a great deal about the character of a man by observing those who hate him.
Now you'll have librarians running around going Input? Input!!! Need More Input!!!
Here is a movie of the Kirtas machine at work. Interesting technology used here!
I live to gib...
what every happened to the ultrahigh resolution (200-250 dpi) displays which were being talked about a couple of years ago?
For one thing, the operating systems got in the way. Many poorly-written but popular Windows applications assume that your display is 96 dpi and will not react properly to changes in the system DPI setting (in Windows 2000, Control Panel > Display Properties > Settings > Advanced > General > Display > Font Size).
For another thing, it was discovered that LCD panels don't let enough light through per pixel because of the black border between pixels.
Finally, color LCD panels are already 300 DPI horizontally and 100 DPI vertically.
Will I retire or break 10K?
This is truly an amazing technology. I would love to see some government use this technology to create a vast online library. This offers too great an oppurtunity to let excessive property rights get in the way. Set a date for respecting rights (say 20 years or whatever), then up it goes into the library. I would love to see some nation do this. It would be a great gift to humanity.
HenryJamesFeltus.com
2. West sends food
3. Food is intercepted by dictator's thugs.
The humanitarian organizations[1] have started to fly people in who prepare hot meals and serve them directly to needy people. This should make it a lot harder for a warlord to intercept the food, no?
[1] No cannibal jokes please.
Will I retire or break 10K?
well, maybe something like this finally gets the people from nature, science, europhysics journal (successor of several older european journals) etc. to do what they should have done long ago - make all their old issues available electronically and not only the ones from the 90s and later; the AIP did this years ago and it's absolutely awesome to be able to look up stuff in, say, physical review from 1920 or so without having to leave my desk.
Instead of picking the book up and flipping the pages, couldn't you use X-ray tomography (or possibly microwave tomography) to get a 3d image of the book and extract pages from that?
This assumes two things: that the ink makes a difference to X-ray penetration compared to just paper, and that the resolution of the scanner is high enough to pick out individual pages. But typical medical scanners are pretty high-res I think. Has anyone tried this?
-- Ed Avis ed@membled.com
And they can cope with the sticky pages too!
In vino vici
...will be the telephone.
75% of the world's population may finally get telephone access in the 21st century, thanks to the relatively inexpensive infrastructure requirements of cellular phones.
The bicycle, the internal combustion engine, the telephone, the light bulb, the AC generator. 19th century technologies whose impact is yet to be felt in much of the world.
I don't think folks in villages in Africa will be reading about "freedom" on their web browsers any time soon.
"I guess the proof reading cost for manual conversion could be high?"
I thought students were PAYING to do this. Just give them some extra credit for finding mistakes;)
Sdelat' Ameriku velikoy Snova!
You have no chance to survive...Make your time.
-Strungis
Bah. Johnny 5 could do that in ten seconds flat, without OCR errors, and that was back in the nineteen eighties.
I have been thinking a lot about this very subject recently. It appears that there could be a major gap in the 20th century literature collections approaching due to the fact that few people and/or libraries are retaining books that have been published in the years 1920 to 1950. Plus the new copyright restictions may possibly prevent books from this period from ever reaching the public domain. If thousands of titles from this period don't get digitally copied, then they will have disappeared with the disintegration of the physical paper and binding.
For example, I was reading Florence (National Review's former misanthrope) King's book on why white people are so weird and how they got that way ('WASP, Where is Thy Sting?' (out-of-print) ) recently. She makes reference to many books that were bestsellers in the 30's and 40's that strongly influenced 'the greatest generation's' thought patterns. Most of these books that she referred to are simply gone in that they can't be found anywhere; not in libraries or bookstores; only a few copies scattered in private home collections of the elderly and the Library of Congress.
It would have been nice to have been able to download several of these titles in digital form, but they have never been scanned and quite likely never will be scanned.
I've scanned a few of my favorite books and posted them onto Kazaa, but nobody there is interested in reading anything except Fantasy and Science Fiction. In the year or so that I've had titles by Gore Vidal and John Updike available in my P2P directory, they have been accessed only twice.
Besides, scanning and proofreading books is a serious hassle and major undertaking. Even with good quality OCR software and a reasonable fast scanner it will take about 20 to 30 hours to scan and proof an entire book. Walking through the stacks of the local library recently made me realize that it would take a hundred years to scan all of this material. Before then most of the books on the shelves would have been chucked and pulped.
I asked at the local Best Buy store about using a high resolution digital camera on a custom made stand to photograph book pages and feed the data from the camera to an OCR program using USB 2.0. Needless to say, the response that I got was; 'Duh...'. Nevertheless, it still seems like a good idea to speed the process of preserving our culture by transferring books to a more fluid media.
Any thoughts?
Thank you,
Simonetta
Gah, now in addition to downloading 100+ MB amature racing (on real race tracks, not street racing crap) videos, I'll be downloading PDF copies of books in excess of 500MB from stanford, etc.
so when is stanford going to put these books up on the gnutella network? i'd be happy to mirror their collection (or as much as i can) of digitized books on my gnutella node.
moox. for a new generation.
beside small puffs of compressed air? :)
So you're telling me that these things can turn the pages of magazines AND deal with those sticky pages?!? Wow, even better than slideshow mode!
How expensive could these be to make? The mentioned unit sounds big and probably has some interesting features to handle various problems but still I can't see why these should cost more than a few thousand dollars. In college I built a small robot (about the size of a toaster) that could do the same thing. It had arms for popping a book off the stack and pushing it into a new stack when finished. It also had small arms for flattening and turning pages (credits to Real Genius for the concept). The last arm used a handheld scanner to scan the pages into a connected laptop. The whole thing cost a couple hundred dollars to make. I admit that the 'puff of air' solution is a good idea but how much is it worth? Of course my solution required a connected computer to store the data and run it through OCR but even that didn't cost much. These days I'd probably use a laptop with a WiFi card to locally store the data and broadcast it to a nearby computer for OCR processing. Even including the small cluster to make OCR processing fast you could probably keep the cost to $10,000.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
"Automatically separate sticky pages"? Finally I can digitize my Playboy collection!
I think you are having problems, either in understanding these concepts, or in relating your thoughts on them.
Digital, literally (technically), means "represented by digits". What this means to the average person, however, is that an EXACT copy can be made, which will not degrade over time. Some of the actual data may be lost (ie., through disc errors, material degradation), but the data that remains will still be exactly the same data as was there in the beginning. This keeps the signal separated from the noise as much as possible.
In analog formats, any degradation, or noise, becomes part of the original signal. Signal degradation WILL happen during the copying process. the best example this is photocopies, where just a few generations down the line, they become harder to read. This happens because analog, by definition, has MORE static.
Your insistence that words are (by definition) digital, utterly confounds me. You claim that words are digital because they use simple symbols to represent specific concepts. That is ludicrous! Nearly all words in the english language are subject to interpretation, and will different definitions depending on who is defining it.
Once something is in digital format it'll be allot easier to copy, and share on a Napster-a-like.
:)
This could be the new MP3!
...some invents a macrovision-style technology for books? :)
Please note -- parent is most likely a paid endorsement for Terminator 3: Rise of the Machines (coming to a theater near you July 2nd).
I seem to remember a few years back, during a tour of MIT's media lab, a project underway to basically MRI scan a closed book, then 'slice and dice' it page by page via some sophisticated algorithms into seperate files which could then be OCR'ed. The plus to this approach, is that for some books, just opening them would damage them beyond all repair.
I thouhgt it a pretty cool idea. Anyone ever heard of this befoe?
-Chipp
I run a used bookstore. I could spend a little time running books through this thing, and with a little work, read the latest releases on my palm pilot, for reading at the beach.
On the other hand, there's a big difference between reading on my m130 and holding that paperback in my hand.
Visit Lockjaw's Lair. He won't bite.
Right here.
I have never have had that much good experience with OCR, personally. It seem that in the end, I get around 95% accuracy, while the OCR companies claim only 99% or somesuch.
Even assume that 99% accuracy is achievable, that's still 1% error, which means about 50 words would be OCR'd incorrectly out of every 1000 or so. (assuming 5 letters per word average)
That's a LOT of errors!
The problem is, for stuff like Mark Twain, who *intentionally* mis-spell stuff, or write out things phonetically (anybody read Huck Finn?), you won't know if it's an error because of the OCR or the author's original intent.
A typo was made that said spinach had 10x iron content than it really had. This became a undisputed fact for decades (and people are still going strong on it). So even though Mark Twain's works have millions of copies in circulation, I'd bet any errors contained within the digital versions won't surface for a long time, if ever. And this is Mark Twain we're talking about here. How about authors who don't get so much coverage?
So, if possible, can you enlighten us on some techniques that you guys are using to ensure that the digital replications are, well, CORRECT? I am sure it's impractical for human proofreaders...
My life in the land of the rising sun.
Using this could help toward that goal.
"while the books crumble away because they have fallen back into copywrite and some suit with no vision beyond the next quarter refuses to allow his 'property' to be 'stolen'."
And the moral of the story is...Don't steal!
Can't 'worry' about what 'people' don't 'do'.
Of course, based on the number of postings from Anonymous Coward you can see I have waaaaaay too much free time. Me, Betty Crocker, Mrs. Paul, Ronald McDonald, John Doe, and Mr. Goodwrench are busy people.
How long will it be before this creation of man learns all there is to learn in the library, gaining consiousness, then trying to take over the library by integrating into it's being it's creator, or any available hot chicks?
The Bureau of the Census's R&D department built a page-turning machine to handle Census forms in booklet form. All the paper manipulation was done using wheels and belts with a partial vacuum behind them. The machine rolled a wheel over the booklet, winding one page part way around the wheel. A conveyer belt moved the entire booklet. The pages were microfilmed with an overhead camera.
In a later step, the microfilms were processed into a computer, using a machine called FOSDIC, built out of surplus UNIVAC 1105 parts. This machine just detected dots in circles, as on test forms. Output was a magnetic tape.
That's how 1970 census data was processed.
I can only imagine, one day all the knowledge scanned from the books would be imprinted into our DNA, so we don't have to waste the first 20 years of our life re-learn the past.
I mean these robots sound pretty incredible, but the question of democracy will arise.