Distributed Proofreaders Posts 5,000th E-book

← Back to Stories (view on slashdot.org)

Distributed Proofreaders Posts 5,000th E-book

Posted by timothy on Tuesday August 24, 2004 @06:41PM from the error-checking-and-correcting dept.

bbc writes "Distributed Proofreaders has posted its 5,000th ebook to Project Gutenberg. The book, a Short Biographical Dictionary of English Literature, by John W. Cousin, was proofed for this special occasion by over 500 volunteers. Distributed Proofreaders is a project that distributes the otherwise gargantuan task of correcting scanning and recognition errors in an OCR'ed text. The project has thousands of volunteers, of which many hundreds are active on any given day. It is currently the main supplier of etexts for Project Gutenberg."

10 of 144 comments (clear)

Min score:

Reason:

Sort:

Hm! by martingunnarsson · 2004-08-24 18:46 · Score: 4, Interesting

They should offer their services to authors and magazines, and raise some money from what they do. It wouldn't be enough to split between the involved proof readers I guess, but the project itself could get some money to buy...well, whatever they might need. Perhaps they already do this, I'm too lazy to find out :-)

--
Martin
500 people read it? by tod_miller · 2004-08-24 18:52 · Score: 3, Interesting

The book, a Short Biographical Dictionary of English Literature, by John W. Cousin, was proofed for this special occasion by over 500 volunteers.

Hardly a non-put-downable... I suppose that is is a Biography (Shouldn't that be bibliography *chuckle*) of english literature is kinda symbolic.

I guess this more than doubles the total number of people who have read this book though!

I like Gutenberg, I hope they start a system where you can download copyright books for a micropayment, I would pay good money for text ebooks.

Lets hope ebooks don't go the way of music, keep the costs low, no DRM fluffing up the download. If you can click 3 times and start reading a new book, and it costs you euro's then you would preffer that than d/l gigs of warez.

Anyone who illegally downloads lots of books, tends to be the person who does't read them much anyway (Someone boasted to me that they had 300 O'Reilly books, squirming under the desire to tell me that they were eBooks, off irc, oh lawks, what a riot, I wish I was your friend, go away)

--
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
Who picks this stuff? by Animats · 2004-08-24 19:16 · Score: 3, Interesting

"Final Report of the Louisiana Purchase Exposition Commission"?
Still, I look forward to the day when someone starts digitizing the Mechanics Institute Library in San Francisco. It's a beautiful private library one can join. The books are in excellent condition, and there are century old original editions on the shelves.
But it's the magazine collection that's stunning. They have Popular Mechanics in bound volumes, all the way back to the beginning, when it was a serious scientific journal. All the major railroad magazines from the heyday of railroading. Every issue of Electric Railway Journal (the trade magazine of streetcars). Few other libraries kept that stuff.
Re:good books? by adolfojp · 2004-08-24 19:44 · Score: 3, Interesting

http://www.icarusindie.com/Literature/Library/

That site has a couple of good ones. You should read first "The lost continent". The book was written shortly after, or during WWI and follows a hypotetical developement of the world if the new world and the old world had lost comunication until 200 years later. The most interest thing about those old science fiction books is to contrast their world view with ours and to see what futuristic devices would exist by now.

Cheers,

Adolfo
Re:I need a new job by jonathan_ingram · 2004-08-24 20:07 · Score: 5, Interesting

Luckily, you do not need either grammar or spelling skills -- just the ability to match text against a source image. Indeed, it may even be an *advantage* to not be a great linguist! One of the key things we emphasise is that we want an exact copy of the source material -- we do not want people 'correcting' or 'updating' the originals to bring them into line with the way the language is written today.

--
-- Help Digitise the Public Domain at DP.
Re:Make them renew each year by iamdrscience · 2004-08-24 20:07 · Score: 3, Interesting

Lawrence Lessig proposes a similar scheme in "The Future of Ideas". I doubt he was the first, but that's just what you made me think of. It's a good book, even though it can get kind of dry at times (it is, at least in some capacity, a book about law after all).

As far as your scheme though, I would really like a hard extension limit and I think 25 years for a default term is really too much (I mean, to use your example of Apple II games, many of those games wouldn't even quite be out of term yet). I think 5 or 10 would be much better.
Re:because by jonathan_ingram · 2004-08-24 20:09 · Score: 4, Interesting

because playboy hasnt lapsed into the public domain yet...

Very true, although several of us do keep talking about searching for some Victorian Porn to put through the site :). There's actually quite a lot of public domain 'erotica' (anything written and published before 1923, for example) -- we just need people to scan it and contribute it to the site! We've had a couple of 'racy' books, and not surprisingly they tend to be proofed very quickly.

--
-- Help Digitise the Public Domain at DP.
formatting by golgotha007 · 2004-08-24 21:16 · Score: 3, Interesting

I think the Gutenburg project is a terrific idea!

My only complaint is with the formatting. Project Gutenburg uses hard formatting within the text. I think that's an extremely stupid idea.

There should be zero formatting within the text (other than paragraph breaks). Whatever client you're using should provide the formatting for you.

Let the client handle the presentation!!
Request for MATH experts by jhutch2000 · 2004-08-25 00:52 · Score: 5, Interesting

Right now, we've got plenty of old math intensive books ready to move through the DP system. Because of ASCII terrible ability to handle equation formatting, we use TeX layout. The average DPer doesn't know TeX and it's a rather high learning curve to get started on. So, since Slashdot is full of self-professed geeks...all you TeX geeks should join up and help with the TeX formatted MATH texts. I've got plenty of books scanned and ready to go, so don't think you'll run us out of 'em any time soon!

JHutch
Accuracy by jefu · 2004-08-25 01:45 · Score: 3, Interesting

I have worked on the distributed proofing of a couple of texts and found that the accuracy of a page after the second proofing was often close to perfect.
One of the books I worked on was the "Anatomy of Melancholy" and I (conveniently) have a copy myself. There were often more differences between the scanned image of the page and my copy than between the scanned image and the proofread text.
Don't underestimate the amount of work people put into this too - for "Anatomy of Melancholy" it often took 30 minutes to proof a single page because the page often had latin and very small footnotes.