Distributed Proofreaders Posts 5,000th E-book

← Back to Stories (view on slashdot.org)

Distributed Proofreaders Posts 5,000th E-book

Posted by timothy on Tuesday August 24, 2004 @06:41PM from the error-checking-and-correcting dept.

bbc writes "Distributed Proofreaders has posted its 5,000th ebook to Project Gutenberg. The book, a Short Biographical Dictionary of English Literature, by John W. Cousin, was proofed for this special occasion by over 500 volunteers. Distributed Proofreaders is a project that distributes the otherwise gargantuan task of correcting scanning and recognition errors in an OCR'ed text. The project has thousands of volunteers, of which many hundreds are active on any given day. It is currently the main supplier of etexts for Project Gutenberg."

12 of 144 comments (clear)

Min score:

Reason:

Sort:

Exxcelent Werk by andrewa · 2004-08-24 18:43 · Score: 5, Funny

I am prowd to bee won off thows prewf reeders

--
:(){ :|:& };:
1. Re:Exxcelent Werk by eingram · 2004-08-24 18:56 · Score: 5, Funny
  
  Well, I ran your comment through Word's spelling and grammar checker, took the first suggestions, and cleaned it up for you.
  I am prow to bee won off thaws prow readers.
  I say we get rid of the volunteers, Word does a great job!
I need a new job by jamoan · 2004-08-24 18:46 · Score: 5, Funny

Wear can I apply? i have excellent grammer skills.
1. Re:I need a new job by jonathan_ingram · 2004-08-24 20:07 · Score: 5, Interesting
  
  Luckily, you do not need either grammar or spelling skills -- just the ability to match text against a source image. Indeed, it may even be an *advantage* to not be a great linguist! One of the key things we emphasise is that we want an exact copy of the source material -- we do not want people 'correcting' or 'updating' the originals to bring them into line with the way the language is written today.
  
  --
  -- Help Digitise the Public Domain at DP.
Re:500 people read it? by wolfdvh · 2004-08-24 19:09 · Score: 5, Insightful

I like Gutenberg, I hope they start a system where you can download copyright books for a micropayment, I would pay good money for text ebooks.
Rather than setting up a complicated system to make micro-payments that only some people would follow anyway, do what I do, determine a fair value for youself and make a donation. Not for one book, but estimate a year or two worth so you don't 'nickel and dime' the value of you donation with transaction fees.
Re:A shame by MikeCapone · 2004-08-24 19:17 · Score: 5, Insightful

I just don't understand the point of retroactive copyright extensions. The idea behind copyrights, like patents, is to encourage innovation by allowing the creator an exclusive right for a limited time. If people believe copyright terms need to be extended to achieve this goal, fine. I disagree, but whatever. However, I think it's ludicrous that terms should be extended on works that have already been created, unless maybe they think that extending terms retroactively will lead to more works being produced in the past?

There's nothing to understand. Everything's about money now. Nobody cares about books, art or people. If you can make money - especially on the work of authors usually living near poverty - long after they are dead, then you are the winner of this big capitalistic orgy!

--
Treehugger? Treehugger... Treehugger!
Re:law of averages? by jonathan_ingram · 2004-08-24 19:53 · Score: 5, Informative

However, I am curious as to just how accurate the proofreading is.

The answer is: surprisingly accurate. We proof one page at a time, working from the original scanned images, and emphasise that people should try as hard as they can to stick to the source material. As counter-intuitive as it may appear, this type of proofreading is actually hardest to do with material from the late 18th/19th century -- subtle changes in spelling (and small changes in accent systems for the non-English languages) make errors much harder for human proofreaders to correct than the earlier material, where spelling consistency was completely optional!
Each page is OCRed (and the ability of modern OCR programs is a major improvement over those of even a couple of years ago), proofread twice, and then the whole document is reviewed twice before being posted. We've also recently become much more aware of the need to make useful texts which can be used for scholarly purposes in the future, leading to such improvements as retention of all page numbers.

--
-- Help Digitise the Public Domain at DP.
Re:Hm! by jonathan_ingram · 2004-08-24 20:02 · Score: 5, Informative

It's an interesting idea, but at the moment we're concentrating on providing proofreading services for Project Gutenberg. Every book which goes through the site has been scanned by one of our unpaid volunteers (except for those which have been, to use a slightly emotive term, 'raided' from sites that provide page images) -- and we already have enough books in our queue to keep us going for a year, even if we all stopped scanning immediately!
Also, we are very comfortable with being a provider of *public domain* material, and I think many members wouldn't feel comfortable moving into the copy-restricted domain.

--
-- Help Digitise the Public Domain at DP.
Re:What about 5001? by jonathan_ingram · 2004-08-24 20:13 · Score: 5, Funny

The next book won't yield a news item, but is no less important. You are very welcome to join us, and help us proof all the books which will also provoke no news items until text 10,000 comes along -- which you can also complain about :).

--
-- Help Digitise the Public Domain at DP.
Re:law of averages? by littlem · 2004-08-24 20:26 · Score: 5, Insightful

We've also recently become much more aware of the need to make useful texts which can be used for scholarly purposes in the future, leading to such improvements as retention of all page numbers.

At the risk of going over very old and well-trodden ground, if PG wanted to be useful for "scholarly purposes" it should long ago have corrected the original mistake of using plain text, and used a markup that could have kept page numbers and other meta-information for scholars, while giving the common reader a clean text with a suitable style sheet. But even today on the PG website is a "justification" for sticking to plain text making it clear that scholars don't even figure in the intended audience for PG texts.
from the error-checking-and-correcting dept. by GothChip · 2004-08-24 21:45 · Score: 5, Funny

I didn't realise this department existed at Slashdot.
Request for MATH experts by jhutch2000 · 2004-08-25 00:52 · Score: 5, Interesting

Right now, we've got plenty of old math intensive books ready to move through the DP system. Because of ASCII terrible ability to handle equation formatting, we use TeX layout. The average DPer doesn't know TeX and it's a rather high learning curve to get started on. So, since Slashdot is full of self-professed geeks...all you TeX geeks should join up and help with the TeX formatted MATH texts. I've got plenty of books scanned and ready to go, so don't think you'll run us out of 'em any time soon!

JHutch