Distributed Proofreaders Posts 5,000th E-book
bbc writes "Distributed Proofreaders has posted its 5,000th ebook to Project Gutenberg. The book, a Short Biographical Dictionary of English Literature, by John W. Cousin, was proofed for this special occasion by over 500 volunteers.
Distributed Proofreaders is a project that distributes the otherwise gargantuan task of correcting scanning and recognition errors in an OCR'ed text. The project has thousands of volunteers, of which many hundreds are active on any given day. It is currently the main supplier of etexts for Project Gutenberg."
The answer is: surprisingly accurate. We proof one page at a time, working from the original scanned images, and emphasise that people should try as hard as they can to stick to the source material. As counter-intuitive as it may appear, this type of proofreading is actually hardest to do with material from the late 18th/19th century -- subtle changes in spelling (and small changes in accent systems for the non-English languages) make errors much harder for human proofreaders to correct than the earlier material, where spelling consistency was completely optional!
Each page is OCRed (and the ability of modern OCR programs is a major improvement over those of even a couple of years ago), proofread twice, and then the whole document is reviewed twice before being posted. We've also recently become much more aware of the need to make useful texts which can be used for scholarly purposes in the future, leading to such improvements as retention of all page numbers.
-- Help Digitise the Public Domain at DP.
Also, we are very comfortable with being a provider of *public domain* material, and I think many members wouldn't feel comfortable moving into the copy-restricted domain.
-- Help Digitise the Public Domain at DP.