Distributed Proofreaders Posts 5,000th E-book

← Back to Stories (view on slashdot.org)

Distributed Proofreaders Posts 5,000th E-book

Posted by timothy on Tuesday August 24, 2004 @06:41PM from the error-checking-and-correcting dept.

bbc writes "Distributed Proofreaders has posted its 5,000th ebook to Project Gutenberg. The book, a Short Biographical Dictionary of English Literature, by John W. Cousin, was proofed for this special occasion by over 500 volunteers. Distributed Proofreaders is a project that distributes the otherwise gargantuan task of correcting scanning and recognition errors in an OCR'ed text. The project has thousands of volunteers, of which many hundreds are active on any given day. It is currently the main supplier of etexts for Project Gutenberg."

22 of 144 comments (clear)

Min score:

Reason:

Sort:

Exxcelent Werk by andrewa · 2004-08-24 18:43 · Score: 5, Funny

I am prowd to bee won off thows prewf reeders

--
:(){ :|:& };:
1. Re:Exxcelent Werk by eingram · 2004-08-24 18:56 · Score: 5, Funny
  
  Well, I ran your comment through Word's spelling and grammar checker, took the first suggestions, and cleaned it up for you.
  I am prow to bee won off thaws prow readers.
  I say we get rid of the volunteers, Word does a great job!
So.... by TheRedHorse · 2004-08-24 18:44 · Score: 4, Funny

....I guess the slashdot editors aren't members?
Wonderful by Chasuk · 2004-08-24 18:45 · Score: 4, Informative

As I get older, reading texts on-screen gets easier. My vision is still 20/20, but I now require reading glasses, which are generally out of reach when I need them. Project Gutenberg has come in as a real lifesaver (well, sanity-saver) now that I'm turning into a geezer. That, and the price is perfect!

--
Neopets - the best free game on the Int
Hm! by martingunnarsson · 2004-08-24 18:46 · Score: 4, Interesting

They should offer their services to authors and magazines, and raise some money from what they do. It wouldn't be enough to split between the involved proof readers I guess, but the project itself could get some money to buy...well, whatever they might need. Perhaps they already do this, I'm too lazy to find out :-)

--
Martin
1. Re:Hm! by jonathan_ingram · 2004-08-24 20:02 · Score: 5, Informative
  
  It's an interesting idea, but at the moment we're concentrating on providing proofreading services for Project Gutenberg. Every book which goes through the site has been scanned by one of our unpaid volunteers (except for those which have been, to use a slightly emotive term, 'raided' from sites that provide page images) -- and we already have enough books in our queue to keep us going for a year, even if we all stopped scanning immediately!
  Also, we are very comfortable with being a provider of *public domain* material, and I think many members wouldn't feel comfortable moving into the copy-restricted domain.
  
  --
  -- Help Digitise the Public Domain at DP.
I need a new job by jamoan · 2004-08-24 18:46 · Score: 5, Funny

Wear can I apply? i have excellent grammer skills.
1. Re:I need a new job by jonathan_ingram · 2004-08-24 20:07 · Score: 5, Interesting
  
  Luckily, you do not need either grammar or spelling skills -- just the ability to match text against a source image. Indeed, it may even be an *advantage* to not be a great linguist! One of the key things we emphasise is that we want an exact copy of the source material -- we do not want people 'correcting' or 'updating' the originals to bring them into line with the way the language is written today.
  
  --
  -- Help Digitise the Public Domain at DP.
Is it possible... by cujo_1111 · 2004-08-24 19:03 · Score: 4, Funny

...that a million net monkeys can fix the complete works of Shakespeare so that they language is spoken the correct way?

Instead of 'WHat light through yonder windows breaks?' we get 'Who is that hot chick I can see through my binoculars?'

--
If I point out that you are incorrect, making me a foe does not make you any more correct.
Re:500 people read it? by wolfdvh · 2004-08-24 19:09 · Score: 5, Insightful

I like Gutenberg, I hope they start a system where you can download copyright books for a micropayment, I would pay good money for text ebooks.
Rather than setting up a complicated system to make micro-payments that only some people would follow anyway, do what I do, determine a fair value for youself and make a donation. Not for one book, but estimate a year or two worth so you don't 'nickel and dime' the value of you donation with transaction fees.
A shame by iamdrscience · 2004-08-24 19:10 · Score: 4, Insightful

I think it's really a shame that current copyright laws (and retroactive extensions) have limited project Gutenberg to texts from a little after the turn of the century and before.

I just don't understand the point of retroactive copyright extensions. The idea behind copyrights, like patents, is to encourage innovation by allowing the creator an exclusive right for a limited time. If people believe copyright terms need to be extended to achieve this goal, fine. I disagree, but whatever. However, I think it's ludicrous that terms should be extended on works that have already been created, unless maybe they think that extending terms retroactively will lead to more works being produced in the past?
1. Re:A shame by MikeCapone · 2004-08-24 19:17 · Score: 5, Insightful
  
  I just don't understand the point of retroactive copyright extensions. The idea behind copyrights, like patents, is to encourage innovation by allowing the creator an exclusive right for a limited time. If people believe copyright terms need to be extended to achieve this goal, fine. I disagree, but whatever. However, I think it's ludicrous that terms should be extended on works that have already been created, unless maybe they think that extending terms retroactively will lead to more works being produced in the past?
  
  There's nothing to understand. Everything's about money now. Nobody cares about books, art or people. If you can make money - especially on the work of authors usually living near poverty - long after they are dead, then you are the winner of this big capitalistic orgy!
  
  --
  Treehugger? Treehugger... Treehugger!
Rsync your own Gutenberg library by gtoomey · 2004-08-24 19:47 · Score: 4, Informative

You can rsync your own copy of the Gutenberg library. I used the Aarnet mirror as its closest to me and fast.
Just be aware that the Gutenberg is some 135GB, and much of it is gif jpg and mp3 (spoken work books). So i just used --include in rsync to download the .txt .htm and .html files. Its a more manageable 10GB download.
Re:law of averages? by jonathan_ingram · 2004-08-24 19:53 · Score: 5, Informative

However, I am curious as to just how accurate the proofreading is.

The answer is: surprisingly accurate. We proof one page at a time, working from the original scanned images, and emphasise that people should try as hard as they can to stick to the source material. As counter-intuitive as it may appear, this type of proofreading is actually hardest to do with material from the late 18th/19th century -- subtle changes in spelling (and small changes in accent systems for the non-English languages) make errors much harder for human proofreaders to correct than the earlier material, where spelling consistency was completely optional!
Each page is OCRed (and the ability of modern OCR programs is a major improvement over those of even a couple of years ago), proofread twice, and then the whole document is reviewed twice before being posted. We've also recently become much more aware of the need to make useful texts which can be used for scholarly purposes in the future, leading to such improvements as retention of all page numbers.

--
-- Help Digitise the Public Domain at DP.
Make them renew each year by Anonymous Coward · 2004-08-24 19:57 · Score: 4, Insightful

It's so Disney can keep milking Mickey Mouse.

Here's what I want to see:

You get automatic copyright for 25 years. After that, you must pay $1 per year to keep something in copyright. If you can't be bothered to keep track of your stuff and pay the $1, it lapses into the public domain.

Disney will pay the $1 for Mickey ($1 for Steamboat Willy, $1 for each other cartoon, $1 for each book, etc.). But forgotten gems, like ancient Apple ][ games, will become legal public domain items.

I'd actually like to see a hard limit of 50 years or so for copyright, but even if you can't get that, at least the above scheme makes alot of stuff lapse into the public domain.

A cool feature: if the legal trail is tangled and murky, and no one knows who owns it anymore, no one will pay the $1 and it will fall into public domain. Let's say LSD Software wrote a fun game for the Commodore 64. Then ABC Games bought the game from LSD (who kept the rights to use the music in future games). Then ABC Games went under, but its assets were bought by PDQ Games, which later split into PDQ Software and Foo Bar Games. After that it gets REALLY complicated... anyway, after all that, who exactly owns that fun game? No one knows. It would take a court case to decide, but no one will bother so no one will ever know. Under the current system, you are technically a pirate if you keep the game, but there is no one you can pay a license fee and legally have the game! Catch-22.

Heck, Disney should want this. They make big bucks by Disney-ifying public domain stuff, so they should make sure things will actually go into the public domain in the future.
Re:good books? by jonathan_ingram · 2004-08-24 19:58 · Score: 4, Informative

There are many sites which have taken some of the more popular works from Project Gutenberg, and put a more user-friendly directory style front end to them. One of the best is Blackmask.com, which also contains works from non-Gutenberg free book providers. There are 312 works in the 'Science Fiction' section alone.

--
-- Help Digitise the Public Domain at DP.
Re:because by jonathan_ingram · 2004-08-24 20:09 · Score: 4, Interesting

because playboy hasnt lapsed into the public domain yet...

Very true, although several of us do keep talking about searching for some Victorian Porn to put through the site :). There's actually quite a lot of public domain 'erotica' (anything written and published before 1923, for example) -- we just need people to scan it and contribute it to the site! We've had a couple of 'racy' books, and not surprisingly they tend to be proofed very quickly.

--
-- Help Digitise the Public Domain at DP.
Re:What about 5001? by jonathan_ingram · 2004-08-24 20:13 · Score: 5, Funny

The next book won't yield a news item, but is no less important. You are very welcome to join us, and help us proof all the books which will also provoke no news items until text 10,000 comes along -- which you can also complain about :).

--
-- Help Digitise the Public Domain at DP.
Re:law of averages? by littlem · 2004-08-24 20:26 · Score: 5, Insightful

We've also recently become much more aware of the need to make useful texts which can be used for scholarly purposes in the future, leading to such improvements as retention of all page numbers.

At the risk of going over very old and well-trodden ground, if PG wanted to be useful for "scholarly purposes" it should long ago have corrected the original mistake of using plain text, and used a markup that could have kept page numbers and other meta-information for scholars, while giving the common reader a clean text with a suitable style sheet. But even today on the PG website is a "justification" for sticking to plain text making it clear that scholars don't even figure in the intended audience for PG texts.
Re:How strange by jonathan_ingram · 2004-08-24 20:48 · Score: 4, Informative

I'll let you in on a secret -- this isn't really our 5000th book! Some larger works are split into multiple projects, so while this is our 5000th *project*, it's around 10% off being our 5000th *book*. The text we chose for *this* 5000 was supposed to be appropriate for an internal celebration, rather than one which would be announced to the world -- it's a great example of the sort of text which would be very unlikely to get into PG if DP didn't exist, and it gives us useful biographical information to use in the 'blurb' for future projects. It's hard to stop people from submitting stories to Slashdot, though :).

--
-- Help Digitise the Public Domain at DP.
from the error-checking-and-correcting dept. by GothChip · 2004-08-24 21:45 · Score: 5, Funny

I didn't realise this department existed at Slashdot.
Request for MATH experts by jhutch2000 · 2004-08-25 00:52 · Score: 5, Interesting

Right now, we've got plenty of old math intensive books ready to move through the DP system. Because of ASCII terrible ability to handle equation formatting, we use TeX layout. The average DPer doesn't know TeX and it's a rather high learning curve to get started on. So, since Slashdot is full of self-professed geeks...all you TeX geeks should join up and help with the TeX formatted MATH texts. I've got plenty of books scanned and ready to go, so don't think you'll run us out of 'em any time soon!

JHutch