Distributed Proofreaders Posts 5,000th E-book
bbc writes "Distributed Proofreaders has posted its 5,000th ebook to Project Gutenberg. The book, a Short Biographical Dictionary of English Literature, by John W. Cousin, was proofed for this special occasion by over 500 volunteers.
Distributed Proofreaders is a project that distributes the otherwise gargantuan task of correcting scanning and recognition errors in an OCR'ed text. The project has thousands of volunteers, of which many hundreds are active on any given day. It is currently the main supplier of etexts for Project Gutenberg."
They should offer their services to authors and magazines, and raise some money from what they do. It wouldn't be enough to split between the involved proof readers I guess, but the project itself could get some money to buy...well, whatever they might need. Perhaps they already do this, I'm too lazy to find out :-)
Martin
The book, a Short Biographical Dictionary of English Literature, by John W. Cousin, was proofed for this special occasion by over 500 volunteers.
Hardly a non-put-downable... I suppose that is is a Biography (Shouldn't that be bibliography *chuckle*) of english literature is kinda symbolic.
I guess this more than doubles the total number of people who have read this book though!
I like Gutenberg, I hope they start a system where you can download copyright books for a micropayment, I would pay good money for text ebooks.
Lets hope ebooks don't go the way of music, keep the costs low, no DRM fluffing up the download. If you can click 3 times and start reading a new book, and it costs you euro's then you would preffer that than d/l gigs of warez.
Anyone who illegally downloads lots of books, tends to be the person who does't read them much anyway (Someone boasted to me that they had 300 O'Reilly books, squirming under the desire to tell me that they were eBooks, off irc, oh lawks, what a riot, I wish I was your friend, go away)
#hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
Still, I look forward to the day when someone starts digitizing the Mechanics Institute Library in San Francisco. It's a beautiful private library one can join. The books are in excellent condition, and there are century old original editions on the shelves.
But it's the magazine collection that's stunning. They have Popular Mechanics in bound volumes, all the way back to the beginning, when it was a serious scientific journal. All the major railroad magazines from the heyday of railroading. Every issue of Electric Railway Journal (the trade magazine of streetcars). Few other libraries kept that stuff.
All in all, I have to say that I think this project is better than nothing at all. I am sure that the proofreading is better than what was there before.
However, I am curious as to just how accurate the proofreading is. I think that they try to improve accuracy by having many different volunteers; accuracy in numbers and all that. However, just because many people think in a certain way, does not mean that what they think is accurate. Just look at standardized tests. They are specifically designed to make use of common mistakes, so that the majority (the swell of the bell curve) all get the wrong answer together. Only a slim minority will get all the questions correct. Considering how many people (even educated people), get around average on even the verbal and English sections of such tests as the SAT, GRE, etc., I wonder if certain passages in books will be incorrectly edited on a mass scale. This would especially be true for older or more complex works.
http://www.icarusindie.com/Literature/Library/
That site has a couple of good ones. You should read first "The lost continent". The book was written shortly after, or during WWI and follows a hypotetical developement of the world if the new world and the old world had lost comunication until 200 years later. The most interest thing about those old science fiction books is to contrast their world view with ours and to see what futuristic devices would exist by now.
Cheers,
Adolfo
Luckily, you do not need either grammar or spelling skills -- just the ability to match text against a source image. Indeed, it may even be an *advantage* to not be a great linguist! One of the key things we emphasise is that we want an exact copy of the source material -- we do not want people 'correcting' or 'updating' the originals to bring them into line with the way the language is written today.
-- Help Digitise the Public Domain at DP.
Lawrence Lessig proposes a similar scheme in "The Future of Ideas". I doubt he was the first, but that's just what you made me think of. It's a good book, even though it can get kind of dry at times (it is, at least in some capacity, a book about law after all).
As far as your scheme though, I would really like a hard extension limit and I think 25 years for a default term is really too much (I mean, to use your example of Apple II games, many of those games wouldn't even quite be out of term yet). I think 5 or 10 would be much better.
Very true, although several of us do keep talking about searching for some Victorian Porn to put through the site
-- Help Digitise the Public Domain at DP.
``You get automatic copyright for 25 years. After that, you must pay $1 per year to keep something in copyright. If you can't be bothered to keep track of your stuff and pay the $1, it lapses into the public domain.''
;-)
I would even go a bit further. Why even have a default term at all? (and 25 years is a LONG time) And $1 is arguably a bit little. If you really care, you can pay a bit more. Maybe we can even have different levels of protection - pay nothing if you allow modifications, pay more to retain exclusive rights to distribution, etc.
I think this is an interesting idea worth investigating. Thank you for publishing it!
Oh, and BTW, I will be using your idea as if it were mine, unless you pay your $1, of course.
Please correct me if I got my facts wrong.
I think the Gutenburg project is a terrific idea!
My only complaint is with the formatting. Project Gutenburg uses hard formatting within the text. I think that's an extremely stupid idea.
There should be zero formatting within the text (other than paragraph breaks). Whatever client you're using should provide the formatting for you.
Let the client handle the presentation!!
This is all my fault! :-(
I got a bit carried away. This 5000th project was organized so that as much proofreaders as possible would work on it. (Although any book going through DP runs a chance of being proofread by many separate people, usually proofreaders stick with a certain book for a while, so that the work has only been seen by 50 or so.) I was so glad we pulled it off, that I sent a story to Slashdot without thinking.
Right now, we've got plenty of old math intensive books ready to move through the DP system. Because of ASCII terrible ability to handle equation formatting, we use TeX layout. The average DPer doesn't know TeX and it's a rather high learning curve to get started on. So, since Slashdot is full of self-professed geeks...all you TeX geeks should join up and help with the TeX formatted MATH texts. I've got plenty of books scanned and ready to go, so don't think you'll run us out of 'em any time soon!
JHutch
One of the books I worked on was the "Anatomy of Melancholy" and I (conveniently) have a copy myself. There were often more differences between the scanned image of the page and my copy than between the scanned image and the proofread text.
Don't underestimate the amount of work people put into this too - for "Anatomy of Melancholy" it often took 30 minutes to proof a single page because the page often had latin and very small footnotes.
Yeah! I'm one of the "several" that Jon's referring to. I got a real kick out of recent book that was posted by us to PG...
Sane Sex Life and Sane Sex Living
For a turn of the century study of sex (published 1919), this guy was amazingly (IMHO) progressive! A very fun read! JHutchSome of those are legit too. Professional/Professor reading gets shortened to profreading. The other mistakes are mostly users.
-Dizzle
"I most likely AM so interested in myself."