Slashdot Mirror


Distributed Proofreaders Posts 5,000th E-book

bbc writes "Distributed Proofreaders has posted its 5,000th ebook to Project Gutenberg. The book, a Short Biographical Dictionary of English Literature, by John W. Cousin, was proofed for this special occasion by over 500 volunteers. Distributed Proofreaders is a project that distributes the otherwise gargantuan task of correcting scanning and recognition errors in an OCR'ed text. The project has thousands of volunteers, of which many hundreds are active on any given day. It is currently the main supplier of etexts for Project Gutenberg."

144 comments

  1. Exxcelent Werk by andrewa · · Score: 5, Funny

    I am prowd to bee won off thows prewf reeders

    --
    :(){ :|:& };:
    1. Re:Exxcelent Werk by eingram · · Score: 5, Funny
      Well, I ran your comment through Word's spelling and grammar checker, took the first suggestions, and cleaned it up for you.
      I am prow to bee won off thaws prow readers.
      I say we get rid of the volunteers, Word does a great job!
    2. Re:Exxcelent Werk by craigmarshall · · Score: 1

      I think ewe've spelled "Excellent" wrong, idiut.

  2. So.... by TheRedHorse · · Score: 4, Funny

    ....I guess the slashdot editors aren't members?

  3. Wonderful by Chasuk · · Score: 4, Informative

    As I get older, reading texts on-screen gets easier. My vision is still 20/20, but I now require reading glasses, which are generally out of reach when I need them. Project Gutenberg has come in as a real lifesaver (well, sanity-saver) now that I'm turning into a geezer. That, and the price is perfect!

    1. Re:Wonderful by Anonymous Coward · · Score: 3, Informative

      The price may be right, but donating is good too.

  4. Hm! by martingunnarsson · · Score: 4, Interesting

    They should offer their services to authors and magazines, and raise some money from what they do. It wouldn't be enough to split between the involved proof readers I guess, but the project itself could get some money to buy...well, whatever they might need. Perhaps they already do this, I'm too lazy to find out :-)

    --
    Martin
    1. Re:Hm! by FlipmodePlaya · · Score: 2, Interesting

      Sounds like a cool idea, and I'm not sure if they've done this either. I know that if I were sending a magazine out to a ~million readers, I would place great stock in my editing. The Distributed Proofreaders project probably wouldn't want to be held liable for the mistakes of volunteers, especially with the possibility of trolls.

    2. Re:Hm! by jonathan_ingram · · Score: 5, Informative
      It's an interesting idea, but at the moment we're concentrating on providing proofreading services for Project Gutenberg. Every book which goes through the site has been scanned by one of our unpaid volunteers (except for those which have been, to use a slightly emotive term, 'raided' from sites that provide page images) -- and we already have enough books in our queue to keep us going for a year, even if we all stopped scanning immediately!

      Also, we are very comfortable with being a provider of *public domain* material, and I think many members wouldn't feel comfortable moving into the copy-restricted domain.

    3. Re:Hm! by Anonymous Coward · · Score: 1, Insightful

      Offer their services? It's 99% volunteer work. Why would someone volunteer to proofread some magazine? Gutenberg works because the books that it generates are for non-commercial/academic use - that's why volunteers feel they're doing something good when they're contributing.

    4. Re:Hm! by Anonymous Coward · · Score: 0

      Why would someone volunteer to proofread some magazine?

      Oh, honey, come and see -- we've proofread hundreds of thousands of pages of magazines -- just not current ones.

    5. Re:Hm! by baegucb_18706 · · Score: 2, Interesting

      Australia has a somewhat more favorable copywrite laws. Take a look at http://gutenberg.net.au/ which has some texts you can't download in the USA *wink*

    6. Re:Hm! by jonathan_ingram · · Score: 3, Informative

      Yes, Australia is currently 'Life+50', which means that a work becomes copyright free 50 years after the death of the author (sadly, this will be changing to 'Life+70' soon). I live in the EU, which is 'Life+70'. There's a significant amount of material which is copyright free in the EU and Australia, but still copy restricted in the USA -- basically, anything published after 1922 by an author who died before 1934. We recently started a 'DPEU' to focus on these works. At the moment the focus is on Eastern European languages, but there's a wide variety of content (including some English material).

    7. Re:Hm! by Anonymous Coward · · Score: 0

      It would be really cool if they'd start doing census images. No more paying $100+/year to various genealogy sites, none of which are complete.

      There are various places trying to do some census work, but usually only for their own county or some such.

    8. Re:Hm! by Anonymous Coward · · Score: 0

      I think it's time to take advantage of countries like Canada, ZA, NZ, China, India, Indonesia, Malaysia, Argentina, etc. who are still Berne-term countries (life+50).

      DP-50 anyone?

  5. I need a new job by jamoan · · Score: 5, Funny

    Wear can I apply? i have excellent grammer skills.

    1. Re:I need a new job by Anonymous Coward · · Score: 1, Funny

      You make with the grammar, and the orthographic things, do I. Good concept, it is.

    2. Re:I need a new job by jonathan_ingram · · Score: 5, Interesting

      Luckily, you do not need either grammar or spelling skills -- just the ability to match text against a source image. Indeed, it may even be an *advantage* to not be a great linguist! One of the key things we emphasise is that we want an exact copy of the source material -- we do not want people 'correcting' or 'updating' the originals to bring them into line with the way the language is written today.

  6. Slow down! by tod_miller · · Score: 3, Funny

    I am still on the 4986th book, this one isn't that good, but I have to finish it, oh, page 34, line 7 there is a mistake in the 4th word, I think you know it, yes.

    Other than this I just found, the other 4985 are AOK so far.

    Good work guys. Free the books. ook.

    (re-reading Sourcery on the commute today... ook oook)

    --
    #hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
    1. Re:Slow down! by Anonymous Coward · · Score: 0

      Sourcery? That's the one with the ape in it right?

  7. 5052 by squidinkcalligraphy · · Score: 1, Informative

    And since then they seem to have proofed another 52 books - that's not a bad rate considering...

    --
    "I think it would be a good idea" Gandhi, on Western Civilisation
  8. What about 5001? by after · · Score: 0, Troll

    Why must Slashdot get all excited when a number like 5000 pops up? I don't understand why everyone is so excited about numbers. I took my 500th shit this month, you dont hear me calling the press do you?

    What about the 5001st book? Will that also yeild a news item?

    1. Re:What about 5001? by xsupergr0verx · · Score: 0, Offtopic

      The other day I was the 87298734th visiter to a website. I won a trip to Hawaii.

      The fireworks told me. I put in my email, but I haven't gotten any tickets yet. Mostly just offers to make my penis bigger...

      How did they ever know??

      --

      Click here for a free picture of an iPod!
    2. Re:What about 5001? by xsupergr0verx · · Score: 1, Funny

      This being the 24th day, he drops about 21 ish bombs a day.

      That sir, is not constipation. That is uncontrollable demon bowels.

      --

      Click here for a free picture of an iPod!
    3. Re:What about 5001? by aussie_a · · Score: 2, Funny

      I took my 500th shit this month, you dont hear me calling the press do you?

      I'm assuming your signature link is related to this, so yes, you could say you did call the press.

    4. Re:What about 5001? by dragonp12 · · Score: 0, Offtopic

      Your 500th shit this month? Have you got some kind of medical problem?

      --
      This is me. Don't like it? That's unlucky.
    5. Re:What about 5001? by jonathan_ingram · · Score: 5, Funny

      The next book won't yield a news item, but is no less important. You are very welcome to join us, and help us proof all the books which will also provoke no news items until text 10,000 comes along -- which you can also complain about :).

    6. Re:What about 5001? by Anonymous Coward · · Score: 0

      You're 500th shit? Are you like 2 years old?

  9. because by Anubis350 · · Score: 1

    because playboy hasnt lapsed into the public domain yet...

    wait a few (read: lots) of years and you'll be seeing 'em tossed up there, editors duitifully rendering pictures into ascii, etc.

    --
    "goodbye and hello, as always" ~Prince Corwin, from Zelazny's Amber series
    1. Re:because by jonathan_ingram · · Score: 4, Interesting
      because playboy hasnt lapsed into the public domain yet...


      Very true, although several of us do keep talking about searching for some Victorian Porn to put through the site :). There's actually quite a lot of public domain 'erotica' (anything written and published before 1923, for example) -- we just need people to scan it and contribute it to the site! We've had a couple of 'racy' books, and not surprisingly they tend to be proofed very quickly.

    2. Re:because by jhutch2000 · · Score: 2, Interesting

      Yeah! I'm one of the "several" that Jon's referring to. I got a real kick out of recent book that was posted by us to PG...

      Sane Sex Life and Sane Sex Living

      For a turn of the century study of sex (published 1919), this guy was amazingly (IMHO) progressive! A very fun read! JHutch
  10. 500 people read it? by tod_miller · · Score: 3, Interesting

    The book, a Short Biographical Dictionary of English Literature, by John W. Cousin, was proofed for this special occasion by over 500 volunteers.

    Hardly a non-put-downable... I suppose that is is a Biography (Shouldn't that be bibliography *chuckle*) of english literature is kinda symbolic.

    I guess this more than doubles the total number of people who have read this book though!

    I like Gutenberg, I hope they start a system where you can download copyright books for a micropayment, I would pay good money for text ebooks.

    Lets hope ebooks don't go the way of music, keep the costs low, no DRM fluffing up the download. If you can click 3 times and start reading a new book, and it costs you euro's then you would preffer that than d/l gigs of warez.

    Anyone who illegally downloads lots of books, tends to be the person who does't read them much anyway (Someone boasted to me that they had 300 O'Reilly books, squirming under the desire to tell me that they were eBooks, off irc, oh lawks, what a riot, I wish I was your friend, go away)

    --
    #hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
    1. Re:500 people read it? by wolfdvh · · Score: 5, Insightful
      I like Gutenberg, I hope they start a system where you can download copyright books for a micropayment, I would pay good money for text ebooks.

      Rather than setting up a complicated system to make micro-payments that only some people would follow anyway, do what I do, determine a fair value for youself and make a donation. Not for one book, but estimate a year or two worth so you don't 'nickel and dime' the value of you donation with transaction fees.

    2. Re:500 people read it? by Anonymous Coward · · Score: 0

      Anyone who illegally downloads lots of books, tends to be the person who does't read them much anyway (Someone boasted to me that they had 300 O'Reilly books, squirming under the desire to tell me that they were eBooks, off irc, oh lawks, what a riot, I wish I was your friend, go away)

      Would you rather be the friend of somebody who downloaded 300 O'Reilly books, or somebody who has read 300 O'Reilly books?

    3. Re:500 people read it? by tod_miller · · Score: 2, Interesting

      Would that 'honesty book shop' appease Authors? I meant buy new releases and older copyrighted works (even out of print copyrighted that Gutenberg won't touch) If I want to re-read some Orwell, Asimov, Steinbeck or others, where do I go? (pah, library...)

      Who cares what publishers think, they are wondering how they can be a middle man in a digital age. We will start with good bi-format books, all available in eBook, all 100% well formatted. Then some will move more over into eBooks.

      Then every internet whore will inflict their putrid poetry onto the world. Tum tee tum.

      --
      #hostfile 0.0.0.0 primidi.com 0.0.0.0 www.primidi.com 0.0.0.0 radio.weblogs.com
    4. Re:500 people read it? by quisph · · Score: 1
      Rather than setting up a complicated system to make micro-payments that only some people would follow anyway, do what I do, determine a fair value for youself and make a donation.
      Making a donation to Project Gutenberg is all well and good, but it will not get you any copyrighted ebooks, which is what the grandparent was talking about.

      It's all moot, though. The suggestion doesn't make much sense either way. Even if PG were interested in distributing copyrighted works, the copyright owners would probably demand quite a bit more than a few pennies per download.

    5. Re:500 people read it? by Mannerism · · Score: 1

      Not all of it; just a few pages each. Thank God this one went out of copyright before congress extended it, or it might not have become such a hit.

  11. Is it possible... by cujo_1111 · · Score: 4, Funny

    ...that a million net monkeys can fix the complete works of Shakespeare so that they language is spoken the correct way?

    Instead of 'WHat light through yonder windows breaks?' we get 'Who is that hot chick I can see through my binoculars?'

    --
    If I point out that you are incorrect, making me a foe does not make you any more correct.
  12. How strange by iamdrscience · · Score: 1

    I would have thought that if they were going to hold this up as a landmark that they would have picked a more notable book. I mean, I skimmed through that book and it seemed to have a lot of information and everything, but it's hardly something you would ever read all the way through.

    1. Re:How strange by jonathan_ingram · · Score: 4, Informative

      I'll let you in on a secret -- this isn't really our 5000th book! Some larger works are split into multiple projects, so while this is our 5000th *project*, it's around 10% off being our 5000th *book*. The text we chose for *this* 5000 was supposed to be appropriate for an internal celebration, rather than one which would be announced to the world -- it's a great example of the sort of text which would be very unlikely to get into PG if DP didn't exist, and it gives us useful biographical information to use in the 'blurb' for future projects. It's hard to stop people from submitting stories to Slashdot, though :).

    2. Re:How strange by bbc · · Score: 2, Interesting

      This is all my fault! :-(

      I got a bit carried away. This 5000th project was organized so that as much proofreaders as possible would work on it. (Although any book going through DP runs a chance of being proofread by many separate people, usually proofreaders stick with a certain book for a while, so that the work has only been seen by 50 or so.) I was so glad we pulled it off, that I sent a story to Slashdot without thinking.

    3. Re:How strange by jhutch2000 · · Score: 1

      It's ok. We still love you, man!

      JHutch

    4. Re:How strange by SnarfQuest · · Score: 1

      It's ok. We still love you, man!

      But you have to bring the beer for the real thing.

      --
      Who would win this election: Andrew Weiner vs Andrew Weiner's weiner.
  13. Re:lol fp by Anonymous Coward · · Score: 0
  14. A shame by iamdrscience · · Score: 4, Insightful

    I think it's really a shame that current copyright laws (and retroactive extensions) have limited project Gutenberg to texts from a little after the turn of the century and before.

    I just don't understand the point of retroactive copyright extensions. The idea behind copyrights, like patents, is to encourage innovation by allowing the creator an exclusive right for a limited time. If people believe copyright terms need to be extended to achieve this goal, fine. I disagree, but whatever. However, I think it's ludicrous that terms should be extended on works that have already been created, unless maybe they think that extending terms retroactively will lead to more works being produced in the past?

    1. Re:A shame by MikeCapone · · Score: 5, Insightful

      I just don't understand the point of retroactive copyright extensions. The idea behind copyrights, like patents, is to encourage innovation by allowing the creator an exclusive right for a limited time. If people believe copyright terms need to be extended to achieve this goal, fine. I disagree, but whatever. However, I think it's ludicrous that terms should be extended on works that have already been created, unless maybe they think that extending terms retroactively will lead to more works being produced in the past?

      There's nothing to understand. Everything's about money now. Nobody cares about books, art or people. If you can make money - especially on the work of authors usually living near poverty - long after they are dead, then you are the winner of this big capitalistic orgy!

    2. Re:A shame by 16K+Ram+Pack · · Score: 2, Insightful
      The change has of course happened because of the industrialisation of reproduction. At one time, if you wanted to hear music, you went to a show or bought the sheet music. Performance was expensive and does not scale up.

      You could fill a music hall with people and pay the performers. You want to open another music hall? You need another set of performers.

      Recorded music meant that each copy scaled the initial costs down. This has, over time become even more exaggerated, though. At one time, record production and promotion was quite amateurish, which also would mean that records were made which actually cost very little, but actually cost quite a lot in terms of the pressing/sleeve production.

      Now, the situation is that CDs cost very little to record and manufacture but the music costs a huge amount to produce in terms of promotion/PR/grooming etc. The cost of CD number 1 is huge but by the time you reach CD number 2million, it costs very little.

      This means that people involved are not small entrepreneurs of the Fred Carno, but major corporations with everything that comes with it.

    3. Re:A shame by ceeam · · Score: 1

      But don't be afraid! As evolution and free market rules state this kinds of civilizations are bound to destroy themselves clearing way to more healthy ones!
      PS: Just don't take the world with it. :(

    4. Re:A shame by Anonymous Coward · · Score: 0
      If you can make money - especially on the work of authors usually living near poverty - long after they are dead, then you are the winner of this big capitalistic orgy!

      I think you are confusing the words "capitalistic" with "greedy". The desire to use any means available to pad your purse in not a capitalistic characteristic but rather a sign of the sad sinful state of mankind.

      This is true regardless of the system of government, regardless of economic theory. Cain tried to keep the best for himself and we've been going downhill ever since.

      The chief problem is not the system, rather it be capitalistic, communistic, or what not, but it is the people in the system.

      If one's cosmology consists of "Look out for ME" then any philosophy adopted will simply be a tool used for looking out for ME.

    5. Re:A shame by jdavidb · · Score: 1

      Moreover, I submit that it is an unconstitutional ex-post facto law. There is a reason the Constitution prohibited retroactive laws, but we seem to be ignoring that principle today.

    6. Re:A shame by MikeCapone · · Score: 1

      Yes, but capitalism *encourages* that sort of behavior. Corporations are not accountable to anybody or any principle except to their shareholders, and if someone runs the corporation in an ethical and common sense way but makes less money than he could have by screwing everybody, he'll be punished and replaced by the shareholders (a bunch of people that don't feel responsible by whatever happens, they just want a good return on their investment).

      A nice explanation of how this all works out can be found in the documentary The Corporation.

  15. Who picks this stuff? by Animats · · Score: 3, Interesting
    "Final Report of the Louisiana Purchase Exposition Commission"?

    Still, I look forward to the day when someone starts digitizing the Mechanics Institute Library in San Francisco. It's a beautiful private library one can join. The books are in excellent condition, and there are century old original editions on the shelves.

    But it's the magazine collection that's stunning. They have Popular Mechanics in bound volumes, all the way back to the beginning, when it was a serious scientific journal. All the major railroad magazines from the heyday of railroading. Every issue of Electric Railway Journal (the trade magazine of streetcars). Few other libraries kept that stuff.

    1. Re:Who picks this stuff? by Fallen+Andy · · Score: 1

      Sigh. Nothing can really capture the reality. Once when I was much much younger a friend gave me some "Boy's Own Papers" c.a. 1912. I still lust after some of the gadgets in the adverts. Whoa. Seriously interesting stuff.

      If you want to see geek heaven go look through the adverts...

      I keep going to look in the hope that someone will put Olaf Stapleton or EE "Doc" Smith up but alas rights are a real bitch...

      (1950's SciAm are pretty cool too - stuff about electroluminescence and (cough) computers).

    2. Re:Who picks this stuff? by jonathan_ingram · · Score: 3, Informative

      If they were published before 1923, then they're public domain, and we'd love to have them in PG! All you need is a scanner, and some spare time :).

      Until the middle of last year, we focused almost exclusively on books. Since then, we've been putting some very interesting periodicals through the site (Punch, The Strand Magazine, Scientific American, Notes & Queries, to name but a few). Magazine aimed specifically at boys (or, indeed, girls), would be a great addition to the pile!

    3. Re:Who picks this stuff? by jhutch2000 · · Score: 1

      As with all volunteer efforts, if you want to see it done, do it yourself! :) In the example library you gave, it probably is just a simple matter of no volunteer currently providing scans is a member of that library. If you have the time and inclination, grab some of those wonderful sounding periodicals, a scanner and get to work. Head over to the DP forums and we'll all be glad to help you get started. JHutch

  16. good books? by null-sRc · · Score: 1



    does anyone have suggestions for fiction titles on gutenburg?

    i need a good read, but i dont want to pay or find something good myself.

    --
    -judging another only defines yourself
    1. Re:good books? by mrchaotica · · Score: 2, Informative

      Sherlock Holmes mysteries, old sci-fi (Jules Verne, H.G. Wells, etc), Edgar Allen Poe's short stories... there's lots of good stuff.

      --

      "[Regarding the 'cloud,'] ownership was what made America different than Russia." -- Woz

    2. Re:good books? by adolfojp · · Score: 3, Interesting

      http://www.icarusindie.com/Literature/Library/

      That site has a couple of good ones. You should read first "The lost continent". The book was written shortly after, or during WWI and follows a hypotetical developement of the world if the new world and the old world had lost comunication until 200 years later. The most interest thing about those old science fiction books is to contrast their world view with ours and to see what futuristic devices would exist by now.


      Cheers,

      Adolfo

    3. Re:good books? by jonathan_ingram · · Score: 4, Informative

      There are many sites which have taken some of the more popular works from Project Gutenberg, and put a more user-friendly directory style front end to them. One of the best is Blackmask.com, which also contains works from non-Gutenberg free book providers. There are 312 works in the 'Science Fiction' section alone.

    4. Re:good books? by Anonymous Coward · · Score: 0

      wow, i read the whole thing in one sitting and all i can say is i feel i have wasted 2 hours of my life, thanks for nothing, if i want to read ignorant trash i shall keep to the posts here in the future.

    5. Re:good books? by dwhitman · · Score: 1

      I just finished the PG edition of Joyce's Ulysses. It's a difficult but rewarding read. I loved having it available as an etext, but I have to say, the proofreading was marginal at best. The entire last chapter had no punctuation marks at all!

      Seriously, the text was in pretty bad shape, with lots of common OCR errors: 1 = I, 5 = S, b = h, etc., chapter titles missing, etc. Does DP take on new versions of existing PG books? I'd volunteer to try and do a better job on Ulysses.

    6. Re:good books? by bbc · · Score: 1

      "Does DP take on new versions of existing PG books?"

      Yes, but I don't know if there are any conditions attached.

      Better would be to next time keep notes of all the errors you encounter, and send them to Project Gutenberg, where volunteers will use them to correct the book.

      The Project Gutenberg FAQ tells you what to send where, and how.

    7. Re:good books? by bbc · · Score: 1

      What sort of stuff do you like? Scary? Romantic? Adventurous? Scientific? Weird?

      Some of the famous literature that is in the public domain: Jules Verne, Sherlock Holmes, Frankenstein, War of the Worlds, Wuthering Heights, the Bible, anything Shakespeare, Aesop's fables, Mother Goose, Alice in Wonderland, Wizard of Oz, Ulysses (both Homer's and Joyce's versions), The Picture of Dorian Gray, Heart of Darkness, Treasure Island, The Jungle Books, et cetera, et cetera.

    8. Re:good books? by jonathan_ingram · · Score: 3, Insightful
      Does DP take on new versions of existing PG books?


      Yes, we do -- although as I mention in an earlier post, we have a year's worth of material as it is, without going back and re-doing the older material already in PG. However, as you say, some of PGs content is below the standards we expect of newly produced text. Hopefully we can go back and correct *all* PGs content over time. The main factor stopping us is that we need page scans of any project before it can go through DP. If you know of any page images of a clearable edition of Ulysses, or indeed if you have a clearable edition which you are willing to scan, then we would gladly put it through the site.
    9. Re:good books? by JohnFluxx · · Score: 1

      I've just finished listening to H.G. Wells, invisible man.

      I download a load of texts, put them on my ipaq, then use flite to do text->speach and read them out.

      Btw, has anyone thought of marking up any of the books so they can be read better by something like festival? (emotions, sex of character etc)

  17. law of averages? by Anonymous Coward · · Score: 2, Interesting

    All in all, I have to say that I think this project is better than nothing at all. I am sure that the proofreading is better than what was there before.

    However, I am curious as to just how accurate the proofreading is. I think that they try to improve accuracy by having many different volunteers; accuracy in numbers and all that. However, just because many people think in a certain way, does not mean that what they think is accurate. Just look at standardized tests. They are specifically designed to make use of common mistakes, so that the majority (the swell of the bell curve) all get the wrong answer together. Only a slim minority will get all the questions correct. Considering how many people (even educated people), get around average on even the verbal and English sections of such tests as the SAT, GRE, etc., I wonder if certain passages in books will be incorrectly edited on a mass scale. This would especially be true for older or more complex works.

    1. Re:law of averages? by jonathan_ingram · · Score: 5, Informative
      However, I am curious as to just how accurate the proofreading is.

      The answer is: surprisingly accurate. We proof one page at a time, working from the original scanned images, and emphasise that people should try as hard as they can to stick to the source material. As counter-intuitive as it may appear, this type of proofreading is actually hardest to do with material from the late 18th/19th century -- subtle changes in spelling (and small changes in accent systems for the non-English languages) make errors much harder for human proofreaders to correct than the earlier material, where spelling consistency was completely optional!

      Each page is OCRed (and the ability of modern OCR programs is a major improvement over those of even a couple of years ago), proofread twice, and then the whole document is reviewed twice before being posted. We've also recently become much more aware of the need to make useful texts which can be used for scholarly purposes in the future, leading to such improvements as retention of all page numbers.

    2. Re:law of averages? by littlem · · Score: 5, Insightful
      We've also recently become much more aware of the need to make useful texts which can be used for scholarly purposes in the future, leading to such improvements as retention of all page numbers.

      At the risk of going over very old and well-trodden ground, if PG wanted to be useful for "scholarly purposes" it should long ago have corrected the original mistake of using plain text, and used a markup that could have kept page numbers and other meta-information for scholars, while giving the common reader a clean text with a suitable style sheet. But even today on the PG website is a "justification" for sticking to plain text making it clear that scholars don't even figure in the intended audience for PG texts.

    3. Re:law of averages? by jonathan_ingram · · Score: 3, Informative

      DP is 'semi attached' to PG -- I think you'll find that we are much more concerned both with keeping page and edition information, and with marking such information up in an appropriate way, than some of the traditionalists inside PG are.

      For example, many of use make sure that we produce a valid XHTML edition of each project, and that the page numbers and edition information of the source are preserved. For an example text, see Graham Wallas -- Human Nature In Politics. We are currently working on a markup and stylesheet which will improve the end-user experience in several ways (and then, sigh, we will have to go back and move all the books we've already done to this new system. This may take a while :) ).

    4. Re:law of averages? by arcade · · Score: 1

      At the risk of going over very old and well-trodden ground, if PG wanted to be useful for "scholarly purposes" it should long ago have corrected the original mistake of using plain text,

      Personally I'm of the opinion that allmost everything is better represented as plain text. In extreme cases, maybe plain text + italics, bold, and the ability to link in pictues.

      I can understand other arguments, but in general, I think plain text is the most universal and common format - and thus best suited.

      Maybe everything should have a 'source' with more meta-formatting, but with plaintext as the default 'export'.

      --
      "Rune Kristian Viken" - http://www.nwo.no - arca
    5. Re:law of averages? by bbc · · Score: 1

      "if PG wanted to be useful for "scholarly purposes" it should long ago have corrected the original mistake of using plain text"

      The sort of scholar that would make such unqualified statements about the need for mark-up has no place in academia.

      Project Gutenberg has excellent reasons to stay with plain text as the most basic distribution format, reasons that have proven themselves over time.

      Smart scholars have many uses for plain Gutenberg texts.

    6. Re:law of averages? by bbc · · Score: 2, Informative

      "However, I am curious as to just how accurate the proofreading is."

      That's very hard to tell, as there is no gold standard for accuracy. There are two sometimes conflicting goals in regards to accuracy that we have; one is to preserve the author's intent, the other to preserve the actual printed text. At some points these two conflict, for instance, when we would like to normalize spelling to increase readability.

      There is currently some talk going on at the DP forums as to which system would be best to eliminate common errors, that everybody tends to overlook.

      We already have several systems in place to help us with these. For instance, we use a specially modified font that helps to highlight differences between letters. It's dog ugly, but that's intentional; because it grates, you see errors much more quickly.

      Also, once common errors are identified as such, we write software that can help us find such errors.

      Finally, we use these new-found methods to look at books we posted to Project Gutenberg in the past, to measure the increase in accuracy.

    7. Re:law of averages? by pvanheus · · Score: 1

      I agree, PG text format is not good for reproducing the features of a printed book. Much better to use something like TEI. However, marking up in a semantic format raises a hairy issue: the proofreader needs to interpret the meaning of textual elements (such as italics which are used for a foreign language term - that's different from italics used as emphasis). That requires more training than simple PG markup. And of course there is the issue of a decent user interface...

      Having said this, maybe these problems can be overcome. Any suggestions?

    8. Re:law of averages? by BranMan · · Score: 1

      Plain text is done in PG for ONE reason - practicality. They can convert a book into plain text and leave it - any kind of markup language will eventually expire, standards will change, etc. But pure plain text with no formatting will be readable forever. Plus, page numbers are simply now included in the plain text - no meta data, no markups, nothing extra - so that scholars and students can refer to page numbers in footnotes.

      The audience for PG is EVERYONE - every single person on the whole planet, regardless of the hardware they have (even old 8080s). Language translation aside, the least common denominator must be used. In this case that is plain text.

    9. Re:law of averages? by littlem · · Score: 1

      Plain text can be generated from any reasonable markup language. The converse is not true. I completely understand that many PG contributors wouldn't want the hassle of applying complicated markup, and that it's better to have a text plain text rather than not at all. Fine - that's the practical situation as it exists. It just grates with me when people deny that well marked up text would be more useful to scholars and keep trying to argue that plain text would be the ideal format even if it weren't for these other constraints.

      Although I knock PG a bit, my real frustration is that no government, no university (well, very few) is willing to pay to have people create electronic texts with all the desirable scholarly apparatus included.

    10. Re:law of averages? by BranMan · · Score: 1

      I'm sorry - you still don't get it. Plain text will be good FOREVER. They can, in the future, go back to books that were scanned 50 years ago, and everyone can still read them. They want to scan them ONCE, record them ONCE, and never touch them again. Any idea what numerous businesses and government agencies have gone through because of outdated file formats and media? PG can't aford that, and never will be able to. Thats why they use plain text.

  18. Re:Hi by Anonymous Coward · · Score: 0

    Hello, Ano!

  19. What you say? by Anonymous Coward · · Score: 0

    [n/t] :D

  20. Rsync your own Gutenberg library by gtoomey · · Score: 4, Informative
    You can rsync your own copy of the Gutenberg library. I used the Aarnet mirror as its closest to me and fast.

    Just be aware that the Gutenberg is some 135GB, and much of it is gif jpg and mp3 (spoken work books). So i just used --include in rsync to download the .txt .htm and .html files. Its a more manageable 10GB download.

    1. Re:Rsync your own Gutenberg library by Black+Acid · · Score: 2, Informative

      I use --exclude \*.zip --exclude \*.iso --exclude \*.mp3 with wget to achieve similar results. The advantage of this is you get all the images and indexes, without wasting space on computer synthesized spoken books (yech), zipped files which you already downloaded the contents of, and 4.7GB/700MB DVD or CD ISOs. On the other hand, the Project Gutenberg CD and DVD Project is worth looking into for "best of" collections if you don't want the whole library.

  21. Make them renew each year by Anonymous Coward · · Score: 4, Insightful

    It's so Disney can keep milking Mickey Mouse.

    Here's what I want to see:

    You get automatic copyright for 25 years. After that, you must pay $1 per year to keep something in copyright. If you can't be bothered to keep track of your stuff and pay the $1, it lapses into the public domain.

    Disney will pay the $1 for Mickey ($1 for Steamboat Willy, $1 for each other cartoon, $1 for each book, etc.). But forgotten gems, like ancient Apple ][ games, will become legal public domain items.

    I'd actually like to see a hard limit of 50 years or so for copyright, but even if you can't get that, at least the above scheme makes alot of stuff lapse into the public domain.

    A cool feature: if the legal trail is tangled and murky, and no one knows who owns it anymore, no one will pay the $1 and it will fall into public domain. Let's say LSD Software wrote a fun game for the Commodore 64. Then ABC Games bought the game from LSD (who kept the rights to use the music in future games). Then ABC Games went under, but its assets were bought by PDQ Games, which later split into PDQ Software and Foo Bar Games. After that it gets REALLY complicated... anyway, after all that, who exactly owns that fun game? No one knows. It would take a court case to decide, but no one will bother so no one will ever know. Under the current system, you are technically a pirate if you keep the game, but there is no one you can pay a license fee and legally have the game! Catch-22.

    Heck, Disney should want this. They make big bucks by Disney-ifying public domain stuff, so they should make sure things will actually go into the public domain in the future.

    1. Re:Make them renew each year by iamdrscience · · Score: 3, Interesting

      Lawrence Lessig proposes a similar scheme in "The Future of Ideas". I doubt he was the first, but that's just what you made me think of. It's a good book, even though it can get kind of dry at times (it is, at least in some capacity, a book about law after all).

      As far as your scheme though, I would really like a hard extension limit and I think 25 years for a default term is really too much (I mean, to use your example of Apple II games, many of those games wouldn't even quite be out of term yet). I think 5 or 10 would be much better.

    2. Re:Make them renew each year by Anonymous Coward · · Score: 0

      Jerry Pournelle said that the way it used to be was you got 26 years, and if you renewed, you got another 26 years. You could only renew once.

      He said he was content with a mere 52 years of copyright, but now of course it is something like the life of the author plus 95 years.

      Anyway, that's where I got the 25 years. It should be a reasonably long term anyway, since some books take years to build their fame and become profitable.

      You could make a separate category for video games, and make it less (10 years, say).

    3. Re:Make them renew each year by RAMMS+EIN · · Score: 2, Interesting

      ``You get automatic copyright for 25 years. After that, you must pay $1 per year to keep something in copyright. If you can't be bothered to keep track of your stuff and pay the $1, it lapses into the public domain.''

      I would even go a bit further. Why even have a default term at all? (and 25 years is a LONG time) And $1 is arguably a bit little. If you really care, you can pay a bit more. Maybe we can even have different levels of protection - pay nothing if you allow modifications, pay more to retain exclusive rights to distribution, etc.

      I think this is an interesting idea worth investigating. Thank you for publishing it!

      Oh, and BTW, I will be using your idea as if it were mine, unless you pay your $1, of course. ;-)

      --
      Please correct me if I got my facts wrong.
    4. Re:Make them renew each year by EpsCylonB · · Score: 1

      You don't want to raise the bar to get something coprighted, the best ideas often come from those who dodn't have a lot of money.

      But creating a intellectual property tax to be paid after a piece of IP turns 25 is, IMHO, a good idea. Take the example of the beatles, if it wasn't for disney and friends lobbying to have copyright extended then their work would already be public domain. But the beatles music still makes money, fair enough, while financially lucrative the copyright holders can afford to pay the state to protect their intellectual property. If they don't pay then the work becomes public domain forever.

      I don't however think this should apply to patents, in the interests of competition they should run out in 15 - 25 years.

    5. Re:Make them renew each year by bbc · · Score: 1

      "And $1 is arguably a bit little."

      It's not about the money, it's about the effort. Most people won't be willing to renew most works. As a result, these works become public domain (and verifiably so).

      This creates several benificial situations:

      1. If you want to use a work that the author lost interest in, you can.

      2. If you want to use a work that the author still is interested in, you now have a way to find out who the author is and how he can be contacted.

      (When I say 'use' I mean 'use in a way that would be prohibited by copyright law, such as copying and distributing'.)

      There are several weaknesses in this system as far as I can see:

      a. Big Copyright can abuse this system to 'claim' copyrights on works they do not really own. Somebody looking for a licence will approach the wrong entity. As a result, that somebody may break the law when using the work, even though he thinks he licenced it correctly.

      Therefore, this new system would have to be accompagnied by a law that states you are not allowed to falsely claim copyright on something you did not make.

      b. A lot of the interest lies in derivative works; if I create a popular cartoon character called Dickey Dog, I would want to be able to control what happens to Dickey. However, at one point I might lose sight of all the Dickey works out there. I might forget to register a postcard one of my interns drew of Dickey Dog a long time ago (come on folks, we're artists, not accountants!), and as a result, Dickey would become public domain.

      Of course, if Dickey were that succesful, I could (and probably should) trademark him, but will a court allow double protection? (In Europe, courts and law-makers frown upon double protection; if something was once under copyright, you cannot try and extend that copyright through the backdoor of trademark law--well, at least in theory you cannot, in practice judges are people too, with faults and everything.)

      The thing is, copyright was never intended to cover derivative works, but now it does, it forms an extra and powerful incentive to invest in derivative works.

    6. Re:Make them renew each year by bbc · · Score: 1

      "Heck, Disney should want this."

      Very much so.

      The fact that Big Copyright have declared themselves fierce opponents to any law that would reintroduce registration and renewal in the US, has made some people remark that their ultimate motive is control.

    7. Re:Make them renew each year by Anonymous Coward · · Score: 0

      I might forget to register a postcard one of my interns drew of Dickey Dog a long time ago (come on folks, we're artists, not accountants!), and as a result, Dickey would become public domain.

      Not so. You could have a registered trademark on Dickey Dog, and/or a "design patent" on the Dickey Dog design. Possibly that would be enough to keep full control of Dickey Dog, or possibly the forgotten postcard would be reproduceable in the public domain. There should be a "good faith grace period" where you can pay up on something you overlooked... but you can't get penalties or damages from someone who used the thing you overlooked during the period where you had let it lapse.

      I'm not a lawyer, so maybe I have overlooked something.

      I don't think Disney should lose all control of Mickey Mouse just because the "Steamboat Willy" cartoon is so old. But I also think it's crazy that basically nothing lapses into the public domain anymore. We can and should hammer out a compromise.

  22. I hope... by dj245 · · Score: 0

    ...they don't use Microsoft word

    --
    Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
    1. Re:I hope... by bbc · · Score: 1

      Project Gutenberg posts texts in formats that are offered by the volunteers and that the gatekeeping volunteers know how to check.

      If a volunteer feels comfortable with MS Word, then by all means they should try and commit a book in that format. The only demand Project Gutenberg makes, is that the etext is also submitted in 'plain vanilla text' format, so that anybody can read the text, anywhere and anytime.

  23. formatting by golgotha007 · · Score: 3, Interesting

    I think the Gutenburg project is a terrific idea!

    My only complaint is with the formatting. Project Gutenburg uses hard formatting within the text. I think that's an extremely stupid idea.

    There should be zero formatting within the text (other than paragraph breaks). Whatever client you're using should provide the formatting for you.

    Let the client handle the presentation!!

    1. Re:formatting by Kvasio · · Score: 1

      yes, zero fotmatting would be great for those silly Eiffel Tower-shaped Guillaume Apollinaire's poems; finally they would be readable.

    2. Re:formatting by mikis · · Score: 1

      Most -- or all? -- of the books posted from Distributed Proofreaders to Project Gutenberg in recent time are available both as plain ASCII text (hard formated) and as HTML -- meaning it can be converted to "soft formated" txt, PDF or anything else.

    3. Re:formatting by jhutch2000 · · Score: 1

      We're getting there. It is surprising just how much work is require of getting a set "standard" in place for such a setup.

      Currently, there is at least one effort to come up with a XHTML conformant standard (it has stalled somewhat due to summer volunteer burnout) and a TEI-lite conformant standard. The problem is getting a standard simple enough for the average lay person to remember it well enough to actually mark up texts, while complex enough to handle 99% of the texts we see.

      It ain't easy!

      JHutch

    4. Re:formatting by ragnar · · Score: 2, Informative

      The problem you raise is not so easy to solve. While it sounds nice to separate content from presentation, in many cases the presentation is part of the content. Take the indentation of poetry for example, or for a more specific example, e. e . cummings. Once you wade into these areas you start talking about marking the text, which is a tricky issue. The Text Encoding Iniative has been hammering out a solution for a decade, but the learning curve is steep.

      As much as I think the project is digging themselves into a whole with hard formatting, I can understand why they do it. The alternative is a nasty can of worms.

      --
      -- Solaris Central - http://w
  24. Helping improve OCR software? by Anonymous Coward · · Score: 3, Insightful

    It seems to me that this project could have a large impact on OCR readers.

    Think about it. You have thousands of volunteers pouring over images, and then providing the corrected text (if necessary). Couldn't this also be used to "train" the OCR software to become better at identifying text?

    If you log the image, the original OCR'd text, and the manually verified text you could use it in a test case for future OCR software.

    I do this all the time when I write data validation/cleanup software.. I run my input data through a program, capture the output, and manually verify that it is correct.. making changes if necessary. I then use the two pieces of information in my test cases as a benchmark. If I introduce a bug in my code that causes something I already wrote to suddenly break, or output incorrect results, I know about it instantly. Works great with database correction code.

    Maybe I'm simplifying this too much, but I sure hope someone is capturing all this great data. It could come in handy..

  25. from the error-checking-and-correcting dept. by GothChip · · Score: 5, Funny

    I didn't realise this department existed at Slashdot.

  26. Shocking by CBDSteve · · Score: 2, Funny

    But not as shocking as this

    1. Re:Shocking by Kryxan · · Score: 1

      So wait, your telling me while only 1,390 sites found on google misspelled the word "proofreading". While 484,000 got it right? I figure thats fairly good, statistically.

      1390/484000 ~= 0.003

    2. Re:Shocking by Anonymous Coward · · Score: 0
      So wait, your telling me while only 1,390 sites found on google misspelled the word "proofreading". While 484,000 got it right? I figure thats fairly good, statistically.

      1390/484000 ~= 0.003

      No, he's pointing out that the misspelled sites were mostly offering high quality "profreading" services.
    3. Re:Shocking by Dizzle · · Score: 2, Interesting

      Some of those are legit too. Professional/Professor reading gets shortened to profreading. The other mistakes are mostly users.

      --
      -Dizzle
      "I most likely AM so interested in myself."
  27. What books to read by nuggz · · Score: 1

    There are so many books there, how can you choose one to read?

    1. Re:What books to read by bbc · · Score: 2, Informative

      There are several websites that offer free ebooks, and that allow people to review them.

      Of the authors I got to know through Project Gutenberg, Stephen Leacock and Theodor Storm stick out in my mind the most. Oh, and Hendrik Conscience turned out to be less boring than I thought after proofing the first of his books to go through DP (but so far he's only available in Dutch).

    2. Re:What books to read by Anonymous Coward · · Score: 0

      Why don't you start at 'A'?

  28. Proof Readers of Regular Books by Anonymous Coward · · Score: 0

    A lot of books I download need to be proof read as well, usally books that are still copyrighted in my particular duristiction. I still haven't figured out what the best format is when I contribute my own books as well, HTML?

  29. books galore ! by chrisranjana.com · · Score: 0

    wow that is awesome community work indeed !

    --
    Chris ,
    Php Programmers.
  30. Distributed Scanners by Anonymous Coward · · Score: 0

    my brothers (Team-Lib) crossed that number many years back.

    Keep up the good work anyway.

  31. To paraphrase... by milo_Gwalthny · · Score: 1

    Everything's always been about money. There's just more money in it now.

    --
    Milo
    1. Re:To paraphrase... by 16K+Ram+Pack · · Score: 1
      Yes, but also that money attracts certain types of people, particularly easy money.

      When the money isn't easy, you often end up with a situation where the only people in it are people with a love for it. There's still a lot of businesses like this - places that sell home brewing equipment or knitting machine shops - no-one makes much more than a good living and a pension from these places.

  32. Request for MATH experts by jhutch2000 · · Score: 5, Interesting

    Right now, we've got plenty of old math intensive books ready to move through the DP system. Because of ASCII terrible ability to handle equation formatting, we use TeX layout. The average DPer doesn't know TeX and it's a rather high learning curve to get started on. So, since Slashdot is full of self-professed geeks...all you TeX geeks should join up and help with the TeX formatted MATH texts. I've got plenty of books scanned and ready to go, so don't think you'll run us out of 'em any time soon!

    JHutch

    1. Re:Request for MATH experts by andy314159pi · · Score: 1

      Hi JHutch,
      I cannot find anything on the project page linking to "mathematicians sign up here for proof reading."
      Maybe you can post a link?

    2. Re:Request for MATH experts by RealAlaskan · · Score: 1

      I'm an occasional volunteer there, and I know LaTeX and math. So, I just took a look. Where are the math books? You got me all hot to help, and the only thing I see is Hilbert's ``Foundations of Geometry''.

    3. Re:Request for MATH experts by jhutch2000 · · Score: 1

      Once you login to the site, you'll see all the books currently in the first round. The only MATH book currently in first round is Hilbert's "Foundations of Geometry." There are quite a few books, in English and other languages, waiting in the wings. They get released one at a time, so as not to overwhelm the list with a single type of work.

    4. Re:Request for MATH experts by jhutch2000 · · Score: 2, Informative

      Only one MATH book is ever in the first round at any one time. Hilbert's book is that one right now.

      The logic behind this is simple. Most of our volunteers avoid these books like the plague and if we kept releasing new ones, pretty soon the entire first round would be only MATH books.

      To see what's waiting in the queue for English language math books, see here. For Languages Other Than English (LOTE) math books, see here.

  33. Public apology by bbc · · Score: 3, Informative

    I would like to apologize to TPTB (The Powers That Be) at Distributed Proofreaders for messing up by posting this story to Slashdot.

    The 5000th Posted celebrations were supposed to be internal. There is a discrepancy between works posted and books posted: sometimes a book gets split up. The big celebrations were intended for 5000 actual books posted.

    I am afraid I got a little carried away, and hope Slashdot will still carry the real story of 5000 books posted to Project Gutenberg.

    1. Re:Public apology by Anonymous Coward · · Score: 0

      Don't worry..this story WILL show up again on Slashdot.

      (Just like every other story ever on here did.)

  34. I miss the smell of damp paper - by Snart+Barfunz · · Score: 1

    but it's good to finally get electronic versions of those books that are bought by the yard to fill the bookshelves in 'Bohemian' pubs and coffee shops. To round out the experience, download the text of these books and write a PERL script to 'pulp' them. One gripe - plenty of books by Abbott but none by Costello - call that a library?

    --
    --- Yx3 = Delilah ---
  35. Any chance images could be made available? by Anonymous Coward · · Score: 0

    If the scanned images were made available after the books are "finished", then people would be able to make better scolarly use of the e-books. It is essential to have the "raw data."

    Also, I often find what I think is an error, and it would be very convenient to check it against the the scanned image, as going to the library or sending an email to someone else to have it checked is usually too much trouble.

    1. Re:Any chance images could be made available? by jonathan_ingram · · Score: 3, Insightful

      Yes, the long term plan is to make the page images we use in proofreading available for end users. There are several logistical problems with this (mainly to do with bandwidth and disk space), but all the images are archived for the time when we can make them available.

      It's possible that we might interface with something like the Million Book Project, which makes page images, but no text, available.

  36. Accuracy by jefu · · Score: 3, Interesting
    I have worked on the distributed proofing of a couple of texts and found that the accuracy of a page after the second proofing was often close to perfect.

    One of the books I worked on was the "Anatomy of Melancholy" and I (conveniently) have a copy myself. There were often more differences between the scanned image of the page and my copy than between the scanned image and the proofread text.

    Don't underestimate the amount of work people put into this too - for "Anatomy of Melancholy" it often took 30 minutes to proof a single page because the page often had latin and very small footnotes.

  37. What ever happened to Project Gutenberg 2? by PetoskeyGuy · · Score: 1

    This previous story mentions a possible split with a company charging for all the books and taking the name. I see now that http://www.projectgutenberg.info/ doesn't seem to be selling books anymore, but www.worldebooklibrary.com is up. Did they give up the Project Gutenberg trademark?

    1. Re:What ever happened to Project Gutenberg 2? by jhutch2000 · · Score: 2, Informative

      There was a lot of internal contention about that "pay" site using the Gutenberg trademark. For the most part, the furor has died down, and as I understand it, for the most part, the World E-book library thing has given up use of the Gutenberg trademark and some checks and balances have been put in place to prevent the unilateral decision that led to that controversy.

  38. Re:MOD PARENT ENCITEFULL by Anonymous Coward · · Score: 0
    smell like a 80% decayed horse-penis that has been raped by 17 skunks.

    Now, the really sad question is, how do you know either of these smells?

  39. Have you read an ebook? by invein · · Score: 1

    if not, give it a try:

    1. Download a text: (say Alice's Adventures in Wonderland). The new site has a vastly improved interface; listing books in available formats (always plain text, sometimes pdf, palm doc, tex)

    2. Have at it in you text reader of choice. If you are on the mac, I highly recommend the free tofu. It breaks the text into columns that are high as the window. Navigate by shifting columns or pages of text. This simple change makes a huge difference when reading large amounts of text. It makes reading books on my laptop pleasant rather than an ordeal.

    What about on other platforms? What are the best programs for reading etexts?

  40. $1 the first year, $2 the 2nd, by snooo53 · · Score: 1
    To avoid the problem of companies renewing something infinitely, I'd suggest not having a minimum period, and charging $1 for the first year. For each subsequent year, the renewal fee would double. So the 2nd year it'd be $2, 3rd year $4, etc...

    By the time the copyright got to 21 years it'd be over a million dollars to renew it, which would strongly encourage people to just let it go to the public domain. This way would also protect small time inventors/writers, since even at 7 years, it's only $64 to renew.

    --
    The sending of this message pretty much inconveniences everyone involved.
  41. Once again by lawpoop · · Score: 1
    I have to question why humans are doing the bulk of the editing for project gutenberg. It seems to me that if you have two independent scans and OCRs of a text, then it's highly unlikely they would agree on a mistake. Any discrepancy would be a mistake on the part of one or more of the OCRs, which should then be sent to a human for revision.

    Again, I think the likelihood to two independent OCR processes (seperate text, seperate scanners, seperate OCR packages) would both make the same mistake, so it's mostly trustworthy, as long as they're in agreement.

    --
    Computers are useless. They can only give you answers.
    -- Pablo Picasso
    1. Re:Once again by jonathan_ingram · · Score: 2, Informative
      (this story is off the front page now, so I doubt you will be looking for an answer, but I'll answer you anyway :) )

      I have to question why humans are doing the bulk of the editing for project gutenberg.


      There are several reasons. Firstly, there are lots of people around who can spare five minutes to proofread a page -- particularly when it has already been OCRed. Secondly, we are a completely volunteer organisation, with no 'plan' as to the books we scan, and so having to find and scan two seperate copies of a text would reduce the amount of material on the site considerably. In particular, it would almost certainly stop us from proofing some of the older and/or harder material.

      I can only suggest that you join DP, and test the process out. I think you'll find that it works surprisingly well.
  42. You mean I've been WASTING my TIME? by Anonymous Coward · · Score: 0

    You mean I've been grammar trolling slashdot, correcting the retarded writing of idiots on slashdot, when I could've been contributing to society in a meaningful way?

    Crap!

  43. Sorry, but you don't get it. by shadow_slicer · · Score: 1

    How about a compromise. A format that uses plain text to store both data and metainformation.

    You could also store documentation for this format and even source code (in a variety of languages) for a program that converts the metadocument into straight text. Then you won't have to worry about converting each of them painfully or worry about outdated formats.

    And even if the worst happens and the format becomes outdated and unreadable, the text is still there, hidden in markup. It wouldn't be that hard for someone to reverse engineer it and write a small program to convert it to a more recent format.

    With the markup data separated from the content, you can choose to only show certain portions of the content. It'd be much harder to filter plain text.

    Plain text has no real advantage over a sensible format. HTML documents will be readable (maybe not pretty, but readable) just as long as plain text.

    Of course eventually when ASCII dies and we all switch to a 32bit character set (endorsed by our alien overlords), neither "plain text" nor any sensible format will be readable by anyone.
    Heck, even switching to 16bit character unicode would screw it up.

  44. Old Scientific Works? by 4of12 · · Score: 1

    How about old scientific works, journals up to say 1920's?

    I know the more recent journal articles are copyrighted and therefore must have some lengthy protection on them, but what about classic old articles (like some of Einstein's work in the early 1900's)?

    --
    "Provided by the management for your protection."
    1. Re:Old Scientific Works? by bbc · · Score: 1

      We're producing Scientific Americans, and maths books. Basically, anything is welcome, although for science stuff we often need to produce TeX, which only a subset of our proofers can handle.