Slashdot Mirror


Project Gutenberg's 32nd Birthday

David Moynihan writes "July 4th marks the 32nd anniversary of that day in 1971 when Michael Hart first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web, thus founding Project Gutenberg. Thanks to an army of volunteers and the Distributed Proofreaders, this is the last year PG will have fewer than 10,000 titles. Strangely, Microsoft picked this dual anniversary of literacy and freedom to re-launch their Reader product, with three free bestsellers a week, if you activate the new version with Passport, sign a EULA, etc. Real reason for the upgrade might be that the DRM on MS's old Reader was cracked. If you're not into giving away data, or are running a system other than Windows, maybe you could take the time to tell a friend about free books online, or even help out by visiting the Distributed Proofers and editing one page per day."

33 of 178 comments (clear)

  1. Must...avoid...Steve...Gutenberg...joke... by mikeophile · · Score: 4, Funny

    Seriously, awesome work people.

  2. You can't be serious by ryants · · Score: 5, Funny
    even help out by visiting the Distributed Proofers and editing one page per day.
    You can't seriously be asking Slashdotters to volunteer as proofreaders.
    --

    Ryan T. Sammartino
    "Ancora imparo"

    1. Re:You can't be serious by BabyDave · · Score: 4, Funny

      Could be worse - they could be asking the Slashdot editors!

    2. Re:You can't be serious by thinkninja · · Score: 4, Funny
      It was the best of times, it was the worst of times...
      -1 Redundant
      --
      "The number of Unix installations has grown to ten, with more expected." (Unix Programmer's Manual, 2nd ed.; june 1972)
    3. Re:You can't be serious by Aldarondo · · Score: 5, Interesting
      As one that has been involved with Distributed Proofreaders for the past 18 months, yes we are serious about having Slashdot people proofread. The last time a story about D.P. ran in November, thousands of new users joined us and helped us grow and expand to our current size.

      Go and check it out, there is great work being done there. (I am a bit biased though). Click here for a history of DP.

  3. Now for the marketing... by Blaine+Hilton · · Score: 4, Insightful

    Now all we need is more people promoting this in schools and printing the books. Much like the IA Bookmobile. It seems like the people who could use this the most, don't even know it exists.

  4. very timely for me by b17bmbr · · Score: 5, Interesting

    i am going to be teaching modern civ next year in high school (i have been at the junior high for 7 years) , and have already gone to the site and gotten works from aristotle, plato, locke, montesque, et al. thanks guys. there is still something to be said for a classical education. glad somebody is doing all they can to preserve the classics, especially with all the assaults on it from the social reconstructionists.

    --
    My problem? I was perfectly gruntled, until some numbnuts came by and dissed me.
  5. founding fathers by Tablizer · · Score: 4, Funny

    ...first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web

    I knew it! This country was founded by COBOLers.

    1. Re:founding fathers by Anonymous Coward · · Score: 3, Funny

      ADD 1 TO POST-POINTS.
      MOVE "Funny" TO POST-STATUS.

      (That's Cobol, for those who don't know)

  6. Really great work by the guys behind the project! by jaemark · · Score: 5, Interesting

    There's really a problem though about getting the word out to people, in pretty much the same way the popularity of libraries today has been dropping. A good idea would be a separate advocacy site to come up with lists of texts in the project (i.e. What's New?, Most Popular, etc.) to help people wade in immediately.

  7. More free books by Cruciform · · Score: 5, Informative

    The Baen Free library has a number of titles available in several formats.

    It's a great way to introduce readers to a series or a talented new author.

  8. 'reader' books not much cheaper by Chmarr · · Score: 3, Insightful

    Just on a whim, I decided to see how much cheaper titles in microsoft reader format was over a physical book.

    I went to the MS Reader site and followed the links to the on-line publishers sites (such as B&N and amazon). In most cases, the reader format is only $1 cheaper, and sometimes $2 more expensive, than the corresponding paper book (soft or hardcover).

    So... why in the world would anyone want to use a format that ties them to the computer?? With a paperback, I can read it anywhere, read for as long as I want without having to change batteries, and even pass the book onto a friend.

    If they want to make the electonic formats more attractive, they need to make them a LOT cheaper than the corresponding paper version.

    1. Re:'reader' books not much cheaper by Joe+Tie. · · Score: 3, Interesting

      Someone else mentioned the fact that he's got a reader with him all the time anyway, which makes it pretty conveinent to have a book or three in there. I'm not going to bring a book around with me everywhere I go just on the offchance that I might get stuck in a long line, or waiting for someone. But when such an event happens, having good reading material right at hand is very nice. Also nice is being able to have a selection of books in there at any one time, just in case I finish one book while waiting somwhere.

      Battery life isn't much of an issue for me. I've got an older ipaq, and even with that I can usually squeeze about ten hours out of it with the addition of an extra battery pack that's small enough to tote around with the pda. Hooking it up isn't much of an issue. Take out of pocket, plug into pda. And if at home, the power situation wouldn't be an issue.

      --
      Everything will be taken away from you.
  9. XML please by DrXym · · Score: 3, Insightful

    Gutenberg is great and all, but it really needs to dump the text format. So much information is lost that it makes reading some texts extremely difficult. Some format that preserved chapter headings, footnotes, illustrations etc. would be a massive step forward.

    1. Re:XML please by starseeker · · Score: 4, Informative

      I think they discuss this somewhere. The whole point of ASCII is that it can be accessed simply, by almost any machine. It is as stable a format as you will find for data storage, anywhere. They are commited to these books being widely readable, and ASCII is the best way to assure this.

      However, I agree that some books (most actually) lose something in ASCII. What I would like to see is a project which works off the basic Gutenberg texts and formats them in a readable way, preserves illustrations, etc. But it should be an add on to the project, not the main project. Also, remember that that level of preservation is much harder than just typing in and proofreading - you have to consider formatting and scanning images as well.

      As a temporary measure, it would be nice to see someone do an XML markup that can be easily translated into LaTeX, so people can have pdfs with nice fonts, table of contents, title page, etc. That would be a step up. But to do it properly would take a separate effort, and a very large scale one even by Gutenberg standards. Worthwhile, yes. But involved.

      --
      "I object to doing things that computers can do." -- Olin Shivers, lispers.org
    2. Re:XML please by DarkOx · · Score: 3, Informative

      The entire point of the project is to preserver the content in a format that is both human and machine readable. See if I don't have any software from the present here in fifteen years and XML is long dead I will still be able to read standard ASCII text even if I am just cat(ing) it through less or printing it as is. I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful. I am not saying that it would be hard to write such software but, the concept is to make sure its easy and always easy to get the data. Also they do put chapter breaks in as text so if you want to find one most wordprocs and e-book readers these days even the fifteen year old ones can find text strings.

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    3. Re:XML please by Eloquence · · Score: 4, Insightful
      I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful.

      This is complete bullshit. With a proper setup you would convert the source into multiple output formats, including TXT, but you would keep the source in a format that maintains meta information such as formatting, chapters and pages. XML is used in the entire industry exactly with the expectation that it will be around for decades. Even if it won't, the open source code that we have to parse it will not magically disappear -- PG would keep using it to generate output texts from the XML source through all these years. You might as well argue that ASCII will go away.

    4. Re:XML please by Teancum · · Score: 4, Informative

      Michael Hart has repeatedly made mention that he does not want to get caught up into the fad of the moment with text formatting issues, and that plain old ASCII is one constant that hasn't needed changing. Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine. I dare you to try and find any other data format that was commonly used 32 years ago that you can still read with current equipment.

      With that said, I believe that XML is perhaps going to have the staying power that ASCII text has had for the past many years. And there are many volunteer projects that you can get involved with that do this including:

      The HTML Writers Guild - Originally they were trying to convert all of the gutenberg texts to HTML, which has been admittedly a resonable standard for a good number of years. Currently they are now going to a version of XML with some standard headings for titles, copyright info (or lack thereof), chapter headings and so forth. More is on this website.

      Project Gutenberg XMLThis is a group more dedicated to the XML, but has a very similar purpose.

      The point here is that once the data is put into ASCII text format, projects like this can and are being done. If you really feel that you want to help with the effort, please join one of these. Also, at any time you can also take the Project Gutenberg files yourself and do this, but at least this gives you a forum to share your work once you are done.

    5. Re:XML please by belbo · · Score: 3, Informative

      The final ASCII version is also produced by hand. After two rounds of proofing, the text gets into a queue. From that queue, a 'post-processor' checks it out and reformats it according to the Gutenberg guidelines, along with any error corrections that might still be necessary. Then she or he uploads the final version to Project Gutenberg, where the 'whitewashers' check the text yet again before posting it to the archive.

      About the XML: You are in fact welcome to produce an XML version, I believe some fellows at DP indeed do that already. However, the main version is the simple text version, since you can read that with everything. But nothing keeps you from also posting an XML or PDF or TeX or whatever version.

      belbo, post-processor at DP

      (Boy I do hope there are no spelling errors in this *g*)

      --

      --
      "Just believe everything I tell you, and it will all be very, very simple."

    6. Re:XML please by fm6 · · Score: 4, Insightful
      The whole point of ASCII is that it can be accessed simply, by almost any machine.
      Just because you store something in XML, doesn't mean people have to use XML to read it. The whole point of XML is to have a format that you can easily transform. Transforming in ASCII is particularly easy.
      XML markup that can be easily translated into LaTeX
      If it's a good content-oriented XML app, it's easily transformed into LaTeX, or anything else. If it isn't a good content-oriented XML app (the StarOffice native format comes to mind) then it shouldn't be used for an online document repository.

      I think the basic problem with the Guttenberg/DP people is that they've been doing things a certain way for so long, and they don't want to retool. And I can see their point -- changing over to XML is a lot of work. And the core DP team already seems pretty busy keeping the web site going.

      On the other hand, I do wish they'd make it a priority. Right now I'm a volunteer proofreader, concentrating on getting out the famous Britannica 11th edition. The amount of information that gets lost in scanning in Greek and other text with weird phonological conventions is just appalling. And the conventions for math and science formulas and equations produces a complex linear format I can't believe anyone would actually want to read.

      Then again, it wouldn't be that hard to go back and insert proper markup. For 90% of the text there's a simple transform between the Gutenberg conventions and a reasonable XML format. The other 10% probably need another look anyway, and wouldn't be hard to do if they've saved the scan images. I haven't had the heart to ask if they do.

    7. Re:XML please by DrXym · · Score: 3, Insightful
      Yeah but the entire point of XML is that it defines structure not presentation. If you want to go off and produce something which is readable in some other format (e.g. text), feed the document through some XSL transformation or perl script and it pops out the other end in any way you desire. Someone else can feed it through something that produces a PDF, someone else a Palm e-Book, someone else braille. And this can all be automated on the server. Everyone is happy.


      As for XML being long dead, this is highly unlikely. XML is just structured data and is itself just text. It would be trivial 5, 10, or even 100 years from now to pull out the data from the xml format in any way you please. Unless the grammar is horribly mangled (MS Office), it would even be possible to infer it without even knowing the grammar. I would trust Gutenberg to collectively come up with a format which would be simple for proof readers and parsers alike.

    8. Re:XML please by fm6 · · Score: 4, Insightful

      ... that plain old ASCII is one constant that hasn't needed changing.

      I think you're a little unclear as to what ASCII is. As the "A" in "ASCII" indicates, it's oriented towards American applications. And it consists of a mere 127 characters, which includes 32 control characters that you don't use in text.

      In point of fact, Project Gutenberg has long outgrown the 96 graphic characters in ASCII, though I think they themselves are ignorant of the fact. The seem to have experimented with characters until they found a set that displays the same on "normal" Windows, Macs and Unix/Linux. The result is something they call "extended ASCII" but that's actually subset of both ISO's Latin1 character set and Microsoft's Latin1 code page.

      When is this an issue? Well, I'm a DP volunteer, and I'm concentrating on the Britannica 11th edition. Lots of geographic entries, all of which contain degree symbols. This symbol is not in ASCII! If you follow the DP instructions, you end up entering byte 186 (decimal). If you're using the ISO or Microsoft Latin1 set (and if your computer is localized for the U.S., Canada, or Western Europe, you probably are) then 186 does in fact display as a degree symbol. But if your system is localized for Eastern Europe, you're probably using Latin2, and this byte stands for an S with a cedilla accent!

      In short, "ASCII" is actually less universal than well-formed HTML. In which you represent the degree symbol with a character entity (°) that's the same everywhere.

      Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine.

      Hardly a representative example. The Declaration of Independence was hand-written, and thus doesn't include a lot of fancy fonts or formatting. A better example is a contemporary novel, such as 1984.

      As it happens I just finished re-reading this one. I read a Plucker file that somebody had transformed from an HTML version, which in turn came from the Project Gutenberg "ASCII" version. Readable enough. But all the typographic nicities -- italics, boldface, etc. -- were reduced to ALL CAPS in the text version, and that was retained in the HTML version. Pretty distracting -- made me feel like somebody was shouting at me. Double Plus Ungood! Thoughtcrime!

      ...once the data is put into ASCII text format, projects like this [XML] can and are being done.

      You make it sound easy. A lot of information is lost when your primary version is "ASCII". It all has to be put back by hand. There's no avoiding this for the large body of existing Gutenberg texts. And of course as recently as 5 years ago, there wasn't a real choice anyway. Even HTML had issues, and serious XML tools didn't exist.

      But now XML technology is pretty mature. It makes sense to store new Gutenberg texts in XML. If people still want "ASCII" copies, the XML is easily transformed into that. Though I a lot more people will want the HTML version -- a format which is actually accessible to more people than "ASCII".

      There are two reasons this won't happen soon.

      The first is that somebody will have to design and implement the necessary XML apps for inputing and proofreading the texts. (Which would alsio elminate a lot of the errors proofreaders make, like entering [Greek: Tau] when they mean [Greek: T].) A huge project. As it stands, the people who maintain the DP web site have their work cut out just to keep the existing software working. That's a vali

  10. Oh, who reads books anymore anyway? by Faust7 · · Score: 4, Funny

    I absorb all information directly through a USB link from my laptop to my head. Pretty nice, except for the typographical migraines. I always have ibuprofen in hand when visiting Slashdot.

  11. it's all lost and stoof by shadowbearer · · Score: 3, Funny

    I like what happens when you run across a title which isn't on the site.

    Example: "It's not there, eh? -- Canadian"

    Heh.

    SB

    --
    It's old. The more humans I meet, the more I like my cats. At least they are honest.
  12. Too bad... by Insurgent2 · · Score: 5, Interesting

    Unfortunately, with the copyright periods being extended so long, the material will only be of (ancient) historical interest. The 98 percent of copyrighted works that are unpublished and should be on there, unfortunately, gets to sit collecting dust instead of benefitting mankind.

  13. Re:Really great work by the guys behind the projec by Cthefuture · · Score: 3, Insightful

    Yes, they need something like that badly.

    I remember poking around on PG not long ago but soon forgot about it.

    If you're not looking for something specific then the site is kinda, meh. As you suggested, they need a news site, ratings, and other stats so you can see what's available.

    And sections. "Technical", "Poetry", etc. Otherwise it's not very useful to the casual browser.

    --
    The ratio of people to cake is too big
  14. Greenstone by gmaestro · · Score: 4, Interesting

    Great to see a project like this run on Free software. Read more at Greenstone's website.

  15. We should all actually read this by tie_guy_matt · · Score: 4, Insightful

    Putting a flag on your front porch is a great way to celebrate the 4th of July. An even better way to celebrate the United States' birthday would be to go to this site and actually read the documents that define us as a country.

    In this day in age when it seems everyone is a suspected terrorist and our liberties are stripped one by one in the name of homeland security, and in the name of the rights of large companies, I wish some of our elected officials would actually read these documents sometime.

    A red white and blue flag isn't what makes this country great, nor does an extremely high gross domestic product -- it is the set of ideas that where written over 200 years ago that makes the USA great.

    So everyone go to this site and read those documents. Even if you aren't American you should still read those documents because everyone has the right to the freedoms that our founding fathers wrote about.

  16. Thanks for support, plans for future by gbnewby · · Score: 5, Informative
    Thanks to everyone who has helped contribute eBooks and other support to Project Gutenberg! If you haven't already, please visit Distributed Proofreaders and proof a page today!

    Lots of plans for the future:

    • Post-#10000 formatting changes. We'll be rearranging our directories to make it easier to find things. Likely we'll go with something OAI (OpenArchives.org) compliant
    • Conversion on the fly to many formats. We'll putting eBooks into XML format (mostly using teixlite.dtd, we think) for conversion on the fly to many other formats.
    • New ways to donate. "Sponsor a book"
    • More contemporary content. We receive donations nearly every week from currently published authors who want to make their stuff available to a wider audience (i.e., our Doctorow's Down and Out)
    • Your ideas! Visit gutenberg.net to sign up for newsletters, find out how to get started producing an eBook, and find eBooks


    Thanks especially to our main and backup distribution sites, iBiblio and The Internet Archive. And thanks to the THOUSANDS of volunteers who have brought us nearly to our 10,000th eBook.



    Dr. Gregory B. Newby

    Chief Executive and Director

    Project Gutenberg Literary Archive Foundation
    http://gutenberg.net

    A 501(c)(3) not-for-profit organization with EIN 64-6221541

    gbnewby@pglaf.org

  17. First sale doctrine by yerricde · · Score: 3, Informative

    The law specifically says you can not distribute a work that is copyrighted without the copyright holders permission.

    True, 17 USC 106 says that, but it limits itself "Subject to sections 107 through 121", such as 17 USC 109:

    Notwithstanding the provisions of section 106(3), the owner of a particular copy or phonorecord lawfully made under this title, or any person authorized by such owner, is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord.

    fair use laws, but the DMCA removed most of those

    From the DMCA: "Nothing in this section shall affect rights, remedies, limitations, or defenses to copyright infringement, including fair use, under this title."

    --
    Will I retire or break 10K?
  18. Cheaper, but useful? by yerricde · · Score: 3, Insightful

    A speedy internet connection and tons of computers wouldn't be needed to print out documents from Gutenberg.

    It still costs money to turn downloaded digital copies of works into printed copies for 100 students in a grade level.

    they would realize that it would be cheaper in the longrun to get texts off Gutenberg, instead of buying pre-bound books elsewhere.

    Public domain etexts, such as those offered by Project Gutenberg, would be useful in schools only under limited circumstances. Though they would be useful in literature classes in high school (and possibly middle school), forget about them in elementary school, where most books are illustrated, because most PG editions leave out illustrations. Forget about them in science classes as well; the 1911 Encyclopaedia Britannica contains outdated views of anything scientific, and anything significantly newer is tied up forever in the Bono Act and its obligatory sequels. And what keeps a publisher from tying purchases of its science books to purchases of its literature books?

    --
    Will I retire or break 10K?
  19. Ah, that explains the "Midi-Sum and Nite Dream" by Pac · · Score: 3, Funny

    It probably accounts for Achilles shouting "First Greek" when he lands at Troy's beach and all those descriptions of statues of a girl suspiciously similar to a modern actress in Roman poems. And I was really sure Tom Sawyer's pet was not a penguin.

  20. How to sperad the word... by evilviper · · Score: 4, Insightful

    Here's what I did...

    A while back, I used wget to mirror the entire Project Gutenberg works. (I did it off-hours, and contacted them to see if it was a problem, or if there was some other more effecient way to do things)

    Anyhow, with my GBs of text, I used bzip2 -9 to compress each text file. In the end, the entire collection of PG was able to fit on one CD. Since most people don't have bzip2 support I also included the free archiver, Ultimate Zip on the CD as well. I also put a read-me on the CD (that would appear as the first file) with basic instructions what to do.

    One of the great things about CDs is how easy they are to transfer... One stamp, and a 5cent CD envelope, and you can send 2 CDs anywhere in the country (this predated Netflix AFAIK).

    Anyhow, I sent these CDs to two different people, and the next time I talked with them, I found out they'd made several coppies of it. Basically, they heard someone mention some subject that related to one of the files on the CD, brought up the CD, and offered to make a copy for them. This happened a few times that I know of, and quite possibly many times that I don't know of. Quite as easy way to spread the word.

    Of course, with that said, I don't read the PG texts myself... There are two reasons. The first is that I have yet to come across decent software designed for long-term reading. Something that saves your place (automatically?), something with a legible font, and something with light colored text on a dark background, which brings me to my next point...

    The second reason is that monitors are all backlit... That means, reading on a computer screen is like reading text on a floursent lightbulb. It's possible for a while, but your eyes are quickly fatigued. The only screen I have that doesn't do that is my 640x240 B&W LCD screen on my Psion handheld. As good as that is, it's just too small for efective reading. Someone needs to create a non-backlit LCD screen, approx 6" (about the size of a book page) that is small, light, silent, compatible with everything, and most importantly, it needs to have good software that makes reading less work than it normally is on a computer... Until then, relectronic reading isn't going to really be feasable. Screw electronic paper, just give me a screen that doesn't hurt my eyes, and I'm set to go.

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant