Project Gutenberg Publishes 10,000th Free eBook

← Back to Stories (view on slashdot.org)

Project Gutenberg Publishes 10,000th Free eBook

Posted by CmdrTaco on Thursday October 16, 2003 @08:45AM from the worthwhile-endeavours dept.

AndrewRUK writes "Earlier today, Project Gutenberg's founder, Micheal Hart, announced that the project has passed the milestone of 10,000 free eBooks available, with the publication of the Magna Carta.Project Gutenberg was founded in 1971, with the aim of "[making] information, books and other materials available to the general public in forms a vast majority of the computers, programs and people can easily read, use, quote, and search." In the 32 years since the project started, over 10,000 books, ranging from the Bible to school textbooks, and from the complete works of Shakespeare to the USA's declaration of independence, have been made freely available to the public by Project Gutenberg."

4 of 281 comments (clear)

Min score:

Reason:

Sort:

Can't be said enough... by daeley · 2003-10-16 08:48 · Score: 1, Redundant

Come join the proofreaders that make Project Gutenberg possible!

--
I watched C-beams glitter in the dark near the Tannhauser gate.
Congratulations! by apsmith · 2003-10-16 08:49 · Score: 1, Redundant

Now why was my story on this rejected earlier today? Oh well...

Go to Distributed Proofreaders if you'd like to help out!

--
Energy: time to change the picture.
Come Help Out by dave2112 · 2003-10-16 08:49 · Score: 0, Redundant

You can help us with the next 10,000. Join the Distributed Proofreaders Project @

http://www.pgdp.net/c/default.php
plain ASCII makes no sense by CoughDropAddict · 2003-10-16 12:46 · Score: 0, Redundant

As with the last story about project Gutenberg, I have the same comment. I love the philosophy of project Gutenberg, but the fact that they continue to use plain text as the canonical formatting makes the collection seriously less useful. Using XML would give only advantages.

What advantages? Advantages like indicating what words are actually part of a title, so that a reader could display titles in large print and provide a table of contents. Advantages like having real bold, italic, and underline. Advantages like being able to handle characters not in ASCII. Advantages like allowing a reader to break lines however makes most sense for that situation (for example, handhelds are going to have shorter lines than a large monitor). The list goes on.

Their argument for continuing to favor ASCII is to support the widest possible usability, now and in the future, since markup languages can come and go. This doesn't stand up to scrutiny though, for the simple reason that XML contains strictly more information than plain text. XML can be flawlessly converted to plain text by a program, but the opposite is not true: plain text cannot be converted to XML. Was that line break the end of a stanza, or simply a line of a paragraph? Is that single line in all-caps a title or is it a paragraph of shouting? This information simply cannot be extracted from plain text. Not to mention the problem of characters that aren't in ASCII.

Suppose that XML is just a fad, that it's a horrible joke being perpetuated by hordes of clueless professionals who love buzzwards. Suppose no one uses XML in 10 years. Even if this is true XML is still a better choice than plain text because XML has enough information to automatically convert the books into whatever superior format emerges in the future. Plain text does not.