Project Gutenberg's 32nd Birthday

Must...avoid...Steve...Gutenberg...joke... by mikeophile · 2003-07-04 06:05 · Score: 4, Funny

Seriously, awesome work people.

You can't be serious by ryants · 2003-07-04 06:08 · Score: 5, Funny

even help out by visiting the Distributed Proofers and editing one page per day.

You can't seriously be asking Slashdotters to volunteer as proofreaders.

--

Ryan T. Sammartino
"Ancora imparo"

Re:You can't be serious by BabyDave · 2003-07-04 06:12 · Score: 4, Funny

Could be worse - they could be asking the Slashdot editors!
Re:You can't be serious by thinkninja · 2003-07-04 06:16 · Score: 4, Funny

It was the best of times, it was the worst of times...
-1 Redundant

--
"The number of Unix installations has grown to ten, with more expected." (Unix Programmer's Manual, 2nd ed.; june 1972)
Re:You can't be serious by Aldarondo · 2003-07-04 06:35 · Score: 5, Interesting

As one that has been involved with Distributed Proofreaders for the past 18 months, yes we are serious about having Slashdot people proofread. The last time a story about D.P. ran in November, thousands of new users joined us and helped us grow and expand to our current size.
Go and check it out, there is great work being done there. (I am a bit biased though). Click here for a history of DP.
Re:You can't be serious by claudius0425 · 2003-07-04 06:38 · Score: 2, Funny

Hey yo, some us /. homies aint got no grammer problems.

--
Phus. Sysiphus.
Re:You can't be serious by MadCow42 · 2003-07-04 07:03 · Score: 2, Funny

Well, it wouldn't be THAT bad, we'd just have 5 different versions of each book, each released about a day apart.

MadCow.

--
I used to have a sig, but I set it free and it never came back.
Re:You can't be serious by JDWTopGuy · 2003-07-04 13:25 · Score: 2, Funny

How about a project to translate the books to 1337 speak?

20,000 1346u35 und3r 7h3 534
h4x0r3d by Ju135 V3rn

7h3y w4s 0wn3d by 7h3 c4p74!n

Or even worse, the Bible in 1337: (7h3 n3w h4x0r v3r510n?)

7h0u 5h417 n0t k!11, d00d.

--
Ron Paul 2012
Re:You can't be serious by croddy · 2003-07-05 02:34 · Score: 2, Interesting

well, that was fun. I think it would be more addictive if I got to do pages in order though...

Now for the marketing... by Blaine+Hilton · 2003-07-04 06:08 · Score: 4, Insightful

Now all we need is more people promoting this in schools and printing the books. Much like the IA Bookmobile. It seems like the people who could use this the most, don't even know it exists.

Re:Now for the marketing... by AndroidCat · 2003-07-04 07:18 · Score: 2, Insightful

and speedy internet connection
The first Gutenberg books I came across were being passed around BBSs at 2400 bps or so. When they started 32 years ago, 110, maybe 300 bps. Who cares? Check the size of the files, these aren't Word documents, you know.

--
One line blog. I hear that they're called Twitters now.

very timely for me by b17bmbr · 2003-07-04 06:16 · Score: 5, Interesting

i am going to be teaching modern civ next year in high school (i have been at the junior high for 7 years) , and have already gone to the site and gotten works from aristotle, plato, locke, montesque, et al. thanks guys. there is still something to be said for a classical education. glad somebody is doing all they can to preserve the classics, especially with all the assaults on it from the social reconstructionists.

--
My problem? I was perfectly gruntled, until some numbnuts came by and dissed me.

founding fathers by Tablizer · 2003-07-04 06:16 · Score: 4, Funny

...first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web

I knew it! This country was founded by COBOLers.

--
Table-ized A.I.

Re:founding fathers by Anonymous Coward · 2003-07-04 06:21 · Score: 3, Funny

ADD 1 TO POST-POINTS.
MOVE "Funny" TO POST-STATUS.

(That's Cobol, for those who don't know)

Really great work by the guys behind the project! by jaemark · 2003-07-04 06:17 · Score: 5, Interesting

There's really a problem though about getting the word out to people, in pretty much the same way the popularity of libraries today has been dropping. A good idea would be a separate advocacy site to come up with lists of texts in the project (i.e. What's New?, Most Popular, etc.) to help people wade in immediately.

More free books by Cruciform · 2003-07-04 06:18 · Score: 5, Informative

The Baen Free library has a number of titles available in several formats.

It's a great way to introduce readers to a series or a talented new author.

'reader' books not much cheaper by Chmarr · 2003-07-04 06:19 · Score: 3, Insightful

Just on a whim, I decided to see how much cheaper titles in microsoft reader format was over a physical book.

I went to the MS Reader site and followed the links to the on-line publishers sites (such as B&N and amazon). In most cases, the reader format is only $1 cheaper, and sometimes $2 more expensive, than the corresponding paper book (soft or hardcover).

So... why in the world would anyone want to use a format that ties them to the computer?? With a paperback, I can read it anywhere, read for as long as I want without having to change batteries, and even pass the book onto a friend.

If they want to make the electonic formats more attractive, they need to make them a LOT cheaper than the corresponding paper version.

Re:'reader' books not much cheaper by Jonathan · 2003-07-04 06:24 · Score: 2, Interesting

So... why in the world would anyone want to use a format that ties them to the computer?? With a paperback, I can read it anywhere, read for as long as I want without having to change batteries, and even pass the book onto a friend

Well, I don't use MS-Reader myself (For commercial e-books I like the cross-platform Mobipocket), but a major reason I like e-books is I like to read them on my PDA -- not to save money. I carry my PDA around anyway, and having e-books means less to carry. I would purchase all my books as e-books if they were available as such.
Re:'reader' books not much cheaper by Joe+Tie. · 2003-07-04 06:46 · Score: 3, Interesting

Someone else mentioned the fact that he's got a reader with him all the time anyway, which makes it pretty conveinent to have a book or three in there. I'm not going to bring a book around with me everywhere I go just on the offchance that I might get stuck in a long line, or waiting for someone. But when such an event happens, having good reading material right at hand is very nice. Also nice is being able to have a selection of books in there at any one time, just in case I finish one book while waiting somwhere.

Battery life isn't much of an issue for me. I've got an older ipaq, and even with that I can usually squeeze about ten hours out of it with the addition of an extra battery pack that's small enough to tote around with the pda. Hooking it up isn't much of an issue. Take out of pocket, plug into pda. And if at home, the power situation wouldn't be an issue.

--
Everything will be taken away from you.

Huh??? by lilricky · 2003-07-04 06:20 · Score: 2, Insightful

"...to anyone and everyone then on what later became the web..." What?? In 1971 http protocol was around? Or is the author trying to suggest that the internet became the web? I thought the web was part of the internet, not a replacement for. Perhaps Im misreading the article.

Re:Huh??? by dissy · 2003-07-04 07:22 · Score: 2, Insightful

> "...to anyone and everyone then on what later became the web..." What??

I think they are saying in 1971 it was distributed to anyone and everyone...
Then, on what later became the web, they distributed it there too.

Keeping in mind the web ripped most of its ideas from gopher, and FTP before that, so the web wasnt a breakthrough idea out of nothingness.
But i dont think they meant it as 'distributed on one medium which later that medium turned into the web'

Thats atleast how i believe it was suppost to be read.. Hard to tell without commas and what not ;}

XML please by DrXym · 2003-07-04 06:23 · Score: 3, Insightful

Gutenberg is great and all, but it really needs to dump the text format. So much information is lost that it makes reading some texts extremely difficult. Some format that preserved chapter headings, footnotes, illustrations etc. would be a massive step forward.

Re:XML please by starseeker · 2003-07-04 06:37 · Score: 4, Informative

I think they discuss this somewhere. The whole point of ASCII is that it can be accessed simply, by almost any machine. It is as stable a format as you will find for data storage, anywhere. They are commited to these books being widely readable, and ASCII is the best way to assure this.

However, I agree that some books (most actually) lose something in ASCII. What I would like to see is a project which works off the basic Gutenberg texts and formats them in a readable way, preserves illustrations, etc. But it should be an add on to the project, not the main project. Also, remember that that level of preservation is much harder than just typing in and proofreading - you have to consider formatting and scanning images as well.

As a temporary measure, it would be nice to see someone do an XML markup that can be easily translated into LaTeX, so people can have pdfs with nice fonts, table of contents, title page, etc. That would be a step up. But to do it properly would take a separate effort, and a very large scale one even by Gutenberg standards. Worthwhile, yes. But involved.

--
"I object to doing things that computers can do." -- Olin Shivers, lispers.org
Re:XML please by DarkOx · 2003-07-04 06:45 · Score: 3, Informative

The entire point of the project is to preserver the content in a format that is both human and machine readable. See if I don't have any software from the present here in fifteen years and XML is long dead I will still be able to read standard ASCII text even if I am just cat(ing) it through less or printing it as is. I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful. I am not saying that it would be hard to write such software but, the concept is to make sure its easy and always easy to get the data. Also they do put chapter breaks in as text so if you want to find one most wordprocs and e-book readers these days even the fifteen year old ones can find text strings.

--
Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
Re:XML please by Eloquence · 2003-07-04 06:50 · Score: 4, Insightful

I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful.
This is complete bullshit. With a proper setup you would convert the source into multiple output formats, including TXT, but you would keep the source in a format that maintains meta information such as formatting, chapters and pages. XML is used in the entire industry exactly with the expectation that it will be around for decades. Even if it won't, the open source code that we have to parse it will not magically disappear -- PG would keep using it to generate output texts from the XML source through all these years. You might as well argue that ASCII will go away.
Re:XML please by GigsVT · 2003-07-04 06:55 · Score: 2, Insightful

With a proper setup you could read MS Word 2000 docs 100 years from now too. The whole point is to not make it reliant on any particular software, or any particular fad.

XML hasn't been around long enough to say whether it is a fad or not. ASCII has been around longer than most of us have existed.

--
I've had enough abrasive sigs. Kittens are cute and fuzzy.
Re:XML please by Teancum · 2003-07-04 06:56 · Score: 4, Informative

Michael Hart has repeatedly made mention that he does not want to get caught up into the fad of the moment with text formatting issues, and that plain old ASCII is one constant that hasn't needed changing. Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine. I dare you to try and find any other data format that was commonly used 32 years ago that you can still read with current equipment.

With that said, I believe that XML is perhaps going to have the staying power that ASCII text has had for the past many years. And there are many volunteer projects that you can get involved with that do this including:

The HTML Writers Guild - Originally they were trying to convert all of the gutenberg texts to HTML, which has been admittedly a resonable standard for a good number of years. Currently they are now going to a version of XML with some standard headings for titles, copyright info (or lack thereof), chapter headings and so forth. More is on this website.

Project Gutenberg XMLThis is a group more dedicated to the XML, but has a very similar purpose.

The point here is that once the data is put into ASCII text format, projects like this can and are being done. If you really feel that you want to help with the effort, please join one of these. Also, at any time you can also take the Project Gutenberg files yourself and do this, but at least this gives you a forum to share your work once you are done.
Re:XML please by belbo · 2003-07-04 07:01 · Score: 3, Informative

The final ASCII version is also produced by hand. After two rounds of proofing, the text gets into a queue. From that queue, a 'post-processor' checks it out and reformats it according to the Gutenberg guidelines, along with any error corrections that might still be necessary. Then she or he uploads the final version to Project Gutenberg, where the 'whitewashers' check the text yet again before posting it to the archive.

About the XML: You are in fact welcome to produce an XML version, I believe some fellows at DP indeed do that already. However, the main version is the simple text version, since you can read that with everything. But nothing keeps you from also posting an XML or PDF or TeX or whatever version.

belbo, post-processor at DP

(Boy I do hope there are no spelling errors in this *g*)

--
--
"Just believe everything I tell you, and it will all be very, very simple."
Re:XML please by fm6 · 2003-07-04 07:02 · Score: 4, Insightful

The whole point of ASCII is that it can be accessed simply, by almost any machine.
Just because you store something in XML, doesn't mean people have to use XML to read it. The whole point of XML is to have a format that you can easily transform. Transforming in ASCII is particularly easy.
XML markup that can be easily translated into LaTeX
If it's a good content-oriented XML app, it's easily transformed into LaTeX, or anything else. If it isn't a good content-oriented XML app (the StarOffice native format comes to mind) then it shouldn't be used for an online document repository.
I think the basic problem with the Guttenberg/DP people is that they've been doing things a certain way for so long, and they don't want to retool. And I can see their point -- changing over to XML is a lot of work. And the core DP team already seems pretty busy keeping the web site going.
On the other hand, I do wish they'd make it a priority. Right now I'm a volunteer proofreader, concentrating on getting out the famous Britannica 11th edition. The amount of information that gets lost in scanning in Greek and other text with weird phonological conventions is just appalling. And the conventions for math and science formulas and equations produces a complex linear format I can't believe anyone would actually want to read.
Then again, it wouldn't be that hard to go back and insert proper markup. For 90% of the text there's a simple transform between the Gutenberg conventions and a reasonable XML format. The other 10% probably need another look anyway, and wouldn't be hard to do if they've saved the scan images. I haven't had the heart to ask if they do.
Re:XML please by Vann_v2 · 2003-07-04 07:07 · Score: 2, Insightful

With some works the layout itself is an important part in comprehending them. Do blindly remove the formatting so that everyone can read it is an injustice to the original author.
Re:XML please by DrXym · 2003-07-04 07:10 · Score: 3, Insightful

Yeah but the entire point of XML is that it defines structure not presentation. If you want to go off and produce something which is readable in some other format (e.g. text), feed the document through some XSL transformation or perl script and it pops out the other end in any way you desire. Someone else can feed it through something that produces a PDF, someone else a Palm e-Book, someone else braille. And this can all be automated on the server. Everyone is happy.

As for XML being long dead, this is highly unlikely. XML is just structured data and is itself just text. It would be trivial 5, 10, or even 100 years from now to pull out the data from the xml format in any way you please. Unless the grammar is horribly mangled (MS Office), it would even be possible to infer it without even knowing the grammar. I would trust Gutenberg to collectively come up with a format which would be simple for proof readers and parsers alike.
Re:XML please by fm6 · 2003-07-04 08:59 · Score: 4, Insightful

... that plain old ASCII is one constant that hasn't needed changing.

I think you're a little unclear as to what ASCII is. As the "A" in "ASCII" indicates, it's oriented towards American applications. And it consists of a mere 127 characters, which includes 32 control characters that you don't use in text.
In point of fact, Project Gutenberg has long outgrown the 96 graphic characters in ASCII, though I think they themselves are ignorant of the fact. The seem to have experimented with characters until they found a set that displays the same on "normal" Windows, Macs and Unix/Linux. The result is something they call "extended ASCII" but that's actually subset of both ISO's Latin1 character set and Microsoft's Latin1 code page.
When is this an issue? Well, I'm a DP volunteer, and I'm concentrating on the Britannica 11th edition. Lots of geographic entries, all of which contain degree symbols. This symbol is not in ASCII! If you follow the DP instructions, you end up entering byte 186 (decimal). If you're using the ISO or Microsoft Latin1 set (and if your computer is localized for the U.S., Canada, or Western Europe, you probably are) then 186 does in fact display as a degree symbol. But if your system is localized for Eastern Europe, you're probably using Latin2, and this byte stands for an S with a cedilla accent!
In short, "ASCII" is actually less universal than well-formed HTML. In which you represent the degree symbol with a character entity (°) that's the same everywhere.

Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine.

Hardly a representative example. The Declaration of Independence was hand-written, and thus doesn't include a lot of fancy fonts or formatting. A better example is a contemporary novel, such as 1984.
As it happens I just finished re-reading this one. I read a Plucker file that somebody had transformed from an HTML version, which in turn came from the Project Gutenberg "ASCII" version. Readable enough. But all the typographic nicities -- italics, boldface, etc. -- were reduced to ALL CAPS in the text version, and that was retained in the HTML version. Pretty distracting -- made me feel like somebody was shouting at me. Double Plus Ungood! Thoughtcrime!

...once the data is put into ASCII text format, projects like this [XML] can and are being done.

You make it sound easy. A lot of information is lost when your primary version is "ASCII". It all has to be put back by hand. There's no avoiding this for the large body of existing Gutenberg texts. And of course as recently as 5 years ago, there wasn't a real choice anyway. Even HTML had issues, and serious XML tools didn't exist.
But now XML technology is pretty mature. It makes sense to store new Gutenberg texts in XML. If people still want "ASCII" copies, the XML is easily transformed into that. Though I a lot more people will want the HTML version -- a format which is actually accessible to more people than "ASCII".
There are two reasons this won't happen soon.
The first is that somebody will have to design and implement the necessary XML apps for inputing and proofreading the texts. (Which would alsio elminate a lot of the errors proofreaders make, like entering [Greek: Tau] when they mean [Greek: T].) A huge project. As it stands, the people who maintain the DP web site have their work cut out just to keep the existing software working. That's a vali
Re:XML please by jeremyp · 2003-07-04 09:29 · Score: 2, Insightful

Using ASCII presupposes that all the important texts you want to preserve are in American English. Since a fair amount of the important pieces of literature come from mainland Europe (actually even the British £ sign isn't in ASCII), it is clearly not up to the job and should be replaced.
Further, authors often use devices like italics or bold to add emphasis to their work and nowadays even completely different fonts and typefaces. Translating these works to ASCII with no markup actually destroys some of the information in the original works.
I'm not an enthusiastic fan of XML - too many people advocate it as a silver bullet - but this this sort of thing seems to be an ideal application.

--
All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
Re:XML please by Mr.+Piddle · 2003-07-04 09:31 · Score: 2, Insightful

You might as well argue that ASCII will go away.

ASCII is simply 127 or 255 characters or so. Writing software to translate it is trivial, and it can even be decoded by hand, if necessary.

XML adds a lot of complexity beyond this, which hampers a person's ability to read a file with practically no software tools.

Also, XML is not as ubiquitous as you think, and huge numbers of people don't know how to use the tools to work with it.

--
Vote in November. You won't regret it.
Re:XML please by gotem · 2003-07-04 11:05 · Score: 2, Funny

I dare you to try and find any other data format that was commonly used 32 years ago that you can still read with current equipment.

punchcards.. what? you mean you don't have your punchcard read connected?
Re:XML please by dvdeug · 2003-07-04 12:28 · Score: 2, Informative

And the conventions for math and science formulas and equations produces a complex linear format I can't believe anyone would actually want to read.

It's basically TeX, the one true math typesetting system. Most mathematicans and many scientists know it quite well. It beats the heck out of MathML (one example in a MathML tutorial was 8 characters in TeX, and about 50 in MathML.)

Oh, who reads books anymore anyway? by Faust7 · 2003-07-04 06:23 · Score: 4, Funny

I absorb all information directly through a USB link from my laptop to my head. Pretty nice, except for the typographical migraines. I always have ibuprofen in hand when visiting Slashdot.

--
The coolest voice ever.

it's all lost and stoof by shadowbearer · 2003-07-04 06:24 · Score: 3, Funny

I like what happens when you run across a title which isn't on the site.

Example: "It's not there, eh? -- Canadian"

Heh.

SB

--
It's old. The more humans I meet, the more I like my cats. At least they are honest.

Too bad... by Insurgent2 · 2003-07-04 06:26 · Score: 5, Interesting

Unfortunately, with the copyright periods being extended so long, the material will only be of (ancient) historical interest. The 98 percent of copyrighted works that are unpublished and should be on there, unfortunately, gets to sit collecting dust instead of benefitting mankind.

MS Reader is crapola by blair1q · 2003-07-04 06:41 · Score: 2, Interesting

"cannot open this title on a Terminal Services session"

What bollocks. Free software and free books but you can't read them over a network link to your own compute server? Microsoft, as usual, screws the pooch.

Now. How do I uninstall this without removing my adenoids?

Re:Really great work by the guys behind the projec by Cthefuture · 2003-07-04 06:45 · Score: 3, Insightful

Yes, they need something like that badly.

I remember poking around on PG not long ago but soon forgot about it.

If you're not looking for something specific then the site is kinda, meh. As you suggested, they need a news site, ratings, and other stats so you can see what's available.

And sections. "Technical", "Poetry", etc. Otherwise it's not very useful to the casual browser.

--
The ratio of people to cake is too big

Greenstone by gmaestro · 2003-07-04 06:45 · Score: 4, Interesting

Great to see a project like this run on Free software. Read more at Greenstone's website.

We should all actually read this by tie_guy_matt · 2003-07-04 07:33 · Score: 4, Insightful

Putting a flag on your front porch is a great way to celebrate the 4th of July. An even better way to celebrate the United States' birthday would be to go to this site and actually read the documents that define us as a country.

In this day in age when it seems everyone is a suspected terrorist and our liberties are stripped one by one in the name of homeland security, and in the name of the rights of large companies, I wish some of our elected officials would actually read these documents sometime.

A red white and blue flag isn't what makes this country great, nor does an extremely high gross domestic product -- it is the set of ideas that where written over 200 years ago that makes the USA great.

So everyone go to this site and read those documents. Even if you aren't American you should still read those documents because everyone has the right to the freedoms that our founding fathers wrote about.

Speaking of XML markup by Moderation+abuser · 2003-07-04 07:40 · Score: 2, Interesting

http://www.conglomerate.org/

Lovely bit of kit.

--
Government of the people, by corporate executives, for corporate profits.

Thanks for support, plans for future by gbnewby · 2003-07-04 07:58 · Score: 5, Informative

Thanks to everyone who has helped contribute eBooks and other support to Project Gutenberg! If you haven't already, please visit Distributed Proofreaders and proof a page today!

Lots of plans for the future:

Post-#10000 formatting changes. We'll be rearranging our directories to make it easier to find things. Likely we'll go with something OAI (OpenArchives.org) compliant
Conversion on the fly to many formats. We'll putting eBooks into XML format (mostly using teixlite.dtd, we think) for conversion on the fly to many other formats.
New ways to donate. "Sponsor a book"
More contemporary content. We receive donations nearly every week from currently published authors who want to make their stuff available to a wider audience (i.e., our Doctorow's Down and Out)
Your ideas! Visit gutenberg.net to sign up for newsletters, find out how to get started producing an eBook, and find eBooks

Thanks especially to our main and backup distribution sites, iBiblio and The Internet Archive. And thanks to the THOUSANDS of volunteers who have brought us nearly to our 10,000th eBook.

Dr. Gregory B. Newby Chief Executive and Director Project Gutenberg Literary Archive Foundation http://gutenberg.net A 501(c)(3) not-for-profit organization with EIN 64-6221541 gbnewby@pglaf.org

First sale doctrine by yerricde · 2003-07-04 08:12 · Score: 3, Informative

The law specifically says you can not distribute a work that is copyrighted without the copyright holders permission.

True, 17 USC 106 says that, but it limits itself "Subject to sections 107 through 121", such as 17 USC 109:

Notwithstanding the provisions of section 106(3), the owner of a particular copy or phonorecord lawfully made under this title, or any person authorized by such owner, is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord.

fair use laws, but the DMCA removed most of those

From the DMCA: "Nothing in this section shall affect rights, remedies, limitations, or defenses to copyright infringement, including fair use, under this title."

--
Will I retire or break 10K?

Re:Really great work by the guys behind the projec by Anonymous Coward · 2003-07-04 08:17 · Score: 2, Interesting

Want to know what's new, etc? The Project Gutenberg website admittedly sucks, and their ASCII adherence admittedly verges on dogma, but there is a good substitute:

The Online Books Page
http://digital.library.upenn.edu/books/

It currently has 20,000 FREE titles listed, from hundreds (at least!) of sources, in all subjects, beautifully categorizes by title, author and subject--and topped off by an up-to-date what's new listing and a fine search engine. Much props to John Mark Ockerbloom and the University of Pennsylvania for supporting the site.

P.S. Won't one of you nice Slashdotters with time or interest in good works consider doing a complete redesign of the PG site, a full-text on-site search engine for the texts, a better categorization system and just a decent, half-respectable look? It don't get no respect lookin' as it does now. Among other things, the lack of internal organization means that individual texts get shafted in Google rankings.

Cheaper, but useful? by yerricde · 2003-07-04 08:21 · Score: 3, Insightful

A speedy internet connection and tons of computers wouldn't be needed to print out documents from Gutenberg.

It still costs money to turn downloaded digital copies of works into printed copies for 100 students in a grade level.

they would realize that it would be cheaper in the longrun to get texts off Gutenberg, instead of buying pre-bound books elsewhere.

Public domain etexts, such as those offered by Project Gutenberg, would be useful in schools only under limited circumstances. Though they would be useful in literature classes in high school (and possibly middle school), forget about them in elementary school, where most books are illustrated, because most PG editions leave out illustrations. Forget about them in science classes as well; the 1911 Encyclopaedia Britannica contains outdated views of anything scientific, and anything significantly newer is tied up forever in the Bono Act and its obligatory sequels. And what keeps a publisher from tying purchases of its science books to purchases of its literature books?

--
Will I retire or break 10K?

XML conversions look lacking. by CryptOntology · 2003-07-04 09:04 · Score: 2, Informative

I just looked over the links in earlier replies (PGXML and HTML-Writers) and was surprised: HTML-Writers hasn't touched only converted 20-odd etexts from Jan to Feb 2000; and PGXML hasn't even the ability to do valid HTML curled quotes.

Both look like amateur do-gooders, and we need more of those; but these efforts should be folded back into the organisation of PG, where they may find a permanent home. The alternative is to go adrift, due to too few people being involved (only _two_ people do PGXML) to round out the abilities (and future efforts of) XML uber-format-goodness.

One major reason why I'd be interested in a longer toolchain, from scans into TXT, and TXT into XML, is to make translation easier. All the older Gutenberg etexts are in different, revised formats. Try making a parser than automagically transforms the dozen or so revisions of the one true "TXT" into XML to see what I mean. (I have; there will always be some books that break important Gutenbrth formatting placeholders).

Size by Beryllium+Sphere(tm) · 2003-07-04 09:20 · Score: 2, Insightful

My entire CD collection fits in my pocket with my iPod. If I could fit my entire book collection in my pocket, that would be a dream and a delight.

This is just wrong by Anonymous Coward · 2003-07-04 09:22 · Score: 2, Interesting

XML is not a character encoding. XML does not require the use of non-ASCII characters. What can be represented by an XML document is a superset of what can be represented by a plain ASCII document. XML is a human-readable markup.

MS Word 2000 .doc is a binary format.

I suspect that you have very little idea what you are talking about.

PG already uses XML-like markup to indicate an emphasized portion of a passage, among other things. If we were to accept your argument, then even this alone should be seen as a failure.

Afterall, what if over the course of 50 years we forget what "blahblahblah" means? What if in some impoverished country, while the people have the processing power to read these documents, they do not have the processing power to parse out ?!

Both of these worries are foolish. If you use an XML format for open content, you have an obligation to provide openly the strict and formal DTD or schema which describes your XML markup.

What if this DTD or schema becomes lost? This won't happen, because you can embed the DTD or schema in the distributed documents (the books) themselves.

What if we forget how to parse XML?

Yes, if there were a terrible war which left the entire planet in shambles for 100 years, then we might forget how to parse XML.

But this is no different than with ASCII. We could just as easily forget how to convert binary data (you know, '1's and '0's) to corresponding ASCII characters.

Now, even if there were such a catastrophy, you insult the human creature by suggesting that we would not be able to figure this out, and to figure out the XML DTD or schema. Have you ever read an XML document following a standard article or book DTD or schema? It is painfully obvious what the markup means, and what its use is.

However, all of this discussion is just silly, because there probably will not be such a catastrophy in the near future.

You are forgetting that change is gradual. If a new format becomes popular (and this is unlikely, because XML can describe any possible format), it will be a matter of an hour or two to convert the entire PG library to the new format.

And if the new format is as well defined (as we should hope) as the existing XML format, then this process will be painless.

You are welcome to continue to comment and complain from a position of clear ignorance, or you can admit that there might in fact be some things which you are not an expert on (suprise!), and that others understand better than you.

We are telling you that using a strictly defined XML format would in every sense be the better choice. It does not require the use of non-ASCII characters. It is human readable. It is well defined, Conversion of the XML document (which for your purposes would not be very complex) to plain (as in not XML formatted) ASCII strings can be done by a 15-20 year old processor or by hand if needed.

In fact, since it is human readable, there is no need to do the conversion at all if we some day find ourselves in a situation where we can not automate it (as in after a worldwide nuclear armageddon). The document can be read as is if needed, and the structuring afforded by XML will be just as clear.

Ah, that explains the "Midi-Sum and Nite Dream" by Pac · 2003-07-04 09:29 · Score: 3, Funny

It probably accounts for Achilles shouting "First Greek" when he lands at Troy's beach and all those descriptions of statues of a girl suspiciously similar to a modern actress in Roman poems. And I was really sure Tom Sawyer's pet was not a penguin.

A sterling mistake by fm6 · 2003-07-04 09:56 · Score: 2, Insightful

Since a fair amount of the important pieces of literature come from mainland Europe (actually even the British £ sign isn't in ASCII), it is clearly not up to the job and should be replaced.

As a matter of fact, the DP web interface allows you to enter the pound sterling symbol even if you don't have it on your keyboard. It also has a lot of accented characters that aren't in English. The fact is that the Gutenberg people think they're using ASCII, but are actually using Latin1. So Gutenberg texts will display correctly on any system that's localized for the U.S., Canada, or Western Europe. But not elsewhere.

You made a similar mistake when you entered that character, since you just entered it from your keyboard. (A natural mistake if you have a British keyboard, as I assume you do.) On some web sites, this would only read correctly on systems similarly configured. However, Slashdot puts out the header:

Content-Type: text/html; charset=iso-8859-1

which should prevent that. Still, the character entity £ is more portable, and will work even when the web page doesn't specify a character set -- and most do not.

On the other hand, Slashcode sometimes mangles eight-bit characters when it archives them. So if you seek true immortality, use the character entity!

Re:A sterling mistake by dvdeug · 2003-07-04 12:15 · Score: 2, Informative

The fact is that the Gutenberg people think they're using ASCII, but are actually using Latin1. So Gutenberg texts will display correctly on any system that's localized for the U.S., Canada, or Western Europe. But not elsewhere.

Excuse me? The Gutenberg people know quite well when they're using ASCII and when they're using Latin-1. If you'll look at the books that are posted, some of the books posted from DP are posted just in ASCII, and some in 7foo.txt and 8foo.txt files, where 7foo is ASCII and 8foo is Latin-1, and a few just in Latin-1.

How to sperad the word... by evilviper · 2003-07-04 15:27 · Score: 4, Insightful

Here's what I did...

A while back, I used wget to mirror the entire Project Gutenberg works. (I did it off-hours, and contacted them to see if it was a problem, or if there was some other more effecient way to do things)

Anyhow, with my GBs of text, I used bzip2 -9 to compress each text file. In the end, the entire collection of PG was able to fit on one CD. Since most people don't have bzip2 support I also included the free archiver, Ultimate Zip on the CD as well. I also put a read-me on the CD (that would appear as the first file) with basic instructions what to do.

One of the great things about CDs is how easy they are to transfer... One stamp, and a 5cent CD envelope, and you can send 2 CDs anywhere in the country (this predated Netflix AFAIK).

Anyhow, I sent these CDs to two different people, and the next time I talked with them, I found out they'd made several coppies of it. Basically, they heard someone mention some subject that related to one of the files on the CD, brought up the CD, and offered to make a copy for them. This happened a few times that I know of, and quite possibly many times that I don't know of. Quite as easy way to spread the word.

Of course, with that said, I don't read the PG texts myself... There are two reasons. The first is that I have yet to come across decent software designed for long-term reading. Something that saves your place (automatically?), something with a legible font, and something with light colored text on a dark background, which brings me to my next point...

The second reason is that monitors are all backlit... That means, reading on a computer screen is like reading text on a floursent lightbulb. It's possible for a while, but your eyes are quickly fatigued. The only screen I have that doesn't do that is my 640x240 B&W LCD screen on my Psion handheld. As good as that is, it's just too small for efective reading. Someone needs to create a non-backlit LCD screen, approx 6" (about the size of a book page) that is small, light, silent, compatible with everything, and most importantly, it needs to have good software that makes reading less work than it normally is on a computer... Until then, relectronic reading isn't going to really be feasable. Screw electronic paper, just give me a screen that doesn't hurt my eyes, and I'm set to go.

--
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant

Re:How to sperad the word... by Junkster+Julian · 2003-07-05 12:19 · Score: 2, Interesting
I wasn't listing the final specifications for a device in detail. Yes, it would have HTML support, and CSS would be useful to have as well. With HTML, people are going to want images supported, that means a few different libraries there as well.

Ok I'm gonna tone myself down a little... this should be a little less of a rant so hang on. The point I was trying to make is that I think HTML should be the one technology an ebook reader should be able to support unlike even standard desktop browsers. I'm not sure it would be such a strech to see the "web browser" condensed into a hardware-streamlined product. SGML support would be great but to implement SGML we must first master HTML, and if we can't deliver an machine dedicated to rendering HTML then how much chance would we have in implementing a technology with less sample-base? It's hard to match HTML in terms of demographic penetration at least in so far as actual text-based content... contrast with postscript, pdf, and the like which (for the most part) do not have human-readable source -- essential for "debugging" our ebooks.

Yeah and the pdf reader for WinCE needs, uhh, "work". It is by no means comparable to its desktop cousins... a cheap knock-off from a huge company complaining about the limitations of PDAs. IMHO, avantgo is a considerably better "ebook reader" that's easier to code for and is far more compatible. HTML 3.2, that's it... can't go wrong. Visit my site and you'll know what I'm talking about: popnt.com Keep in mind my work is still beta, but anyways.

And about permanent media.. well.. I'm going to go way out on a limb here and suggest that print cannot truely be compared with your examples.. although I do in all seriousness appreciate your debate. Just for the sake of argument, what distinguishes print from (at least) the three examples you listed (and please I hope this does not escalate) are the following:
1. Stone tables were never mass-produced in the same way as books (or paper media) were: sure there were sandscript, but specifically what distinguishes print as breakthrough was its potential for industrial mass-production via inventions like the printing press.. ubiquity made the press permanent in many ways.
2. Music (and movies): due to the very recent inventions of the gramophone and that which makes up a motion picture (the camera, film, etc), I'm not sure these can be compared to print media, specifically because of their very recent introductions to society.. note that I am not saying music is a new introduction, rather recorded music.. so in that light, and given the whole MP3 hoopla we're having with the RIAA et al, I think the music/movie industries would have a lot to learn from the print industry -- not the other way around. Also, the music and movie industries themselves use a concept very closely tied in with books in that they are given data to process. I'm not sure music/movies can really compare, in all seriousness to books.. in all honesty, I'm not sure there is much out there that even CAN compare to the print industry. These are secondary industries which require processing that print-media does not. Print is unique in that respect and is therefore again really tough to beat! Even braille is a form of print which requires nothing whatsoever, not even a light-source! What makes print so permanent is its ubiquity -- the sheer volume of static copies whose content and information cannot and will not change over time. No other industry has this power.

Slashdot Mirror

Project Gutenberg's 32nd Birthday

56 of 178 comments (clear)