Project Gutenberg's 32nd Birthday
David Moynihan writes "July 4th marks the 32nd anniversary of that day in 1971 when Michael Hart first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web, thus founding Project Gutenberg. Thanks to an army of volunteers and the Distributed Proofreaders, this is the last year PG will have fewer than 10,000 titles.
Strangely, Microsoft picked this dual anniversary of literacy and freedom to re-launch their Reader product, with three free bestsellers a week, if you activate the new version with Passport, sign a EULA, etc. Real reason for the upgrade might be that the DRM on MS's old Reader was cracked. If you're not into giving away data, or are running a system other than Windows, maybe you could take the time to tell a friend about free books online, or even help out by visiting the Distributed Proofers and editing one page per day."
Seriously, awesome work people.
Ryan T. Sammartino
"Ancora imparo"
Now all we need is more people promoting this in schools and printing the books. Much like the IA Bookmobile. It seems like the people who could use this the most, don't even know it exists.
i am going to be teaching modern civ next year in high school (i have been at the junior high for 7 years) , and have already gone to the site and gotten works from aristotle, plato, locke, montesque, et al. thanks guys. there is still something to be said for a classical education. glad somebody is doing all they can to preserve the classics, especially with all the assaults on it from the social reconstructionists.
My problem? I was perfectly gruntled, until some numbnuts came by and dissed me.
I knew it! This country was founded by COBOLers.
Table-ized A.I.
There's really a problem though about getting the word out to people, in pretty much the same way the popularity of libraries today has been dropping. A good idea would be a separate advocacy site to come up with lists of texts in the project (i.e. What's New?, Most Popular, etc.) to help people wade in immediately.
The Baen Free library has a number of titles available in several formats.
It's a great way to introduce readers to a series or a talented new author.
Just on a whim, I decided to see how much cheaper titles in microsoft reader format was over a physical book.
I went to the MS Reader site and followed the links to the on-line publishers sites (such as B&N and amazon). In most cases, the reader format is only $1 cheaper, and sometimes $2 more expensive, than the corresponding paper book (soft or hardcover).
So... why in the world would anyone want to use a format that ties them to the computer?? With a paperback, I can read it anywhere, read for as long as I want without having to change batteries, and even pass the book onto a friend.
If they want to make the electonic formats more attractive, they need to make them a LOT cheaper than the corresponding paper version.
"...to anyone and everyone then on what later became the web..." What?? In 1971 http protocol was around? Or is the author trying to suggest that the internet became the web? I thought the web was part of the internet, not a replacement for. Perhaps Im misreading the article.
Gutenberg is great and all, but it really needs to dump the text format. So much information is lost that it makes reading some texts extremely difficult. Some format that preserved chapter headings, footnotes, illustrations etc. would be a massive step forward.
I absorb all information directly through a USB link from my laptop to my head. Pretty nice, except for the typographical migraines. I always have ibuprofen in hand when visiting Slashdot.
The coolest voice ever.
I like what happens when you run across a title which isn't on the site.
Example: "It's not there, eh? -- Canadian"
Heh.
SB
It's old. The more humans I meet, the more I like my cats. At least they are honest.
Unfortunately, with the copyright periods being extended so long, the material will only be of (ancient) historical interest. The 98 percent of copyrighted works that are unpublished and should be on there, unfortunately, gets to sit collecting dust instead of benefitting mankind.
"cannot open this title on a Terminal Services session"
What bollocks. Free software and free books but you can't read them over a network link to your own compute server? Microsoft, as usual, screws the pooch.
Now. How do I uninstall this without removing my adenoids?
Yes, they need something like that badly.
I remember poking around on PG not long ago but soon forgot about it.
If you're not looking for something specific then the site is kinda, meh. As you suggested, they need a news site, ratings, and other stats so you can see what's available.
And sections. "Technical", "Poetry", etc. Otherwise it's not very useful to the casual browser.
The ratio of people to cake is too big
Great to see a project like this run on Free software. Read more at Greenstone's website.
Putting a flag on your front porch is a great way to celebrate the 4th of July. An even better way to celebrate the United States' birthday would be to go to this site and actually read the documents that define us as a country.
In this day in age when it seems everyone is a suspected terrorist and our liberties are stripped one by one in the name of homeland security, and in the name of the rights of large companies, I wish some of our elected officials would actually read these documents sometime.
A red white and blue flag isn't what makes this country great, nor does an extremely high gross domestic product -- it is the set of ideas that where written over 200 years ago that makes the USA great.
So everyone go to this site and read those documents. Even if you aren't American you should still read those documents because everyone has the right to the freedoms that our founding fathers wrote about.
http://www.conglomerate.org/
Lovely bit of kit.
Government of the people, by corporate executives, for corporate profits.
Lots of plans for the future:
Thanks especially to our main and backup distribution sites, iBiblio and The Internet Archive. And thanks to the THOUSANDS of volunteers who have brought us nearly to our 10,000th eBook.
Dr. Gregory B. Newby
Chief Executive and Director
Project Gutenberg Literary Archive Foundation
http://gutenberg.net
A 501(c)(3) not-for-profit organization with EIN 64-6221541
gbnewby@pglaf.org
The law specifically says you can not distribute a work that is copyrighted without the copyright holders permission.
True, 17 USC 106 says that, but it limits itself "Subject to sections 107 through 121", such as 17 USC 109:
fair use laws, but the DMCA removed most of those
From the DMCA: "Nothing in this section shall affect rights, remedies, limitations, or defenses to copyright infringement, including fair use, under this title."
Will I retire or break 10K?
Want to know what's new, etc? The Project Gutenberg website admittedly sucks, and their ASCII adherence admittedly verges on dogma, but there is a good substitute:
The Online Books Page
http://digital.library.upenn.edu/books/
It currently has 20,000 FREE titles listed, from hundreds (at least!) of sources, in all subjects, beautifully categorizes by title, author and subject--and topped off by an up-to-date what's new listing and a fine search engine. Much props to John Mark Ockerbloom and the University of Pennsylvania for supporting the site.
P.S. Won't one of you nice Slashdotters with time or interest in good works consider doing a complete redesign of the PG site, a full-text on-site search engine for the texts, a better categorization system and just a decent, half-respectable look? It don't get no respect lookin' as it does now. Among other things, the lack of internal organization means that individual texts get shafted in Google rankings.
A speedy internet connection and tons of computers wouldn't be needed to print out documents from Gutenberg.
It still costs money to turn downloaded digital copies of works into printed copies for 100 students in a grade level.
they would realize that it would be cheaper in the longrun to get texts off Gutenberg, instead of buying pre-bound books elsewhere.
Public domain etexts, such as those offered by Project Gutenberg, would be useful in schools only under limited circumstances. Though they would be useful in literature classes in high school (and possibly middle school), forget about them in elementary school, where most books are illustrated, because most PG editions leave out illustrations. Forget about them in science classes as well; the 1911 Encyclopaedia Britannica contains outdated views of anything scientific, and anything significantly newer is tied up forever in the Bono Act and its obligatory sequels. And what keeps a publisher from tying purchases of its science books to purchases of its literature books?
Will I retire or break 10K?
Both look like amateur do-gooders, and we need more of those; but these efforts should be folded back into the organisation of PG, where they may find a permanent home. The alternative is to go adrift, due to too few people being involved (only _two_ people do PGXML) to round out the abilities (and future efforts of) XML uber-format-goodness.
One major reason why I'd be interested in a longer toolchain, from scans into TXT, and TXT into XML, is to make translation easier. All the older Gutenberg etexts are in different, revised formats. Try making a parser than automagically transforms the dozen or so revisions of the one true "TXT" into XML to see what I mean. (I have; there will always be some books that break important Gutenbrth formatting placeholders).
My entire CD collection fits in my pocket with my iPod. If I could fit my entire book collection in my pocket, that would be a dream and a delight.
XML is not a character encoding. XML does not require the use of non-ASCII characters. What can be represented by an XML document is a superset of what can be represented by a plain ASCII document. XML is a human-readable markup.
.doc is a binary format.
MS Word 2000
I suspect that you have very little idea what you are talking about.
PG already uses XML-like markup to indicate an emphasized portion of a passage, among other things. If we were to accept your argument, then even this alone should be seen as a failure.
Afterall, what if over the course of 50 years we forget what "blahblahblah" means? What if in some impoverished country, while the people have the processing power to read these documents, they do not have the processing power to parse out ?!
Both of these worries are foolish. If you use an XML format for open content, you have an obligation to provide openly the strict and formal DTD or schema which describes your XML markup.
What if this DTD or schema becomes lost? This won't happen, because you can embed the DTD or schema in the distributed documents (the books) themselves.
What if we forget how to parse XML?
Yes, if there were a terrible war which left the entire planet in shambles for 100 years, then we might forget how to parse XML.
But this is no different than with ASCII. We could just as easily forget how to convert binary data (you know, '1's and '0's) to corresponding ASCII characters.
Now, even if there were such a catastrophy, you insult the human creature by suggesting that we would not be able to figure this out, and to figure out the XML DTD or schema. Have you ever read an XML document following a standard article or book DTD or schema? It is painfully obvious what the markup means, and what its use is.
However, all of this discussion is just silly, because there probably will not be such a catastrophy in the near future.
You are forgetting that change is gradual. If a new format becomes popular (and this is unlikely, because XML can describe any possible format), it will be a matter of an hour or two to convert the entire PG library to the new format.
And if the new format is as well defined (as we should hope) as the existing XML format, then this process will be painless.
You are welcome to continue to comment and complain from a position of clear ignorance, or you can admit that there might in fact be some things which you are not an expert on (suprise!), and that others understand better than you.
We are telling you that using a strictly defined XML format would in every sense be the better choice. It does not require the use of non-ASCII characters. It is human readable. It is well defined, Conversion of the XML document (which for your purposes would not be very complex) to plain (as in not XML formatted) ASCII strings can be done by a 15-20 year old processor or by hand if needed.
In fact, since it is human readable, there is no need to do the conversion at all if we some day find ourselves in a situation where we can not automate it (as in after a worldwide nuclear armageddon). The document can be read as is if needed, and the structuring afforded by XML will be just as clear.
It probably accounts for Achilles shouting "First Greek" when he lands at Troy's beach and all those descriptions of statues of a girl suspiciously similar to a modern actress in Roman poems. And I was really sure Tom Sawyer's pet was not a penguin.
You made a similar mistake when you entered that character, since you just entered it from your keyboard. (A natural mistake if you have a British keyboard, as I assume you do.) On some web sites, this would only read correctly on systems similarly configured. However, Slashdot puts out the header:
which should prevent that. Still, the character entity £ is more portable, and will work even when the web page doesn't specify a character set -- and most do not.On the other hand, Slashcode sometimes mangles eight-bit characters when it archives them. So if you seek true immortality, use the character entity!
Here's what I did...
A while back, I used wget to mirror the entire Project Gutenberg works. (I did it off-hours, and contacted them to see if it was a problem, or if there was some other more effecient way to do things)
Anyhow, with my GBs of text, I used bzip2 -9 to compress each text file. In the end, the entire collection of PG was able to fit on one CD. Since most people don't have bzip2 support I also included the free archiver, Ultimate Zip on the CD as well. I also put a read-me on the CD (that would appear as the first file) with basic instructions what to do.
One of the great things about CDs is how easy they are to transfer... One stamp, and a 5cent CD envelope, and you can send 2 CDs anywhere in the country (this predated Netflix AFAIK).
Anyhow, I sent these CDs to two different people, and the next time I talked with them, I found out they'd made several coppies of it. Basically, they heard someone mention some subject that related to one of the files on the CD, brought up the CD, and offered to make a copy for them. This happened a few times that I know of, and quite possibly many times that I don't know of. Quite as easy way to spread the word.
Of course, with that said, I don't read the PG texts myself... There are two reasons. The first is that I have yet to come across decent software designed for long-term reading. Something that saves your place (automatically?), something with a legible font, and something with light colored text on a dark background, which brings me to my next point...
The second reason is that monitors are all backlit... That means, reading on a computer screen is like reading text on a floursent lightbulb. It's possible for a while, but your eyes are quickly fatigued. The only screen I have that doesn't do that is my 640x240 B&W LCD screen on my Psion handheld. As good as that is, it's just too small for efective reading. Someone needs to create a non-backlit LCD screen, approx 6" (about the size of a book page) that is small, light, silent, compatible with everything, and most importantly, it needs to have good software that makes reading less work than it normally is on a computer... Until then, relectronic reading isn't going to really be feasable. Screw electronic paper, just give me a screen that doesn't hurt my eyes, and I'm set to go.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant