Project Gutenberg's 32nd Birthday
David Moynihan writes "July 4th marks the 32nd anniversary of that day in 1971 when Michael Hart first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web, thus founding Project Gutenberg. Thanks to an army of volunteers and the Distributed Proofreaders, this is the last year PG will have fewer than 10,000 titles.
Strangely, Microsoft picked this dual anniversary of literacy and freedom to re-launch their Reader product, with three free bestsellers a week, if you activate the new version with Passport, sign a EULA, etc. Real reason for the upgrade might be that the DRM on MS's old Reader was cracked. If you're not into giving away data, or are running a system other than Windows, maybe you could take the time to tell a friend about free books online, or even help out by visiting the Distributed Proofers and editing one page per day."
Seriously, awesome work people.
Ryan T. Sammartino
"Ancora imparo"
Now all we need is more people promoting this in schools and printing the books. Much like the IA Bookmobile. It seems like the people who could use this the most, don't even know it exists.
They had AOL back then?
Download Error
You'll need to install and activate the current version of Microsoft Reader before you can download these Owner-Exclusive titles.
Click here to get started now.
No Linux version!? Gah.
i am going to be teaching modern civ next year in high school (i have been at the junior high for 7 years) , and have already gone to the site and gotten works from aristotle, plato, locke, montesque, et al. thanks guys. there is still something to be said for a classical education. glad somebody is doing all they can to preserve the classics, especially with all the assaults on it from the social reconstructionists.
My problem? I was perfectly gruntled, until some numbnuts came by and dissed me.
I knew it! This country was founded by COBOLers.
Table-ized A.I.
There's really a problem though about getting the word out to people, in pretty much the same way the popularity of libraries today has been dropping. A good idea would be a separate advocacy site to come up with lists of texts in the project (i.e. What's New?, Most Popular, etc.) to help people wade in immediately.
The Baen Free library has a number of titles available in several formats.
It's a great way to introduce readers to a series or a talented new author.
Just on a whim, I decided to see how much cheaper titles in microsoft reader format was over a physical book.
I went to the MS Reader site and followed the links to the on-line publishers sites (such as B&N and amazon). In most cases, the reader format is only $1 cheaper, and sometimes $2 more expensive, than the corresponding paper book (soft or hardcover).
So... why in the world would anyone want to use a format that ties them to the computer?? With a paperback, I can read it anywhere, read for as long as I want without having to change batteries, and even pass the book onto a friend.
If they want to make the electonic formats more attractive, they need to make them a LOT cheaper than the corresponding paper version.
"...to anyone and everyone then on what later became the web..." What?? In 1971 http protocol was around? Or is the author trying to suggest that the internet became the web? I thought the web was part of the internet, not a replacement for. Perhaps Im misreading the article.
Gutenberg is great and all, but it really needs to dump the text format. So much information is lost that it makes reading some texts extremely difficult. Some format that preserved chapter headings, footnotes, illustrations etc. would be a massive step forward.
I absorb all information directly through a USB link from my laptop to my head. Pretty nice, except for the typographical migraines. I always have ibuprofen in hand when visiting Slashdot.
The coolest voice ever.
way to go /. This publicity is sure to help the project. Those who haven't got accounts can start helping or atleast consider it. There is bond to be a few people with extra time on their hands to kill, haven't heard of distributed proof reading, and are willing to do it.
I like what happens when you run across a title which isn't on the site.
Example: "It's not there, eh? -- Canadian"
Heh.
SB
It's old. The more humans I meet, the more I like my cats. At least they are honest.
Unfortunately, with the copyright periods being extended so long, the material will only be of (ancient) historical interest. The 98 percent of copyrighted works that are unpublished and should be on there, unfortunately, gets to sit collecting dust instead of benefitting mankind.
On Slashdot? Sheesh.
2. Hard work to put them in computer form.
3. ????
4. Profit! (For all humanity.)
Hip-Hip-Hooray for a job well done!
One line blog. I hear that they're called Twitters now.
Nice..another opportunity to take an undeserved potshot at Microsoft for no apparent reason. Doesn't it ever get old?
Newsflash: Microsoft is not trying to promote literacy or freedom. They are trying to make money, like just about every other business.
If you want to criticize their Reader/ebook business go ahead, but it's rather petty that the submitter had to attach it to a completely unrelated story. Instead of more information and background about Project Gutenberg, we get this crap.
SIG:Slashdot: indymedia for nerds.
"cannot open this title on a Terminal Services session"
What bollocks. Free software and free books but you can't read them over a network link to your own compute server? Microsoft, as usual, screws the pooch.
Now. How do I uninstall this without removing my adenoids?
While we're on the subject of attaching criticism and potshots to unrelated stories, maybe you should check you sig.
This is why copyrights shouldn't be more than 25 years.
:}
I say, make 'em 10 years renewable up to 50 (and non-transferable).
If only there were more works there like, er, hmm, Roald "Charlie & the Chocolate Factory"/"Matilda"/"The Witches" Dahl.
Meh, well, better than nothing. Too bad though they don't have the Tomson New Testament of 1576.
-uso.
Dreams, dreams, don't doubt dreams, dreaming children's dreaming dreams. Sailor Moon SS
Yes, they need something like that badly.
I remember poking around on PG not long ago but soon forgot about it.
If you're not looking for something specific then the site is kinda, meh. As you suggested, they need a news site, ratings, and other stats so you can see what's available.
And sections. "Technical", "Poetry", etc. Otherwise it's not very useful to the casual browser.
The ratio of people to cake is too big
Great to see a project like this run on Free software. Read more at Greenstone's website.
Heh, XML: The BSD of Markup Languages! :)
C - A language that combines the speed of assembly with the ease of use of assembly.
Just because your dad likes to dress like an archaic seniorita, doesn't make him the Queen of Spain.
From the disclaimer/header on Project Gutenberg files:
.type]
... isn't this snarky instruction now more than a little dated?
If you have an FTP program (or emulator), please
FTP directly to the Project Gutenberg archives:
[Mac users, do NOT point and click. .
Given that a) Macs, being Unix-based, have command-line FTP like everybody else and b) the idea of a point-and-click interface has now passed so far from being a bizarre and contemptible innovation that lots of people are trying hard to develop nice-looking Linux GUIs...
ASA
All employees must wash hands before seeking equitable relief.
Imagine! What if he reads out the Yangs' Holy text, and "We The People" isn't in at least bold text?
Putting a flag on your front porch is a great way to celebrate the 4th of July. An even better way to celebrate the United States' birthday would be to go to this site and actually read the documents that define us as a country.
In this day in age when it seems everyone is a suspected terrorist and our liberties are stripped one by one in the name of homeland security, and in the name of the rights of large companies, I wish some of our elected officials would actually read these documents sometime.
A red white and blue flag isn't what makes this country great, nor does an extremely high gross domestic product -- it is the set of ideas that where written over 200 years ago that makes the USA great.
So everyone go to this site and read those documents. Even if you aren't American you should still read those documents because everyone has the right to the freedoms that our founding fathers wrote about.
One of the advantages of XML is that it's very easily transformable. If Project Gutenberg were to produce XML texts, it'd be trivial for them to automatically convert them to plain ASCII and make that version available as well.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
http://www.conglomerate.org/
Lovely bit of kit.
Government of the people, by corporate executives, for corporate profits.
Lots of plans for the future:
Thanks especially to our main and backup distribution sites, iBiblio and The Internet Archive. And thanks to the THOUSANDS of volunteers who have brought us nearly to our 10,000th eBook.
Dr. Gregory B. Newby
Chief Executive and Director
Project Gutenberg Literary Archive Foundation
http://gutenberg.net
A 501(c)(3) not-for-profit organization with EIN 64-6221541
gbnewby@pglaf.org
Does there exist a decent FAST scanner using free software that runs on GNU/Linux or *BSD?
Especially when you only need to scan text, it seems that every scanner on the market takes > 10 seconds per page.
Where are the 1-3 second scanners? What do PG volunteers use?
Is that full-speed or hi-speed USB?
The law specifically says you can not distribute a work that is copyrighted without the copyright holders permission.
True, 17 USC 106 says that, but it limits itself "Subject to sections 107 through 121", such as 17 USC 109:
fair use laws, but the DMCA removed most of those
From the DMCA: "Nothing in this section shall affect rights, remedies, limitations, or defenses to copyright infringement, including fair use, under this title."
Will I retire or break 10K?
Want to know what's new, etc? The Project Gutenberg website admittedly sucks, and their ASCII adherence admittedly verges on dogma, but there is a good substitute:
The Online Books Page
http://digital.library.upenn.edu/books/
It currently has 20,000 FREE titles listed, from hundreds (at least!) of sources, in all subjects, beautifully categorizes by title, author and subject--and topped off by an up-to-date what's new listing and a fine search engine. Much props to John Mark Ockerbloom and the University of Pennsylvania for supporting the site.
P.S. Won't one of you nice Slashdotters with time or interest in good works consider doing a complete redesign of the PG site, a full-text on-site search engine for the texts, a better categorization system and just a decent, half-respectable look? It don't get no respect lookin' as it does now. Among other things, the lack of internal organization means that individual texts get shafted in Google rankings.
A speedy internet connection and tons of computers wouldn't be needed to print out documents from Gutenberg.
It still costs money to turn downloaded digital copies of works into printed copies for 100 students in a grade level.
they would realize that it would be cheaper in the longrun to get texts off Gutenberg, instead of buying pre-bound books elsewhere.
Public domain etexts, such as those offered by Project Gutenberg, would be useful in schools only under limited circumstances. Though they would be useful in literature classes in high school (and possibly middle school), forget about them in elementary school, where most books are illustrated, because most PG editions leave out illustrations. Forget about them in science classes as well; the 1911 Encyclopaedia Britannica contains outdated views of anything scientific, and anything significantly newer is tied up forever in the Bono Act and its obligatory sequels. And what keeps a publisher from tying purchases of its science books to purchases of its literature books?
Will I retire or break 10K?
So I try to go to http://www.pgdp.net/ - only to find out that the page won't load unless you enable JavaScript!
Um... I thought PG was all about not using the latest bells and whistles? (semi-facetious)
Wolde you bothe eate your cake, and have your cake?
How can XML have staying power? It isn't a file format?
It is essentially a meta-format. You can put any tags in there you want. And that's the problem with it. Same problem as TIFF. Anyone can generate one, but few can read others files because to do so means you need to understand every tag that could possibly be in there.
And since the format is so flexible, people create new tags every day. SO programs written a year ago have zero chance of understanding a file. Just like TIFF.
If Gutenberg were to switch to anything it should be RTF, it's been around 10 years and still going strong.
HTML preserves formatting and illustrations, and being an ascii format, it is recoverable even if one doesn't have an HTML browser.
The point is that many of us would prefer an XML version. The argument against this was that ASCII is a longer-lasting archive format. My counter-argument was that an ASCII version can trivially be produced from the XML both for archival purposes and for those who would prefer such a version.
10 PRINT CHR$(205.5+RND(1)); : GOTO 10
Where are the 1-3 second scanners?
Wouldn't it be possible to rig up a high-speed scanner based on digital video technology? Or are CCD and CMOS image sensors not fine enough yet?
Will I retire or break 10K?
There is a feature on DP right now that allows you to see the 10 latest projects posted to Project Gutenberg. It is in RSS format and at http://www.pgdp.net/c/feeds/backend.php?content=po sted&type=rss
Joseph Gruber
DP Developer
Both look like amateur do-gooders, and we need more of those; but these efforts should be folded back into the organisation of PG, where they may find a permanent home. The alternative is to go adrift, due to too few people being involved (only _two_ people do PGXML) to round out the abilities (and future efforts of) XML uber-format-goodness.
One major reason why I'd be interested in a longer toolchain, from scans into TXT, and TXT into XML, is to make translation easier. All the older Gutenberg etexts are in different, revised formats. Try making a parser than automagically transforms the dozen or so revisions of the one true "TXT" into XML to see what I mean. (I have; there will always be some books that break important Gutenbrth formatting placeholders).
Your comment implies that smart people refrain from having children. That is one of the most idiotic comments I have ever read.
My entire CD collection fits in my pocket with my iPod. If I could fit my entire book collection in my pocket, that would be a dream and a delight.
XML is not a character encoding. XML does not require the use of non-ASCII characters. What can be represented by an XML document is a superset of what can be represented by a plain ASCII document. XML is a human-readable markup.
.doc is a binary format.
MS Word 2000
I suspect that you have very little idea what you are talking about.
PG already uses XML-like markup to indicate an emphasized portion of a passage, among other things. If we were to accept your argument, then even this alone should be seen as a failure.
Afterall, what if over the course of 50 years we forget what "blahblahblah" means? What if in some impoverished country, while the people have the processing power to read these documents, they do not have the processing power to parse out ?!
Both of these worries are foolish. If you use an XML format for open content, you have an obligation to provide openly the strict and formal DTD or schema which describes your XML markup.
What if this DTD or schema becomes lost? This won't happen, because you can embed the DTD or schema in the distributed documents (the books) themselves.
What if we forget how to parse XML?
Yes, if there were a terrible war which left the entire planet in shambles for 100 years, then we might forget how to parse XML.
But this is no different than with ASCII. We could just as easily forget how to convert binary data (you know, '1's and '0's) to corresponding ASCII characters.
Now, even if there were such a catastrophy, you insult the human creature by suggesting that we would not be able to figure this out, and to figure out the XML DTD or schema. Have you ever read an XML document following a standard article or book DTD or schema? It is painfully obvious what the markup means, and what its use is.
However, all of this discussion is just silly, because there probably will not be such a catastrophy in the near future.
You are forgetting that change is gradual. If a new format becomes popular (and this is unlikely, because XML can describe any possible format), it will be a matter of an hour or two to convert the entire PG library to the new format.
And if the new format is as well defined (as we should hope) as the existing XML format, then this process will be painless.
You are welcome to continue to comment and complain from a position of clear ignorance, or you can admit that there might in fact be some things which you are not an expert on (suprise!), and that others understand better than you.
We are telling you that using a strictly defined XML format would in every sense be the better choice. It does not require the use of non-ASCII characters. It is human readable. It is well defined, Conversion of the XML document (which for your purposes would not be very complex) to plain (as in not XML formatted) ASCII strings can be done by a 15-20 year old processor or by hand if needed.
In fact, since it is human readable, there is no need to do the conversion at all if we some day find ourselves in a situation where we can not automate it (as in after a worldwide nuclear armageddon). The document can be read as is if needed, and the structuring afforded by XML will be just as clear.
It probably accounts for Achilles shouting "First Greek" when he lands at Troy's beach and all those descriptions of statues of a girl suspiciously similar to a modern actress in Roman poems. And I was really sure Tom Sawyer's pet was not a penguin.
Oops, my \\ tags weren't commented out, sorry about that :) Bah, I can't figure out how to prevent Slashdot from parsing them out. You get the idea, eh?
PG uses "italics" tags to designate emphasis in the text.
You made a similar mistake when you entered that character, since you just entered it from your keyboard. (A natural mistake if you have a British keyboard, as I assume you do.) On some web sites, this would only read correctly on systems similarly configured. However, Slashdot puts out the header:
which should prevent that. Still, the character entity £ is more portable, and will work even when the web page doesn't specify a character set -- and most do not.On the other hand, Slashcode sometimes mangles eight-bit characters when it archives them. So if you seek true immortality, use the character entity!
I get all the information I need (and more) from "reading" lamb livers (all the Universe is reflected in even its tiniest fragment, you only have to look hard enough). On most days though, I have to resort to using tea leaves (as there aren't too many sheep left in 20 mile radius) but tea leaves have lower bandwidth and they generate more errors (mostly typos, but when reading Slashdot, I occasionally experience a kind of deja vu). I post to Slashdot by using complicated black magic (it includes drawing several pentagrams and calling several names I dare not mention in the fear of accidentally calling their wrath upon me) to directly alter the state of the Universe.
Hell is not other people; it is yourself. - Ludwig Wittgenstein
I say make them 25 years and indefinently renewable. That makes the corporates happy and lets other works go into public domain. Who cares if 200 years from now Mickey Mouse and Harry Potter are still owned by corporate interests, when Lord of the Rings and Star Trek are public domain in 10.
The German Projekt Gutenberg
http://www.gutenberg2000.de/
Which attempts to put all German literature which copyrights have exinguished on the web.
There are two rules for success:
1. Never tell everything you know.
Your comment implies that smart people refrain from having children. That is one of the most idiotic comments I have ever read
No, it implies that smart people have fewer children, which is true, generally speaking.
Why? Consider the following reasons why having a lot of children can make make you stupid:
PowerDVD wont run under terminal server, even if you are on the console..
Rather silly..
---- Booth was a patriot ----
Ok all you Gutenberg fans.. here's a good one for y'all... I know this person who is trying to author a site that's right in line with exactly what Project Gutenberg stands for: making texts available to the electronic world.
So what could we suggest to someone who is donating their blood sweat and tears to an ubiquitous online resource so everyone, from Lynx, IE, Palm/Wince, WAP, and even print/fax "users" can have access to this ubiquitous resource?
And what if I said that resource was about Nanotechnology and that it beats Nanodot in terms of potential audience, readership, and just plain usability?
And what if I said that person was me, and that site was popnt.com, also called "Popular Nanotechnology"...? Would that tweak anyone's "interest"? Anyone...?
Please don't flame me folks, I'm doing this literally out of the kindness of my heart and need any help I can get.
All I ask is you check it out.. start at Volume 0 if anyone has a chance.
Here's what I did...
A while back, I used wget to mirror the entire Project Gutenberg works. (I did it off-hours, and contacted them to see if it was a problem, or if there was some other more effecient way to do things)
Anyhow, with my GBs of text, I used bzip2 -9 to compress each text file. In the end, the entire collection of PG was able to fit on one CD. Since most people don't have bzip2 support I also included the free archiver, Ultimate Zip on the CD as well. I also put a read-me on the CD (that would appear as the first file) with basic instructions what to do.
One of the great things about CDs is how easy they are to transfer... One stamp, and a 5cent CD envelope, and you can send 2 CDs anywhere in the country (this predated Netflix AFAIK).
Anyhow, I sent these CDs to two different people, and the next time I talked with them, I found out they'd made several coppies of it. Basically, they heard someone mention some subject that related to one of the files on the CD, brought up the CD, and offered to make a copy for them. This happened a few times that I know of, and quite possibly many times that I don't know of. Quite as easy way to spread the word.
Of course, with that said, I don't read the PG texts myself... There are two reasons. The first is that I have yet to come across decent software designed for long-term reading. Something that saves your place (automatically?), something with a legible font, and something with light colored text on a dark background, which brings me to my next point...
The second reason is that monitors are all backlit... That means, reading on a computer screen is like reading text on a floursent lightbulb. It's possible for a while, but your eyes are quickly fatigued. The only screen I have that doesn't do that is my 640x240 B&W LCD screen on my Psion handheld. As good as that is, it's just too small for efective reading. Someone needs to create a non-backlit LCD screen, approx 6" (about the size of a book page) that is small, light, silent, compatible with everything, and most importantly, it needs to have good software that makes reading less work than it normally is on a computer... Until then, relectronic reading isn't going to really be feasable. Screw electronic paper, just give me a screen that doesn't hurt my eyes, and I'm set to go.
Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
I'd like to mention my freeware ebook reader - it's got a direct-downloader for the Gutenberg catalog and all texts therein. It also works on Linux, under Wine (ok, it would be nice to have a native version but I've been coding in VB/VC for over a decade. Old dog, new tricks, etc)
I specifically wrote it to take gutenberg text files and reformat them back into something you can actually read.
Cheers
Simon
Oh yeah, a URL: http://www.spacejock.com
Does this mean that the Declaration of Independence is the first spam?
-- Ed Avis ed@membled.com
While I actually took the time to sit down and learn how to read punchcards from just their hole patterns (which isn't too difficult compared to reading data files directly from a hex editor if you have to dig into why a program isn't reading a certain file correctly).
I have seen some punchcard machines come into the local thrift store a couple of years ago, I think it would be hard to find one now.
The nice advantage that punch cards have over just about every other data storage medium is that as long as the cards are preserved in archival conditions, bit rot is almost impossible. And the archival conditions are no different from perserving old books, which has a long tradition and history.
The only problem is that punch cards is that it takes so much room, especially compared to the amount of data actually stored.
If you've seen most of the paper that they print on, which is newsprint when you buy paperback, the printing fees are essentially a near zero-cost procedure. It only costs them 1-2 $ to actually print the book, but it is the intellectual property that matters a lot more.
By purchasing a book, you are, in essence, "licensing" the work for your use from the publisher. I don't see why e-books should be more expensive, but if they are a dollar or two cheaper, that's probably because they don't actually have to print it.
Moreover, there are some advantages to using the electronic format versus using the paper format. You try finding a sentence in a 400 page book, and I'll try finding the same thing in an e-book. Needless to say, it will take a lot less time to find it in the electronic book format.
I would have to agree that XML does offer some resonable options that make it much superior to plain ASCII test (or Latin-1 as has been discussed in this thread).
A point I want to make is that:
I've been involved with the computer industry long enough myself that I feel the caution that Michael Hart has towards this issue is totally legitimate. Let's let XML prove itself and survive the next couple of rounds of new fads for data formatting, and if it makes it more than a decade (XML isn't that old), it might just make it longer.
I'm happy to announce that there's now a new version of Convert LIT which is compatible with the updated version of Microsoft Reader. You can find it on our website here.
And who enforces the Sherman Antitrust Act? Microsoft got off on Sherman Act charges with just a slap on the wrist.
Will I retire or break 10K?