Slashdot Mirror


Project Gutenberg's 32nd Birthday

David Moynihan writes "July 4th marks the 32nd anniversary of that day in 1971 when Michael Hart first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web, thus founding Project Gutenberg. Thanks to an army of volunteers and the Distributed Proofreaders, this is the last year PG will have fewer than 10,000 titles. Strangely, Microsoft picked this dual anniversary of literacy and freedom to re-launch their Reader product, with three free bestsellers a week, if you activate the new version with Passport, sign a EULA, etc. Real reason for the upgrade might be that the DRM on MS's old Reader was cracked. If you're not into giving away data, or are running a system other than Windows, maybe you could take the time to tell a friend about free books online, or even help out by visiting the Distributed Proofers and editing one page per day."

178 comments

  1. Must...avoid...Steve...Gutenberg...joke... by mikeophile · · Score: 4, Funny

    Seriously, awesome work people.

  2. You can't be serious by ryants · · Score: 5, Funny
    even help out by visiting the Distributed Proofers and editing one page per day.
    You can't seriously be asking Slashdotters to volunteer as proofreaders.
    --

    Ryan T. Sammartino
    "Ancora imparo"

    1. Re:You can't be serious by BabyDave · · Score: 4, Funny

      Could be worse - they could be asking the Slashdot editors!

    2. Re:You can't be serious by thinkninja · · Score: 4, Funny
      It was the best of times, it was the worst of times...
      -1 Redundant
      --
      "The number of Unix installations has grown to ten, with more expected." (Unix Programmer's Manual, 2nd ed.; june 1972)
    3. Re:You can't be serious by AndroidCat · · Score: 1
      Could be worser - they could be asking the Slashdot trolls!

      What all these Soviet Russia and All Your Base changes that have been marked in? And what is goatse?

      --
      One line blog. I hear that they're called Twitters now.
    4. Re:You can't be serious by psylent · · Score: 0, Troll

      "could be worser" ... go back to school dude.

    5. Re:You can't be serious by Anonymous Coward · · Score: 0

      >==**WHOOSH**==>

    6. Re:You can't be serious by Aldarondo · · Score: 5, Interesting
      As one that has been involved with Distributed Proofreaders for the past 18 months, yes we are serious about having Slashdot people proofread. The last time a story about D.P. ran in November, thousands of new users joined us and helped us grow and expand to our current size.

      Go and check it out, there is great work being done there. (I am a bit biased though). Click here for a history of DP.

    7. Re:You can't be serious by claudius0425 · · Score: 2, Funny

      Hey yo, some us /. homies aint got no grammer problems.

      --
      Phus. Sysiphus.
    8. Re:You can't be serious by MadCow42 · · Score: 2, Funny

      Well, it wouldn't be THAT bad, we'd just have 5 different versions of each book, each released about a day apart.

      MadCow.

      --
      I used to have a sig, but I set it free and it never came back.
    9. Re:You can't be serious by Anonymous Coward · · Score: 0

      "It was the best of times, it was the *BLURST* of times?!?"

    10. Re:You can't be serious by tommertron · · Score: 1, Interesting

      The thing is, this brings up a somewhat serious point. I've proofread professionally in the past, and I know that it's hard and nobody's perfect at doing it. An open approach might work with software, because anyone can easily test it: there are bugs in the program. But without a wiki-type format (www.wikipedia.org) who is there to make sure it's proofread properly? If this is proofread incorrectly and distributed to schools and stuff, I have to worry about the quality level of the texts students are learning with if they use the free texts. I have in fact read a lot of public domain texts, and find typos and grammatical errors to be fairly common in them. Would a wiki-format help open texts? (Or maybe a moderated wiki-format.)

      --
      Random rants about technology: http://technorants.blogspot.com
    11. Re:You can't be serious by obotics · · Score: 1

      On /., a lot of dumb posts are modded up to +5 funny. But this was truly funny, man. Good job. ^_^

    12. Re:You can't be serious by WindBourne · · Score: 1

      Actually, there a few grammer nazis that I would love to see work on it. While they are grammatically correct, they are obnoxious here. Worse, they are wasting space and time.

      --
      I prefer the "u" in honour as it seems to be missing these days.
    13. Re:You can't be serious by JDWTopGuy · · Score: 2, Funny

      How about a project to translate the books to 1337 speak?

      20,000 1346u35 und3r 7h3 534
      h4x0r3d by Ju135 V3rn

      7h3y w4s 0wn3d by 7h3 c4p74!n

      Or even worse, the Bible in 1337: (7h3 n3w h4x0r v3r510n?)

      7h0u 5h417 n0t k!11, d00d.

      --
      Ron Paul 2012
    14. Re:You can't be serious by bruthasj · · Score: 1

      They're inviting those who mock and scorn the bad spellers. Obviously, if you've read enough posting here, there appears to be a 20:1 ratio of bad spellers to good spellers. So, there are still some who they can extract sufficient proofreading capabilities from.

      Proofread that.

    15. Re:You can't be serious by Koushiro · · Score: 1

      The good thing about Distributed Proofreaders is that there's actually a system in which each scanned and OCRed page of text is proofread twice: once by anyone, and once by anyone who has more than fifty pages proofread. (For that matter, it's checked again by a post-proofreader when the separate pages are combined into one file, for extra safety.)

      Admittedly, a moderated wiki-format would be difficult to beat for reliability, but three separate checks, (theoretically) increasing in accuracy each time, there's not much that will be missed. Really, the accuracy of the triple-check system in addition to the speed (2500-3000 pages a day) of the proofreading at DP would make it doubtful that any increase in reliability would be justified by the change in system.

      (Oh, and check the relevant section of the FAQ for a clear, step-by-step version of how DP works.)

      --
      Karma: Oldschool
    16. Re:You can't be serious by croddy · · Score: 2, Interesting

      well, that was fun. I think it would be more addictive if I got to do pages in order though...

    17. Re:You can't be serious by psylent · · Score: 1

      u guys really have no idea about grammar bad worse worst pls refer to elementary grammar books like the one written by Wren and Martin. And someone actually moderated me a troll.

    18. Re:You can't be serious by Nerull · · Score: 1

      Probobly because 'Worser' is a word. An old word that nobody uses anymore, but its a word.

      Oh, and about the grammar thing...as they say, "People who live in glass houses shouldn't throw stones."

  3. Now for the marketing... by Blaine+Hilton · · Score: 4, Insightful

    Now all we need is more people promoting this in schools and printing the books. Much like the IA Bookmobile. It seems like the people who could use this the most, don't even know it exists.

    1. Re:Now for the marketing... by Googa · · Score: 1, Insightful

      Yes, I can agree with this. We people here won't benefit from it half as much as needy school districts who could use the texts. Methinks what they really need to do is work on some awareness program, distributing the books to teachers... or even letting know that such a resource exists. With more technology in the classroom, Gutenberg shouldn't be out of reach to many teachers.

    2. Re:Now for the marketing... by Anonymous Coward · · Score: 0

      Yes, because needy school districts have tons of computers and speedy internet connection.

    3. Re:Now for the marketing... by Googa · · Score: 0
      A speedy internet connection and tons of computers wouldn't be needed to print out documents from Gutenberg.

      My point is that if schools know of this, they would realize that it would be cheaper in the longrun to get texts off Gutenberg, instead of buying pre-bound books elsewhere.

    4. Re:Now for the marketing... by AndroidCat · · Score: 2, Insightful
      and speedy internet connection

      The first Gutenberg books I came across were being passed around BBSs at 2400 bps or so. When they started 32 years ago, 110, maybe 300 bps. Who cares? Check the size of the files, these aren't Word documents, you know.

      --
      One line blog. I hear that they're called Twitters now.
    5. Re:Now for the marketing... by Anonymous Coward · · Score: 0

      The city of Atlanta spends $12,000 per student every year. If they choose to waste it instead of purchasing computers than it is their own fault.

  4. All caps? by Anonymous Coward · · Score: 0, Funny

    They had AOL back then?

    1. Re:All caps? by squiggleslash · · Score: 1

      Of course, the other bit of evidence being that all the Declaration's signers signed it "ME TOO!!!!"

      --
      You are not alone. This is not normal. None of this is normal.
  5. doh by Anonymous Coward · · Score: 1, Funny

    Download Error

    You'll need to install and activate the current version of Microsoft Reader before you can download these Owner-Exclusive titles.

    Click here to get started now.


    No Linux version!? Gah.

    1. Re:doh by Anonymous Coward · · Score: 0

      try cat or less

  6. very timely for me by b17bmbr · · Score: 5, Interesting

    i am going to be teaching modern civ next year in high school (i have been at the junior high for 7 years) , and have already gone to the site and gotten works from aristotle, plato, locke, montesque, et al. thanks guys. there is still something to be said for a classical education. glad somebody is doing all they can to preserve the classics, especially with all the assaults on it from the social reconstructionists.

    --
    My problem? I was perfectly gruntled, until some numbnuts came by and dissed me.
    1. Re:very timely for me by sketerpot · · Score: 1

      The Gutenberg people are doing all they can to preserve every book they can legally get their hands on. Personally, I'd like it if they could get their hands on some newer books.

  7. founding fathers by Tablizer · · Score: 4, Funny

    ...first sped an all-caps version of the Declaration of Independence to anyone and everyone then on what later became the web

    I knew it! This country was founded by COBOLers.

    1. Re:founding fathers by Anonymous Coward · · Score: 3, Funny

      ADD 1 TO POST-POINTS.
      MOVE "Funny" TO POST-STATUS.

      (That's Cobol, for those who don't know)

    2. Re:founding fathers by mayotte · · Score: 1

      I was thinking of something along the same lines. Imagine a public reading of that text, SHOUTED AT THE TOP OF HIS LUNGS.

    3. Re:founding fathers by Anonymous Coward · · Score: 0

      NO SHIT?

      GEE, HERE I THOUGHT IT WAS JAVA!

      Lameness filter encountered. Post aborted!
      Reason: Don't use so many caps. It's like YELLING.

  8. Really great work by the guys behind the project! by jaemark · · Score: 5, Interesting

    There's really a problem though about getting the word out to people, in pretty much the same way the popularity of libraries today has been dropping. A good idea would be a separate advocacy site to come up with lists of texts in the project (i.e. What's New?, Most Popular, etc.) to help people wade in immediately.

  9. More free books by Cruciform · · Score: 5, Informative

    The Baen Free library has a number of titles available in several formats.

    It's a great way to introduce readers to a series or a talented new author.

    1. Re:More free books by Anonymous Coward · · Score: 0

      The best place (that I know of) to search for free books is The Online Books Page at the University of Pennsylvania. Also check out their list of some books (available online) that have been banned somewhere at some time - "Little Red Riding Hood" is one of the curiosities...

    2. Re:More free books by Night+Goat · · Score: 1

      Talented? I checked that out and all I saw were hacky sci-fi authors whose titles clutter the sci-fi section of my local bookstores. I wouldn't pay money for those books.

    3. Re:More free books by Anonymous Coward · · Score: 0

      I would reccomend you read "meloncholy elephants". I think it was written by Spider Robinson.
      that and maybe get a soul.

  10. 'reader' books not much cheaper by Chmarr · · Score: 3, Insightful

    Just on a whim, I decided to see how much cheaper titles in microsoft reader format was over a physical book.

    I went to the MS Reader site and followed the links to the on-line publishers sites (such as B&N and amazon). In most cases, the reader format is only $1 cheaper, and sometimes $2 more expensive, than the corresponding paper book (soft or hardcover).

    So... why in the world would anyone want to use a format that ties them to the computer?? With a paperback, I can read it anywhere, read for as long as I want without having to change batteries, and even pass the book onto a friend.

    If they want to make the electonic formats more attractive, they need to make them a LOT cheaper than the corresponding paper version.

    1. Re:'reader' books not much cheaper by Jonathan · · Score: 2, Interesting

      So... why in the world would anyone want to use a format that ties them to the computer?? With a paperback, I can read it anywhere, read for as long as I want without having to change batteries, and even pass the book onto a friend

      Well, I don't use MS-Reader myself (For commercial e-books I like the cross-platform Mobipocket), but a major reason I like e-books is I like to read them on my PDA -- not to save money. I carry my PDA around anyway, and having e-books means less to carry. I would purchase all my books as e-books if they were available as such.

    2. Re:'reader' books not much cheaper by Joe+Tie. · · Score: 3, Interesting

      Someone else mentioned the fact that he's got a reader with him all the time anyway, which makes it pretty conveinent to have a book or three in there. I'm not going to bring a book around with me everywhere I go just on the offchance that I might get stuck in a long line, or waiting for someone. But when such an event happens, having good reading material right at hand is very nice. Also nice is being able to have a selection of books in there at any one time, just in case I finish one book while waiting somwhere.

      Battery life isn't much of an issue for me. I've got an older ipaq, and even with that I can usually squeeze about ten hours out of it with the addition of an extra battery pack that's small enough to tote around with the pda. Hooking it up isn't much of an issue. Take out of pocket, plug into pda. And if at home, the power situation wouldn't be an issue.

      --
      Everything will be taken away from you.
    3. Re:'reader' books not much cheaper by donutello · · Score: 1

      I have never really used the reader however an advantage to having the book electronically is being able to search.

      --
      Mmmm.. Donuts
    4. Re:'reader' books not much cheaper by dissy · · Score: 1

      > and even pass the book onto a friend.

      Ahh but you are forgetting, in the USA, you cant do that.
      Well, you can, but then you are voilating copyright and thus a criminal.

      The law specifically says you can not distribute a work that is copyrighted without the copyright holders permission.

      The only reason its not _illegal_ is because of fair use laws, but the DMCA removed most of those, and the next version of law change will no doubt remove most or all of the rest.

      Its only a matter of time if things dont start getting better soon.

      Software and music labels already go after people selling used copyrighted materials online (Ebay and amazon and such)
      Once its in their best interests to do this to real world stores, they will. And they will win there too.

      Its sad, and it sucks, and i hate it too.. but its true :{

    5. Re:'reader' books not much cheaper by Anonymous Coward · · Score: 1, Informative

      Passing a hardcover or paperback book on to a friend is not a copyright violation in the U.S. and does not make you a criminal.

      The principle that protects you is not Fair Use, but First Sale Doctrine -- which says that once a copyright holder distributes a copy of a work, the copyright holder loses any right to control further redistribution of that copy.

    6. Re:'reader' books not much cheaper by Jerf · · Score: 1

      I went to the MS Reader site and followed the links to the on-line publishers sites (such as B&N and amazon). In most cases, the reader format is only $1 cheaper, and sometimes $2 more expensive, than the corresponding paper book (soft or hardcover).

      These facts being plainly obvious, the logical conclusion is either that A: The cost of setting up the Reader infrastructure is so high that these high prices must be charged to recoup them, or B: They want them to fail.

      I don't know which it is. But there comes a time where the choice "They don't realize how stupid this is" ceases to be an option, and I think this is one of those times. These are not stupid people, they are out to make a buck, and if they aren't making money directly, they either expect to make money in the future, or are making it indirectly.

    7. Re:'reader' books not much cheaper by mikeboone · · Score: 1

      I agree that the eBook prices are too high. I've settled for reading the classics on my Handspring Visor.

      Check out Plucker Books. These are Gutenberg books formatted for the Plucker reader.

      I still prefer a real book, but these come in handy when I'm feeding my infant son...bottle in one hand and Visor in the other.

    8. Re:'reader' books not much cheaper by Chmarr · · Score: 0

      Oh, I can certainly understand the convenience of having a good book stored on a device you're carrying around with you anyway, but... on a device specifically and exclusively designed for reading books?

      If you're carring that around to read books... why not carry a paper book around? The ONLY advantage I see is being able to store more than one book on this device, but you then have all the disadvantages I've cited before. eBooks need to be a LOT cheaper to make them worthwhile over the convenience and flexibility of a paper book.

  11. Huh??? by lilricky · · Score: 2, Insightful

    "...to anyone and everyone then on what later became the web..." What?? In 1971 http protocol was around? Or is the author trying to suggest that the internet became the web? I thought the web was part of the internet, not a replacement for. Perhaps Im misreading the article.

    1. Re:Huh??? by dissy · · Score: 2, Insightful

      > "...to anyone and everyone then on what later became the web..." What??

      I think they are saying in 1971 it was distributed to anyone and everyone...
      Then, on what later became the web, they distributed it there too.

      Keeping in mind the web ripped most of its ideas from gopher, and FTP before that, so the web wasnt a breakthrough idea out of nothingness.
      But i dont think they meant it as 'distributed on one medium which later that medium turned into the web'

      Thats atleast how i believe it was suppost to be read.. Hard to tell without commas and what not ;}

  12. XML please by DrXym · · Score: 3, Insightful

    Gutenberg is great and all, but it really needs to dump the text format. So much information is lost that it makes reading some texts extremely difficult. Some format that preserved chapter headings, footnotes, illustrations etc. would be a massive step forward.

    1. Re:XML please by starseeker · · Score: 4, Informative

      I think they discuss this somewhere. The whole point of ASCII is that it can be accessed simply, by almost any machine. It is as stable a format as you will find for data storage, anywhere. They are commited to these books being widely readable, and ASCII is the best way to assure this.

      However, I agree that some books (most actually) lose something in ASCII. What I would like to see is a project which works off the basic Gutenberg texts and formats them in a readable way, preserves illustrations, etc. But it should be an add on to the project, not the main project. Also, remember that that level of preservation is much harder than just typing in and proofreading - you have to consider formatting and scanning images as well.

      As a temporary measure, it would be nice to see someone do an XML markup that can be easily translated into LaTeX, so people can have pdfs with nice fonts, table of contents, title page, etc. That would be a step up. But to do it properly would take a separate effort, and a very large scale one even by Gutenberg standards. Worthwhile, yes. But involved.

      --
      "I object to doing things that computers can do." -- Olin Shivers, lispers.org
    2. Re:XML please by Anonymous Coward · · Score: 0

      I totaly agree. Once the format was standardized developers could more easily create software to display or search the information in which ever way they choose. End user could then use different viewers depending on their intended use for the information.

    3. Re:XML please by DarkOx · · Score: 3, Informative

      The entire point of the project is to preserver the content in a format that is both human and machine readable. See if I don't have any software from the present here in fifteen years and XML is long dead I will still be able to read standard ASCII text even if I am just cat(ing) it through less or printing it as is. I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful. I am not saying that it would be hard to write such software but, the concept is to make sure its easy and always easy to get the data. Also they do put chapter breaks in as text so if you want to find one most wordprocs and e-book readers these days even the fifteen year old ones can find text strings.

      --
      Repeal the 17th Amendment TODAY! Also Please Read http://www.gnu.org/philosophy/right-to-read.html
    4. Re:XML please by AndroidCat · · Score: 1
      a very large scale one even by Gutenberg standards

      I wonder. Does Gutenberg keep their sources in ASCII or something else that they runoff to produce the ASCII final version? It might be that they already have formating information that a smarter runoff process could use. (Heh, I can dream, right?)

      --
      One line blog. I hear that they're called Twitters now.
    5. Re:XML please by Eloquence · · Score: 4, Insightful
      I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful.

      This is complete bullshit. With a proper setup you would convert the source into multiple output formats, including TXT, but you would keep the source in a format that maintains meta information such as formatting, chapters and pages. XML is used in the entire industry exactly with the expectation that it will be around for decades. Even if it won't, the open source code that we have to parse it will not magically disappear -- PG would keep using it to generate output texts from the XML source through all these years. You might as well argue that ASCII will go away.

    6. Re:XML please by GigsVT · · Score: 2, Insightful

      With a proper setup you could read MS Word 2000 docs 100 years from now too. The whole point is to not make it reliant on any particular software, or any particular fad.

      XML hasn't been around long enough to say whether it is a fad or not. ASCII has been around longer than most of us have existed.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
    7. Re:XML please by Teancum · · Score: 4, Informative

      Michael Hart has repeatedly made mention that he does not want to get caught up into the fad of the moment with text formatting issues, and that plain old ASCII is one constant that hasn't needed changing. Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine. I dare you to try and find any other data format that was commonly used 32 years ago that you can still read with current equipment.

      With that said, I believe that XML is perhaps going to have the staying power that ASCII text has had for the past many years. And there are many volunteer projects that you can get involved with that do this including:

      The HTML Writers Guild - Originally they were trying to convert all of the gutenberg texts to HTML, which has been admittedly a resonable standard for a good number of years. Currently they are now going to a version of XML with some standard headings for titles, copyright info (or lack thereof), chapter headings and so forth. More is on this website.

      Project Gutenberg XMLThis is a group more dedicated to the XML, but has a very similar purpose.

      The point here is that once the data is put into ASCII text format, projects like this can and are being done. If you really feel that you want to help with the effort, please join one of these. Also, at any time you can also take the Project Gutenberg files yourself and do this, but at least this gives you a forum to share your work once you are done.

    8. Re:XML please by belbo · · Score: 3, Informative

      The final ASCII version is also produced by hand. After two rounds of proofing, the text gets into a queue. From that queue, a 'post-processor' checks it out and reformats it according to the Gutenberg guidelines, along with any error corrections that might still be necessary. Then she or he uploads the final version to Project Gutenberg, where the 'whitewashers' check the text yet again before posting it to the archive.

      About the XML: You are in fact welcome to produce an XML version, I believe some fellows at DP indeed do that already. However, the main version is the simple text version, since you can read that with everything. But nothing keeps you from also posting an XML or PDF or TeX or whatever version.

      belbo, post-processor at DP

      (Boy I do hope there are no spelling errors in this *g*)

      --

      --
      "Just believe everything I tell you, and it will all be very, very simple."

    9. Re:XML please by fm6 · · Score: 4, Insightful
      The whole point of ASCII is that it can be accessed simply, by almost any machine.
      Just because you store something in XML, doesn't mean people have to use XML to read it. The whole point of XML is to have a format that you can easily transform. Transforming in ASCII is particularly easy.
      XML markup that can be easily translated into LaTeX
      If it's a good content-oriented XML app, it's easily transformed into LaTeX, or anything else. If it isn't a good content-oriented XML app (the StarOffice native format comes to mind) then it shouldn't be used for an online document repository.

      I think the basic problem with the Guttenberg/DP people is that they've been doing things a certain way for so long, and they don't want to retool. And I can see their point -- changing over to XML is a lot of work. And the core DP team already seems pretty busy keeping the web site going.

      On the other hand, I do wish they'd make it a priority. Right now I'm a volunteer proofreader, concentrating on getting out the famous Britannica 11th edition. The amount of information that gets lost in scanning in Greek and other text with weird phonological conventions is just appalling. And the conventions for math and science formulas and equations produces a complex linear format I can't believe anyone would actually want to read.

      Then again, it wouldn't be that hard to go back and insert proper markup. For 90% of the text there's a simple transform between the Gutenberg conventions and a reasonable XML format. The other 10% probably need another look anyway, and wouldn't be hard to do if they've saved the scan images. I haven't had the heart to ask if they do.

    10. Re:XML please by Vann_v2 · · Score: 2, Insightful

      With some works the layout itself is an important part in comprehending them. Do blindly remove the formatting so that everyone can read it is an injustice to the original author.

    11. Re:XML please by DrXym · · Score: 3, Insightful
      Yeah but the entire point of XML is that it defines structure not presentation. If you want to go off and produce something which is readable in some other format (e.g. text), feed the document through some XSL transformation or perl script and it pops out the other end in any way you desire. Someone else can feed it through something that produces a PDF, someone else a Palm e-Book, someone else braille. And this can all be automated on the server. Everyone is happy.


      As for XML being long dead, this is highly unlikely. XML is just structured data and is itself just text. It would be trivial 5, 10, or even 100 years from now to pull out the data from the xml format in any way you please. Unless the grammar is horribly mangled (MS Office), it would even be possible to infer it without even knowing the grammar. I would trust Gutenberg to collectively come up with a format which would be simple for proof readers and parsers alike.

    12. Re:XML please by DrXym · · Score: 1
      The thing is, XML is just plain ascii too (assuming you mandate not to use Unicode or some weird charset), so therefore you're not reducing the ability of people to read the text. At worst they'd be inconvenienced by extra tags if they tried to read it raw, but then again they wouldn't have to.


      The reason for this is XML is easily translatable into just about anything else that the grammar allows for. So I don't see it would make any difference to the project goals if the 'master copy' for every document were in XML and a plain ASCII transform was immediately produced and kept in sync with it. People could still grab a .txt file if they wanted, but for those of us who want to read something on a palm pilot, or comfortably in a browser, we'd be able to do just that.

    13. Re:XML please by AndroidCat · · Score: 1
      It seems a lot of work. (From here at a distance.) I know that even when they started there were tools for that sort of work. (I just found my DTSS RUNOFF*** manual, whee!)

      Ah well, if they have a standard way formating ASCII text then producing an XML version from it should be too daunting. (Me, once again from a distance.)

      But an automated translation to Klingon, priceless! (I'm joking, that would be daunting!)

      --
      One line blog. I hear that they're called Twitters now.
    14. Re:XML please by andrewjjenkins · · Score: 1

      They scan anyways - the proofreaders compare the ASCII version to the scanned image of a page to make sure they match.

    15. Re:XML please by Anonymous Coward · · Score: 0
      In the works, and has been for a while. I have just released my vision paper as to where Distributed Proofreaders (DP) is headed and where we would like to take Gutenberg in the future.

      Conversion on the fly to various formats is a major goal.. but first we need a good source of high-quality marked up etexts. To create this source we are going to be doing some re-working of the processes at DP.

      You can read my paper here (http://www.pgdp.net/vision/)

      And comment on it in the DP forums(http://www.pgdp.net/phpBB2/viewforum.php?f= 4) (yes, you must make an account to post)

      Charles Franks
      Founder, Distributed Proofreaders

    16. Re:XML please by Q+Who · · Score: 1

      Great idea!

      I am sure it will be seriously considered... after, say, 25 years.

      (If there is still XML)

    17. Re:XML please by fm6 · · Score: 4, Insightful

      ... that plain old ASCII is one constant that hasn't needed changing.

      I think you're a little unclear as to what ASCII is. As the "A" in "ASCII" indicates, it's oriented towards American applications. And it consists of a mere 127 characters, which includes 32 control characters that you don't use in text.

      In point of fact, Project Gutenberg has long outgrown the 96 graphic characters in ASCII, though I think they themselves are ignorant of the fact. The seem to have experimented with characters until they found a set that displays the same on "normal" Windows, Macs and Unix/Linux. The result is something they call "extended ASCII" but that's actually subset of both ISO's Latin1 character set and Microsoft's Latin1 code page.

      When is this an issue? Well, I'm a DP volunteer, and I'm concentrating on the Britannica 11th edition. Lots of geographic entries, all of which contain degree symbols. This symbol is not in ASCII! If you follow the DP instructions, you end up entering byte 186 (decimal). If you're using the ISO or Microsoft Latin1 set (and if your computer is localized for the U.S., Canada, or Western Europe, you probably are) then 186 does in fact display as a degree symbol. But if your system is localized for Eastern Europe, you're probably using Latin2, and this byte stands for an S with a cedilla accent!

      In short, "ASCII" is actually less universal than well-formed HTML. In which you represent the degree symbol with a character entity (°) that's the same everywhere.

      Indeed, you can open up the original Declaration of Independence document with your standard web browser, and you can still read it just fine.

      Hardly a representative example. The Declaration of Independence was hand-written, and thus doesn't include a lot of fancy fonts or formatting. A better example is a contemporary novel, such as 1984.

      As it happens I just finished re-reading this one. I read a Plucker file that somebody had transformed from an HTML version, which in turn came from the Project Gutenberg "ASCII" version. Readable enough. But all the typographic nicities -- italics, boldface, etc. -- were reduced to ALL CAPS in the text version, and that was retained in the HTML version. Pretty distracting -- made me feel like somebody was shouting at me. Double Plus Ungood! Thoughtcrime!

      ...once the data is put into ASCII text format, projects like this [XML] can and are being done.

      You make it sound easy. A lot of information is lost when your primary version is "ASCII". It all has to be put back by hand. There's no avoiding this for the large body of existing Gutenberg texts. And of course as recently as 5 years ago, there wasn't a real choice anyway. Even HTML had issues, and serious XML tools didn't exist.

      But now XML technology is pretty mature. It makes sense to store new Gutenberg texts in XML. If people still want "ASCII" copies, the XML is easily transformed into that. Though I a lot more people will want the HTML version -- a format which is actually accessible to more people than "ASCII".

      There are two reasons this won't happen soon.

      The first is that somebody will have to design and implement the necessary XML apps for inputing and proofreading the texts. (Which would alsio elminate a lot of the errors proofreaders make, like entering [Greek: Tau] when they mean [Greek: T].) A huge project. As it stands, the people who maintain the DP web site have their work cut out just to keep the existing software working. That's a vali

    18. Re:XML please by jeremyp · · Score: 2, Insightful

      Using ASCII presupposes that all the important texts you want to preserve are in American English. Since a fair amount of the important pieces of literature come from mainland Europe (actually even the British £ sign isn't in ASCII), it is clearly not up to the job and should be replaced.

      Further, authors often use devices like italics or bold to add emphasis to their work and nowadays even completely different fonts and typefaces. Translating these works to ASCII with no markup actually destroys some of the information in the original works.

      I'm not an enthusiastic fan of XML - too many people advocate it as a silver bullet - but this this sort of thing seems to be an ideal application.

      --
      All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
    19. Re:XML please by Mr.+Piddle · · Score: 2, Insightful

      You might as well argue that ASCII will go away.

      ASCII is simply 127 or 255 characters or so. Writing software to translate it is trivial, and it can even be decoded by hand, if necessary.

      XML adds a lot of complexity beyond this, which hampers a person's ability to read a file with practically no software tools.

      Also, XML is not as ubiquitous as you think, and huge numbers of people don't know how to use the tools to work with it.

      --
      Vote in November. You won't regret it.
    20. Re:XML please by che.kai-jei · · Score: 1

      erm i may sound stupid not being literate or terribloy clever but has anyone mentioned the elibrary client? http://home.cfl.rr.com/ln3gs/ its ideal for lazy slobs like myself. my downloaded data has never been sorted.. but all my bookish files are in one place... almost... well itppears to use texty ascii files okay..

    21. Re:XML please by Zeinfeld · · Score: 1
      Gutenberg is great and all, but it really needs to dump the text format. So much information is lost that it makes reading some texts extremely difficult.

      The guy that runs the scheme is a bigott on this issue. He has some wierd issue with the Web as a competitor to what he sees as his domain.

      Use of a lightweight markup of any sort would improve the value of the texts. Even if they invented their own markup it would be an improvement.

      Archeologists have managed to decipher the Myan hierogliphs, even linear B. Yet we still have people making the idiotic claim that HTML should be avoided lest people forget how to read it.

      When I first used HTML I had to reverse engineer it from other web pages because there wasn't a manual yet. it took me all of about ten minutes to have H1..H6, P and UL figured out.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
    22. Re:XML please by gotem · · Score: 2, Funny

      I dare you to try and find any other data format that was commonly used 32 years ago that you can still read with current equipment.

      punchcards.. what? you mean you don't have your punchcard read connected?

    23. Re:XML please by dvdeug · · Score: 1

      In point of fact, Project Gutenberg has long outgrown the 96 graphic characters in ASCII, though I think they themselves are ignorant of the fact.

      Then I invite you to actually take a look at some of the texts. The Gutenberg people know quite well when they're using ASCII and when they're using Latin-1. If you'll look at the books that are posted, some of the books are posted just in ASCII, and some in 7foo.txt and 8foo.txt files, where 7foo is ASCII and 8foo is Latin-1, and a few just in Latin-1, and the Gutenberg index file lists some posted in Unicode and CP850 (Polish) and other formats.

      But now XML technology is pretty mature. It makes sense to store new Gutenberg texts in XML

      Except for the fact that everyone is familiar with plain text, but very few of our editors are familiar with XML.

      The second is that the old Gutenberg hands have been doing things a certain way for 32 years, and don't want to change. That's not a valid reason.

      I take it if they came out with DVD 2.0, which has 6 gigabyte disks and uses MPEG-4 instead of MPEG-2, you'd immediately run out to buy new drives, movies and players? Conversion from one format to another is a complex, time consuming and problem causing procedure; changing before your completely ready and before there's a large proven benefit isn't good.

    24. Re:XML please by dvdeug · · Score: 2, Informative

      And the conventions for math and science formulas and equations produces a complex linear format I can't believe anyone would actually want to read.

      It's basically TeX, the one true math typesetting system. Most mathematicans and many scientists know it quite well. It beats the heck out of MathML (one example in a MathML tutorial was 8 characters in TeX, and about 50 in MathML.)

    25. Re:XML please by Anonymous Coward · · Score: 0

      I've proofed about 300 pages on DP and have come to be disappointed that the extensive work involved in producing a text achieves a rather uninspiring, 'dumb' ASCII result. Sometimes, if you're lucky, a link to a zipped HTML version also appears. Why you'd ZIP an HTML file that you're offering on a web site is beyond me.

      I am not that interested in continuing DP'ing until an XML framework is implemented. You know, we're doing a lot of mark-up anyway--italicizing words with HTML that get translated back to _this_ most of the time--why not use standards-based mark-up??

      XML is a meta-format. It produces formats. The idea that ASCII is universal doesn't touch the argument in favor of XML. Really, is the XML 'source' ever going to divorced from the ASCII version? This is like offering a distribution in source and binary--users will pick whichever they prefer.

      Sending all our work back to ASCII--losing all the meta-data we created or could have created--is really counterproductive and has dulled my interest in this.

    26. Re:XML please by Zeinfeld · · Score: 1
      Sometimes, if you're lucky, a link to a zipped HTML version also appears. Why you'd ZIP an HTML file that you're offering on a web site is beyond me.

      The point is to make the plaintext version the accessible one and hide the HTML that someone produced.

      We get exactly the same attitude from the IETF which also has a wierd plaintext fetish. You can submit drafts in plaintext and also postscript, but not HTML.

      The reasons given make absolutely no sense, it is pretty easy to verify that an HTML text complies with a DTD or schema using well known tools so it is easy to make sure that documents do not use poorly supported features or appear differently on different browsers. The probability that knowledge of interpeting HTML will be lost is NIL.

      --
      Looking for an Information Security student project suggestion?
      Try http://dotcrimeManifesto.com/
    27. Re:XML please by Anonymous Coward · · Score: 0

      >> I can't resonably read a book that is filled with XML tags and if there is no longer software to parse them then its not to useful.

      >This is complete bullshit.

      No, it's not. Anything other than ASCII is a Huuuggee effort. "Huh?" you say?

      Is XML universally supported?

      Think so?

      Try opening XML in every word and processor out there. Will it open it as tagged text or interpret it as metadata?

      Will all interpreters, interpret the data the same way? If yes, are you sure about that?

      If the meaning behind the XML metadata is not universally understood, then XML is pointless. The fact that XML is not proprietary and easily-changed is irrelivent: Five years from now (where maybe they are 30% of the way through backporting old text to some now-obsolete spec of XML), something ELSE comes along and makes the whole thing redundant or obsolete. Oops.

      Mind you, if the texts that are "complete" are relatively static, I don't see anything wrong with maintaining external "XML diffs" that preserve otherwise lost metadata.

      Are you questioning the storage format as a way of requesting an XML conversion? Or are you seriously going to contribute to such an effort?

      I don't disagree with your sentiment that XML would be "better", but there are things like legacy data and software maintanance that need to be considered.

      This wouldn't be a problem if they could get government funding... just hire someone to do the conversion work no one has time for.

      Political rant: The fact remains the US Government would rather give billions on anti-drug ads to for-profit networks, than target a few million to a non-profit media project like this. If only these guys could lobby, but that also takes money. And yes, it's no different under either party (chickenhawks or chickenshits they both love the drug pork money).

    28. Re:XML please by sketerpot · · Score: 1
      Ah well, if they have a standard way formating ASCII text then producing an XML version from it should be too daunting. (Me, once again from a distance.)

      I would suggest reStructuredText, which doesn't look like markup but is.

    29. Re:XML please by dvdeug · · Score: 1

      Why you'd ZIP an HTML file that you're offering on a web site is beyond me.

      Because one common reason to do an HTML edition is pictures, and the system is set up to have one file per document.

      we're doing a lot of mark-up anyway

      Italics is not a lot of markup. XML calls for a lot of details that would take work. How many books have you post-processed? They accept XML; why don't you find out how hard it is to make an XML edition first hand?

  13. Oh, who reads books anymore anyway? by Faust7 · · Score: 4, Funny

    I absorb all information directly through a USB link from my laptop to my head. Pretty nice, except for the typographical migraines. I always have ibuprofen in hand when visiting Slashdot.

    1. Re:Oh, who reads books anymore anyway? by Stephen+Gilbert · · Score: 1

      Man, that's going to be a nasty upgrade to USB 2...

  14. cool by Anonymous Coward · · Score: 0

    way to go /. This publicity is sure to help the project. Those who haven't got accounts can start helping or atleast consider it. There is bond to be a few people with extra time on their hands to kill, haven't heard of distributed proof reading, and are willing to do it.

  15. it's all lost and stoof by shadowbearer · · Score: 3, Funny

    I like what happens when you run across a title which isn't on the site.

    Example: "It's not there, eh? -- Canadian"

    Heh.

    SB

    --
    It's old. The more humans I meet, the more I like my cats. At least they are honest.
    1. Re:it's all lost and stoof by Rassendyll · · Score: 1
      Yup! Quite amusing, my favourite is the "Newfinese" one:

      Lawrd tunderin' Jesus, bye, it tidn't dere!

      Hehheh...

      --
      An eye for an eye... leaves the whole world blind.
  16. Too bad... by Insurgent2 · · Score: 5, Interesting

    Unfortunately, with the copyright periods being extended so long, the material will only be of (ancient) historical interest. The 98 percent of copyrighted works that are unpublished and should be on there, unfortunately, gets to sit collecting dust instead of benefitting mankind.

    1. Re:Too bad... by Anonymous Coward · · Score: 0

      This points up why we need to institute "Intellectual Property Taxes". Copywrites on intellectual property can be held out of the public domain as long as some minimal taxes are paid. Similar to real property, if there is no benefit to the owner, he will sell it or revert it to the public domain. Presently the owner has no incentive to revert material to the public domain. There is no excuse for material to languish unused.
      I would propose that the property tax schedule be based on a self appraisal of the value. The owner sets a price on its value and he must honor any buyer who meets the price.

  17. You can't even make an AOL joke? by Anonymous Coward · · Score: 0

    On Slashdot? Sheesh.

  18. Business Model by AndroidCat · · Score: 1, Interesting
    1. Gather great PD books.
    2. Hard work to put them in computer form.
    3. ????
    4. Profit! (For all humanity.)

    Hip-Hip-Hooray for a job well done!

    --
    One line blog. I hear that they're called Twitters now.
  19. What the?? by Pave+Low · · Score: 0, Flamebait

    Nice..another opportunity to take an undeserved potshot at Microsoft for no apparent reason. Doesn't it ever get old?

    Newsflash: Microsoft is not trying to promote literacy or freedom. They are trying to make money, like just about every other business.

    If you want to criticize their Reader/ebook business go ahead, but it's rather petty that the submitter had to attach it to a completely unrelated story. Instead of more information and background about Project Gutenberg, we get this crap.

    --
    SIG:Slashdot: indymedia for nerds.
  20. MS Reader is crapola by blair1q · · Score: 2, Interesting

    "cannot open this title on a Terminal Services session"

    What bollocks. Free software and free books but you can't read them over a network link to your own compute server? Microsoft, as usual, screws the pooch.

    Now. How do I uninstall this without removing my adenoids?

  21. pot...kettle...black... by Anonymous Coward · · Score: 0

    While we're on the subject of attaching criticism and potshots to unrelated stories, maybe you should check you sig.

    1. Re:pot...kettle...black... by Anonymous Coward · · Score: 0

      it looks like someone here doesn't know what a sig is.

  22. Ptui! by usotsuki · · Score: 1

    This is why copyrights shouldn't be more than 25 years.

    I say, make 'em 10 years renewable up to 50 (and non-transferable).

    If only there were more works there like, er, hmm, Roald "Charlie & the Chocolate Factory"/"Matilda"/"The Witches" Dahl. :}

    Meh, well, better than nothing. Too bad though they don't have the Tomson New Testament of 1576.

    -uso.

    --
    Dreams, dreams, don't doubt dreams, dreaming children's dreaming dreams. Sailor Moon SS
    1. Re:Ptui! by jc42 · · Score: 1

      I say, make 'em 10 years renewable up to 50

      Even better is the suggestion that anything out of print becomes public domain.

      Copyright holders shouldn't be able to use their copyright to make something inaccessible to the rest of us.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    2. Re:Ptui! by usotsuki · · Score: 1

      I was about to say that but my mind seized up. Anything out of print for 1 year should lose its copyright; use it or lose it.

      -uso.
      That may mean Messy Dog ;)

      --
      Dreams, dreams, don't doubt dreams, dreaming children's dreaming dreams. Sailor Moon SS
    3. Re:Ptui! by raymondbesse · · Score: 1

      ------
      I say, make 'em 10 years renewable up to 50

      Even better is the suggestion that anything out of print becomes public domain.
      ------

      "out of print" for how long? If the term is short, tell how you avoid this business model (for publishers):

      1. Acquire a work.
      2. Publish a limited quantity.
      3. Stop the presses.
      4. Wait for the work to go "out of print"
      5. At that moment, but before anybody else can set it to type, crank out as many as you can sell, with no obligation to pay anything to the author.
      -----
      Copyright holders shouldn't be able to use their copyright to make something inaccessible to the rest of us.
      -----

      But "out of print" is much more subject to abuse by copyright holders than "10 years renewable up to 50". Let's assume the copyright holder wants to prevent distribution of his work. His "business model" would be:

      1. Set up a dummy publisher.
      2. Print one copy just often enough to keep the work from becoming "out of print"
      3. Arrange for the private sale and destruction of the work.
  23. Re:Really great work by the guys behind the projec by Cthefuture · · Score: 3, Insightful

    Yes, they need something like that badly.

    I remember poking around on PG not long ago but soon forgot about it.

    If you're not looking for something specific then the site is kinda, meh. As you suggested, they need a news site, ratings, and other stats so you can see what's available.

    And sections. "Technical", "Poetry", etc. Otherwise it's not very useful to the casual browser.

    --
    The ratio of people to cake is too big
  24. Greenstone by gmaestro · · Score: 4, Interesting

    Great to see a project like this run on Free software. Read more at Greenstone's website.

  25. Are you saying XML will be dead in 15 years? by egg+troll · · Score: 1, Troll

    Heh, XML: The BSD of Markup Languages! :)

    --

    C - A language that combines the speed of assembly with the ease of use of assembly.
  26. Re:YOU FAIL IT!! by Anonymous Coward · · Score: 0

    Just because your dad likes to dress like an archaic seniorita, doesn't make him the Queen of Spain.

  27. Mac disclaimer on PG files by ArsSineArtificio · · Score: 1, Troll

    From the disclaimer/header on Project Gutenberg files:

    If you have an FTP program (or emulator), please
    FTP directly to the Project Gutenberg archives:
    [Mac users, do NOT point and click. . .type]


    Given that a) Macs, being Unix-based, have command-line FTP like everybody else and b) the idea of a point-and-click interface has now passed so far from being a bizarre and contemptible innovation that lots of people are trying hard to develop nice-looking Linux GUIs... ... isn't this snarky instruction now more than a little dated?

    ASA

    --
    All employees must wash hands before seeking equitable relief.
  28. What would Captain Kirk say? by Anonymous Coward · · Score: 0

    Imagine! What if he reads out the Yangs' Holy text, and "We The People" isn't in at least bold text?

  29. We should all actually read this by tie_guy_matt · · Score: 4, Insightful

    Putting a flag on your front porch is a great way to celebrate the 4th of July. An even better way to celebrate the United States' birthday would be to go to this site and actually read the documents that define us as a country.

    In this day in age when it seems everyone is a suspected terrorist and our liberties are stripped one by one in the name of homeland security, and in the name of the rights of large companies, I wish some of our elected officials would actually read these documents sometime.

    A red white and blue flag isn't what makes this country great, nor does an extremely high gross domestic product -- it is the set of ideas that where written over 200 years ago that makes the USA great.

    So everyone go to this site and read those documents. Even if you aren't American you should still read those documents because everyone has the right to the freedoms that our founding fathers wrote about.

    1. Re:We should all actually read this by Anonymous Coward · · Score: 0
      Tell me who's the real patriots

      The Archie Bunker slobs waving flags?

      Or the people with the guts to work

      For some real change


      Read more.

    2. Re:We should all actually read this by Uncle+Dick · · Score: 0

      If anyone needs to read the Constitution of the United States, it's not our elected officials but rather those who have been appointed to the highest court in the land. Some of the recent majority opinions by the SCOTUS make me wonder how well versed activist judges are in the sacred documents that founded this country.

      --
      END OF LINE
    3. Re:We should all actually read this by Anonymous Coward · · Score: 0

      Even if you aren't American you should still read those documents because everyone has the right to the freedoms that our founding fathers wrote about.

      you Americans, you think you invented freedom :P

  30. using XML doesn't prevent using ASCII by Trepidity · · Score: 1

    One of the advantages of XML is that it's very easily transformable. If Project Gutenberg were to produce XML texts, it'd be trivial for them to automatically convert them to plain ASCII and make that version available as well.

    1. Re:using XML doesn't prevent using ASCII by Anonymous Coward · · Score: 0

      >One of the advantages of XML is that it's very easily transformable.
      >If Project Gutenberg were to produce XML texts, it'd be trivial for
      >them to automatically convert them to plain ASCII and make that
      >version available as well.
      >
      >
      If you're going to be converting the XML to plain ASCII why bother to use XML to begin with? Sounds like a huge waste of time and effort to produce the XML version.

  31. Speaking of XML markup by Moderation+abuser · · Score: 2, Interesting

    http://www.conglomerate.org/

    Lovely bit of kit.

    --
    Government of the people, by corporate executives, for corporate profits.
    1. Re:Speaking of XML markup by obotics · · Score: 1
  32. Thanks for support, plans for future by gbnewby · · Score: 5, Informative
    Thanks to everyone who has helped contribute eBooks and other support to Project Gutenberg! If you haven't already, please visit Distributed Proofreaders and proof a page today!

    Lots of plans for the future:

    • Post-#10000 formatting changes. We'll be rearranging our directories to make it easier to find things. Likely we'll go with something OAI (OpenArchives.org) compliant
    • Conversion on the fly to many formats. We'll putting eBooks into XML format (mostly using teixlite.dtd, we think) for conversion on the fly to many other formats.
    • New ways to donate. "Sponsor a book"
    • More contemporary content. We receive donations nearly every week from currently published authors who want to make their stuff available to a wider audience (i.e., our Doctorow's Down and Out)
    • Your ideas! Visit gutenberg.net to sign up for newsletters, find out how to get started producing an eBook, and find eBooks


    Thanks especially to our main and backup distribution sites, iBiblio and The Internet Archive. And thanks to the THOUSANDS of volunteers who have brought us nearly to our 10,000th eBook.



    Dr. Gregory B. Newby

    Chief Executive and Director

    Project Gutenberg Literary Archive Foundation
    http://gutenberg.net

    A 501(c)(3) not-for-profit organization with EIN 64-6221541

    gbnewby@pglaf.org

    1. Re:Thanks for support, plans for future by femto · · Score: 1
      > We'll putting eBooks into XML format

      This one has my vote. Good move! Thanks for running PG.

      I know it is complicated, but is it worth also publshing a style sheet for each work, which can be used to replicate the 'look and feel' of the original? It shouldn't interfere with the aims of readability, as one is free to ignore the style sheet and just read the raw XML or text file.

      (from a Distributed Proofreader)

  33. A decent fast scanner? by Anonymous Coward · · Score: 0

    Does there exist a decent FAST scanner using free software that runs on GNU/Linux or *BSD?

    Especially when you only need to scan text, it seems that every scanner on the market takes > 10 seconds per page.

    Where are the 1-3 second scanners? What do PG volunteers use?

  34. Is that full-speed or hi-speed USB? by Anonymous Coward · · Score: 0

    Is that full-speed or hi-speed USB?

  35. First sale doctrine by yerricde · · Score: 3, Informative

    The law specifically says you can not distribute a work that is copyrighted without the copyright holders permission.

    True, 17 USC 106 says that, but it limits itself "Subject to sections 107 through 121", such as 17 USC 109:

    Notwithstanding the provisions of section 106(3), the owner of a particular copy or phonorecord lawfully made under this title, or any person authorized by such owner, is entitled, without the authority of the copyright owner, to sell or otherwise dispose of the possession of that copy or phonorecord.

    fair use laws, but the DMCA removed most of those

    From the DMCA: "Nothing in this section shall affect rights, remedies, limitations, or defenses to copyright infringement, including fair use, under this title."

    --
    Will I retire or break 10K?
  36. Re:Really great work by the guys behind the projec by Anonymous Coward · · Score: 2, Interesting

    Want to know what's new, etc? The Project Gutenberg website admittedly sucks, and their ASCII adherence admittedly verges on dogma, but there is a good substitute:

    The Online Books Page
    http://digital.library.upenn.edu/books/

    It currently has 20,000 FREE titles listed, from hundreds (at least!) of sources, in all subjects, beautifully categorizes by title, author and subject--and topped off by an up-to-date what's new listing and a fine search engine. Much props to John Mark Ockerbloom and the University of Pennsylvania for supporting the site.

    P.S. Won't one of you nice Slashdotters with time or interest in good works consider doing a complete redesign of the PG site, a full-text on-site search engine for the texts, a better categorization system and just a decent, half-respectable look? It don't get no respect lookin' as it does now. Among other things, the lack of internal organization means that individual texts get shafted in Google rankings.

  37. Cheaper, but useful? by yerricde · · Score: 3, Insightful

    A speedy internet connection and tons of computers wouldn't be needed to print out documents from Gutenberg.

    It still costs money to turn downloaded digital copies of works into printed copies for 100 students in a grade level.

    they would realize that it would be cheaper in the longrun to get texts off Gutenberg, instead of buying pre-bound books elsewhere.

    Public domain etexts, such as those offered by Project Gutenberg, would be useful in schools only under limited circumstances. Though they would be useful in literature classes in high school (and possibly middle school), forget about them in elementary school, where most books are illustrated, because most PG editions leave out illustrations. Forget about them in science classes as well; the 1911 Encyclopaedia Britannica contains outdated views of anything scientific, and anything significantly newer is tied up forever in the Bono Act and its obligatory sequels. And what keeps a publisher from tying purchases of its science books to purchases of its literature books?

    --
    Will I retire or break 10K?
    1. Re:Cheaper, but useful? by Anonymous Coward · · Score: 0

      And what keeps a publisher from tying purchases of its science books to purchases of its literature books?

      The Sherman Act (anti-trust statute).

  38. Eh? The text is in ASCII but requires JavaScript? by WuphonsReach · · Score: 1

    So I try to go to http://www.pgdp.net/ - only to find out that the page won't load unless you enable JavaScript!

    Um... I thought PG was all about not using the latest bells and whistles? (semi-facetious)

    --
    Wolde you bothe eate your cake, and have your cake?
  39. XML is not a file format by Anonymous Coward · · Score: 0

    How can XML have staying power? It isn't a file format?

    It is essentially a meta-format. You can put any tags in there you want. And that's the problem with it. Same problem as TIFF. Anyone can generate one, but few can read others files because to do so means you need to understand every tag that could possibly be in there.

    And since the format is so flexible, people create new tags every day. SO programs written a year ago have zero chance of understanding a file. Just like TIFF.

    If Gutenberg were to switch to anything it should be RTF, it's been around 10 years and still going strong.

  40. how about HTML? by Anonymous Coward · · Score: 0

    HTML preserves formatting and illustrations, and being an ascii format, it is recoverable even if one doesn't have an HTML browser.

  41. [sigh] by Trepidity · · Score: 1

    The point is that many of us would prefer an XML version. The argument against this was that ASCII is a longer-lasting archive format. My counter-argument was that an ASCII version can trivially be produced from the XML both for archival purposes and for those who would prefer such a version.

  42. Something based on DV cameras? by yerricde · · Score: 1

    Where are the 1-3 second scanners?

    Wouldn't it be possible to rig up a high-speed scanner based on digital video technology? Or are CCD and CMOS image sensors not fine enough yet?

    --
    Will I retire or break 10K?
    1. Re:Something based on DV cameras? by Anonymous Coward · · Score: 0

      Of course they're fine enough. Didn't you see Natalie Portman's nipples in Episode 2?

    2. Re:Something based on DV cameras? by dvdeug · · Score: 1

      Wouldn't it be possible to rig up a high-speed scanner based on digital video technology?

      A large part of the speed problem is the page turning or moving the page past the sensors. In any case, digital cameras haven't shown enough detail for good scans, and plantery scanners (expensive digital cameras for scanning) cost several thousand dollars.

  43. Re:Really great work by the guys behind the projec by Anonymous Coward · · Score: 0

    There is a feature on DP right now that allows you to see the 10 latest projects posted to Project Gutenberg. It is in RSS format and at http://www.pgdp.net/c/feeds/backend.php?content=po sted&type=rss

    Joseph Gruber
    DP Developer

  44. XML conversions look lacking. by CryptOntology · · Score: 2, Informative
    I just looked over the links in earlier replies (PGXML and HTML-Writers) and was surprised: HTML-Writers hasn't touched only converted 20-odd etexts from Jan to Feb 2000; and PGXML hasn't even the ability to do valid HTML curled quotes.

    Both look like amateur do-gooders, and we need more of those; but these efforts should be folded back into the organisation of PG, where they may find a permanent home. The alternative is to go adrift, due to too few people being involved (only _two_ people do PGXML) to round out the abilities (and future efforts of) XML uber-format-goodness.

    One major reason why I'd be interested in a longer toolchain, from scans into TXT, and TXT into XML, is to make translation easier. All the older Gutenberg etexts are in different, revised formats. Try making a parser than automagically transforms the dozen or so revisions of the one true "TXT" into XML to see what I mean. (I have; there will always be some books that break important Gutenbrth formatting placeholders).

  45. Re:Really great work by the guys behind the projec by obotics · · Score: 1

    Your comment implies that smart people refrain from having children. That is one of the most idiotic comments I have ever read.

  46. Size by Beryllium+Sphere(tm) · · Score: 2, Insightful

    My entire CD collection fits in my pocket with my iPod. If I could fit my entire book collection in my pocket, that would be a dream and a delight.

  47. This is just wrong by Anonymous Coward · · Score: 2, Interesting

    XML is not a character encoding. XML does not require the use of non-ASCII characters. What can be represented by an XML document is a superset of what can be represented by a plain ASCII document. XML is a human-readable markup.

    MS Word 2000 .doc is a binary format.

    I suspect that you have very little idea what you are talking about.

    PG already uses XML-like markup to indicate an emphasized portion of a passage, among other things. If we were to accept your argument, then even this alone should be seen as a failure.

    Afterall, what if over the course of 50 years we forget what "blahblahblah" means? What if in some impoverished country, while the people have the processing power to read these documents, they do not have the processing power to parse out ?!

    Both of these worries are foolish. If you use an XML format for open content, you have an obligation to provide openly the strict and formal DTD or schema which describes your XML markup.

    What if this DTD or schema becomes lost? This won't happen, because you can embed the DTD or schema in the distributed documents (the books) themselves.

    What if we forget how to parse XML?

    Yes, if there were a terrible war which left the entire planet in shambles for 100 years, then we might forget how to parse XML.

    But this is no different than with ASCII. We could just as easily forget how to convert binary data (you know, '1's and '0's) to corresponding ASCII characters.

    Now, even if there were such a catastrophy, you insult the human creature by suggesting that we would not be able to figure this out, and to figure out the XML DTD or schema. Have you ever read an XML document following a standard article or book DTD or schema? It is painfully obvious what the markup means, and what its use is.

    However, all of this discussion is just silly, because there probably will not be such a catastrophy in the near future.

    You are forgetting that change is gradual. If a new format becomes popular (and this is unlikely, because XML can describe any possible format), it will be a matter of an hour or two to convert the entire PG library to the new format.

    And if the new format is as well defined (as we should hope) as the existing XML format, then this process will be painless.

    You are welcome to continue to comment and complain from a position of clear ignorance, or you can admit that there might in fact be some things which you are not an expert on (suprise!), and that others understand better than you.

    We are telling you that using a strictly defined XML format would in every sense be the better choice. It does not require the use of non-ASCII characters. It is human readable. It is well defined, Conversion of the XML document (which for your purposes would not be very complex) to plain (as in not XML formatted) ASCII strings can be done by a 15-20 year old processor or by hand if needed.

    In fact, since it is human readable, there is no need to do the conversion at all if we some day find ourselves in a situation where we can not automate it (as in after a worldwide nuclear armageddon). The document can be read as is if needed, and the structuring afforded by XML will be just as clear.

  48. Ah, that explains the "Midi-Sum and Nite Dream" by Pac · · Score: 3, Funny

    It probably accounts for Achilles shouting "First Greek" when he lands at Troy's beach and all those descriptions of statues of a girl suspiciously similar to a modern actress in Roman poems. And I was really sure Tom Sawyer's pet was not a penguin.

  49. Oops - tags by Anonymous Coward · · Score: 0

    Oops, my \\ tags weren't commented out, sorry about that :) Bah, I can't figure out how to prevent Slashdot from parsing them out. You get the idea, eh?

    PG uses "italics" tags to designate emphasis in the text.

    1. Re:Oops - tags by GigsVT · · Score: 1

      You get the idea, eh?

      Yes, and the ad hominem attacks were unnecessary.

      Another disadvantage to XML is that it creates a barrier to creation of the documents. It's not easy to write XML style documents, from a lay point of view.

      I've created a HOWTO with an eye on submitting it to the LDP eventually, and it wasn't something I'd expect just anyone to be able to do. Writing SGML/XML documents is difficult, especially at first, when you don't know the names of many of the tags yet.

      --
      I've had enough abrasive sigs. Kittens are cute and fuzzy.
  50. A sterling mistake by fm6 · · Score: 2, Insightful
    Since a fair amount of the important pieces of literature come from mainland Europe (actually even the British £ sign isn't in ASCII), it is clearly not up to the job and should be replaced.
    As a matter of fact, the DP web interface allows you to enter the pound sterling symbol even if you don't have it on your keyboard. It also has a lot of accented characters that aren't in English. The fact is that the Gutenberg people think they're using ASCII, but are actually using Latin1. So Gutenberg texts will display correctly on any system that's localized for the U.S., Canada, or Western Europe. But not elsewhere.

    You made a similar mistake when you entered that character, since you just entered it from your keyboard. (A natural mistake if you have a British keyboard, as I assume you do.) On some web sites, this would only read correctly on systems similarly configured. However, Slashdot puts out the header:

    Content-Type: text/html; charset=iso-8859-1
    which should prevent that. Still, the character entity £ is more portable, and will work even when the web page doesn't specify a character set -- and most do not.

    On the other hand, Slashcode sometimes mangles eight-bit characters when it archives them. So if you seek true immortality, use the character entity!

    1. Re:A sterling mistake by dvdeug · · Score: 2, Informative

      The fact is that the Gutenberg people think they're using ASCII, but are actually using Latin1. So Gutenberg texts will display correctly on any system that's localized for the U.S., Canada, or Western Europe. But not elsewhere.

      Excuse me? The Gutenberg people know quite well when they're using ASCII and when they're using Latin-1. If you'll look at the books that are posted, some of the books posted from DP are posted just in ASCII, and some in 7foo.txt and 8foo.txt files, where 7foo is ASCII and 8foo is Latin-1, and a few just in Latin-1.

    2. Re:A sterling mistake by fm6 · · Score: 1
      I hadn't noticed that. But that convention isn't followed consistently. Of the last 10 files posted from DP, only 7 follow this convention. And I haven't seen it documented anywhere.

      I shouldn't have spoken categorically about the Gutenberg people. Somebody is aware of this issue, because recent posts from DP say "Character set encoding: ISO-Latin-1", which I guess is some help. My assumption of ignorance was based on the DP Proofing Guidelines, which refers to 8-bit characters as "Upper ASCII". But I guess all that means is that people tend to confuse ASCII with Latin1 -- a confusion that doesn't matter except when it does.

    3. Re:A sterling mistake by jeremyp · · Score: 1

      Well I did preview it and it looked OK so that was good enough for me although technically a mistake since I was using HTML mode.

      However, even latin-1 does not have the complete range of characters in use by all writing systems based on the Latin alphabet and you're totally screwed if you want to preserve the Iliad or the Bible (to pick two random texts) in the original. Also, to do bold and italics etc you need some sort of markup - so it might as well be XML or HTML.

      --
      All I want is a secure system where it's easy to do anything I want. Is that too much to ask ~~ Randall Munroe
    4. Re:A sterling mistake by fm6 · · Score: 1
      Actually, you didn't make any mistakes with your input, and I shouldn't have implied that you did.

      This all comes down to a simple misunderstanding: people use "ASCII" and "text" interchangably. Nine times out of 10, when you hear somebody talking about ASCII, they're really talking about Latin1. Usually, this mistake doesn't really matter. But this time it did: The guy who was defending Gutenberg's use of "ASCII" managed to imply that Gutenberg uses an American character set. Which was why you flamed him -- and justifiably so. But in fact the core people at Gutenberg and DP do use Latin1 (I was wrong when I claimed they were ignorant about it). The problem is simply that not everybody involved with Gutenberg knows that ASCII and Latin1 are different character sets.

      As for things like bold face, Greek characters, etc.: Gutenberg has conventions for representing these. But they're not very carefully thought out, and really should be replaced by something more consistent and mistake proof. But Gutenberg seems to be dominated by ML-haters. (In particular, they don't want to change from TeX to ML for representing equations.) Probably not going to change.

      The only problem with your post was something that wasn't your fault: the slash code is likely to screw up any Latin1 character that isn't also an ASCII character.

      Actually, everybody seems ignorant of character set issues. I myself had some misconceptions about how HTML does 8-bit characters before I got into this argument and was forced to do some reading. And let's not even talk about the misunderstandings connected with Unicode!

  51. Modern technology... bleh! by bj8rn · · Score: 1

    I get all the information I need (and more) from "reading" lamb livers (all the Universe is reflected in even its tiniest fragment, you only have to look hard enough). On most days though, I have to resort to using tea leaves (as there aren't too many sheep left in 20 mile radius) but tea leaves have lower bandwidth and they generate more errors (mostly typos, but when reading Slashdot, I occasionally experience a kind of deja vu). I post to Slashdot by using complicated black magic (it includes drawing several pentagrams and calling several names I dare not mention in the fear of accidentally calling their wrath upon me) to directly alter the state of the Universe.

    --
    Hell is not other people; it is yourself. - Ludwig Wittgenstein
  52. Feh! by Anonymous Coward · · Score: 0

    I say make them 25 years and indefinently renewable. That makes the corporates happy and lets other works go into public domain. Who cares if 200 years from now Mickey Mouse and Harry Potter are still owned by corporate interests, when Lord of the Rings and Star Trek are public domain in 10.

  53. I thought you guys were talking about... by bursch-X · · Score: 1

    The German Projekt Gutenberg

    http://www.gutenberg2000.de/

    Which attempts to put all German literature which copyrights have exinguished on the web.

    --
    There are two rules for success:
    1. Never tell everything you know.
  54. Re:Really great work by the guys behind the projec by Anonymous Coward · · Score: 0

    Your comment implies that smart people refrain from having children. That is one of the most idiotic comments I have ever read

    No, it implies that smart people have fewer children, which is true, generally speaking.

    Why? Consider the following reasons why having a lot of children can make make you stupid:

    • You don't know that screwing makes babies. Stupid.
    • You know where babies come from but don't stop screwing anyway. Stupid.
    • You don't understand that babies cost money. Having lots will break the bank. Stupid.
    • Too many children will distract you from studying, so you can't learn. Stupid.
    • More children are more difficult to control. They will be more likely to end up delinquent. Stupid.
    • You and your 20 brats will probably wind up riding my bus, making me late and preventing me from getting any reading done. STUPID!
  55. DVD players barf on TS too by nurb432 · · Score: 1

    PowerDVD wont run under terminal server, even if you are on the console..

    Rather silly..

    --
    ---- Booth was a patriot ----
  56. 32 birthdays means a bucketload of knowledge! by Junkster+Julian · · Score: 1

    Ok all you Gutenberg fans.. here's a good one for y'all... I know this person who is trying to author a site that's right in line with exactly what Project Gutenberg stands for: making texts available to the electronic world.

    So what could we suggest to someone who is donating their blood sweat and tears to an ubiquitous online resource so everyone, from Lynx, IE, Palm/Wince, WAP, and even print/fax "users" can have access to this ubiquitous resource?

    And what if I said that resource was about Nanotechnology and that it beats Nanodot in terms of potential audience, readership, and just plain usability?

    And what if I said that person was me, and that site was popnt.com, also called "Popular Nanotechnology"...? Would that tweak anyone's "interest"? Anyone...?

    Please don't flame me folks, I'm doing this literally out of the kindness of my heart and need any help I can get.

    All I ask is you check it out.. start at Volume 0 if anyone has a chance.

  57. How to sperad the word... by evilviper · · Score: 4, Insightful

    Here's what I did...

    A while back, I used wget to mirror the entire Project Gutenberg works. (I did it off-hours, and contacted them to see if it was a problem, or if there was some other more effecient way to do things)

    Anyhow, with my GBs of text, I used bzip2 -9 to compress each text file. In the end, the entire collection of PG was able to fit on one CD. Since most people don't have bzip2 support I also included the free archiver, Ultimate Zip on the CD as well. I also put a read-me on the CD (that would appear as the first file) with basic instructions what to do.

    One of the great things about CDs is how easy they are to transfer... One stamp, and a 5cent CD envelope, and you can send 2 CDs anywhere in the country (this predated Netflix AFAIK).

    Anyhow, I sent these CDs to two different people, and the next time I talked with them, I found out they'd made several coppies of it. Basically, they heard someone mention some subject that related to one of the files on the CD, brought up the CD, and offered to make a copy for them. This happened a few times that I know of, and quite possibly many times that I don't know of. Quite as easy way to spread the word.

    Of course, with that said, I don't read the PG texts myself... There are two reasons. The first is that I have yet to come across decent software designed for long-term reading. Something that saves your place (automatically?), something with a legible font, and something with light colored text on a dark background, which brings me to my next point...

    The second reason is that monitors are all backlit... That means, reading on a computer screen is like reading text on a floursent lightbulb. It's possible for a while, but your eyes are quickly fatigued. The only screen I have that doesn't do that is my 640x240 B&W LCD screen on my Psion handheld. As good as that is, it's just too small for efective reading. Someone needs to create a non-backlit LCD screen, approx 6" (about the size of a book page) that is small, light, silent, compatible with everything, and most importantly, it needs to have good software that makes reading less work than it normally is on a computer... Until then, relectronic reading isn't going to really be feasable. Screw electronic paper, just give me a screen that doesn't hurt my eyes, and I'm set to go.

    --
    Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    1. Re:How to sperad the word... by Junkster+Julian · · Score: 1
      The second reason is that monitors are all backlit... That means, reading on a computer screen is like reading text on a floursent lightbulb. It's possible for a while, but your eyes are quickly fatigued.
      Acually, we need e-books (or e-resources in general) that can easily be printed because if you like the book that much then you sure as heck would want a hard copy for your archives, no?
      Someone needs to create a non-backlit LCD screen, approx 6" (about the size of a book page) that is small, light, silent, compatible with everything, and most importantly, it needs to have good software that makes reading less work than it normally is on a computer

      And the content, who in their right might would code content for such a generic device? Unless of course someone were to code something that were ubiquitous, except who has that kinda time on their hands, and better, what resource would merrit that kind of application? There's nothing really that would need that level of compatibility, except, well, Nanotechnology...ahem.

      ... Until then, relectronic reading isn't going to really be feasable. Screw electronic paper, just give me a screen that doesn't hurt my eyes, and I'm set to go.

      Make the e-book easily printed and your distribution, archival, and compatibility problems are all solved... anyone notice that little function in the IE [and only in IE, what's up with that?!] print dialog called "Print all linked documents"? One click and your ebook is printed from Title to Index.

      Ok I'll quit ranting now.

    2. Re:How to sperad the word... by evilviper · · Score: 1
      because if you like the book that much then you sure as heck would want a hard copy for your archives, no?

      It's great that they CAN be printed, but I am mainly talking about reading them in the first place. Am I supposed to print out 500 pages of a project guttenberg text just to find out after 5 pages I really don't like it? Even if it was one I like, it's likely not one I want to spend the money to print. I have another system of making a hard copy... I burn files to multiple CDs, which I keep in various locations. If I was able to read them on something like the device I mentioned, I'd probably digitize most of my library, rather than keeping a lot of paperback books around. Who needs to when it's easier, cheaper, and more convient to read it electronically? Unfortunately, it's not fesable yet.

      And the content, who in their right might would code content for such a generic device?

      I am talking about a very low-end computer here... Give it TEXT and PDF software, and there is plenty of content to be had. I think it wolud be a good idea to give this device a CompactFlash port to hold the content, and to make transfers from PCs easy.

      Make the e-book easily printed and your distribution, archival, and compatibility problems are all solved...

      Printing is FAR, FAR too expensive, time consuming, error-prone, wasteful, and provides you with inconviently large output.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    3. Re:How to sperad the word... by AlexCV · · Score: 1

      Vim does most of what you want. I use in Xterm's with black background and a slightly gray shade of white for the foreground and comfortable fonts. Vim will save your position in files. It works well.

    4. Re:How to sperad the word... by evilviper · · Score: 1

      Does it save your position when you move the file to a new location?

      Does it have good-looking fonts? (I haven't yet seen a terminal with good fonts)

      Does it make it easy to read without having to think much about operating it? (Easy page advance, line wrapping, page seeking)

      No, I didn't think so.

      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    5. Re:How to sperad the word... by pHDNgell · · Score: 1

      Does it save your position when you move the file to a new location?

      Nah, but it'd be easy enough to make a vim macro to take care of this.

      Does it have good-looking fonts? (I haven't yet seen a terminal with good fonts)

      That's pretty much an opinion. I spend my work days in vim in terminal windows on my Mac and they look fine. Also remember: vim doesn't have to be run in a terminal window.

      Does it make it easy to read without having to think much about operating it? (Easy page advance, line wrapping, page seeking)

      Absolutely. I don't think about what I'm doing when I'm using vim, I just think about what I want to do (i.e. if I want the next page, I just think that and it happens...).

      This is probably the only reason I'm not using a Dvorak keyboard, in fact. I have no idea what most of the keys I use in vim are. My fingers just press them when I want something to happen. I was 100% Dvorak for about a month, and that was the hardest thing for me. Deleting a line is two taps of my middle finger, not ``dd.''

      --
      -- The world is watching America, and America is watching TV.
    6. Re:How to sperad the word... by evilviper · · Score: 1
      but it'd be easy enough to make a vim macro to take care of this.

      Indeed, but after a few "modifications" like this, your soultion looks rather hackish. If I wanted, I could write a small shell script and call it "reader.sh" or something like that, and simply have it store your position as the first line in the book, and restore you to that position next time. Of course, once again, things like that get pretty ugly after a few features get added.

      I don't think about what I'm doing when I'm using vim, I just think about what I want to do

      That's not actually what I was talking about. What I mean is that page forward/page backward should be one (obvious) keystroke, not a command sequence, or anything like that. Also, movements shouldn't be cursor-based. When people hit "down" everything should go down a page. With a cursor-based system (which is pretty much every editor) you can never be sure what's going to happen. A web browser would make a better book-reader than vi (maybe lynx/links?).

      Besides, even though editing in VI can become automatic after a few seconds, it's not so easy when you are just occasionally entering a keystroke.... Especially when you have things memorized by finger position, you have to concentrate on what you're doing just to hit the one button you need.

      Deleting a line is two taps of my middle finger, not ``dd.''

      I know the feeling... It took me weeks upon weeks, but gradually, you learn new habits, and stop with the old. In many ways my new Dvorak habits are better than my QWERTY habits. The only thing I never got the hang of with Dvorak was the alphabetic direction keys... Mainly because I use the arrow keys instead, but sometimes I login to a terminal set to something odd, and I need to usse them. Then, the process can be a bit slow. That's rather minor though. I'm quite a happy Dvorak user. (check out the typematrix keyboard, dvorak and qwerty-labeled version, and both are hardware switchable between the two layouts)

      (I prefer nvi to vim BTW, I just find vim to feel very rubbery, and just not very strict or consistent in the way it acts--I can live without vim's color formatting as well, dark colors on a black terminal justs makes a bad situation worse)
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    7. Re:How to sperad the word... by Junkster+Julian · · Score: 1
      I am talking about a very low-end computer here... Give it TEXT and PDF software, and there is plenty of content to be had. I think it wolud be a good idea to give this device a CompactFlash port to hold the content, and to make transfers from PCs easy.

      By "text" I suppose you mean, in particular, HTML -- seeing as anything put out on the market that cannot read HTML (v2.0, 3.2, or wow even 4.0) would simply be a total and utter waste of like a zillion web-pages.

      And wait, while we're on the subject of PDF, why would you want a document reader when what you're reading are long and lengthy texts? Have you seen the PDF reader for WinCE? Ugh.

      IMHO priority should be on HTML and relatives.. besides, we [as ebook readers and writers] don't have to battle with companies like adobe about patents and copyrights and whatnot to get in the way of delivering what we all really want -- something to read AND something we can all easily write with! Where are the freeware versions of distiller, everyone?! Geez.. HTML is perfectly fine for a vast majority of ebook implementations.

      Oh and given that you consider printing too error-prone and expensive, does that mean there are no e-texts you would consider worth printing and/or the expense? Anything that is not in print is quite frankly NOT permanent. What, I need batteries to jot down my bus schedule?! yeah.. RIGHT!

    8. Re:How to sperad the word... by evilviper · · Score: 1
      By "text" I suppose you mean, in particular, HTML

      I wasn't listing the final specifications for a device in detail. Yes, it would have HTML support, and CSS would be useful to have as well. With HTML, people are going to want images supported, that means a few different libraries there as well.

      Then there are more document formats. SGML, Tex, info, Postscript, etc.

      why would you want a document reader when what you're reading are long and lengthy texts?

      Umm, because Document is pretty much an all-encompassing word to refer to anything you mught be reading. Personally, if I had something like this, I would send anything over a page to it... So it wouldn't just be for long texts.

      Have you seen the PDF reader for WinCE? Ugh.

      No I haven't. Should I for some reason?

      don't have to battle with companies like adobe about patents and copyrights and whatnot

      Postscript and PDF are both openly documented, patent and royalty free.

      AND something we can all easily write with! Where are the freeware versions of distiller, everyone?!

      What? You've never heard of ghostscript? That's exactly what it is, a full-featured free version of distiller (along with many more features that distiller doesn't have).

      es that mean there are no e-texts you would consider worth printing and/or the expense?

      Completely besides the point here. With the current devices, printing them out is the only option, although I would send it to a cheap print shop if it was longer than about 25 pages.

      If I had a good e-book reader as I imagine, I don't think I would find the need for hardcopies anymore. There is much to be said for making books just a little bit smarter. Searching an electronic version is infinitely faster, and that's the main thing I do with a large number of reference books I keep around.

      Anything that is not in print is quite frankly NOT permanent.

      Anything that is not etched in stone tablets is quite frankly NOT permanent.

      Any music that is not etched in vinyl is quite frankly NOT permanent.

      Any movies that are not recorded on film are quite frankly NOT permanent.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
    9. Re:How to sperad the word... by pHDNgell · · Score: 1

      That's not actually what I was talking about. What I mean is that page forward/page backward should be one (obvious) keystroke, not a command sequence, or anything like that. Also, movements shouldn't be cursor-based. When people hit "down" everything should go down a page. With a cursor-based system (which is pretty much every editor) you can never be sure what's going to happen. A web browser would make a better book-reader than vi (maybe lynx/links?).

      Well, it is a keystroke, ^F or page down. Those are both page-based.

      I know the feeling... It took me weeks upon weeks, but gradually, you learn new habits, and stop with the old. In many ways my new Dvorak habits are better than my QWERTY habits. The only thing I never got the hang of with Dvorak was the alphabetic direction keys... Mainly because I use the arrow keys instead, but sometimes I login to a terminal set to something odd, and I need to usse them. Then, the process can be a bit slow. That's rather minor though. I'm quite a happy Dvorak user. (check out the typematrix keyboard, dvorak and qwerty-labeled version, and both are hardware switchable between the two layouts)

      Yeah, that could be a problem for me. I never use the arrow keys. The alpha keys always work, the arrow keys don't always work, so I just learn the one. I don't actually know what the keys are, though...I just know how to make the cursor go where I want it to go.

      The Mac has a keyboard mode that does Dvorak for typing, and qwerty when you hit command, which is cool, but not helpful for vi.

      (I prefer nvi to vim BTW, I just find vim to feel very rubbery, and just not very strict or consistent in the way it acts--I can live without vim's color formatting as well, dark colors on a black terminal justs makes a bad situation worse)

      I don't think I've used nvi a lot, but I use a *lot* of vim's features nowadays. It started with color coding, but now I've got lots of nice scripts to help me with my programming or just general text writing. Things like learning macros on the fly and reapplying an arbitrarily long sequence of commands (think really-cool . command), build-system awareness (my config knows that if I open a java file, that :make should use ant and will properly handle all errors that it encounters (letting me flip through them and correct them). I also do emacs-style multi-buffer things a lot (open lots of files simultaneously). I can do things like a *really* global search and replace, split screens, use navigation-enhancing scripts (taglist.vim is awesome), etc... vim is an incredible tool for a programmer.

      It also understands just about every language. When I'm learning a new programming language (which I try to do a lot), it tends to know more about it than I do. That's pretty helpful for knowing when you're making a stupid mistake or something.

      --
      -- The world is watching America, and America is watching TV.
    10. Re:How to sperad the word... by Junkster+Julian · · Score: 2, Interesting
      I wasn't listing the final specifications for a device in detail. Yes, it would have HTML support, and CSS would be useful to have as well. With HTML, people are going to want images supported, that means a few different libraries there as well.

      Ok I'm gonna tone myself down a little... this should be a little less of a rant so hang on. The point I was trying to make is that I think HTML should be the one technology an ebook reader should be able to support unlike even standard desktop browsers. I'm not sure it would be such a strech to see the "web browser" condensed into a hardware-streamlined product. SGML support would be great but to implement SGML we must first master HTML, and if we can't deliver an machine dedicated to rendering HTML then how much chance would we have in implementing a technology with less sample-base? It's hard to match HTML in terms of demographic penetration at least in so far as actual text-based content... contrast with postscript, pdf, and the like which (for the most part) do not have human-readable source -- essential for "debugging" our ebooks.

      Yeah and the pdf reader for WinCE needs, uhh, "work". It is by no means comparable to its desktop cousins... a cheap knock-off from a huge company complaining about the limitations of PDAs. IMHO, avantgo is a considerably better "ebook reader" that's easier to code for and is far more compatible. HTML 3.2, that's it... can't go wrong. Visit my site and you'll know what I'm talking about: popnt.com Keep in mind my work is still beta, but anyways.

      And about permanent media.. well.. I'm going to go way out on a limb here and suggest that print cannot truely be compared with your examples.. although I do in all seriousness appreciate your debate. Just for the sake of argument, what distinguishes print from (at least) the three examples you listed (and please I hope this does not escalate) are the following:

      1. Stone tables were never mass-produced in the same way as books (or paper media) were: sure there were sandscript, but specifically what distinguishes print as breakthrough was its potential for industrial mass-production via inventions like the printing press.. ubiquity made the press permanent in many ways.
      2. Music (and movies): due to the very recent inventions of the gramophone and that which makes up a motion picture (the camera, film, etc), I'm not sure these can be compared to print media, specifically because of their very recent introductions to society.. note that I am not saying music is a new introduction, rather recorded music.. so in that light, and given the whole MP3 hoopla we're having with the RIAA et al, I think the music/movie industries would have a lot to learn from the print industry -- not the other way around. Also, the music and movie industries themselves use a concept very closely tied in with books in that they are given data to process. I'm not sure music/movies can really compare, in all seriousness to books.. in all honesty, I'm not sure there is much out there that even CAN compare to the print industry. These are secondary industries which require processing that print-media does not. Print is unique in that respect and is therefore again really tough to beat! Even braille is a form of print which requires nothing whatsoever, not even a light-source! What makes print so permanent is its ubiquity -- the sheer volume of static copies whose content and information cannot and will not change over time. No other industry has this power.
    11. Re:How to sperad the word... by evilviper · · Score: 1
      Yeah and the pdf reader for WinCE needs, uhh, "work". It is by no means comparable to its desktop cousins... a cheap knock-off from a huge company complaining about the limitations of PDAs. IMHO, avantgo is a considerably better "ebook reader" that's easier to code for and is far more compatible. HTML 3.2, that's it...

      Well, I do have a WinCE device that I paid several hundreds of dollars for, sitting around collecting dust. Instead, I use my Psion 5mx all the time, and it has a great PDF reader. It is, in fact, a port of the Unix xpdf to the Psion, so it is open-source and GPLd. There is a good web browser for the Psion, but it is very very large compared to xpdf, and with a PDF you can include everything, including images in a single file. So, if I want to read a web-page on it, I typically just convert it to PDF on my desktop before I transfer it over.

      Stone tables were never mass-produced in the same way as books (or paper media) were:

      Books have never been mass-produced in the same way as CDs are.

      If you wanted to do so, you could easilly make a device that mass-engraves stone tablets. Unfortunatly, it wouldn't be as economical as paper. But on the same note, mass-production of books isn't anywhere as effecient as mass-producing CDs/DVDs.
      --
      Slashdot gets worse every day... Pipedot: News for nerds, without the corporate slant
  58. Re:XML please (plug) by Anonymous Coward · · Score: 0

    I'd like to mention my freeware ebook reader - it's got a direct-downloader for the Gutenberg catalog and all texts therein. It also works on Linux, under Wine (ok, it would be nice to have a native version but I've been coding in VB/VC for over a decade. Old dog, new tricks, etc)

    I specifically wrote it to take gutenberg text files and reformat them back into something you can actually read.

    Cheers
    Simon
    Oh yeah, a URL: http://www.spacejock.com

  59. 1971 mass mailing by Ed+Avis · · Score: 1

    Does this mean that the Declaration of Independence is the first spam?

    --
    -- Ed Avis ed@membled.com
  60. Getting punch cards read by Teancum · · Score: 1

    While I actually took the time to sit down and learn how to read punchcards from just their hole patterns (which isn't too difficult compared to reading data files directly from a hex editor if you have to dig into why a program isn't reading a certain file correctly).

    I have seen some punchcard machines come into the local thrift store a couple of years ago, I think it would be hard to find one now.

    The nice advantage that punch cards have over just about every other data storage medium is that as long as the cards are preserved in archival conditions, bit rot is almost impossible. And the archival conditions are no different from perserving old books, which has a long tradition and history.

    The only problem is that punch cards is that it takes so much room, especially compared to the amount of data actually stored.

  61. Printing is essentially zero-cost by Anonymous Coward · · Score: 0

    If you've seen most of the paper that they print on, which is newsprint when you buy paperback, the printing fees are essentially a near zero-cost procedure. It only costs them 1-2 $ to actually print the book, but it is the intellectual property that matters a lot more.

    By purchasing a book, you are, in essence, "licensing" the work for your use from the publisher. I don't see why e-books should be more expensive, but if they are a dollar or two cheaper, that's probably because they don't actually have to print it.

    Moreover, there are some advantages to using the electronic format versus using the paper format. You try finding a sentence in a 400 page book, and I'll try finding the same thing in an e-book. Needless to say, it will take a lot less time to find it in the electronic book format.

  62. Using XML by Teancum · · Score: 1

    The point is that many of us would prefer an XML version. The argument against this was that ASCII is a longer-lasting archive format. My counter-argument was that an ASCII version can trivially be produced from the XML both for archival purposes and for those who would prefer such a version.


    I would have to agree that XML does offer some resonable options that make it much superior to plain ASCII test (or Latin-1 as has been discussed in this thread).

    A point I want to make is that:
    1. XML versions are being generated anyway. Try it out, it is there right now. And if you don't like how the XML conversion has occured, change it. This is open source/public domain and I'm sure the project could use some of your help. Add to this effort if you know XML. There is a big backlog of older Project Gutenberg texts that could use this work.
    2. XML is just a recent idea. This is even new compared to other formats and system on the internet. As I said in my original post, I believe that XML is going to have the staying power necessary to outlast the next range of operating systems over the next 20-50 years and this isn't a classic EBDIC vs. ASCII issue. That said, Project Gutenberg was started well before XML (or SGML/HTML, etc.) and when a project gets to this sort of age it becomes necessary to look at changes in computing technology with a certain amount of jaundice.


      1. I've been involved with the computer industry long enough myself that I feel the caution that Michael Hart has towards this issue is totally legitimate. Let's let XML prove itself and survive the next couple of rounds of new fads for data formatting, and if it makes it more than a decade (XML isn't that old), it might just make it longer.
  63. Convert LIT updated to version 1.4 by Danj2k · · Score: 1

    I'm happy to announce that there's now a new version of Convert LIT which is compatible with the updated version of Microsoft Reader. You can find it on our website here.

  64. Enforced? by yerricde · · Score: 1

    And who enforces the Sherman Antitrust Act? Microsoft got off on Sherman Act charges with just a slap on the wrist.

    --
    Will I retire or break 10K?