Slashdot Mirror


Project Gutenberg Made Accessible

scishop writes "Mazarin is an open-source interface to Project Gutenberg's library. Mazarin increases the accessibility of Gutenberg's 10,000+ books as it formats the books for HTML display -- providing paginations in addition to generating table of contents and other advanced markup features -- along with enabling users to carry out full-text searches on the entire library."

26 of 214 comments (clear)

  1. Tested by mpost4 · · Score: 3, Interesting

    I can not test the claim of all 10k works, but I tested what I thought would be most likely to be left out, and I found that they were there.

    I Tested Martin Luther.

    (if it was not for the printing press the reformation would not have been as sucsessfull as it was)

  2. Looks nice and dandy by tfbastard · · Score: 3, Interesting

    But did they have to make the tutorial presentation a fullscreen flash file?

  3. PG by ArbiterOne · · Score: 4, Informative

    Most of PG's more well-knownalready are formatted into HTML.

    1. Re:PG by Charles+Franks · · Score: 5, Informative
      The promo.net address is an old one and no longer maintained, please reference gutenberg.net

      Charles Franks
      Founder, Distributed Proofreaders

    2. Re:PG by Charles+Franks · · Score: 3, Informative
      For PDA-friendly formats of PG e-texts try Blackmask and/or Pluckerbooks

      Charles Franks
      Founder, Distributed Proofreaders

    3. Re:PG by flimnap · · Score: 5, Informative

      Indeed, there are many, many sites that do all sorts of wonderful things with Project Gutenberg eBooks. That's the wonderful thing about PG, you can do anything you like with the books.

      While personally I prefer the original and the best... hey, whatever floats your boat!

      It is very much worth noting that Project Gutenberg would have nowhere near as many eBooks as it does without the help of Distributed Proofreaders. Sign up there, and proof just a page a day to make your contribution to preserving literary history. You can proofread as little or as much as you like, and do something worthwhile! Distributed Proofreaders is a great way to spend some of your time.

    4. Re:PG by bbc · · Score: 4, Informative

      PG does accept other formats, gladly.

      However, it insists on at least a plain vanilla version of a text, as that format has proven to be the most durable and accessible.

      So next time you post a text version to PG, make sure you post HTML and PDF versions alongside.

      (Do read the rules for HTML in the PG FAQ first, though.)

  4. P2P / Library by Anonymous Coward · · Score: 5, Interesting

    Interesting idea, I can't get to the website but a feature I'd want is the content shared P2P so you don't have to rely on a central server for the content.

    A central webpage index could just have ed2k links to the files: sharereactor for books. When they update the book they release a new hash-link and the file onto the network.

    It being P2P it could open it up to more then just public domain books too ;).

  5. Slashdotted - but nice error messages by twoshortplanks · · Score: 4, Interesting

    Hmm, nicely formatted error messages. Does anyone know what this is? I'm assuming it's a mod_perl handler of some sort.

    --
    -- Sorry, I can't think of anything funny to say here.
    1. Re:Slashdotted - but nice error messages by Zocalo · · Score: 4, Informative

      Judging by the snippet of Perl at the bottom of the error message I'd say it's part of the Mason CMS.

      --
      UNIX? They're not even circumcised! Savages!
  6. 10,000+ books? by Ronald+Dumsfeld · · Score: 5, Funny

    10,000+ books. Right, so I've got to read all of them before I can post a comment?

    Oh wait, this is Slashdot.

    --
    Where's the Kaboom?
    There's supposed to be an Earth-shattering Kaboom.
  7. Gutenberg is totally inaccessible by Anonymous Coward · · Score: 5, Interesting

    This sounds like it just adds complexity and does not make gutenberg's data accessible.

    There were several research projects for which I used pg as a corpus. However, pg's a terrible hassle for the first-time researcher, since the format of the introductory text ("we're gutenberg, here's the copyright, blah blah") is inconsistent.

    You have to remove the introductory text to avoid bias in the corpus, however there are so many pathological special cases (different formats, spelling, languages, words used, punctuation, case) that it requires several hours of Perl coding to successfully strip the header text from 75% of the documents with >99% accuracy. Yuk.

    If gutenberg is serious about making their work more accessible, they should think about the simple concern of ensuring consistency in the header text format.

  8. Text version by Whitecloud · · Score: 5, Informative

    since some seem to have trouble on the index page... here it is:

    Project Gutenberg is the brainchild of Michael Hart, who in 1971 decided that it would be a really good idea if lots of famous and important texts were freely available to everyone in the world. Since then, he has been joined by hundreds of volunteers who share his vision.
    Now, more than thirty years later, Project Gutenberg has the following figures (as of November 8th 2002): 203 New eBooks released during October 2002, 1975 New eBooks produced in 2002 (they were 1240 in 2001) for a total of 6267 Total Project Gutenberg eBooks. 119 eBooks have been posted so far by Project Gutenberg of Australia.

    Click here for the full PG story and here for the latest News , and learn about the Stockholm Challenge Award recently won by Project Gutenberg in the category Culture.

    The key link is search page.

    --

    Do you need a website upgrade?

    1. Re:Text version by jonathan_ingram · · Score: 3, Informative

      well, to those with computers & internet connection...

      One of the projects run by the Internet Archive is the Bookmobile, which creates, prints, and gives away (for a nominal production fee) books created from public domain sources. One of their most popular products is an illustrated edition of Alice in Wonderland.

      who can read English...

      Yes, PG's content is primarily English at the moment, but this is only because most of the volunteers up until now have been English. If you are confident in a language other than English, you can help us get more books in this language -- either by scanning them, or by proofing the books which other people have scanned by joining the Distributed Proofreading Project (or the new EU sister-project DP Europe). At the moment the main site has projects available for proofing in German, Latin, French, Spanish, Swedish, Finnish, Dutch, Hebrew, Danish, Italian, ancient Greek, and Gaelic. The EU site has, in addition, books available in Serbian, Slovenian, Romanian, Welsh, Hawaiian, Russian, Polish, Lithuanian, Ukranian, modern Greek, and Bulgarian.

      if the copyright has expired...

      Yes, the vast majority of books in PG are copyright expired. This isn't a big problem, though, as we've only scratched the surface of the number of copyright expired books. Even at the current rate of growth, there's enough to keep us going until the US copyright regime starts letting new books into the public domain in 15 years or so.

  9. Best way to read online texts? by GGardner · · Score: 4, Interesting

    What's the best way to read online texts? There are a bunch of PG texts I might like to read, but reading them in a web browser, as a big text file gets tiring after ten minutes or so. I'm not sure why I can read a book for hours, but the screen for minutes, but there you have it. I don't think that HTML will help this problem -- does anyone have recommendations for better ways to read these files?

    1. Re:Best way to read online texts? by gunne · · Score: 5, Informative

      If you have a palm pilot, i can recommend Weasel Reader.
      I've been using it for a couple of years on my Palm V, and despite its small screen size it works perfectly for reading ebooks.

  10. the tutorial talked to me by Lord+Zerrr · · Score: 3, Funny

    I love sexy robot voice tutorials! mazarin tutorial

    --
    "If the facts don't fit the theory, change the facts." -Albert Einstein
    Karma? There's a serial modder out there.
  11. Straight HTML = archaic by Leobinus · · Score: 5, Interesting

    Bah. Posting HTML is so 1996. You can do so much more with these texts. One example is Open Source Shakespeare, which takes all of Shakespeare's texts, indexes them, presents them in an attractive manner, creates a concordance, provides a full-text search engine, organizes the lines by character, etc.

    All of the texts are open source, and you can download the database and source code from the site, too. Check it out.

  12. Bam by kunudo · · Score: 4, Funny

    Monday May 24, @03:14PM : Project Gutenberg made accessible
    Monday May 24, @03:15PM : Project Gutenberg made inaccessible

  13. Re:and then just think by mangu · · Score: 4, Interesting
    While, at first, one would classify your post as either "offtopic" or "flamebait", I think an interesting point can be raised here: the Lutheran reformation was an early consequence of the maxim "information wants to be free".


    It was very convenient for the Roman Church to have a practical monopoly on what was widely acknowledged at the time to be the main source of information, the Holy Bible. When the printing press was invented, this diluted that monopoly, since then the ordinary people could afford their own copies of the Bible and became independent from the Church for information. Luther was one of the first to realize that, when he urged people to read the Bible. A consequence of that was that people learned to read. Until early in the 20th century, the literacy rate for countries which are mostly Lutheran, e.g. Scandinavian countries and parts of Germany, were much higher than in southern Europe, where people were mostly Catholic.


    A modern analogy:

    Catholic Church --> RIAA

    Lutheranism --> P2P

  14. Slashdot'd by Bipedismaximus · · Score: 4, Funny

    "Project Gutenberg Made Accessible"

    Oh, the irony that is slashdot.

    --
    The way to a man's heart is through the left ventricle
  15. Gutenberg Disclaimer by Twinky · · Score: 5, Interesting
    What always struck me as odd is the enourmous length of the disclaimer that Project Gutenberg attaches to every text. To me it seems to be the most obvious sign of a law system that is ridiculously screwed. No book I ever read had a legal statement like this.

    Quote:

    LIMITED WARRANTY; DISCLAIMER OF DAMAGES But for the "Right of Replacement or Refund" described below, [1] the Project (and any other party you may receive this etext from as a PROJECT GUTENBERG-tm etext) disclaims all liability to you for damages, costs and expenses, including legal fees, and [2] YOU HAVE NO REMEDIES FOR NEGLIGENCE OR UNDER STRICT LIABILITY, OR FOR BREACH OF WARRANTY OR CONTRACT, INCLUDING BUT NOT LIMITED TO INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES, EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES. If you discover a Defect in this etext within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending an explanatory note within that time to the person you received it from. If you received it on a physical medium, you must return it with your note, and such person may choose to alternatively give you a replacement copy. If you received it electronically, such person may choose to alternatively give you a second opportunity to receive it electronically. THIS ETEXT IS OTHERWISE PROVIDED TO YOU "AS-IS". NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, ARE MADE TO YOU AS TO THE ETEXT OR ANY MEDIUM IT MAY BE ON, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimers of implied warranties or the exclusion or limitation of consequential damages, so the above disclaimers and exclusions may not apply to you, and you may have other legal rights. INDEMNITY You will indemnify and hold the Project, its directors, officers, members and agents harmless from all liability, cost and expense, including legal fees, that arise directly or indirectly from any of the following that you do or cause: [1] distribution of this etext, [2] alteration, modification, or addition to the etext, or [3] any Defect.
  16. Now might be a good time to consider by SolemnDragon · · Score: 5, Insightful

    ...donating to the good cause. If you don't want to donate money, volunteer to proofread, or it might be worth it for writers out there to consider a notation in your will that will allow your works to pass either directly into the public domain, or, as i have been in contact with lawyers to discuss, simply passing the copyright of your own works on to project gutenberg. This allows them more work to publish, and if you're in a contract somewhere that allows for royalty collection, you can set it up so that those royalties switch to project gutenberg at the time of your death.

    Now might also be a good time to contribute an hour a week to a literacy project, or to make a donation there. Adult literacy is a serious issue all over the world, and that includes right here in the states, where there really are bright people out there who could have better lives if they could read. I can't think of a more on-topic subject than project gutenberg to discuss adult literacy and the need for both literacy teaching and to support free literature for the masses such as this project provides.

    Just my $0.02...

    solemndragon

  17. Funny definition of "accessible..." by dpbsmith · · Score: 3, Insightful

    At the risk of pointing out the obvious, Michael Hart's decision to make the basic format of PG texts "plain vanilla ASCII" has resulted in texts that are highly accessible by any meaning I can think of for that word. They are also compact, platform-agnostic, and durable. Texts contributed in the 1980s are fully usable today.

    While there have been constant complaints about PG using the "wrong" format, opinions on the "right" format have been the flavor-of-the-month (or at least several flavors per decade). Had PG decided to use a "better" format, all of their volunteer time would probably have been taken up converting (say) WordPerfect to RTF to HTML to SGML to XML, leaving relatively little time to digitize and proofread texts.

  18. Already very accessible... by ricky-road-flats · · Score: 4, Interesting
    I only last week downloaded Project Gutenberg as an ISO - it has 9,500 books on it and weighs in at about 3.85 GB. All the books are as plain text within a ZIP file, accessed through a set of basic web pages also on the disc.

    It's great - I now have that on my laptop hard drive, mountable by Alcohol, so I'll never be short of anything to read, especially when the web's not available...

    I can't find the torrent file I got it through, but if it helps the filename is pgdvd.iso and the size is 4,139,646,976 bytes.

  19. Re:and then just think by carlos_benj · · Score: 3, Funny

    Well, in my converstaions with information I have determined that while some information does indeed want to be free, other information does not want to rock the boat. Some information simply wants to be left alone. There are also some sub-groups of information that are blissfully ignorant of their situation and do not realize that they are not already free.

    I have not had the time to speak with all information, so this is merely anecdotal evidence of the diversity of opinion among informations.

    --

    --

    As a matter of fact, I am a lawyer. But I play an actor on TV.