Project Gutenberg Made Accessible
scishop writes "Mazarin is an open-source interface to Project Gutenberg's library. Mazarin increases the accessibility of Gutenberg's 10,000+ books as it formats the books for HTML display -- providing paginations in addition to generating table of contents and other advanced markup features -- along with enabling users to carry out full-text searches on the entire library."
I can not test the claim of all 10k works, but I tested what I thought would be most likely to be left out, and I found that they were there.
I Tested Martin Luther.
(if it was not for the printing press the reformation would not have been as sucsessfull as it was)
But did they have to make the tutorial presentation a fullscreen flash file?
http://www.thewolfweb.com This website is faster than Slashdot for news.
frist psot?
System error /var/www/html/mazarin/search.cgi line 118.
... ... /var/www/html/mazarin/search.cgi:118 /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/Request. pm:649 /var/www/html/mazarin/search.cgi:84
error: Can't call method "prepare" on an undefined value at
context:
114:
115:
116: my $dbh=DatabaseConnect("books");
117: my $Search;
118: if ($mode eq "header"){
119: $Search=$dbh->prepare("SELECT author, title,id FROM book WHERE MATCH (title, author) AGAINST (? IN BOOLEAN MODE) LIMIT $start, $end");
120: $Search->execute($modQuery);
121: }elsif($mode eq "fulltext"){
122: $Search=$dbh->prepare("SELECT author,title,id FROM book WHERE MATCH (search_text) AGAINST (?) LIMIT $start, $end");
code stack:
Most of PG's more well-knownalready are formatted into HTML.
I searched on "oil" and came up with numerous passages from various versions of the Bible, and a few recipes from an Italian cookbook. Attempted to search again, but amazingly the site fails to respond...
Nothing but the finest in meaningless drivel
Interesting idea, I can't get to the website but a feature I'd want is the content shared P2P so you don't have to rely on a central server for the content.
;).
A central webpage index could just have ed2k links to the files: sharereactor for books. When they update the book they release a new hash-link and the file onto the network.
It being P2P it could open it up to more then just public domain books too
Fully slashdotted
can someone debug this ?
16: my $dbh=DatabaseConnect("translations");
17:
18: sub Prepare{
19: $dbh=DatabaseConnect("translations");
20: return $dbh->prepare($_[0])
21: or die "Couldn't prepare statement: " . $dbh->errstr;
22: }
23:
Free Web based FTP
Whatever I try to search for, all I get is a nice code dump... Anyone else has the same?
/var/www/html/mazarin/search.cgi line 118.
... ...
/var/www/html/mazarin/search.cgi:118 /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/Request. pm:649 /var/www/html/mazarin/search.cgi:84
error: Can't call method "prepare" on an undefined value at
context:
114:
115:
116: my $dbh=DatabaseConnect("books");
117: my $Search;
118: if ($mode eq "header"){
119: $Search=$dbh->prepare("SELECT author, title,id FROM book WHERE MATCH (title, author) AGAINST (? IN BOOLEAN MODE) LIMIT $start, $end");
120: $Search->execute($modQuery);
121: }elsif($mode eq "fulltext"){
122: $Search=$dbh->prepare("SELECT author,title,id FROM book WHERE MATCH (search_text) AGAINST (?) LIMIT $start, $end");
code stack:
Hmm, nicely formatted error messages. Does anyone know what this is? I'm assuming it's a mod_perl handler of some sort.
-- Sorry, I can't think of anything funny to say here.
I think the site is about to go down, it's already terribly slow...
Martin
Unfortunately, I am not Wil Wheaton
to be revealed (made excessofbull) buy corepirate nazi ?pr firm? clerics?
.asp on that won? lookout bullow.
don't bet yOUR
tell 'em robbIE? tell 'em how easy it was to become won of the soul DOWt payper billyonerrors?
...deader than Steve Gutenberg's career or even BSD.
and just think of all the problems which could have been avoided over the years if the reformation never happened...
it really is one of the most unfortunate twists in history (of course, slashdotters will just jump on christianity in general)
10,000+ books. Right, so I've got to read all of them before I can post a comment?
Oh wait, this is Slashdot.
Where's the Kaboom?
There's supposed to be an Earth-shattering Kaboom.
Well, the site failed the
raw error
/usr/lib/perl5/5.8.3/BookTools/Translator.pm line 20.
/usr/lib/perl5/site_perl/5.8.3/HTML/Mason/Exceptio ns.pm line 131C an\'t call method "prepare" on an undefined value at /usr/lib/perl5/5.8.3/BookTools/Translator.pm line 20.^J') called at /usr/lib/perl5/5.8.3/BookTools/Translator.pm line 20 /usr/lib/perl5/5.8.3/BookTools/Translator.pm line 26 /usr/lib/perl5/5.8.3/BookTools/Translator.pm line 96 /var/www/html/mazarin/index.html line 3 /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/Componen t.pm line 134m ponen t::FileBased=HASH(0x9d64cb4)') called at /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/Request. pm line 1069 /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/Request. pm line 1068 /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/Request. pm line 338 /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/Request. pm line 338 /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/Request. pm line 297u est:: ApacheHandler=HASH(0x9e20d14)') called at /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/ApacheHa ndler.pm line 134 /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/ApacheHa ndler.pm line 134T ML::M ason::Request::ApacheHandler=HASH(0x9e20d14)') called at /usr/lib/perl5/site_perl/5.8.3/HTML/Mason/ApacheHa ndler.pm line 792H TML:: Mason::ApacheHandler=HASH(0x9d56f8c)', 'Apache=SCALAR(0x9e2045c)') called at (eval 34) line 8o n:: ApacheHandler', 'Apache=SCALAR(0x9e2045c)') called at /dev/null line 0 /dev/null line 0
Can't call method "prepare" on an undefined value at
Trace begun at
HTML::Mason::Exceptions::rethrow_exception('
BookTools::Translator::Prepare('SELECT id FROM language WHERE language_code = ?') called at
BookTools::Translator::SetLanguage('en') called at
main::_('PROJECT_NAME') called at
HTML::Mason::Commands::__ANON__ at
HTML::Mason::Component::run('HTML::Mason::Co
eval {...} at
HTML::Mason::Request::comp(undef, undef, undef) called at
eval {...} at
eval {...} at
HTML::Mason::Request::exec('HTML::Mason::Req
eval {...} at
HTML::Mason::Request::ApacheHandler::exec('H
HTML::Mason::ApacheHandler::handle_request('
HTML::Mason::ApacheHandler::handler('HTML::Mas
eval {...} at
"If the facts don't fit the theory, change the facts." -Albert Einstein
Karma? There's a serial modder out there.
A guess would be that the script is accessing the database remotely. Thus, if the server is getting slashdotted, there is no way it can talk to the remote database. Instead of die, they should have sent a small text message of "Remote database unreachable."
;)
Hind sight is 20/20
In a place beyond time and space, in a land far better than this, look for me there...
Bah - I already have a fully functional API to the books.
SIG: TAKE OFF EVERY 'CAPTAIN'!!
This sounds like it just adds complexity and does not make gutenberg's data accessible.
There were several research projects for which I used pg as a corpus. However, pg's a terrible hassle for the first-time researcher, since the format of the introductory text ("we're gutenberg, here's the copyright, blah blah") is inconsistent.
You have to remove the introductory text to avoid bias in the corpus, however there are so many pathological special cases (different formats, spelling, languages, words used, punctuation, case) that it requires several hours of Perl coding to successfully strip the header text from 75% of the documents with >99% accuracy. Yuk.
If gutenberg is serious about making their work more accessible, they should think about the simple concern of ensuring consistency in the header text format.
since some seem to have trouble on the index page... here it is:
Project Gutenberg is the brainchild of Michael Hart, who in 1971 decided that it would be a really good idea if lots of famous and important texts were freely available to everyone in the world. Since then, he has been joined by hundreds of volunteers who share his vision.
Now, more than thirty years later, Project Gutenberg has the following figures (as of November 8th 2002): 203 New eBooks released during October 2002, 1975 New eBooks produced in 2002 (they were 1240 in 2001) for a total of 6267 Total Project Gutenberg eBooks. 119 eBooks have been posted so far by Project Gutenberg of Australia.
Click here for the full PG story and here for the latest News , and learn about the Stockholm Challenge Award recently won by Project Gutenberg in the category Culture.
The key link is search page.
Do you need a website upgrade?
I guess this is about the time we take the Internet Archive's cache of PG eh? I wonder if they cache all 10k books. :)
-- Friends don't let friends buy Nokia.
What's the best way to read online texts? There are a bunch of PG texts I might like to read, but reading them in a web browser, as a big text file gets tiring after ten minutes or so. I'm not sure why I can read a book for hours, but the screen for minutes, but there you have it. I don't think that HTML will help this problem -- does anyone have recommendations for better ways to read these files?
I can't open site with Mozilla. Simplified error is below:
/usr/lib/perl5/5.8.3/BookTools/Translator.pm line 20. ... ... /usr/lib/perl5/5.8.3/BookTools/Translator.pm:20 /usr/lib/perl5/5.8.3/BookTools/Translator.pm:26 /usr/lib/perl5/5.8.3/BookTools/Translator.pm:96 /var/www/html/mazarin/index.html:3
error: Can't call method "prepare" on an undefined value at
context:
16: my $dbh=DatabaseConnect("translations");
17:
18: sub Prepare{
19: $dbh=DatabaseConnect("translations");
20: return $dbh->prepare($_[0])
21: or die "Couldn't prepare statement: " . $dbh->errstr;
22: }
23:
24: sub SetLanguage{
code stack:
I love sexy robot voice tutorials! mazarin tutorial
"If the facts don't fit the theory, change the facts." -Albert Einstein
Karma? There's a serial modder out there.
...into Latin or some other dead language.
WTF is with people who say that "haha - the website is slashdotted, here is the error message". WE WILL FIGURE IT OUT OURSELVES OR READ THE OTHER TEN MESSAGES THAT SAY THAT. Thanks for your consideration.
I fear a source code dump would put me off my lunch faster than a goatsex link.
real creators suggest using newclear power vs. (Score:0)
by Anonymous Coward on Monday May 24, @09:39AM (#9237105)
unprecedented evile, whilst participating in the increasingly popular planet/population rescue initiative.
no contest. this stuff is unbreakable, & wwworks on several (more than 3) dimensions.
it's probably just a suggestion.
consult with/trust in yOUR creators.... with power to spare.
eye gas robbIE's fired up the pateNTdead PostBlock devise, wonce again?
Due to excessive bad posting from this IP or Subnet, anonymous comment posting has temporarily (forever, if we had some ept) been disabled. You can still login to post. However, if bad posting continues from your IP or Subnet that privilege could be revoked as well. If it's you, consider this a chance to sit in the timeout corner or login and improve your posting . If it's someone else, this is a chance to hunt them down (like with fuddles' phonIE bouNTy hunter scam). If you think this is unfair, we just don't care.
Bah. Posting HTML is so 1996. You can do so much more with these texts. One example is Open Source Shakespeare, which takes all of Shakespeare's texts, indexes them, presents them in an attractive manner, creates a concordance, provides a full-text search engine, organizes the lines by character, etc.
All of the texts are open source, and you can download the database and source code from the site, too. Check it out.
And that is why we are slashdotting it?
At least thats my experience after "testing" it now.
Not Buzzword 2.0 compliant. Please speak english.
Monday May 24, @03:14PM : Project Gutenberg made accessible
Monday May 24, @03:15PM : Project Gutenberg made inaccessible
RTFL
Read The F(ine) Library!
Unless you have read the entire body of work that makes up human civilization, you do not have the requisite knowledge to comment on any aspect of it.
"Project Gutenberg Made Accessible"
Oh, the irony that is slashdot.
The way to a man's heart is through the left ventricle
The 'DatabaseConnect' function didn't return anything.
Not a big deal, really, but they probably should have trapped that, as it could happen for any number of reasons (database down, authentication failed, etc).
I find that I'm getting much slower when I write programs these days -- because I'm checking errors for those things that I would've just blown off, or not have thought about in my earlier days.
[there's a few different things that could be done to this -- but I don't know why they're calling DatabaseConnect both at lines 16 and 19, so it would be careless of me to recommend a solution to code that I don't fully understand, and can't see the whole context]
Build it, and they will come^Hplain.
I say kill all christians they all should be killed starting with this idiot.
Mazarin increases the accessibility of Gutenberg's 10,000+ books
In a related story, the Slashdot effect decreases the accessibility of Gutenberg's 10,000+ book.
Gotta turn a living you know...
and just think, the Chinese and Koreans both had movable typeface long before Gutenberg did.
maybe if we had them when they did, John Hus would be the one that did the reforming instead of being burned http://www.greatsite.com/timeline-english-bible-hi story/john-hus.html
think of it if Hus was successfull instead of being a lutherian I would be called a husian (ok maybe not)
Film at 11.
is that when the body of a text passes is a certain length it should not all be in italics, because a long body of text written in italics becomes difficult to read, similar to the practice of writing a long document IN ALL CAPS, BOTH OF THESE FORMS OF AWRITING A LONG DOCUMENT MAKE THE TEXT VERY HARD TO READ, BECAUSE OF THE WAY THE EYE VIEWS THE SERIF LETTERS. THIS FOLK BELIEF IS TRUE.
http://www.gutenberg.net/etext04/awbv110.txt
there in HTML.
The first volume was converted to HTML by hand by someone else and to pdf, by machine, I think, whereas my site simply has the e-text:
http://rjs.org/gutenberg/Stevens_Thomas/
So an automated process would be a boon. What I'd really like to see is an OS text-to-voice reader program. I wrote a wxPython program to assist conversion from scanned text to PG format: http://rjs.org/gutenberg/OCR2Gutenberg/, but I have never been able to find a free set of spoken word wave files or speech library.
Ray
http://rjs.org/ - biking, astronomy, photography
The site is written in Perl.
I always thought you would be called the Hussies ;)
Wouldn't it be great if Google were involved in Gutenberg in a major way?
What about how the Reformation tackled: transubstantiation, Papal Succession, the ability of priests to marry, idolatory, salvation by works, and the way that the church held (and still holds) Roman Catholic dogma over Holy Scripture. There are more, but I think this will do for now.
Really, Indulgences were a way of raising money for the church, and I see it as a symptom of the wider corruption that had occurred in the Roman Catholic church.
Quote:
...donating to the good cause. If you don't want to donate money, volunteer to proofread, or it might be worth it for writers out there to consider a notation in your will that will allow your works to pass either directly into the public domain, or, as i have been in contact with lawyers to discuss, simply passing the copyright of your own works on to project gutenberg. This allows them more work to publish, and if you're in a contract somewhere that allows for royalty collection, you can set it up so that those royalties switch to project gutenberg at the time of your death.
Now might also be a good time to contribute an hour a week to a literacy project, or to make a donation there. Adult literacy is a serious issue all over the world, and that includes right here in the states, where there really are bright people out there who could have better lives if they could read. I can't think of a more on-topic subject than project gutenberg to discuss adult literacy and the need for both literacy teaching and to support free literature for the masses such as this project provides.
Just my $0.02...
solemndragon
"I'd say 'Have a good time,' but arson is still illegal.
At the risk of pointing out the obvious, Michael Hart's decision to make the basic format of PG texts "plain vanilla ASCII" has resulted in texts that are highly accessible by any meaning I can think of for that word. They are also compact, platform-agnostic, and durable. Texts contributed in the 1980s are fully usable today.
While there have been constant complaints about PG using the "wrong" format, opinions on the "right" format have been the flavor-of-the-month (or at least several flavors per decade). Had PG decided to use a "better" format, all of their volunteer time would probably have been taken up converting (say) WordPerfect to RTF to HTML to SGML to XML, leaving relatively little time to digitize and proofread texts.
"How to Do Nothing," kids activities, back in print!
It's great - I now have that on my laptop hard drive, mountable by Alcohol, so I'll never be short of anything to read, especially when the web's not available...
I can't find the torrent file I got it through, but if it helps the filename is pgdvd.iso and the size is 4,139,646,976 bytes.
6. Celibacy - Pederasty
And not one mention of the screenplay for Police Academy.
Sorry, but you are a dismal failure. Better luck next time.
Support HP and Lexmark, print it. Unfortunately, I have never found any electronic reading medium that compares with a paper book. In my experience, there's no way you can lie down in a couch or bed and have the same experience with a computer as you can have with a book. I have used my very lightweight Sony Vaio, but it still generates a lot of heat, and has a ridiculous 1024x768 resolution.
What with Disney getting the copyright limit extended each time they risk losing the Mouse and the laws saying that a copyright exists on all works even if not formally copyrighted, unless otherwise stated by the author, there's a good chance that we won't have new books in the public domain for a long time...
This sig has absolutely no significance and serves only to take up screen space and waste the time of the reader.
Information wants you to give me a dollar.
I use Fujitsu p1120 + SuSE Linux 9.1. Very light, less than 1 Kg, lighter than many books, I can hold it in a single hand. It has an 8.9 inch panoramic screen; it is MUCH better than any PDA or dedicated book reader. It comes with a 30Gb drive, which I replaced with an 80 GB drive. Right now no PDA or book reader has 80 GB storage!. The Gutenberg DVD is only 4 GB, I copied it to my p1120.
This site has done basically the same thing w/ Gutenberg text, and did it years ago. Provides searching, html format, bookmark, etc. www.crankylibrarian.com
Hey I will indulge you for *FREE*
BTW, re: bsDaemon anyone else think this handle is hilarious?
I didn't know he had 10,000+ films under his belt! I need to update my "Three Men And A Little Bastard" collection!
(1) The late, lamented Newton.
(2) A high-resolution Palm screen with PalmReader Pro (now eReader) and the Bell 18-point serif font. Some, but not all, of their commercial books do paragraphing correctly (i.e., \n\t, not \n\n); almost all the non-commercial ebooks I've found get this uniformly wrong: A decade of low-resolution screens is no excuse for ignoring a millennium of reading tradition.
(3) Safari on an LCD screen with a custom style sheet, rendering
as \n\t:
P {text-indent: 1em; margin-bottom: 0; margin-top: 0;
padding-bottom: 0; padding-top: 0; }
---
Disclaimer: There are various complicated relations between some of my employers and the companies mentioned above.
> However, it insists on at least a plain vanilla version of a text, as that format has proven to be the most durable and accessible.
Sometimes the illustrations that accompany a text are crucial for its understanding.
How about using the Text Encoding Initiative's TEI XML format instead? Graphics can be included using its figure tag. Combine the TEI XML markup with Dublin Core metadata and people could search PG's library by author, publication date, publisher, etc.
The markup can be stored as ASCII text and edited with a simple text editor. This format can also be rendered to ASCII for legacy purposes...
I've created an RSS feed from the Project Gutenberg list of etexts. The RSS feed contains titles, authors, descriptions and links to the relevant page or file on http://www.gutenberg.net/
PGDB.rss PGDB.rss.gz
I read French, Spanish and German in addition to English. These languages have diacriticals and special characters which are not covered in ASCII, because ASCII was created for English and English only.
So you can say that the use of ASCII prevented Project Gutenberg, in the earlier days, from considering working with any texts in any language other than English.
Now that Unicode 8 is a standard, it's possible for classic works in many languages to be represented in Gutenberg or affiliated projects.
If not, I get text files from Gutenberg and format them in HTML.
I purchased the iSilo program for the Palm for US $20
http://isilo.com/
, which comes with a free program (IWindows, Mac and Linux!) to convert HTML pages into iSilo format. It works great and preserves graphics, hyperlinks, CSS, bookmarks and more. It uses color if your handheld supports it.
So I do all my Gutenberg reading on my Clie SJ22 with a 320 x 320 color display.
I guess some people on /. can't face the truth of this fellow's statement.
Even more interesting, they never did much of it, as with powder, the compass, long-ranging ships...
The same applies to Greeks and Romans with steam power.
Leandro Guimarães Faria Corcete DUTRA
DA, DBA, SysAdmin, Data Modeller
GNU Project, Debian GNU/Lin