pgdp.net · Domains · Slashdot Mirror

Re:Personally... by br0ck · 2004-08-12 08:27 · Score: 1 · on Where Did Affordable OCR Go?

Well, I've done quite a few pages at Project Gutenberg's Distributed Proofreaders where you donate your proofreading time to clean up text scanned in from books and the the first draft from the computer is usually pretty dang close. I find that it is usually just a matter of cleaning up formatting for things like footnotes and scientific notations.

Re:Not-So-Sad Truth by jesterzog · 2004-07-13 12:20 · Score: 1 · on Alan Kay Decries the State of Computing

...but i'd say the way we communicate with each other has changed alot since then - text messages, email, mobile phones are a different way of communicating then what it was.

That's certainly true, although they're all still more digital/electronic, probably faster and more efficient variants of things that already existed.

Personally I think the very interesting side of communication via computers is with the enhanced parallel collaboration that's becoming possible.

Consider something like Wikipedia, for instance. It's a product of many thousands of people who've used modern technology to collaborate and create something bigger, more adaptive and quite different from anything that could probably have been achieved a few decades ago.

Another fantastic example is something like Distributed Proofreaders. Through the whole collaboration effort, it's created a very efficient and effective way for people to interact with each other and pool their resources together to do something that really couldn't have been done a few years ago.

Of course, Alan Kay's comments are about commercialisation. He may have a point considering that both of these examples are voluntary efforts. I stand to be corrected, but to me it seems that the main thing that most commercial organisations seem to have used computers for are the same things that they always did without computers. (Primarily publishing, processing of information, etc.)

This is a great question by jesterzog · 2004-07-12 12:04 · Score: 1 · on Ask Wikipedia Founder Jimmy Wales About Online Collaboration

Thanks for asking this question. I was recently at a conference where I presented an idea for a collaborative system for something where there doesn't seem to be one at the momemnt. I used a couple of examples to try and demonstrate my point of what genuine collaboration was on the web, moreso than just sending emails. One example was Wikipedia and the other was Distributed Proofreaders.

If you look at either of these websites in any detail, there are scores of devices and virtual rewards used to keep people interested, and to keep the regulars coming back to keep taking part in the community and continue building it.

I'd be very interested to know what the Wikipedia engineers believe are the most important and successful devices that they use to encourage people to continue contributing to the community.

plugging my interests too by kippy · 2004-07-12 02:57 · Score: 2, Informative · on Mozilla Foundation Now IRS 501(c)(3) Approved

The Mars Society
Project Gutenberg and the Distributed Proofreaders
Wikipedia (sorta, soon it'll be 501(c)(3) )

Re:Its a good start by nautical9 · 2004-06-12 06:17 · Score: 4, Informative · on 19th Century News Coming Online

This is a good time to remind folks of the Distributed Proofreaders project, now the largest contributor to Project Gutenberg, where anyone can take a scanned page and compare it to the OCR output to check for errors. Sign up and give it a go - all browser based, and actually quite addictive. :)

Get involved and help keep out-of-print and out-of-copyright books around forever.

More than solitaire and programming by waynegoode · 2004-05-31 09:11 · Score: 1 · on Programming For Terrified Adults?

More than solitaire and programming

There are a lot of things more fulfilling than solitaire and less complicated than programming. I suggest she try something else. Options:

Computer games such as SimCity. I'd skip FPS and MMORPG.
Simple online games such as chess, checkers, board games, etc.
Reading and posting to BB sites on a topic of interest to her.
HTML to make a web site
Spreadsheets. This has some of the ideas of programming, but broken down into easy pieces. Plus, it is useful.
General computer training courses or books.
Project Gutenberg distributed proofreaders.

If you want to teach programming to someone who is just going to do it for fun, I would suggest VBA. It's simple and some programs will write macros for you for example code. Also, by interacting with applications, a little code can go a long way.

Then, if she's still interested, try a real language, maybe C.

Re:Funny definition of "accessible..." by jesterzog · 2004-05-24 10:00 · Score: 1 · on Project Gutenberg Made Accessible

What I meant was font size, bold, and italic (such as for emphasis or to make titles stand out). I don't want to debate the merits of including the font information, but if the original author included such formatting in the paper original, even just to add emphasis to dialogue he/she had written, I think it's accurate to say that "information" is lost in the plain ASCII version.

I'm not sure what the official PG archive line says, but the stated Primary Rule in the Distributed Proofreaders Proofreading Guidelines states that "the final electronic book seen by a reader, possibly many years in the future, should accurately convey the intent of the author".

It's true that some information may be lost, but I read this to mean that at the very least, some intelligence will be applied when deciding what information to keep and what to throw away. It's fortunate that most authoring is more about writing words than formatting them, and the majority of the works currently being preserved by PG would have originally been typed on a either typewriter or something with less accuracy.

The proofing guidelines also instruct on how to mark up bold and italics, and the proofing interface makes it relatively easy to enter non-ASCII characters, such as accented letters, which come up in books from time to time. Even if it's not official, there appears to be an attempt to save a lot of the important information so that at the very least it can be translated later.

It's still kept quite simple though, and I guess that's because for every extra complication, the participation would go down.

But all of that said, I don't think the main goal of Project Gutenberg is to provide output that will easily parse in people's perl scripts. The primary goal is to save as many out-of-copyright books as possible before they completely disappear, and make them reasonably accessible... which plain text files do remarkably well. If added complication causes things to go noticibly more slowly then it conflicts with that goal.

Perhaps instead, a separate project is needed with different goals, that would take works outputted from PG and standardise them to a format that's more accessible to technology. If it works out and they liaise well enough, they might end up merging in the end anyway. I think a good proof of concept would be needed first, though.

Re:Funny definition of "accessible..." by bbc · 2004-05-24 07:15 · Score: 1 · on Project Gutenberg Made Accessible

Of the 100 most recent etexts posted from Distributed Proofreaders to PG (DP is the main supplier of texts), 46 come in an HTML version. I would not be surprised if a large part of the remaining texts did not have any special needs to begin with.

Re:Funny definition of "accessible..." by bbc · 2004-05-24 07:15 · Score: 1 · on Project Gutenberg Made Accessible

Of the 100 most recent etexts posted from Distributed Proofreaders to PG (DP is the main supplier of texts), 46 come in an HTML version. I would not be surprised if a large part of the remaining texts did not have any special needs to begin with.

Re:Funny definition of "accessible..." by bbc · 2004-05-24 05:29 · Score: 2, Insightful · on Project Gutenberg Made Accessible

a converter could easily be written

I you checked the volunteer mailinglist of Project Gutenberg, you would see that every now and again somebody waltzes in and says: "Why don't you do such and so? It's easy! You guys must be idiots for not doing it my way."

Neglecting the fact that such people rarely have the decency to find out if this discussion has already been held, and what the arguments were, list members will then ask the question:

"If it's so easy, why don't you show us how its done?"

That will usually shut up the it's-easy-sayers.

There are of course those who act, rather than talk. Those people have built the PG website, the PG database, the Distributed Proofreaders environment, all the 'easy' little things that are required to keep PG's library double in size every 18 months.

Next time you say "it's easy", try to have a system in place that shows you know what you are talking about.

P.S. I realize you did not say PG are idiots. Quite the contrary. However, such emotional outbursts are often the next step in the mode of discussion of those who think others should fullfill their desires at no cost and immediately. That's why I put it in the example.

Re:Text version by jonathan_ingram · 2004-05-24 01:59 · Score: 3, Informative · on Project Gutenberg Made Accessible

well, to those with computers & internet connection...

One of the projects run by the Internet Archive is the Bookmobile, which creates, prints, and gives away (for a nominal production fee) books created from public domain sources. One of their most popular products is an illustrated edition of Alice in Wonderland.

who can read English...

Yes, PG's content is primarily English at the moment, but this is only because most of the volunteers up until now have been English. If you are confident in a language other than English, you can help us get more books in this language -- either by scanning them, or by proofing the books which other people have scanned by joining the Distributed Proofreading Project (or the new EU sister-project DP Europe). At the moment the main site has projects available for proofing in German, Latin, French, Spanish, Swedish, Finnish, Dutch, Hebrew, Danish, Italian, ancient Greek, and Gaelic. The EU site has, in addition, books available in Serbian, Slovenian, Romanian, Welsh, Hawaiian, Russian, Polish, Lithuanian, Ukranian, modern Greek, and Bulgarian.

if the copyright has expired...

Yes, the vast majority of books in PG are copyright expired. This isn't a big problem, though, as we've only scratched the surface of the number of copyright expired books. Even at the current rate of growth, there's enough to keep us going until the US copyright regime starts letting new books into the public domain in 15 years or so.

Re:PG by flimnap · 2004-05-24 01:57 · Score: 5, Informative · on Project Gutenberg Made Accessible

Indeed, there are many, many sites that do all sorts of wonderful things with Project Gutenberg eBooks. That's the wonderful thing about PG, you can do anything you like with the books.

While personally I prefer the original and the best... hey, whatever floats your boat!

It is very much worth noting that Project Gutenberg would have nowhere near as many eBooks as it does without the help of Distributed Proofreaders. Sign up there, and proof just a page a day to make your contribution to preserving literary history. You can proofread as little or as much as you like, and do something worthwhile! Distributed Proofreaders is a great way to spend some of your time.

Re:PG by flimnap · 2004-05-24 01:57 · Score: 5, Informative · on Project Gutenberg Made Accessible

Indeed, there are many, many sites that do all sorts of wonderful things with Project Gutenberg eBooks. That's the wonderful thing about PG, you can do anything you like with the books.

While personally I prefer the original and the best... hey, whatever floats your boat!

It is very much worth noting that Project Gutenberg would have nowhere near as many eBooks as it does without the help of Distributed Proofreaders. Sign up there, and proof just a page a day to make your contribution to preserving literary history. You can proofread as little or as much as you like, and do something worthwhile! Distributed Proofreaders is a great way to spend some of your time.

Re:PG by Charles+Franks · 2004-05-24 01:42 · Score: 3, Informative · on Project Gutenberg Made Accessible

For PDA-friendly formats of PG e-texts try Blackmask and/or Pluckerbooks

Charles Franks
Founder, Distributed Proofreaders

Re:PG by Charles+Franks · 2004-05-24 01:32 · Score: 5, Informative · on Project Gutenberg Made Accessible

The promo.net address is an old one and no longer maintained, please reference gutenberg.net

Charles Franks
Founder, Distributed Proofreaders

Soon! by Tom7 · 2004-03-27 09:43 · Score: 4, Informative · on Boolean Logic : George Boole's The Laws of Thought

I helped proofread this one recently at pgdp.net. It's in post-processing now, so it will be in Gutenberg soon!

Re:Rich vs plain by bbc · 2004-03-18 13:46 · Score: 2, Informative · on Project Gutenberg 2 Raises Some Hackles

Just to give you folks some info about what's going on at PG.

First of all, PG is not against any other formats than plain vanilla text. However, because of the accessibility and future-proofness of that format, every text that PG will ever produce will also be published as plain vanilla text. It is the one format we will always produce, of many.

XML formats are being discussed. The idea is that we will produce XML files that will be used as storage format, from which at the very least the plain vanilla texts will be produced, and further more any format we care to support (most likely at least HTML and PDF).

The problem with these technologies is that they require volunteers to implement them.

Currently the biggest producer of ebooks for PG is Distributed Proofreaders (DP). This is a web-based, distributed application for the correction and formatting of ebooks. DP has a long list of guidelines of the sort of information that needs to be retained. At the moment, we keep more information than is required by PG, and a lot of this extra information runs the risk of being discarded. One of the solutions to this problem that volunteers have devised is producing their own HTML and XML etexts. Please read our newsletter article The Illustrated Masterpieces of Project Gutenberg to see some recent examples.

The Distributed Proofreaders would love to see a solution for the conservation problem. We want our ebooks to look good. It's the natural effect of putting ten thousand nit-pickers in the same room.

--Branko Collin

Re:Rich vs plain by bbc · 2004-03-18 13:46 · Score: 2, Informative · on Project Gutenberg 2 Raises Some Hackles

Just to give you folks some info about what's going on at PG.

First of all, PG is not against any other formats than plain vanilla text. However, because of the accessibility and future-proofness of that format, every text that PG will ever produce will also be published as plain vanilla text. It is the one format we will always produce, of many.

XML formats are being discussed. The idea is that we will produce XML files that will be used as storage format, from which at the very least the plain vanilla texts will be produced, and further more any format we care to support (most likely at least HTML and PDF).

The problem with these technologies is that they require volunteers to implement them.

Currently the biggest producer of ebooks for PG is Distributed Proofreaders (DP). This is a web-based, distributed application for the correction and formatting of ebooks. DP has a long list of guidelines of the sort of information that needs to be retained. At the moment, we keep more information than is required by PG, and a lot of this extra information runs the risk of being discarded. One of the solutions to this problem that volunteers have devised is producing their own HTML and XML etexts. Please read our newsletter article The Illustrated Masterpieces of Project Gutenberg to see some recent examples.

The Distributed Proofreaders would love to see a solution for the conservation problem. We want our ebooks to look good. It's the natural effect of putting ten thousand nit-pickers in the same room.

--Branko Collin

Re:Michael Hart == idiot by Anonymous Coward · 2004-03-17 04:40 · Score: 0 · on Project Gutenberg 2 Raises Some Hackles

I was briefly involved with Project Gutenberg back in '98. I wanted to do etexts of some books on which the copyright had never been renewed (which was a loophole, since they'd still be protected today). Knowing all of this, I did all of the necessary research, including getting a copyright attorney, who happened to be a fan of the author, to draft me a complicated email regarding the history of the stories in question.

Long story short, the stories cleared PG's legal review and got underway. Some time later, someone panicked because they didn't have a copy of the legal clearance. I'd lost the original email in a crash, but figured that it was no big deal since so many other eyes had touched it. Turns out that NO ONE -- including PG's lawyers -- bothered to keep a record of the project. It also turns out that (until at least 1998) Michael Hart doesn't OWN a printer, and is therefore unconcerned about backing up his documentation.

Between that event and this article, I wrote the entire project off. There are a lot of dedicated volunteers spending many hours of their time to bring books to the rest of us, but the project itself is run by an utter idiot. Outside of the name recognition -- which is great -- anyone could do exactly the same thing, and with Charles Franks' Distributed Proofreaders technology, it wouldn't be hard.

Re:project for free distribution of written knowle by clonebarkins · 2004-03-17 01:58 · Score: 3, Informative · on Project Gutenberg 2 Raises Some Hackles

It is sad to see that projects lose their initial zeal and ideology.

The project has not lost its zeal and ideology. Project Gutenberg is alive and kicking, and even revolting to some extent against Michael's unilateral decision to partner with the World Ebook Library through the device of projectgutenberg.info (aka Project Gutenberg II). As an active volunteer of PG and DP, I have seen the discussions over the past few days, and the zeal has increased if anything. People are still holding true to the ideals of PG, even if its founder has made a bad decision.

Some open source projects, such as Linux, have understood that, and were GPL'ed. This safegards any commercialism that would destroy it's very foundation.

Project Gutenberg is not an "open source project." It is a project to get public domain texts into electronic formats and distribute them to whoever wants them--including commercial enterprises. Linux and others are projects that work in copyrighted materials. Verbum Vanum requires specific licensing, which is very much against PG philosophy (yes, PG does have some copyrighted texts, but it does not require authors to give up any rights as the OLPA does, only to provide PG non-exclusive electronic distribution rights).

Yes, PG puts a license on every one of its texts. But it is the only license I know of that says you can remove the license altogether and redistribute however you desire. That is a benefit, not a detriment.

Gutenberg books are fine! by flimnap · 2004-03-17 01:02 · Score: 3, Informative · on Project Gutenberg 2 Raises Some Hackles

Project Gutenberg will accept any format of an ebook, as long as there is also a plain text version. So, many ebooks are available in plain text and HTML, and sometimes other formats (including PDF!!).

The major producer of PG ebooks, Distributed Proofreaders, ends up producing an illustrated HTML version of almost every book that would benefit from it.

As long as the public domain PDF ebooks are eventually added to the real Project Gutenberg, and PG2 pays the proper royalties to PG, I don't have a problem with this site.

Oh wait, I do... I think it's fishy that a friend of Michael Hart (the founder of PG) is awarded one of the domain names owned by the real Project Gutenberg. The "owner" of the domain is Greg Newby (the CEO of the Project Gutenberg Literary Archive Foundation. He does a fine job, and this isn't his fault ;).

PROJECTGUTENBERG.INFO Registrant: Newby, Greg (PROJECTGUTENBERG2-DOM)

A sad day... by wew · 2004-02-08 20:27 · Score: 5, Interesting · on Australia To Adopt U.S.-Style Copyright Laws

This is a sad day for public information in Australia--and just when it looked like the free trade agreement was not going to go through because of US intransigence over agriculture! Unfortunately, John Howard decided to sell out completely.

When this was first mentioned, I spent some time reading up on the topic: I might as well share some links here.

The only organisation that I could find actively lobbying against the dilution of Public Domain rights in Australia was Australian Library and Information Association, a professional organisation for librarians. They are following this issue, and may appreciate your input and support; their online journal also contains an insightful article by an Australian National University professor of law on copyrights and public domain.

As other have pointed out, the retrospective extension of copyrights from Life+50 to Life+70, which even those advocating a longer copyright term admitted had no justification, is of particular concern to Project Gutenberg of Australia (site seems to be down at present--anyone know why?), which had published a number of until now Public Domain works on their site (for instance, the works of George Orwell). There's already some discussion of this on Distributed Proofreaders (registration may be required)--if you're a DP'er, you might like to contribute, and if you're not a DP'er, you should be.

HTH

Re:If you really want to support... by Kphrak · 2004-02-05 11:24 · Score: 1 · on Grokster/Morpheus Hearing Recap

It's true, we have plenty of information on the Internet. But don't expect humanity to use it to educate itself in some blissful utopian dream. The Internet already has the main thing people want: PORN.

I like having so much information available on the Internet, and I'm a regular contributor to Distributed Proofreaders, which definitely fits the bill for huge collections of public domain. But I'm under no illusions that most people will not ignore my recently-proofread "Studies in Civics" or "Babylonian and Assyrian Literature" in favor of "Amateur Anal Action" and "Barely Legal Blowjobs".

Re:Let's be honest by ChaosDiscord · 2004-02-02 08:54 · Score: 1 · on DARPA-Funded Linux Security Hub Withers

Auditing is boring.

You know what else is boring? Proofreading. And yet Distributed Proofreaders manages to get about 5,000 pages of text proofread every day! The key is making it easy so that a little bit of my time can be useful. It also helps to get some popularity. I'd repeatedly heard about the distributed Proofreaders, but this is the first I've heard about Sardonix. Now that I've heard about it, it sounds interesting, next time I decide to proof a page or two for Distributed Proofreaders I'll take a look to see if I can help with Sardonix.

Re:Let's be honest by ChaosDiscord · 2004-02-02 08:54 · Score: 1 · on DARPA-Funded Linux Security Hub Withers

Auditing is boring.

You know what else is boring? Proofreading. And yet Distributed Proofreaders manages to get about 5,000 pages of text proofread every day! The key is making it easy so that a little bit of my time can be useful. It also helps to get some popularity. I'd repeatedly heard about the distributed Proofreaders, but this is the first I've heard about Sardonix. Now that I've heard about it, it sounds interesting, next time I decide to proof a page or two for Distributed Proofreaders I'll take a look to see if I can help with Sardonix.

Re:Exhaustion of the public domain? by jonathan_ingram · 2004-01-27 14:04 · Score: 1 · on MusicXML DTD Hits 1.0; Browser Support Next?

They scale down their operations in the US (which is quite possibly never going to let anything else into the public domain ever again), and switch to sites operated in life+X countries, which release new material into the public domain every year. I'm currently working on scanning some 'newly freed' authors for the brand new European branch of Project Gutenberg's Distributed Proofreaders. These are books written by authors who died in 1933, and whose life+70 year copyright term has just expired.

No new works are due to be freed of copyright restrictions in the US for quite a while (2018 rings a bell -- it's around then). By then, everything written by anybody who died before 1948 will be public domain over here in the UK. In Canada and Australia, everything written by anybody who died before 19*6*8 (unless they follow the EU lead and restrospectively move from life+50 to life+70 for no good reason).

So don't worry, we're not going to run out of material :).

There is no ASCII by fm6 · 2003-12-22 11:53 · Score: 3, Insightful · on Open eBook Forum Courts Controversy Over Formats

You're out of date.

Nobody actually uses ASCII any more. It's not adquate for internationalizable applications. It only contains a simple non-accented Latin alphabet, arabic numerals, space, and 33 other characters. Oh, and 33 non-graphic control characters, only 2 of which are relatively safe to use in text files and streams. That's just not enough for any application that isn't specific to the U.S.

You say you use ASCII every day? No you don't. You probably use some variation of Latin 1 and/or UTF-8. Both have the same values as ASCII for their first 127 characters, so the difference is usually transparent. Not always.

Now you're saying, "All right, ASCII, Latin 1, whatever. What I mean is plain text. That's the universal format." No it's not. There isn't even a single Latin 1. Aside from ISO Latin 1 (which is supposed to be the default for web pages, but no widely-used browser makes that assumption), there's Microsoft Latin 1 and Macintosh Latin. Add in UTF-8 (which Slashdot supposedly uses, though most of their pages actually use ISO Latin1), and you have four different "plain text" encodings in wide use. The results when files are shared between these platforms are often pretty gross. And these are just the encodings used in the Americas and Western Europe!

Even if there was a text encoding that absolutely everybody used, you wouldn't want to store all your books in it. You're throwing away too much data! That's why I gave up on Project Gutenberg and Distributed Proofreaders. When I downloaded a Gutenberg text, things like italics and boldface all appeared at ALLCAPS. VERY VERY IRRITATING! And when I helped proof DP's text scans, I wasn't given any proper way to enter to record all the subtle typography that was in those old texts. One particular omission was the absence of any clear separation between encylopedia articles. I found this particularly frustrating, because I joined DP to help bring the classic Britannica 11th Edition online. What's the point if you can't browse individual articles easily, or the Greek words are a mess, etc., etc.

What's the solution? Not HTML -- it's not general enough. Somebody needs to sit down and design a markup (probably an XML document type) that expresses the stuff you find in various kinds of books. I doubt of if this "Open EBook" thing will do, because it will have very narrow objectives -- find a way to distribute the next Steven King with proper DRM support. Not interesting to those of us who want to share a lot of public domain and Creative Commons stuff, and are mainly concerned with preserving the original character of the text. Maybe when I know more about writing DTDs and Schemas, I'll take a stab.

But doesn't that create files that aren't accesible to a lot of people? No, because you don't distribute the XML version isn't for distribution (except to those who really want it). Mostly you transform the XML into formats suitable for distribution: HTML, WML, ebook formats, and yes, "plain text".

Public domain E-Book museum needs your help! by Anonymous Coward · 2003-11-05 09:33 · Score: 1, Interesting · on E-Book Museum at Library of Congress?

Distributed Proofreaders is the main source of public domain electronic books. It is part of Project Gutenberg. DP consists of thousands of volunteers doing hundreds of books each month, and some of our math books, for which DP is using LaTeX. Thus, the project needs savvy (La)TeX folk to correct the OCRed texts.

Thus, if you have a spare ten minutes now and then, you can make a significant contribution to public domain and mathematics. The finished e-books are free, downloadable, and computer-searchable. Sign up here!

The work is done through a web interface that lets you compare a scanned page image against OCRed text, and make any necessary changes to the text. The interface works with most browsers, from IE and Netscape to Mozilla and Opera. (I have proofread a couple pages myself, and can vouch for it being straightforward.) You can do one page whenever you have time or a hundred a day -- it's up to you. No commitments, no schedules.

If you'd like a change from mathematics, there are plenty of other books to do: from classics to pot-boilers, in English, French, German, Dutch,
Finnish, Swedish, etc.

Public domain E-Book museum needs your help! by Anonymous Coward · 2003-11-05 09:33 · Score: 1, Interesting · on E-Book Museum at Library of Congress?

Distributed Proofreaders is the main source of public domain electronic books. It is part of Project Gutenberg. DP consists of thousands of volunteers doing hundreds of books each month, and some of our math books, for which DP is using LaTeX. Thus, the project needs savvy (La)TeX folk to correct the OCRed texts.

Thus, if you have a spare ten minutes now and then, you can make a significant contribution to public domain and mathematics. The finished e-books are free, downloadable, and computer-searchable. Sign up here!

The work is done through a web interface that lets you compare a scanned page image against OCRed text, and make any necessary changes to the text. The interface works with most browsers, from IE and Netscape to Mozilla and Opera. (I have proofread a couple pages myself, and can vouch for it being straightforward.) You can do one page whenever you have time or a hundred a day -- it's up to you. No commitments, no schedules.

If you'd like a change from mathematics, there are plenty of other books to do: from classics to pot-boilers, in English, French, German, Dutch,
Finnish, Swedish, etc.

e-books are searchable by balamw · 2003-11-05 08:16 · Score: 1 · on E-Book Museum at Library of Congress?

I agree with you, but for the fact that unlike the dead tree or audio formats, the e-book has at least the potential to be full-text searchable. Which could be invaluable for the work in question.

If this flies we wouldn't need Distributed Proofreaders anymore. B

YOU CAN HELP!!! by clonebarkins · 2003-10-16 09:26 · Score: 1 · on Project Gutenberg Publishes 10,000th Free eBook

Go to Distributed Proofreaders to help out! The are a distributed effort to scan, OCR, proof, and post books to Project Gutenberg.

Congratulations! by apsmith · 2003-10-16 08:49 · Score: 1, Redundant · on Project Gutenberg Publishes 10,000th Free eBook

Now why was my story on this rejected earlier today? Oh well...

Go to Distributed Proofreaders if you'd like to help out!

Can't be said enough... by daeley · 2003-10-16 08:48 · Score: 1, Redundant · on Project Gutenberg Publishes 10,000th Free eBook

Come join the proofreaders that make Project Gutenberg possible!

Proofreading by Empiric · 2003-10-16 08:47 · Score: 4, Informative · on Project Gutenberg Publishes 10,000th Free eBook

Based on someone's post earlier, I gave Distributed Proofreaders a try. It's very straightforward to get started on a couple of pages done at your leisure (especially easy for those knowing basic HTML--like Slashdot posters--think standard bold and italic tags; the only mild ramp up is footnotes), and I found their scanned book choices interesting to be reading through in the process of proofing (well-done proofing interface as well).

If you're in the mood for browsing books, give it a try... you can find something interesting to read and do a little service for humanity at the same time.

Re:More like give it a few years and 5MP cams by jonathan_ingram · 2003-10-10 21:54 · Score: 2, Interesting · on Bubble Bursts for e-Books

Nobody is going to scan a book on a flatbed scanner. It's just not convenient.

I've done over 200 on my flatbed scanner in the last six months, for processing through Distributed Proofreaders. Once you get into the flow, a decently sized octavo book can be done in less than an hour. Holinshed's Chronicles (my current project) is obviously taking a little longer :).

The very high-end overhead document scanners are effectively fixed digital cameras with groovy software, so there's no real reason why an enthusiast couldn't jury rig a home-made digital camera document scanner. 5MP still isn't enough, though, for anything serious. To scan an A4 page at 400DPI requires around 15MP, and you'd need even more to get a decent DPI on folio volumes like Dugdale's Antiquities of Warwickshire.

Re:I think theres better distributed computing cau by De+Lemming · 2003-10-09 23:52 · Score: 2, Informative · on New Seti@Home Client to be Open to Other Projects

Indeed. And there are dozens of distributed computing projects, so everyone can find one to his likings.

Click here for an overview of active distributed computing projects. Also have a look at the lists at the bottom of the page: these are projects you donate some of your own time to, instead of spare CPU cycles (from Distributed Proofreaders to The Hunger Site).

Further info on distributed computing: Bottomquark has reviewed a number of projects.

Distributed Proofreaders by clonebarkins · 2003-09-18 04:23 · Score: 2, Funny · on What Do You Do at Work?

I do my one page-a-day (or more ;O)) at Distributed Proofreaders.

Oh, wait, did you mean what I'm supposed to do at work?

PNG Issues by fm6 · 2003-07-20 09:46 · Score: 1 · on dSVG - A New Kind of Programming?

A certain popular browser doesn't handle a lot of transparent PNGs correctly. (Can you guess which one?) Erik Arvidsson has a clever workaround, but I have to question whether kludging ones web pages that way is a good idea.

Another issue I have with PNG is that some software seems to generate illegal images. I volunteer at Distributed Proofreaders and every once in a while Mozilla chokes on a page image that's an illegal PNG file. The really irritating thing is that Mozilla is actually able to read the image itself, but if it's allowed to download the very end of the file it says, "This is an illegal PNG! I can't let you look at it anymore!" I have to download the page image and read it with a less intolerant image browser. A pain.

As if that weren't enough, IIS doesn't ship with a metadata entry for PNG. Which, predicably enough, doesn't matter to IE, but screws up more compliant browsers.

Of course, none of the problems are the fault of the PNG people, who have created some very sexy technology. But until PNG images can be displayed reliably, it makes no sense to insist that everybody should migrate away from GIFs.

There's also the little detail that the notorious LZW patent has expired. That removes one of the big motivations for creating the PNG standard in the first place. Yeah, GIFs don't have as many cool features, but they're still adequate for most people's needs.

Re:You can't be serious by Koushiro · 2003-07-04 16:51 · Score: 1 · on Project Gutenberg's 32nd Birthday

The good thing about Distributed Proofreaders is that there's actually a system in which each scanned and OCRed page of text is proofread twice: once by anyone, and once by anyone who has more than fifty pages proofread. (For that matter, it's checked again by a post-proofreader when the separate pages are combined into one file, for extra safety.)

Admittedly, a moderated wiki-format would be difficult to beat for reliability, but three separate checks, (theoretically) increasing in accuracy each time, there's not much that will be missed. Really, the accuracy of the triple-check system in addition to the speed (2500-3000 pages a day) of the proofreading at DP would make it doubtful that any increase in reliability would be justified by the change in system.

(Oh, and check the relevant section of the FAQ for a clear, step-by-step version of how DP works.)

Re:You can't be serious by Koushiro · 2003-07-04 16:51 · Score: 1 · on Project Gutenberg's 32nd Birthday

The good thing about Distributed Proofreaders is that there's actually a system in which each scanned and OCRed page of text is proofread twice: once by anyone, and once by anyone who has more than fifty pages proofread. (For that matter, it's checked again by a post-proofreader when the separate pages are combined into one file, for extra safety.)

Admittedly, a moderated wiki-format would be difficult to beat for reliability, but three separate checks, (theoretically) increasing in accuracy each time, there's not much that will be missed. Really, the accuracy of the triple-check system in addition to the speed (2500-3000 pages a day) of the proofreading at DP would make it doubtful that any increase in reliability would be justified by the change in system.

(Oh, and check the relevant section of the FAQ for a clear, step-by-step version of how DP works.)

Re:A sterling mistake by fm6 · 2003-07-04 12:39 · Score: 1 · on Project Gutenberg's 32nd Birthday

I hadn't noticed that. But that convention isn't followed consistently. Of the last 10 files posted from DP, only 7 follow this convention. And I haven't seen it documented anywhere.

I shouldn't have spoken categorically about the Gutenberg people. Somebody is aware of this issue, because recent posts from DP say "Character set encoding: ISO-Latin-1", which I guess is some help. My assumption of ignorance was based on the DP Proofing Guidelines, which refers to 8-bit characters as "Upper ASCII". But I guess all that means is that people tend to confuse ASCII with Latin1 -- a confusion that doesn't matter except when it does.

Eh? The text is in ASCII but requires JavaScript? by WuphonsReach · 2003-07-04 08:23 · Score: 1 · on Project Gutenberg's 32nd Birthday

So I try to go to http://www.pgdp.net/ - only to find out that the page won't load unless you enable JavaScript!

Um... I thought PG was all about not using the latest bells and whistles? (semi-facetious)

Re:XML please by Anonymous Coward · 2003-07-04 08:03 · Score: 0 · on Project Gutenberg's 32nd Birthday

In the works, and has been for a while. I have just released my vision paper as to where Distributed Proofreaders (DP) is headed and where we would like to take Gutenberg in the future.

Conversion on the fly to various formats is a major goal.. but first we need a good source of high-quality marked up etexts. To create this source we are going to be doing some re-working of the processes at DP.

You can read my paper here (http://www.pgdp.net/vision/)

And comment on it in the DP forums(http://www.pgdp.net/phpBB2/viewforum.php?f= 4) (yes, you must make an account to post)

Charles Franks
Founder, Distributed Proofreaders

Re:XML please by Anonymous Coward · 2003-07-04 08:03 · Score: 0 · on Project Gutenberg's 32nd Birthday

In the works, and has been for a while. I have just released my vision paper as to where Distributed Proofreaders (DP) is headed and where we would like to take Gutenberg in the future.

Conversion on the fly to various formats is a major goal.. but first we need a good source of high-quality marked up etexts. To create this source we are going to be doing some re-working of the processes at DP.

You can read my paper here (http://www.pgdp.net/vision/)

And comment on it in the DP forums(http://www.pgdp.net/phpBB2/viewforum.php?f= 4) (yes, you must make an account to post)

Charles Franks
Founder, Distributed Proofreaders

Thanks for support, plans for future by gbnewby · 2003-07-04 07:58 · Score: 5, Informative · on Project Gutenberg's 32nd Birthday

Thanks to everyone who has helped contribute eBooks and other support to Project Gutenberg! If you haven't already, please visit Distributed Proofreaders and proof a page today!

Lots of plans for the future:

Post-#10000 formatting changes. We'll be rearranging our directories to make it easier to find things. Likely we'll go with something OAI (OpenArchives.org) compliant
Conversion on the fly to many formats. We'll putting eBooks into XML format (mostly using teixlite.dtd, we think) for conversion on the fly to many other formats.
New ways to donate. "Sponsor a book"
More contemporary content. We receive donations nearly every week from currently published authors who want to make their stuff available to a wider audience (i.e., our Doctorow's Down and Out)
Your ideas! Visit gutenberg.net to sign up for newsletters, find out how to get started producing an eBook, and find eBooks

Thanks especially to our main and backup distribution sites, iBiblio and The Internet Archive. And thanks to the THOUSANDS of volunteers who have brought us nearly to our 10,000th eBook.

Dr. Gregory B. Newby Chief Executive and Director Project Gutenberg Literary Archive Foundation http://gutenberg.net A 501(c)(3) not-for-profit organization with EIN 64-6221541 gbnewby@pglaf.org

Re:You can't be serious by Aldarondo · 2003-07-04 06:35 · Score: 5, Interesting · on Project Gutenberg's 32nd Birthday

As one that has been involved with Distributed Proofreaders for the past 18 months, yes we are serious about having Slashdot people proofread. The last time a story about D.P. ran in November, thousands of new users joined us and helped us grow and expand to our current size.

Go and check it out, there is great work being done there. (I am a bit biased though). Click here for a history of DP.

Re:eBooks by jonathan_ingram · 2003-06-18 19:28 · Score: 5, Interesting · on Gemstar Ebook Crashes, Burns

And might I also mention that if you want to get involved in helping PG, we have a wonderful Distributed Proofreading project. It's now the main route through which books go to get only DP, and we're almost up to 1500 books processed. Anyone can join -- we need all the proofreaders we can get!

Slashdot Mirror

Domain: pgdp.net

Comments · 147