Why Project Gutenberg Isn't There Yet

Speech recognition? by timeOday · 2003-01-28 13:22 · Score: 4, Interesting

That's crazy. OCR will always be faster than speech, even if speech recognition ever works, which it currently does not.

Re:Speech recognition? by Anonymous Coward · 2003-01-28 13:31 · Score: 1, Insightful

When thinking of free labour, volunteers are most likely able to type faster than the can speak. Im wondering if it wouldn't be faster to scan it.
Re:Speech recognition? by aaza · 2003-01-28 15:24 · Score: 1

...even if speech recognition ever works, which it currently does not.
Watt do ewe mean? It works really well hear. I don't even need to cheque the spelling. The quay is to speak slowly and clearly so the computer can under stand ewe.

--
In theory there is no difference between theory and practice.
In practice, however, there is.
Re:Speech recognition? by Anonymous Coward · 2003-01-28 15:50 · Score: 0

For the lettace, what bar beach mean? Frank queue, small bean three.
Re:Speech recognition? by ckaminski · 2003-01-29 03:45 · Score: 1

Few people can type 120 words per minute. Many can't even type 60. Most can speak at least that if not more. I can certainly read aloud at least twice as fast as I can type. But I'm sure that speech recognizer won't be able to keep up. ;-)

-Chris
Re:Speech recognition? by lostboy2 · 2003-01-29 04:03 · Score: 1

More importantly than speed, I think, is the fact that speech recognition could not handle colloquialisms (like /. or CowboyNeal) well, and would barf on things like ee cummings' poetry. Such as:
Beautiful is the unmea ning of(sil ently)fal ling(e v er yw here)s Now
Even 0wnz0red could pose a problem.

Tex? by duckpoopy · 2003-01-28 13:22 · Score: 0, Insightful

Aren't many books typeset using Latex? Just post the source.

--
word.

Re:Tex? by ProgressiveCynic · 2003-01-28 13:25 · Score: 5, Insightful

Umm, Project Guttenberg can only legally use public domain works. If you know of any 100+ year old novels typeset in Tex lets hear about it. Even if a modern reprint was done recently, do you think the publisher would really want to give away all that hard work so that everyone can get it for free instead of buying their spiffy new edition?

--
Delivering militantly anti-commercial music to all two people who care!
Re:Tex? by jonman_d · 2003-01-28 13:26 · Score: 4, Informative

That's not how project Gutenberg works. Most everything that's on PG is public domain - that means the copyright has expired. Thus, most of the stuff is over 70 years old. They didn't exactly use Latex back in the 1930s.

Besides, what I generally use PG for are the classics - greek/roman literature, etc... I don't think Plato used UNIX.

It's all got to be somehow entered from dead-tree-format copy. Currently, that pretty much means typing up the entire book.

--

--
http://nemilar.net - Not your grandmother's soup kitchen
Re:Tex? by Anonymous Coward · 2003-01-28 13:27 · Score: 0

Heck, only computer science texts are written using LaTeX. Want an old copy of "Art of Computer Programming" so you can learn MIX? Same goes for [ntg]roff.
Re:Tex? by AndrewRUK · 2003-01-28 13:33 · Score: 2, Informative

The requires LaTex source to post, which means a modern edition, which means it's copyrighted, which means you can't copy it (unless you have the publisher's permission.*)
Project Guttenburg only does texts which are in the public domain, which currently mean pre-1923 editions (PG of Australia has newer books) and, obviously, pre-1923 means that the only sources are print copies.

* pedantic point: it's the copyright holder's permission, which isn't necessairly the publisher, but usually is.
Re:Tex? by MulluskO · 2003-01-28 13:40 · Score: 3, Funny

Well, getting the source from the publishers certainly seems like a more feasible solution than copying the text out of books.

Now that I think about it, I imagagine publishers face this very same problem when publishing a new edition of an old work, and I'd wager they have developed a few tricks to make the process easier.

If all else fails, I suggest we direct some time, effort, and old PCs into monastaries. Monks dig that tedious shit, hard style. Only one question remains, though... VI or Emacs?

--

Too busy staying alive... ~ R.A.
Re:Tex? by Matrix · 2003-01-28 13:45 · Score: 5, Informative

While this comment has been addressed, I'd like to point out that you can get pretty decent output from the Gutenberg texts by importing them into LyX. With just a little bit of work (basically setting up the chapters), LyX will allow you to create good looking PDF, Postscript, HTML, etc, along with the LaTeX source. Combine this with rbmake and you can even read them, complete with hyperlinks, on your eBook (if you have one!)
Re:Tex? by Angry+White+Guy · 2003-01-28 13:48 · Score: 0, Offtopic

If it were a toss up between MS and Unix, Plato would have used Unix.

I'm all about the ancient Greek Philosophers. Fo' Sho!

--
You think that I'm crazy, you should see this guy!
Re:Tex? by Anonymous Coward · 2003-01-28 14:54 · Score: 0

I don't think Plato used UNIX.

"Necessity, who is the mother of invention."
-- Plato(427?-347? B. C.) [The Republic. Book II. 369-C]
Re:Tex? by kwoo · 2003-01-28 15:27 · Score: 1

I don't think Plato used UNIX.

Nope, he strikes me more as a VMS kind of guy.

--
unixkb.com -- articles on practical Unix issues.
Re:Tex? by nels_tomlinson · 2003-01-28 15:58 · Score: 2, Informative

I've been marking up the Project Gutenberg etexts using LaTeX for several years now. I can typeset an Oz book, or one of the Tom Swift books, in about 15 minutes. I have put about a week into typsetting ``The Voyage of the Beagle'', and no end in sight. I was able to typeset a translation of the bible in about one week, but it was sloppy work, and I wasn't satisfied.
Lyx is nice, but I don't think that it really speeds things up. I can't imagine that Lyx could speed things up at all on a Tom Swift.

--
See what I've been reading.
Re:Tex? by dasunt · 2003-01-28 16:06 · Score: 1

Actually, they look fine to me in vim. :)

However, depending on your tastes, perhaps the utilities 'fold' and 'sed' could help. Each paragraph seems to have no first-line indent, and is seperated from the other paragraphs by 1 blank line (at least in the file I grabbed at random from my downloaded 'library'). Chapter titles are buffered by several blank lines.

The beauty of text is that, if you know what you are doing and have a unix environment, you can manipulate it however you want and most of the transforms are rather trivial to do.

Unless images, obscure characters, or layout are important, nothing beats text.
Re:Tex? by cel4145 · 2003-01-28 16:42 · Score: 0, Offtopic

nah. plato would have been a mac man.

it's the sophists who would have been the open source users!
Re:Tex? by Anonymous Coward · 2003-01-28 23:24 · Score: 0

Yes, it is literal copy and paste (the sticky type). Have you not noticed the bad quality of old reprints.
Re:Tex? by Anonymous Coward · 2003-01-29 04:54 · Score: 1, Funny

okay, I can see Geogias and the sophists using perl, but no way Plato would use a Mac. Mac is for artist types like Sophocles and Aeschylus. Plato is like totally *nix because quintessentially he's a Pythagoran, which rules out Microsoft because he has to have access to source and good hex editors, and the Socratic influence would lean towards adoption of Open Source.

gutenberg rocks by Anonymous Coward · 2003-01-28 13:23 · Score: 0

I really like project gutenberg. I have many of their texts....
I'd really like to see them succeed.

Cost of labor? by Anonymous Coward · 2003-01-28 13:24 · Score: 3, Insightful

What about the cost of the books? Unless the only books you have in this "universal library" are old enough to be without copyrights, won't there be a problem in finding funding to buy current day books?

Re:Cost of labor? by jonman_d · 2003-01-28 13:29 · Score: 5, Informative

That's pretty much it - most of the books are in the public domain. AFAIK, the rest are all donated by their authors.

From their FAQ:

What books will I find in Project Gutenberg?

We cannot publish any texts still in copyright. This generally means that our texts are taken from books published pre-1923. (It's more complicated than that, as our Copyright Page explains, but 1923 is a good first rule-of-thumb for the U.S.A.)

So you won't find the latest bestsellers or modern computer books here. You will find the classic books from the start of this century and previous centuries, from authors like Shakespeare, Poe, Dante, as well as well-loved favorites like the Sherlock Holmes stories by Sir Arthur Conan Doyle, the Tarzan and Mars books of Edgar Rice Burroughs, Alice's adventures in Wonderland as told by Lewis Carroll, and thousands of others.

These books are chosen by our volunteers. Simply, a volunteer decides that a certain book should be in the archives, obtains the book and does the work necessary to turn it into an e-text. If you're interested in volunteering, click here.

--

--
http://nemilar.net - Not your grandmother's soup kitchen
Re:Cost of labor? by Acidic_Diarrhea · 2003-01-28 13:33 · Score: 2, Insightful

That's exactly it. Check out their website. All the works they currently have and all the ones they want to get are public domain. So it's a big project but one that we can eventually finish since the age of intellectual property that never expires is upon us. Today's books won't ever be in the public domain if the current trend continues.

--
I hate liberals. If you are a liberal, do not reply.
Re:Cost of labor? by GammaTau · 2003-01-28 14:18 · Score: 5, Informative

Additionally translations might generate practical limitations. If a text was written in ancient Greece and translated to English or some other language in the 20th century, the translation might not be public domain even when the original work is. Of course you are free to read the original text or make a new translation. Anyway even if a piece of literature was public domain, the translation to your native language might not be.
Re:Cost of labor? by IronChef · 2003-01-28 14:35 · Score: 0

Yeah, how many free books can there be? 5 or 6?

Of course, we'll see this number explode in the centuries to come.
Re:Cost of labor? by Anonymous Coward · 2003-01-28 14:40 · Score: 0

youre welcome for the free karma :)

Oh yeah!
Re:Cost of labor? by Nekoi · 2003-01-28 15:33 · Score: 2, Insightful

Well... the project is a worthy cause. I don't see copyrights being the biggest problem. most books that's worth reading are older classics anyways. Plus, the project is probably aimed at providing people that has no access to a library with materials that's available in one. I mean, if you live in the middle of no where, how are you suppose to get new books, let alone a library? on the other hand, a e-library can provide that person with the same material, as long as he/she has a internet connection. as to cost of labor, there is always people who are welling to put in a little time for a worthy cause. you just need to advertise well.
Re:Cost of labor? by kalidasa · 2003-01-29 00:37 · Score: 1, Interesting

Additionally translations might generate practical limitations. If a text was written in ancient Greece and translated to English or some other language in the 20th century, the translation might not be public domain even when the original work is. Of course you are free to read the original text or make a new translation. Anyway even if a piece of literature was public domain, the translation to your native language might not be.

Exactly. What's worse, modern texts of an ancient work are not usually considered to be in the public domain, because the work to try to clean up the errors that inevitably creep into the manuscript tradition is commonly accepted to be a copyrightable contribution. However, I don't think this is something that has ever been tested in a court (IANAL).

And many texts from before 1923 aren't very good by modern standards (too many errors).

So the solution is to try to get those who have the necessary philological skills to make translations to agree to donate their services - something that has proven an uphill struggle so far, as some translations have scored big time as bestsellers (Ciardi's Dante, Fitzgerald's Homer, Pevear's translations from the Russian, Mitchell's translations - mediocre though they are - of Biblical/spiritual "classics") and a lot of translators secretly nourish the hopes to be the next Arthur Waley.

And scholarly texts take years to produce. Again, the editors tend to nourish hopes they might supplement their income (slightly, here; we're not talking about Stephen King, or David Pogue, or even Simson Garfinkel type numbers; I doubt that editors of ancient texts even make 1/100 from their books what David Pogue makes) from the royalties from the 2,000 copies sold to libraries, or the 10,000+ copies sold to students.

Librarians? by Metallic+Matty · 2003-01-28 13:24 · Score: 4, Interesting

I'm not too informed about this topic; feel free to correct me.

If the goal is a universal library, and there is a need for a work force, wouldn't a program iniated on the library level to utilizie librarians as a volunteer work force, perhaps as a side project they might be interesting in helping along? I think of it as SETI in the library world.. *shrug*

Re:Librarians? by qortra · 2003-01-28 13:36 · Score: 2, Interesting

That's a pretty good idea. If each public American library (and perhaps other nations for other languages) was to commit about 5 books to be typed by its volunteers and staff each year (a resonable amount), the project could really take off. Estimate 5000 participants (conservative); 25000 books a year.
Re:Librarians? by Anonymous Coward · 2003-01-28 14:56 · Score: 1, Insightful

Yeah, right. They don't have anything to do. Let's have people with advanced degrees and public service management jobs doing typing for free.

Hey, this involves computers. Every person who has a computer should be expected to *volunteer* to each type, word for word, an entire book every year as a side project, because, I mean, hey, you ain't got nuthin' better to do with your life, right?

Actually a lot of librarians do contribute to the many, many digital library projects *you* take advantage of, without thinking of the amount of work invested, and without appreciation for their efforts.

Before volunteering the efforts of others, volunteer your own.
Re:Librarians? by Metallic+Matty · 2003-01-28 15:04 · Score: 1

Hey, this involves computers. Every person who has a computer should be expected to *volunteer* to each type, word for word, an entire book every year as a side project, because, I mean, hey, you ain't got nuthin' better to do with your life, right?

The key word there is volunteer: I just thought it might be a project that librarians might be interested in contributing to, no one is forcing them, nor expecting them to do so.

Actually a lot of librarians do contribute to the many, many digital library projects *you* take advantage of, without thinking of the amount of work invested, and without appreciation for their efforts.

I _do_ appreciate that efforts, and I commend them for volunteering their precious time.

Before volunteering the efforts of others, volunteer your own.

Lastly, I wasn't volunteering them, merely suggesting it as an idea. Secodnly, I would gladly volunteer to type up a book for this project.
Re:Librarians? by zsmooth · 2003-01-28 15:21 · Score: 1

Well, here you go. Knock yourself out.
Re:Librarians? by olethrosdc · 2003-01-29 03:10 · Score: 1

Well, had it *been* a universal library, it would have had indexed sections on the books, relating to language, country, year of publication and 'genre'.

--
I miss my rubber keyboard.(Homepage)

The REAL Problem by echucker · 2003-01-28 13:26 · Score: 2, Insightful

So many of the things that people want to read are copyrighted, and won't be availble until long after we're dead.

Re:The REAL Problem by Lenbok · 2003-01-28 13:34 · Score: 2, Insightful

If they become available at all, given the current copywrite extension precedents.
Re:The REAL Problem by IvyMike · 2003-01-28 13:38 · Score: 1, Informative

No, the real REAL problem is that because of Disney, copyright lengths keep getting extended and extended. At the current rate, Mickey Mouse will never be public domain. This is actually unconstitutional, since Congress is enabled to grant exclusive rights for "limited times" only. But it's the way things are.
Re:The REAL Problem by Anonymous Coward · 2003-01-28 13:43 · Score: 0

Of course you can read. Just buy the book and support the author.
Re:The REAL Problem by The+Analog+Kid · 2003-01-28 14:01 · Score: 1

and we wouldn't have these problems if it weren't for conservative supreme court judges.
Re:The REAL Problem by Anonymous Coward · 2003-01-28 14:04 · Score: 5, Insightful

Mickey Mouse will never be public domain because MICKEY MOUSE IS A TRADEMARK/LOGO. That would be like forcing IBM to give up their IBM logo/colors/design.

However, *Copyrighted* works should eventually go into public domain. The point is that after you are dead, anything - be it a movie, song, cartoon, book, poem --- whatever --- serves a greater good to mankind than it could to its dead creator. I think that a decade or two is too short of a limit for copyright. If I write a book when I'm 20 years old, I should still be allowed to make money off the sale of that book when I'm 40. But when I'm in the grave, it servs me no use.

Now, it could be said that a person who works hard to create pieces of work like movies or books or songs should be allowed to bestow the revenue from use of that material after the original author is dead. If I write a book that still sells well 20 years after my death, my son and daughter should be allowed to benefit from this copyrighted item in my 'estate'.

But I think that indefinite extensions are rediculous. I would say that 100 years is bordering on ridiculous. I think that 75 years is reasonable. If I create something when I'm 25, the copyright will outlive me by as much as 25 years.

In fact, I would propose that copyright should be extended to the life of the creator plus 20 years **OR** 50 years. Whichever is less (so if you die two years after the copyright, the copyright is still in effect for another 20 years).
Re:The REAL Problem by Thomas+M+Hughes · 2003-01-28 15:14 · Score: 3, Informative

This is actually unconstitutional, since Congress is enabled to grant exclusive rights for "limited times" only.

As much as I wish you were right, you're actually wrong on this. The Supreme Court ruled on the case, and found that what the Congress did was constitutional, and since the constitution grants the Supreme Court the right to interpret the Constitution, it is constitutional to do so. This will only change if the Supreme Court changes its ruling at a future date, or the Congress were to ammend the constitution to make it unconstitutional, this issue remains constitutional, as unfortunate as it is.
Re:The REAL Problem by rodgerd · 2003-01-28 15:46 · Score: 1

That's only a problem in the US currently. Perhaps Gutenberg will need to have a Gutenberg-Berne (for countries with the Berne convention minimum of 50 years) and a Gutenberg-US (for the States).
Re:The REAL Problem by L.+J.+Beauregard · 2003-01-28 16:30 · Score: 3, Insightful

There is some good in letting a copyright extend beyond the author's death. An author may die with children still not yet grown, and his royalties can provide for them. Life plus 20 or maybe 25 should be enough for this.
Some posthumous works may come out under a life-plus-X term that might have been cast aside under a life-plus-zero term. Life plus 50 is probably more than enough.
Life plus 70 is absurd and our so-called elected officials should be ashamed of going along with it. And may Sonny Bono *not* rest in peace.

--
Ooh, moderator points! Five more idjits go to Minus One Hell!
Delendae sunt RIAA, MPAA et Windoze
Re:The REAL Problem by Anonymous Coward · 2003-01-28 17:12 · Score: 0

If I write a book when I'm 20 years old, I should still be allowed to make money off the sale of that book when I'm 40.
How about when you're 40 you have to pick up your damn pencil and write another fucking book?
Re:The REAL Problem by Anonymous Coward · 2003-01-28 22:16 · Score: 0

No, the REAL problem is that YOU do not have the talent to come up with something creative and worth publishing, and that can capture the world's heart and mind and captivate audiences of the future.

Don't worry, neither do I.

But I can respect artists who can. I respect their right to control how their works are used. If copyrights exist forever, so be it.

What works have you created that are so great that "humanity" deserves them after you die?

I thought so.

And if you ARE creative, then there's nothing to stop you from creating something even more popular than Mickey Mouse. And YOU will have the freedom to choose whether your creation can be released in the public domain at the time of YOUR choosing.

POWER TO THE ARTIST!
POWER TO THE PROGRAMMER!
POWER TO THE AUTHOR!
Re:The REAL Problem by vidarlo · 2003-01-29 00:40 · Score: 1

In norway, you have copyright as long as you live, even if you get (hypotetical) 200 yrs. Then the right is transfered, to the person stated in the will. This right is enduring for 70 yrs after death of the creator. This means that if I write a novel when I am 20, I am assured that me, and my relatives get the income for about 150 next years...This is some problems with, but in general it works fine.

--
Assembling etherkillers for fun an profit
Re:The REAL Problem by overunderunderdone · 2003-01-29 03:05 · Score: 1

As much as I wish you were right, you're actually wrong on this. The Supreme Court ruled on the case, and found that what the Congress did was constitutional, and since the constitution grants the Supreme Court the right to interpret the Constitution, it is constitutional to do so.

A couple of points. The supreme court is not infallible so while as a legal matter their opinion decides a laws constitutionality I think it is perfectly fair point out constitutional flaws in a law even if the court have ruled that it IS constitutional.

Also the constitution does NOT explicitly grant the Supreme Court the right to interpret the constitution - they granted themselves that right in a very early case. ALL of our elected federal officials and judges swear the same oath to uphold the constitution. It is ALL of their responsibilities to do so. The idea that the supreme court alone decides constitutionality has led to some real irresponsiblity, particularly in congress where many lawmakers are willing to KNOWINGLY pass laws that are unconstitutional. The most glaring recent example was the campaign finance reform law. Most if it's staunchest supporters don't think it will pass muster with the supreme court - which is fine if they really believed it should, but one got the impression they didn't really consider the question of constitutionality at all. The really irresponsible ones were those that truly believed it was unconstitutional but bowed to popular pressure and voted for it anyway thinking the Supreme Court would clean up ther mess. That borders on treachery - it is certainly breaking their oath of office. Our government is meant to be balanced - congress should be interpeting and following the constitution when writing the law, The president should do so when signing or vetoing the law, the supreme court should do so when deciding cases. And congress should do so if they ever decide to check the courts power through regulation and exemptions (check out article 3, section 2 the last line in the second paragraph for a really fun constitutional crisis waiting to happen if the court ever REALLY pisses off congress).

Back to the topic at hand. I think however in this case the court was unfortunately right. The constitution puts it in congress's hands to decide the length of copyrights as long as that time is limited. Now congress keeps increasing the limits but the court can't rule that a limited time is "unlimited" simply because it's really long, or because they have written laws that keep making it longer. This is a case where I'm sure the court wanted to stamp the law "stupid, but constitutional". It's not the courts job to make sure the laws are good, or wise, only that they are constitutional. This law is constitutional on it's face - it's also bad, stupid and unwise. But, it is a GOOD THING that the court refrained from strike it down for those reasons. If they had they would be putting themselves in the position of unelected, unaccountable, anti-democratic (if benign - for now) rulers. It is congress' job to write our laws and OUR job to elect people that will do so wisely. The court cannot save us from ourselves and if they try they will only undermine democracy and balanced government.
Re:The REAL Problem by iuyterw · 2003-01-29 03:38 · Score: 1

And that leads back to the people
Nice try.
Disney doesn't give them (Congress) much wiggle room since it (Disney) assigns Congress the task to regulate copyrights to their (Disney's) heart's desire.
Seriously though, do you really think "the people" had anything to do with copyright being extended?
Representative democracy in the US hasn't been very "representative" of most people since before WWII.
Re:The REAL Problem by Squirrel+Killer · 2003-01-29 09:01 · Score: 1

If I write a book when I'm 20 years old, I should still be allowed to make money off the sale of that book when I'm 40.
How about when you're 40 you have to pick up your damn pencil and write another fucking book?
Interestingly enough, that is the fundemental argument surrounding the "(t)o promote the Progress of Science and the useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries" clause. It's not unlike the "You can never water it too much" quandry. Does that mean you can pour gallons upon gallons of water, or should you be careful to pour only a little bit?
How do you promote the most progress? By giving a lifetime copyright protection, or by ending copyright during the authors' lifetime? Would Salinger have written more had he not been able to live fat off of Catcher in the Rye royalties? It's a question of legitimate debate, but since the public domain doesn't have as much legislative, judicial, and monetary support as copyright holders, policy has veered strongly in their favor. The idea that copyright should only last a relatively short period of time is dismissed out of hand in mainstream circles.
-sk
Re:The REAL Problem by Anonymous Coward · 2003-01-29 09:04 · Score: 0

"An author may die with children still not yet grown, and his royalties can provide for them."

A reason for extending copyrights is for life insurance reasons....wonderful argument.

The children did not create the content. Copyrights are for the creator, not the wife, not the significant other, not his best friend, not his concubine or her love slave, not the family pet, for the creator. Only.

If you want to provide for your children after you die, there is an entire freakin industry (that's messed up already enough) just for that. I know of very few industries where you can work on something, pass away, and keep on earning for that previous work. It's not suppose to happen, so why should copyrights be granted this yet additional right.

Copyrights are not part of an estate. If you accept that they are or might be, you've bought into the whole intellectual "property" lie. It's a creative work. It is not property. Copyrights should remain as government granted rights to create and sell with the end result of providing *societal* benefit. That is why there are limits. Financial incentive for limited times protects that societal benefit. After that, that's it. There is suppose to be a balance between creator and the masses.

Today, there is not. If anything, the Gutenberg Project easily demonstrates why--it's 2003, and, at least in the US, we have to go back as far as 1923 or so before we get works that have passively fallen into the public domain.

Then again, to use your example, I suppose people would want the 105 year old and his 73 year old son to be earning profits from an 80 year old work still. Ain't that just grand....
Re:The REAL Problem by Anonymous Coward · 2003-01-29 10:21 · Score: 0

So what do you define as a "Limited" time? 1 lifetime + 20 years for an indivudual or 90 years from first publication for a company? That's what it is now. For 99.999% of the people alive today, it might as well be forever.
Re:The REAL Problem by Anonymous Coward · 2003-01-29 10:25 · Score: 0

So, for the guy who wrote his one and only novel when he was 20 lives to 90, then according to you his great-great-great grandchildren should be able to be sustained because they aren't grown yet.

How about: 20 years from production + an optional 20 years renewable only by the original author? You know, kinda like it used to be? That would be fair. If the guy penned a great work, published and got hit by a truck the next day, his kids would be taken care of for 20 years, and they wouldn't be able to leach off their parents legacay.

other issues... by Stanley+Feinbaum · 2003-01-28 13:30 · Score: 0, Troll

There are more serious technical issues for an online library. A real library has only a limited amount of books, and they can only be lent to limited amounts of people at once. An electronic library would have to follow the same principals, having time limits and user limits for each document. Otherwise they would be giving out works for free, and most are not intended to be distributed that way.

--

Stanley Feinbaum, professional journalist and master debater! God bless the USA!

Re:other issues... by Anonymous Coward · 2003-01-28 13:38 · Score: 0

Stanley Feinbaum, pompous shithead.
Re:other issues... by MBCook · 2003-01-28 13:39 · Score: 2, Informative

Most of the stuff on PG is public domain, IIRC. Unless Poe, Melville (I know it's wrong, so sue me), Shakespere, and others all climb out of their graves and form some kind of union (RIAA - Recently-undeceased Inkers of Aged Albums etc.) will people complain that they're getting ripped off by these works being put on the web.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.
Re:other issues... by Anonymous Coward · 2003-01-28 13:51 · Score: 0

What the hell are you talking about?

Do you have any idea what this project is?

It IS giving away things for free. Jesus man, get with it.
Re:other issues... by failedlogic · 2003-01-28 15:38 · Score: 2, Insightful

If you can publish a Poe compilation in 50,000+ prints, sell them all in a few years, reprint and repeat. Make a lot of money, no royalty payments and charge $20 a copy then they'll keep making it.

And this is where PG is having volunteer problems. The old books still make publishers money so their pre-press electronic versions won't be made available for free. So the process of OCR'ng the text at no cost is painstaking, labour intensive and a limited readership. What motivation!

If only by Cookeisparanoid · 2003-01-28 13:31 · Score: 4, Interesting

It would make life some much easier if I could search an online library rather than searching the library index. Just think how much space we could save as well rather than shelves full of books that are basically dead weight 95% of the time.
I think copyrights got to be the biggest hurdle, publishing houses arnt easily going to be perswaded to put oh say the next harry potter book online for free and risk losing millions

Re:If only by Anonymous Coward · 2003-01-28 13:47 · Score: 1, Interesting

Physical books are important because they offer a pretty good back-up system (not without their own archival problems of course). What happens if all your books are stored on a Control Data Corporation magnetic drum and it breaks?
Think I'm kidding? The U.S. National Archives has something of a crisis on its hands, as thousands of past governement records are stored on computer media which can not be read because the equipment that created them no longer exists. Worse yet, the data is often stored in a format which is no longer documented--even if you could read it from the media, you'd have still have to decipher it!
Re:If only by Maudib · 2003-01-29 01:29 · Score: 1

A couple of years ago I downloaded all of Project Guttenberg (excluding the genome project, pi and some other texts that arent 'readable' in the common sense) and wrote up some code to search the whole thing. It was fun. It wasnt even all that slow, assuming I was the only one to use it. Then one day I got stupid and put it on my school's intranet. It became unuseable with 5 simultaneous users. This wasnt a slow server; dual pIIIs and a gig of ram. Random texts were cached in memory.
My point is, unless you have the resources of google, or spread the library across a p2p network, a fully searchable library with everything is kind of unfeasable for public use.

If you do want to help by Anonymous Coward · 2003-01-28 13:31 · Score: 5, Informative

Distributed Proofreaders. Recently discussed on /. as well.

Re:If you do want to help by adamjaskie · 2003-01-28 14:02 · Score: 2, Interesting

Why not modify that in such a way as to have avaliable a scanned image of a single page of the book, along with an empty box to enter text? That way, people could work on ONE page at a time, while others work on other pages. A single book could be typed in by 547 different people, each typing up one page.

--
/usr/games/fortune
Re:If you do want to help by Unoriginal+Nick · 2003-01-28 14:18 · Score: 1

Did you look at the site? That's basically how is works already. It gives you the scanned image of one page and the text output of the OCR program. You then compare the two and make any corrections to the text and formatting for the page. Many people can be working on the same book at the same time.
Re:If you do want to help by mrjive · 2003-01-28 14:19 · Score: 1

Yeah....except what happens when the version I have is 300 pages, and yours is 500?

This would probably work better on a per-chapter basis, not per-page.

--
If you can't beat them, arrange to have them beaten. -George Carlin
Re:If you do want to help by Anonymous Coward · 2003-01-28 14:21 · Score: 0

Well, if you checked out the link, you'd notice that what they actually do is OCR the text so no one has to type stuff in, and then people can look at one page at a time, and they proofread it. The whole point is that you DO have hundreds of people working on different pages of the same book at once.
Re:If you do want to help by adamjaskie · 2003-01-28 14:35 · Score: 1

Yeah, as a matter of fact, I looked closer, joined, and proofread a few pages. So HA!

--
/usr/games/fortune
Re:If you do want to help by adamjaskie · 2003-01-28 14:38 · Score: 3, Informative

They give you an image of the scanned page, along with the OCR'd text. I just looked closer, and did a few pages as well. Its pretty easy. Took me about 5-10 minutes/page. I had to remove a few end-of-line hyphenations, fix an OCR-mangled word, and replace single hyphens with double hyphens for em dashes a few times.

--
/usr/games/fortune
Re:If you do want to help by Anonymous Coward · 2003-01-28 17:24 · Score: 0

A single book could be typed in by 547 different people, each typing up one page

I thought of that too, but then I realised that that would only work for books that are exactly 547 pages long...

barbor or barber by (rypto* · 2003-01-28 13:32 · Score: 4, Funny

The mechanics of a universal library are simple. The tricky part: hairdressing the free labor.

Karma: Barber

--
#3 pencils and quadrille pads.

Books Are Printed With Computers... by MBCook · 2003-01-28 13:32 · Score: 4, Interesting

... aren't they? I mean, even if I buy "Moby Dick", isn't all that text in a computer at the publisher somewhere? They format it to fit the pages, etc, and then send that file off to the printers, correct? So it is still on the publisher's compuer, and it shouldn't be TOO hard to get it into the simple text files instead of whatever odd format they might use. What about when books get published in Braille. A computer must do that, right? There isn't some guy pokeing dots in steel plates to emboss the pages with, right? I could be wrong, this is my guess. Anyone in the publishing industry out there?

So the point of this post is: why not ask publishers for the material? If it's already public domain, it's not like they'll lose profits, and maybe Project Gutenberg could let them put a little

This text donated by Joe Bob Publishers Inc, of Wala Wala Washington (www.joebobbooks.inc)

kind of thing at the top of each book they donate. Plus, maybe it's a tax write off. I don't know. That said, I'd thing it'd be much easier to just type things in than OCR it or use Speach-To-Text.

--
Comment forecast: Bits of genius surrounded by a sea of mediocrity.

Re:Books Are Printed With Computers... by BJH · 2003-01-28 13:57 · Score: 5, Insightful

I used to be a book editor (at a Japanese publishing company). Let me give you a rundown of the process we followed (I'm sure there are more efficient places than the one I worked at - O'Reilly is well known for their high level of automation).

Get manuscript from author.
This could be either handwritten or typed. If typed, it's likely to be in either plain text or Word format, but with a lot of errors.

If the manuscript's handwritten, farm it out to a typist.
We used to pay 0.5 yen a letter for English, 1 yen a character for Japanese.

Once it's data, edit.
I used to do my editing on a Mac with BBEdit, but this varies a lot between editors - some do it on (shudder) Word, where all the formatting gets in the way.

Reformat it to pass it to the DTP firm.
When I say 'reformat', I don't mean making things bold or italic - I mean cleaning it up so it's easy to do the next step, which is...

Print out and insert format directions.
The manuscript is printed out, and you go through it one line at a time adding things like "Line break here" and "Use larger font for this".

Proofs arrive from the DTP firm.
You go through the proofs, making corrections by hand (i.e., "Move this down one line", etc.)

The DTP firm passes you back the formatted data.
QuarkXPress is king here. You get the data in a finished form and pass it to the printers.

The printer produces the final proofs.
You can still make corrections, but these have to be done by the DTP firm, who then give you the updated data.

Last-minute corrections are made.
This depends on the printer, but quite often these are done by pasting the changes over the top of the printer film (i.e., they're not reflected in the data).

The book is printed.
Corrections after printing are usually done as described above (pasting changes over the film).

The problem with this is that the text data held by the editor is now out-of-date in all sorts of ways:
- It doesn't have the corrections made by the DTP firm.
- It doesn't have the corrections made by the printer.
- It doesn't have any formatting.

QuarkXPress can output the data in other forms, but it's still missing the last-minute changes and after-printing changes, and quite frankly once it's on the market, most publishing companies aren't interested in reworking the data to keep it as text for the next 90 years, so it can be released into the public domain.
Re:Books Are Printed With Computers... by mrjive · 2003-01-28 14:32 · Score: 1

I'd imagine this process mostly applies to new books or new editions of existing books. I would think that a public domain work wouldn't have any last-minute changes done to it by the DTP firm or printer. As for formatting...does that really matter for PG?

It'd be nice if this process could be expediated with help from those who might have already done the data-entry legwork (ie the publishers)

--
If you can't beat them, arrange to have them beaten. -George Carlin
Re:Books Are Printed With Computers... by Anonymous Coward · 2003-01-28 22:30 · Score: 1, Funny

If the manuscript's handwritten, farm it out to a typist.
We used to pay 0.5 yen a letter for English, 1 yen a character for Japanese.

Just reviewed my typing speed, and how much I get for spending an hour writing or debugging code.

Where do I get in?
Re:Books Are Printed With Computers... by jkrausyao · 2003-01-29 05:46 · Score: 1

Many of the books prepared for Project Gutenberg were created before computers were used for publishing. Therefore the first time many of these books exist as computer text is when they are scanned and proofed by a volunteer.

Once these books have been converted to computer text then other publishers could create new print or ebook editions for sale.

Many of the older books at Gemstar, http://www.gemstar-ebook.com/, used text from Project Gutenberg as the source. Search for books by publisher eBook Classics.
Re:Books Are Printed With Computers... by Anonymous Coward · 2003-01-29 20:17 · Score: 0

Not just funny, do the math.

0.5 yen is US$ 0.0042, so a one-finger typist with an average of 1 letter per second would make $15.12 per hour.

Is there anyone here that slow?

WiReD by IvyMike · 2003-01-28 13:33 · Score: 2, Interesting

I'd just like to point out that this is the third story from Wired to show up on slashdot today. And it's not even that bad of a story. I think this must mean Wired is cool again.

Re:WiReD by NerdSlayer · 2003-01-28 13:59 · Score: 2, Insightful

Seriously. And there were a couple of more earlier in the week, I believe. What's the deal? Slashdot has turned into Wired with trolls substituted for pictures and illustrations. Well, I guess there's the goatse guy...
Re:WiReD by rilister · 2003-01-28 14:45 · Score: 1

yup. About the only added value I got from buying the thing this month was the nifty Spike Jonze flipbook of some Jackass-style stunt.

-can someone *please* scan it and post up an animated gif to save me the all the effort with scissors and tape?

--
'This writing business. Pencils and what-not. Over-rated if you ask me. Silly stuff. Nothing in it' - Eeyore

copyright information by Anonymous Coward · 2003-01-28 13:34 · Score: 5, Informative

Keep in mind the following copyright rules:

1. Works first published before January 1, 1923 with proper copyright notice entered the public domain no later than 75 years from the date copyright was first secured. Hence, all works whose copyrights were secured before 1923 are now in the public domain.
(This is the rule Project Gutenberg uses most often)
Works published from 1923-1977 retain copyright for 95 years. No such works will enter the public domain until 2019.
2. Works first created on or after January 1, 1978 enter the public domain 70 years after the death of the author if the author is a natural person.
(Nothing will enter the public domain under this rule until at least January 1, 2049.)
3. Works first created on or after January 1, 1978 which are created by a corporate author enter the public domain 95 years after publication or 120 years after creation whichever occurs first.
(Nothing will enter the public domain under this rule until at least January 1, 2074.)
4. Works created before January 1, 1978 but not published before that date are copyrighted under rules 2 and 3 above, except that in no case will the copyright on a work not published prior to January 1, 1978 expire before December 31, 2002. If the work is published before December 31, 2002, its copyright will not expire before December 31, 2047.
(This rule copyrights a lot of manuscripts that we would otherwise think of as public domain because of their age.)
5. If a substantial number of copies were printed and distributed in the U.S. prior to March 1, 1989 without a copyright notice, and the work is of entirely American authorship, or was first published in the United States, the work is in the public domain in the U.S.
6. (This rule is complicated, and is seldom applied). Works published before 1964 needed to have their copyrights renewed in their 28th year, or they'd enter into the public domain. Some books originally published outside of the US by non-Americans are exempt from this requirement, under GATT. Works from before 1964 were automatically renewed if ALL of these apply:
At least one author was a citizen or resident of a foreign country (outside the US) that's a party to the applicable copyright agreements. (Almost all countries are parties to these agreements.)
The work was still under copyright in at least one author's "home country" at the time the GATT copyright agreement went into effect for that country (January 1, 1996 for most countries).
The work was first published abroad, and not published in the United States until at least 30 days after its first publication abroad.

This means that we can't simply take electronic versions of modern texts and put them in the archive, because only out-of-copyright books are in there.

Re:copyright information by Qinopio · 2003-01-28 13:36 · Score: 0

That's all assuming we don't nuke ourselves off the face of the planet before most of those books hit the public domain...

--
__________
[Big Brick Wall]
Re:copyright information by pgrote · 2003-01-28 14:38 · Score: 1

Thank you for posting this. I don't know why you did it AC. It's great info. Since the Disney led coup of the copyright act I've been looking for a crib sheet on when stuff will be available and now I have it.

Thanks again even if it is depressing.
Re:copyright information by rodgerd · 2003-01-28 15:50 · Score: 1

Useful, but all US law, of course.
Re:copyright information by ColaMan · 2003-01-28 22:41 · Score: 2, Informative

Unless you visit some other , non-US version of project gutenburg , such as the Australianone, which I peruse through every now and then.

From the .au front page:

Works in the 'public domain' in Australia
Under Australian copyright law, literary, dramatic, & musical work published, performed, communicated, or recorded and offered for sale in an author's lifetime are protected for the life of the author plus fifty years from the end of the year of the author's death. After this time they enter into the public domain. EBooks on this page may be still copyright in the US and are therefore not available from the US site.

So , at present Australians can get up to the beginning of 1953. Seems a hell of a lot easier to follow than the mess of dates the parent posted.

--

You are in a twisty maze of processor lines, all alike.
There is a lot of hype here.
Re:copyright information by angle_slam · 2003-01-29 08:15 · Score: 1

This means that we can't simply take electronic versions of modern texts and put them in the archive, because only out-of-copyright books are in there.
Thanks for the very useful post. But the parent was talking about public domain books. Yeah, you can't take electronic version of modern texts. But some publisher somewhere has Charles Dickens on the computer. Ask them to export the text to Project Gutenberg. Will they want to do so? Probably not. They don't want to hurt their own sales. But it won't hurt to ask.

So OCR everything anyway by Anonymous Coward · 2003-01-28 13:34 · Score: 0

and post the on freenet or something. These IP 'laws' are just ridiculous.

Distributed Proofreaders, Copyright by dachshund · 2003-01-28 13:34 · Score: 4, Interesting

Didn't we just have a set of articles on Distributed Proofreaders? Those guys are harnessing technology to churn out books at a mad rate. Seems to me that Wired's reporter is maybe just a tad uninformed.

In any case, the real obstacle to a useful electronic library isn't labor. It's copyright.

Re:Distributed Proofreaders, Copyright by Anonymous Coward · 2003-01-28 13:39 · Score: 0

mod this parent up...
Re:Distributed Proofreaders, Copyright by Anonymous Coward · 2003-01-29 11:08 · Score: 0

Yep! Miss Informed and proud of it! (But quickly getting cured of this affliction.)

Brad DeLong

hmmmmmm by pummer · 2003-01-28 13:34 · Score: 3, Funny

which will be ready first, Project Gutenberg or Duke Nukem?

Re:hmmmmmm by yerricde · 2003-01-28 14:06 · Score: 2, Funny

Are you claiming that Duke Nukem Forever will not be released within the next ninety-five years?

--
Will I retire or break 10K?

Um, Distributed Proofreaders by volsung · 2003-01-28 13:35 · Score: 4, Interesting

Apparently the author of the article missed Distributed Proofreaders. They seem to have survived their Slashdotting and actually retained a good fraction of their new users. This month they've proofed 116,827 pages! (Cut that in half for unique pages, I think) They have completed in their 2(?) years of existence 918 books, and have another 317 being assembled. It really seems like they are only limited by what they can get their hands on in the public domain.

Re:Um, Distributed Proofreaders by mumkin · 2003-01-28 14:01 · Score: 1

Rah, Rah Distributed Proofreaders!

The Slashdot story added several thousand users to their rolls, myself included, and upped the output volume dramatically. Things have quieted down a bit in the months since the Slashdotting, but it's going *very* well over at DP. I encourage anyone who is remotely interested in helping to create a phat, free digital library to check it out and get involved.

It's truly amazing what you can accomplish with a large-enough group of volunteers, over a long-enough period of time. I've spent relatively little time proofing -- just a few pages whenever I've nothing else to do -- but over the course of several months it turns out I've proofed 551 pages... that's a decent-sized book that I, personally, have helped to bring to the masses. How cool is that?

It's off the homepage now, but I believe that a previous note from DP project management estimated that if it continues at its current pace, Distributed Proofreading will manage to add ~2,000 books to the Project Gutenberg library this year alone!
Re:Um, Distributed Proofreaders by madfgurtbn · 2003-01-28 15:21 · Score: 4, Interesting

How cool is that?

Way cool. I've been working there once in a while since the first /. story, and I think it's the one of the most important things happening on the web.

It's only a matter of time before someone with a relatively massive audience like NPR does a story on DP and then we'll see what it's really like to be slashdotted. I would like to see the international membership increase, as well.

I recommend it to anyone who reads. A page a day or a week or a month helps save another book. Most of these old books will become extinct if they are not saved to the web sooon.

--
Send lawyers, guns, and money. Dad, get me out of this.
Re:Um, Distributed Proofreaders by dmoynihan · 2003-01-28 17:22 · Score: 1

Yeah, and I'd also add that one of the big sources for books on DP is (currently) large, grant-funded, completely useless to the average reader operations like MOA and, to a lesser extent, CWRU or Canadiana.org.

These other sites digitized 10s of thousands of titles, but the stuff isn't even really available (think huge single page images, slow-moving search engines, etc.).

However, these images download great through a second PC, get OCR'd, ftp'ed, copyright clearance, tons of volunteers help, another person manages the project: boom! DP's gonna easily do a million pages this year should have their 1,000th book next month...

and they're growing.

Thanks!

(I'm the one responsible for Anatomy of Melancholy, but also lots of mysteries, Parkman, 'n stuff so please don't kill me.)
Re:Um, Distributed Proofreaders by madfgurtbn · 2003-01-29 00:19 · Score: 1

I'm the one responsible for Anatomy of Melancholy

You BASTARD!

Actually, I am a little ashamed to admit it, but I'm kind of a fair weather proofer. If the first page I see is complicated I will do it, but I almost never go back to that book. I like the low hanging fruit.

Anatomy was something else. It was like that relative who drops in unannounced and never leaves for a month.

--
Send lawyers, guns, and money. Dad, get me out of this.
Re:Um, Distributed Proofreaders by DeadSea · 2003-01-29 01:17 · Score: 1

It needs a domain name before you could get a NPR story about it. I'd like to see people try to type in H-T-T-P-colon-slash-slash-T-E-X-T-S-oh-one-dot-A-R -C-H-I-V-E-dot-O-R-G-slash-D-P from hearing it on the radio. It isn't a .com, it won't work with a "www", it has random numbers in it, and it is too long. Fine if you are clicking on a link from slashdot, but too much to remember or even write down after hearing it on the radio.
Re:Um, Distributed Proofreaders by mumkin · 2003-01-29 11:52 · Score: 1

I agree. There's gottabe a catchy and apropos domain name that DP could redirect to its current location -- definitely the best PR investment that could be made -- as texts01.archive.org/dp is a bitch of an url to publicize. (rummage rummage...) Freetext.net seems to be a stagnant site with a good name. Maybe they're done with it and wouldn't mind giving it up to an active project.

Applause and props to the folks responsible for DP's backend and for its (seemingly) well-managed processes, but it's rather the coy site in terms of self-promotion. I shudder to think what kind of proofing volume might be realized if they scored some serious exposure, and what that would mean for the pre- and post- processing ends of things. They might easily run out of scanned material to proof and be overwhelmed with a backlog of proofed works to post-process (there are 55 waiting to be "massaged" into final etexts at the moment).

Gutenberg by Eddy+Johnson · 2003-01-28 13:35 · Score: 3, Funny

Ah, good old Gutenberg, the German man who invented the printing press. I believe he was made Man of the Millenium in 2000. Not bad for a guy whos been dead for a few hundred years. The Library of Congress has a Gutenberg Bible on display (the Bible being, of course, the first book made with a printing press.)

And while we're discussing the speech recognition for books, it wouldn't make sense for poetry, which uses alternate spellings sometimes. It also wouldn't make sense for at least one work that I can think of - Through the Looking-Glass by Lewis Carroll, which is already up there. When Alice first looks at the poem Jabberwocky, it's backwards. Try saying that backwards faster than you can type it!

--

Anonymous Coward: (n.) 1. nerd at school or library. 2. karmawhore in training. 3. embarrased prep.

Re:Gutenberg by CharlieG · 2003-01-28 14:40 · Score: 4, Informative

Gutenberg did NOT invent the printing press - He invented moveable type -a BIG difference

Before Gutenberg, there were printing presses, BUT you had to carve the master (the plate) for each page, and it could NOT be changed. Other folks had the IDEA of movable type, but what Gutenberg did was figure out a way to make it work (what he did was figure out how to make all the type the same length, so that when you press down, all the type comes in contact with the paper)

Movable type gives you one huge advantage - you can make up a bunch of sets of letters, and reuse them for many pages.

The total irony of this is that movable type is almost never used anymore - we make up a plate for each page. Of course, we are doing it with electronic movable type, but that is here nor there. Movable type started to go away with the Linotype machine - which made up one LINE of type at a time.

I think I still have an ingot of linotype metal around somewhere

--
-- 73 de KG2V For the Children - RKBA! "You are what you do when it counts" - the Masso
Re:Gutenberg by MrOrn · 2003-01-29 00:16 · Score: 3, Informative

Actually, he didn't even invent moveable type. The Chinese did that with wooden blocks much earlier and there were existing printing presses that used moveable blocks.
Also, there were prior claimants to the "invention" in Europe, such as Laurens Coster in Haarlem, Netherlands, and others in Bruges, Flanders (Belgium), Avignon (Waldvogel, who is recorded as having "steel alphabets" in 1444) and Bologna.
BTW Gutenburg's "invention" was not the length of the type. It was to have cast the movable type in metal using a matrix. As he was a goldsmith and his father was the Master of the Episcopal Mint in Mainz, this was a great instance of lateral thinking, adapting technology he knew well and applying it to a new field. He would have seen coins being minted and twigged that you could print books like that.
He also designed the press (adapted from existing wine presses) and came up with an ink that was suitable for the process of printing with this type of press (the ink had to be viscous, rather than the ink used for manuscripts).
His combination of the three things meant that he could successfully exploit printing commercially. So Gutenburg was probably the first to exploit it commercially, although he wasn't very successful (5 years (1450-1455) isn't a long time to have a revolutionary business). This fact has ensured that he is credited with the invention of modern printing.
Re:Gutenberg by kalidasa · 2003-01-29 01:03 · Score: 3, Interesting

Actually, he didn't even invent moveable type. The Chinese did that with wooden blocks much earlier and there were existing printing presses that used moveable blocks.

Are you sure about this? My understanding was that early (pre-Gutenburg) Chinese presses didn't have sorts, because with the sheer size of the Chinese writing system, they wouldn't have been efficient with the level of technology to produce wooden blocks. But I'm willing to be corrected (with a reference, preferably).

Anyway, see Elizabeth Eisenstein's The Printing Press as an Agent of Change for a lot of the information that Mr. Orn describes. A more accessible book, The Nature of the Book, Adrian Johns, discusses some of this in the in the earlier chapters.
Re:Gutenberg by Anonymous Coward · 2003-01-29 01:57 · Score: 0

Cool! The first book ever printed was sci-fi!
Re:Gutenberg by MrOrn · 2003-01-29 05:23 · Score: 1

Are you sure about this? My understanding was that early (pre-Gutenburg) Chinese presses didn't have sorts, because with the sheer size of the Chinese writing system, they wouldn't have been efficient with the level of technology to produce wooden blocks. But I'm willing to be corrected (with a reference, preferably).
Good call. As it stands what I wrote is misleading. I did not mean that the Chinese had a sort for each ideograph -- what I meant was that they had invented the idea of the forme containing movable page elements plus furniture.
Re:Gutenberg by MrOrn · 2003-01-29 17:08 · Score: 1

Actually, yes, I will correct you and give a reference. :-)
My previous comment was from memory but after your comment, I did go check my books. The source is McMurtrie, Douglas. C. The Book: The Story of Printing & Bookmaking, New York: Oxford University Press, 1943. Third Ed. (rev.).
Out of print but should be used copies around, it is (or was) a standard reference in printing history. Check B&N
From p. 95ff.: Not only block printing but also movable types originated in China. The Chinese invention of separate types antedated the experiments of Gutenberg by more than four hundred years. The inventor was Pi Sheng, and his types were made of baked clay and not of metal. As the event is of major importance in cultural history, I am quoting the original record in full, as translated from the essays of Shen Kua, a Chinese writer who was contemporary with the invention and possibly a personal friend of the inventor...[here follows a very long quotation from Shen Kua.]
Other Chinese historians confirm the record of Pi Sheng's invention. Types are also reported to have been made of tin, but these, as well as the earthenware types, did not work well with the watercolour ink. So wooden types were made, in spite of the objections which Pi Sheng had raised against them. There is a record of the making of wooden types in 1314 by Wang Cheng, who first cut the characters on a block of wood and then sawed them apart. Wang is said to have arranged his types in a case in the form of a revolving table and to have provided something over sixty thousand types for the printing of a book on agriculture, and other works.
There is quite a bit more on earlier use of type in Asia, including Korea, but I won't quote it entirely as there is 15-pages-worth.
So you stand corrected and so do I. I had forgotten this, so thanks for making me look it up.
Re:Gutenberg by kalidasa · 2003-01-30 02:02 · Score: 1

Thanks! Thanks a lot for looking that up. That's different from what I had understood.
Re:Gutenberg by Sique · 2003-01-31 10:49 · Score: 1

The Library of Congress has a Gutenberg Bible on display (the Bible being, of course, the first book made with a printing press.)

The Bible being the first printed book makes a good legend, but the first book printed by Gutenberg and Faust, his compagnon, was the "Donatus", a latin grammar book quite popular at the time. The 42 lines bible was done more as an advertising instrument for the printing press, showing its capability. After more than 700 "Donatus" were already sold and Gutenberg and Faust got the printing process running smoothly, Gutenberg decided to put all the experience in an ambitious project. Gutenberg later made a new print of the bible with 36 lines per page.

As a matter of fact, Gutenbergs real name was Johannes Gensfleisch. Later, when he became a lackey at the court of Arnold of Hassia-Nassau, he got the name of Johannes Gensfleisch zum Gutenberg.

--
.sig: Sique *sigh*

You can help by geyser · 2003-01-28 13:38 · Score: 3, Informative

The volunteer page is the place to start:
http://promo.net/pg/volunteer.html

Time to Request Digital Copies from Publishers by CaptCanuk · 2003-01-28 13:39 · Score: 4, Insightful

All digital versions of books that publishers have should be requested and maintained in a safe place till their respective patents expire so that they can be easily integrated into the public domain.... especially if OCR or speech recognition doesn't get any better any time soon.

--
---- The geek shall inherit the Earth.

Re:Time to Request Digital Copies from Publishers by kubrick · 2003-01-28 17:32 · Score: 1

till their respective patents expire

Patents on the e-book technology used might expire, but the content contained within will still be under copyright if it was written after 'Steamboat Willie' (1928), now and forever more, at least within the USA.

--
deus does not exist but if he does
Re:Time to Request Digital Copies from Publishers by mattax · 2003-01-28 20:06 · Score: 1

Of course, all books should be available for sale in electronic format right now. The only problem is piracy. But at least the book pirates would be fairly centralised (at printers and not in home).
Re:Time to request digital copies from publishers by I+am+Jack's+username · 2003-01-29 03:28 · Score: 1

All digital versions of books that publishers have should be requested and maintained in a safe place till their respective [copyrights] expire so that they can be easily integrated into the public domain.... especially if OCR or speech recognition doesn't get any better any time soon.
I'm prety sure speech recognition and OCR will have been perfected before copyrights expire - which will be just before universal heat death.

plain text -- WHY?? by CoughDropAddict · 2003-01-28 13:42 · Score: 3, Interesting

I cannot believe that Project Gutenberg continues to use plain text as their source code! I can see why it would have been compelling in 1971, and it still may be true that there are systems out there that can only read 7-bit ASCII.

But that's absolutely no reason why the source shouldn't be marked up. Marked up source can always be converted to ASCII, but you cannot derive semantic markup from ASCII.

Re:plain text -- WHY?? by SoupIsGoodFood_42 · 2003-01-28 14:12 · Score: 1

Yes! As much as I admire the project, I also think that this is incredibly stupid. It's not like there are no tools, basic XML, or even HTML would be usefull. Infact, there is already a open ebook XML-type standard being created by a lot of the big corps like Adobe etc.
It doesn't have to be anything flash, just something to show chapters, titles, [P]s, [BR]s, [B]s, [I]s, etc.
It just seems like such a waste of effort to convert all these books, only to end up with something that has no semantic structure, I thought that would be half the reason for doing it.
Re:plain text -- WHY?? by Anonymous Coward · 2003-01-28 14:20 · Score: 0

more BOOKTITLE

Damn formating, why couldn't they just leave it in ASCII, they can add the formating later.
Re:plain text -- WHY?? by TC+(WC) · 2003-01-28 14:23 · Score: 1

What, exactly, do you want that needs some sort of markup. What's wrong with displaying plain text books in, suprisingly, plain text?
Re:plain text -- WHY?? by johnwroach · 2003-01-28 14:25 · Score: 2, Insightful

So it will be compatible with anything. Every computer can handle plain text (and darn near every program, too). The same isn't true with marked up source.
Re:plain text -- WHY?? by Anonymous Coward · 2003-01-28 14:28 · Score: 0

In fact, just by using latex and sed, you can create a very readeable ebook in ps/pdf etc... format. All you need to do for end users is a little GUI that let the user search a book from the Gutenberg project, download the associated script to convert (a bash script, no less...) and... voila, you have a easy to use ebook creator! Since each book is different on how you change chapters and paragraph, users will have to create those scripts, but it's a 5-10 mins jobs to people that uses sed and latex on a regular basis.
Re:plain text -- WHY?? by CoughDropAddict · 2003-01-28 14:28 · Score: 1

Hello? Did you read the message you are replying to? I specifically said that marked up source can be converted to plan text, and therefore still remain compatible with anything.
Re:plain text -- WHY?? by ChaosDiscord · 2003-01-28 14:29 · Score: 4, Informative

I cannot believe that Project Gutenberg continues to use plain text as their source code! I can see why it would have been compelling in 1971, and it still may be true that there are systems out there that can only read 7-bit ASCII.

That's exactly why. Since 1971 a wide variety of encodings and markup languages existed. 32 years later the only system still trivial to read is plain old ASCII. Project Gutenberg is most interested in preserving the texts themselves. The texts are quite well preserved in ASCII. Sure, some formatting is missing, but it's relatively minor for the majority of books in question. And given the existance of this unformatted text it's alot easier to create formatted text than from scratch, so you even get a benefit there.
But that's absolutely no reason why the source shouldn't be marked up. Marked up source can always be converted to ASCII, but you cannot derive semantic markup from ASCII.

I think you're a bit confused on semantic markup. By and large publishers aren't interested in semantics of the documention, just the formtting.

--
Search 2010 Gen Con events
Re:plain text -- WHY?? by CoughDropAddict · 2003-01-28 14:33 · Score: 1

Books are not plain text. Chapter titles are usually typeset in bigger, heavier text. They italicize the names of terms or emphatically pronounced dialog. They often have diagrams and tables. These things can be approximated with plain text, but books definitely convey more information visually than plain text can accommodate.
Re:plain text -- WHY?? by CoughDropAddict · 2003-01-28 14:40 · Score: 1

The texts are quite well preserved in ASCII. Sure, some formatting is missing, but it's relatively minor for the majority of books in question. And given the existance of this unformatted text it's alot easier to create formatted text than from scratch, so you even get a benefit there.

Given a choice between more information and less, which would you choose? You can always throw away excess information, but you cannot create information that was not there to begin with.

I think you're a bit confused on semantic markup. By and large publishers aren't interested in semantics of the documention, just the formtting.

Congratulations, you just reinvented the monstrosity of CSS-less HTML.
Re:plain text -- WHY?? by nomadic · 2003-01-28 14:47 · Score: 1

Some of their works ARE in HTML.

I don't have much of a problem with plaintext, thought something with better formatting would be preferable. What annoys me are the pages and pages of copyright and Project Gutenberg information they put at the beginning of each file.
Re:plain text -- WHY?? by dachshund · 2003-01-28 14:48 · Score: 1

Project Gutenberg doesn't mind markup lanuages. They just like to have a copy of the plain ASCII as well.
There are a bunch of folks on the Gutenberg mailing lists who are ostensibly trying to settle on a markup language and a setup for quickly typesetting books out of it.
I don't know what kind of progress they've made. Last time I tuned in, they were getting nowhere because the idea of going through PG's back-catalog and adding markup was too daunting (though it's not gonna get any easier...)
Re:plain text -- WHY?? by johnwroach · 2003-01-28 15:09 · Score: 1

And why is going through an extra step a good idea?
Re:plain text -- WHY?? by _Chainsaw · 2003-01-28 15:28 · Score: 1

The header is being changed.... a few lines will appear at the top with the remainder appearing at the end of the e-text... of course this only affects e-texts added to the archive _after_ the header change.
Re:plain text -- WHY?? by madfgurtbn · 2003-01-28 15:29 · Score: 1

What's wrong with displaying plain text books in, suprisingly, plain text?

You would be surprised how few books are truly just text. Join Distributed Proofreaders today and you'll soon find yourself faced with everything but plain text. And most of those books are around 100 years old. Assuming anything else is ever allowed to become public domain, it will only get more difficult to get meaning out of a plain text book in PG.

--
Send lawyers, guns, and money. Dad, get me out of this.
Re:plain text -- WHY?? by FattMattP · 2003-01-28 15:43 · Score: 1

And why is going through an extra step a good idea?
For the same reason it's a good idea for projects like The Linux Documentaiton Project. If the text were marked up in some format then it would be easy to convert it to any other format: PostScript, PDF, HTML, plain text, or even formats we haven't started using yet.

--
Prevent email address forgery. Publish SPF records for y
Re:plain text -- WHY?? by dvdeug · 2003-01-28 15:44 · Score: 1

But that's absolutely no reason why the source shouldn't be marked up. Marked up source can always be converted to ASCII, but you cannot derive semantic markup from ASCII.

And you don't really need semantic markup for a couple italics and titles here and there. Books that do need markup usually get it; but I'm working on a simple book in TeX, and it's taking me an order of magnitude longer than handling a book of prose or poetry the same length, so we don't do many books that need the markup.
Re:plain text -- WHY?? by rodgerd · 2003-01-28 15:52 · Score: 3, Interesting

Amen.

It could at least shift to unicode, so we can write in languages other than English (and English-with-no-accents, at that!).
Re:plain text -- WHY?? by rodgerd · 2003-01-28 15:54 · Score: 1

Because ASCII is American. Much of the rest of the world uses different charactersets. Gutenberg is completely impoverised with regard to, say, French, Norse, or countless other languages.

It's a dumb thing to retain.
Re:plain text -- WHY?? by mumkin · 2003-01-28 17:16 · Score: 1

While the Distributed Proofreading Document Guidelines do call for sacrificing many aspects of text formatting, DPed works retain italics (using html-style tags), and boldface (sortof -- all caps for some reason). Double carriage returns indicate paragraph breaks, and line-lengths are preserved. Upper ASCII characters are required, too, so most of the accents and non-English standard characters are preserved. Some book projects have their own special guidelines that differ from the Doc Standards, and several of them include plans to distribute the finished work with high-res scans of illustrations, music staves, whathaveyou included.

How Gutenberg distributes them once DP hands off the text is up to Gutenberg. There are other repositories of texts online (see archive.org ) that do things differently. What's perhaps most important, in terms of preservation, is that a fragile old book has been scanned for posterity. Distribution and portability benefits from its conversion to semi-plain and very plain text. As long as the original scanned images are preserved, they can be revisited if a richer document is required.

Now, if only E Ink would get a product to market!
Re:plain text -- WHY?? by dvdeug · 2003-01-28 19:06 · Score: 1

Much of the rest of the world uses different charactersets.

And when posting texts in non-English languages, Project Gutenberg uses those character sets.

Gutenberg is completely impoverised with regard to, say, French, Norse, or countless other languages.

But Gutenberg has several books in French, in Latin-1. The problem Project Gutenberg has with non-English languages is that people tend to work with local e-text groups, like Projekt Gutenberg-DE or Project Runeberg, instead of working with PG.
Re:plain text -- WHY?? by ddimas · 2003-01-28 19:27 · Score: 1

Actually there is one, and only one, good reason to move away from plain vanilla ASCII text. Non-Roman alphabets. A perfect example of this is the untranslated Bible. The OT is in Arahmaic, the NT is in Greek. Other than that I agree with you.
Re:plain text -- WHY?? by jkrausyao · 2003-01-29 05:55 · Score: 1

The encoding and formatting of the books are decided by each volunteer. Each book can have multiple file copies. The only requirment is that one of these copies is plain text. Of the books that I have prepared each has three file copies: plain text, UTF-8 or ISO-8859-1 text, and XHTML.
Re:plain text -- WHY?? by Ajatollah · 2003-01-29 08:46 · Score: 1

I am in favor of plain text as a form to achieve simple and wide compatibility, but we are also aware of the limitations of ASCII encoding at least in the non-english languages. I think a wider coding system could be fine, I'm not fond of UNICODE, but I may suggest EUC as a good alternative, I believe there have been revisions to EUC enough to cover most symbols in language use of modern days.

Either alternative can be broad enough meesa thinks.

They need a name change by SystematicPsycho · 2003-01-28 13:43 · Score: 1

Project Gutenberg just doesn't come across as something interesting or the first thing you think of when you think "Free electronic library". Even "WikiLibrary" would be better (although not a wiki).

--
Analytic & algebraic topology of locally Euclidean meterization of infinitely differentiable Riemmanian manifold

just scan and compress by Anonymous Coward · 2003-01-28 13:45 · Score: 5, Interesting

The best and cheapest way to get existing books on the web is to scan them and compress the images. Compression technology for text images is so good (see DjVu), and storage so cheap nowadays that you are better off just distributing high resolution scans.

This is a much more efficient way to make books available on the web, much more efficient than having volunteers painstakingly transcribe the text or correcting OCR mistakes.

OCR can be used for indexing scanned documents, but there is no need to do manual correction. DjVu can compress 300dpi black and white pages of text to 5-25KB. That's less than most HTML pages, and the images look just like the original book.

The Million Book Project at the Internet Archive uses DjVu (as well as other formats).

The open source implementation of DjVu is available on sourceforge

Re:just scan and compress by Anonymous Coward · 2003-01-28 14:40 · Score: 0

I have to agree. Scanned book images are the closest thing to the real paper you can get ( I read Akira for the first time this way ), and in themselves convey subtleties of the publishing process and authors intent that just cannot be conveyed in plain text.

What happens when we expand to paintings, graphic novels, manga, etc? I think it would be better to adopt images as our standard format.

B.
Re:just scan and compress by tealwarrior · 2003-01-28 14:53 · Score: 1

A similar technique has worked well for indexing accademic articles. The scan is stored as a PDF file and the OCR is used to index the underlying text. This greatly simplifies converting tables etc. The OCR is good enough for indexing and can always be redone/corrected in the future while providing a usfull product almost imediately with far less work then typing/speaking/what have you. Check out http://acl.ldc.upenn.edu/ (not all at once) for an example.

--
In theory, there is no difference between theory and practice, in practice there is.
Re:just scan and compress by Suppafly · 2003-01-28 16:49 · Score: 1

Except you can't search in a jpg and you can't use computer software to automatically cross reference and build clickable table of contents and all of that other stuff that have the text of the story actually save as text allows you to do.
Re:just scan and compress by Anonymous Coward · 2003-01-28 20:17 · Score: 0

You can't search a JPEG, but you can search a DjVu: you can run OCR and have the text embedded in the DjVu file. In fact, the on-line DjVu compression server does just that.

You could imagine having a clickable table of content. In fact, DjVu books that are done by hand have hyperlinked TOC (see DjVu Editions).
Re:just scan and compress by kinnell · 2003-01-28 22:58 · Score: 1

The whole point in using ascii is that it is a simple, standardised and widely used format. Will any kind of image compression technology still be around in 100 years? Probably not. Will ascii still be around in 100 years. Again, maybe not, but there is so much data stored in ascii, that there will almost certainly be tools around to translate it. There are already many problems with retreiving data stored in obsolete formats from even 20 ago, and a project of the type will be pointless if the volunteers over time have to spend as much time keeping the data up to date as entering new books.

--
If I seem short sighted, it is because I stand on the shoulders of midgets
Re:just scan and compress by jefu · 2003-01-29 01:47 · Score: 1

If all you want to do is look at the images, this is true.
But I've used Project Gutenberg texts as sources for English text for various purposes - and images just wouldn't do it.
No reason the two can't coexist - it would even be useful to keep both the image and textual form together.

bookwarez? by Punto · 2003-01-28 13:46 · Score: 1

I always got the impression that there are more titles available on the bookwarez scene than on project gutenberg.

I might be wrong, or maybe some books are more '1337' than others, but I got the impression that there definitively are enough people willing to get the texts to digital format.

--

--
Stay tuned for some shock and awe coming right up after this messages!

Just daydreaming here. by eniu!uine · 2003-01-28 13:46 · Score: 3, Insightful

As someone pointed out, the real problem is the copyright issue. Most works are copyrighted and copyrights last for way too long. The consitution states that copyright should be limited, but when it's lifetime plus 90 years, it may as well be unlimited since we'll all be dead before they expire. There needs to be a grassroots movement to inspire a repeal of some seriously damaging legislation. I feel confident that most slashdot readers agree about what needs to be done, but we seem too apathetic to actually do something about it. Sometimes I wish someone would post a link that says 'click here to vote for freedom'. If only it were that easy.

I think an interesting project would be public domain textbooks. Textbooks are grossly overpriced and contain information that is largely available for free. If a community of developers can create an OS like linux then the educational community should be able to come up with open textbooks.

--
My Blog

Re:Just daydreaming here. by _Chainsaw · 2003-01-28 15:33 · Score: 1

Actually I was thinking about this the other day as I was scanning some history books for Distributed Proofreaders.... history, for the most part, has _already been written_ it should be fairly easy to cull through Public Domain history books, select the chapters/sections that are well written and create an 'open' history textbook.
Re:Just daydreaming here. by evilmrhenry · 2003-01-28 15:41 · Score: 1

Public Domain textbooks?

http://www.nongnu.org/fcp/

Free Curriculum Project, textbooks free in GPL sense.

They don't actually have any books yet, just as a warning.
Re:Just daydreaming here. by rodgerd · 2003-01-28 15:56 · Score: 1

Nope. History evolves constantly. Interpretations change, and primary source material can be uncovered. Archaeological digs, you name it, all feed into our understanding. Sutton Hoo, for example, revolutionised our understanding of aspects of pre-Norman Britain, and it was dug up within the period commonly covered by infinitely extending copyright.
Re:Just daydreaming here. by circusnews · 2003-01-28 18:08 · Score: 1

Hey! We are already working on this! Ok, its essentually GPLed not PD (blame the lawyers), and its for circus arts not general education, but we have our first ready to be released this weekend!
Re:Just daydreaming here. by olethrosdc · 2003-01-29 03:20 · Score: 1

Well, this place does offer something in that respect

--
I miss my rubber keyboard.(Homepage)

The parent is "interesting"? by thac0 · 2003-01-28 13:48 · Score: 5, Informative

The article didn't say that OCR was faster than speech, it said that speech was faster than transcibing it.

Come on mod's, read more carefully.

--
poliglut.org: they're still alive and fighting the man

Re:The parent is "interesting"? by timeOday · 2003-01-28 14:45 · Score: 3, Insightful

So what? Rowing across the ocean is faster than swimming. Most of us still fly.
Sure, for the best scanning speed you have to cut the binding off and use a sheet feeder. But even scanning 2 pages at a time will be far faster than reading the whole thing out loud.
So what is your point?
Re:The parent is "interesting"? by nautical9 · 2003-01-28 14:58 · Score: 4, Informative

Depending on the typist, I can't see reading a book out loud as being any faster than transcribing it - especially considering that the speech recognition software is unlikely to do the proper punctuation, paragraph breaks, people & place names, and general capitalization, so proofing the results would take a considerable amount of time.
But as the GP said - a moot point since OCR'ing it and proofreading/fixing minor typos would be far quicker than either.
Re:The parent is "interesting"? by Anonymous Coward · 2003-01-28 19:51 · Score: 1, Interesting

In addition, very few people can read a book aloud at the speed a trained typist can type it, without making numerous mistakes. A skilled typist can transcribe a document they have never seen befor in excess of 120 words/minute One of my room-mates, during the mid-eighties, typed Stuart Mills Jr.'s "On Liberty" in a couple of hours. When I asked him about the speed, he said "Oh thats nothing. This book is well edited. Normaly when I do this I have to correct spelling, punctuation, and grammer." Of course he was a professional editor. BTW, the reason he was doining this was a project to distribute public domain texts on floppy along with a pretty cool turbo pascal software package for reading searching and indexing. Remember, this was the mid eighties!

Control of ones hands and the processing of visual information are closely linked in the brain. More closely then the eyes and the mouth.
Re:The parent is "interesting"? by Anonymous Coward · 2003-01-28 23:13 · Score: 0

One of the issues for project gutenberg is that they spend a lot of time proofing the work. Their goal is a perfect text. I am sure that you roommate was good at what he did, but their work is of a higher standard than we can expect from any single person and a few hours work.
Re:The parent is "interesting"? by ckaminski · 2003-01-29 03:48 · Score: 1

Does this mean perfect in that the final product will reflect the input perfectly, or perfect in that, say, the grammar, spelling and typo's in the classics of Edgar Rice Burroughs or Marx are corrected and eliminated?
Re:The parent is "interesting"? by Anonymous Coward · 2003-01-29 05:37 · Score: 0

Your logic isn't quite on target. Flying via the Concorde is faster than a normal plane, but most of us take the normal plane. Just because something works better doesn't make it superior: you need to factor in other things, especially price. An OCR device that could scan a book might costs tens of thousands of dollars, while basic speech recognition software, if it ever gets truly decent, could be and likely would be much cheaper. Doesn't mean OCR isn't the right answer, but the simple fact that it's faster doesn't mean it's what would be used.
Re:The parent is "interesting"? by Anonymous Coward · 2003-01-29 08:11 · Score: 0

Its a matter of artistry. When transcribing a work of art, the character needs to be preserved. My friend and I had discussions about just this topic. That means preserving grammar, punctuation, and spelling, if it makes sence. When transcribing an authors manuscript for publication, those mistakes are eliminated unless they are what the author intended. As for typos, that depends on wether the intent is to preserve the published work, or the authors writting. In the former, eps or tiff would probably be better then plain asci.
Re:The parent is "interesting"? by Black+Copter+Control · 2003-01-29 15:30 · Score: 1

In addition, very few people can read a book aloud at the speed a trained typist can type it,
Yeah, true -- but few people can type at the speed a trained typist can. I consider myself reasonably lucky that -- on a good day -- I can type fast enough to transcribe the spoken word.
That having been said, I agree that OCR seems to be the best (general) case for mass transcription. There is, BTW, a Gutenberg-associated project that allows people to help correct the mistakes that an OCR makes (and remove the extra bits like page numbers, etc.).

--
OS Software is like love: The best way to make it grow is to give it away.

Huh? by Tyler+Eaves · 2003-01-28 13:48 · Score: 2, Insightful

Huh? I can type a good bit faster than I can speak.

--
TODO: Something witty here...

Re:Huh? by Call+Me+Black+Cloud · 2003-01-28 14:12 · Score: 2, Interesting

No you can't, unless you're impaired in some way.

Average speaking rate (in English) is 100-180 wpm. The world's fastest typist hit 212 wpm on a Dvorak keyboard. See also this

I took a quickie online typing test, one pass, 60 seconds, and here's my score. I'm a decent typist (better when coding). What's your score?

Percentage Accuracy : 100%
Percentage Inaccuracy : 0.8333333333333334%
Characters per minute : 360 cpm
Characters per second : 6 cps
Words per minute : 67 wpm
Words per second : 1 wps
Total Speed status : Too Good
Overall Accuracy : Absolutely Spot on
Re:Huh? by be-fan · 2003-01-28 14:50 · Score: 1

Doh. The typing test would be better without the spelling and grammer mistakes in the text...

--
A deep unwavering belief is a sure sign you're missing something...
Re:Huh? by ColaMan · 2003-01-28 22:55 · Score: 1

Percentage Accuracy : 100%
Percentage Inaccuracy : 0.8333333333333334%

Maybe they should also add:
Probability of getting Accuracy+Inaccuracy to total 100% correctly : Low

--

You are in a twisty maze of processor lines, all alike.
There is a lot of hype here.
Re:Huh? by Anonymous Coward · 2003-01-29 03:26 · Score: 0

Percentage Accuracy : 81.9047619047619%
Percentage Inaccuracy : 18.095238095238095%
Characters per minute : 525 cpm
Characters per second : 8 cps
Words per minute : 96 wpm
Words per second : 1 wps
Total Speed status : Too Good
Overall Accuracy : Okay

*grin*

The accuracy issue is because a) They have spelling /grammar errors, and b) The text was American (favorite/favourite).
Re:Huh? by mcjulio · 2003-01-29 13:16 · Score: 1

Percentage Accuracy : 98.99799599198397%
Percentage Inaccuracy : 1.002004008016032%
Characters per minute : 499 cpm
Characters per second : 8 cps
Words per minute : 91 wpm
Words per second : 1 wps
Total Speed status : Too Good
Overall Accuracy : Brilliant
Re:Huh? by Anonymous Coward · 2003-01-29 17:07 · Score: 0

Damn...you know, with a little practice you could compete in the typolympics or whatever typists hold...that's pretty impressive.

What's more. . . by kfg · 2003-01-28 13:49 · Score: 5, Informative

it is part of the philosophy of Project Gutenburg to publish all of their works in the lowest level stardard format, thus insuring continued cross platform, program independant readability, ad infinitum.

That means *plain* ASCII. Plain ASCII means you could read it in edlin if you really had to.

This is a Good Thing.

This also means that if you wish to format any Project Gutenburg text, in HTML or TeX for publication, you start with a blank slate and can immediately start to work your own will upon the raw text.

This is also a Good Thing.

KFG

Re:What's more. . . by rodgerd · 2003-01-28 16:04 · Score: 1

That means *plain* ASCII. Plain ASCII means you could read it in edlin if you really had to.

This is a Good Thing.

No, it's a bad thing, because it renders Gutenberg near useless for anything other than English, and it cripples it for creating PDFs, TeX files for printing, and the like.

One can take SGML (which came into being around the time of the Project) and create plain text. One cannot take the plain text and create SGML.
Re:What's more. . . by Sgs-Cruz · 2003-01-28 17:00 · Score: 3, Informative

It keeps it quite good for almost all European languages, thank you. Wouldn't you consider it better than nothing? Or would you prefer that Project Gutenberg supported the Unicode standard that is mired in controversy because it doesn't support all 10 to the freaking 24th ancient Chinese ideographs.
I'd prefer that the books be transcribed now and maybe later we can add some foreign-language books once we figure out a standard that can satisfy the world. Besides, English (European languages, anyway) are the real languages of the Internet.

--
Karma: pi (Mostly due to circular reasoning in posts).
Re:What's more. . . by commodoresloat · 2003-01-28 17:06 · Score: 1

Who the hell is going to read Plato's Symposium in edlin?
Re:What's more. . . by dvdeug · 2003-01-28 18:15 · Score: 2, Informative

No, it's a bad thing, because it renders Gutenberg near useless for anything other than English,

Have you ever taken an actual look at Project Gutenberg? It uses whatever character set is necessary for the language in question; Unicode, CP1251, and ISO-8859-1 have all been used.

Of course, so has DOS CP850, which is darn near unreadable unless you're a CS geek, which is why PG prefers ASCII.
Re:What's more. . . by dvdeug · 2003-01-28 18:48 · Score: 1

It keeps it quite good for almost all European languages [evergreen.edu], thank you.

Did you read the page? "ASCII Extended Character Sets -- PC" are two of the various ways that ASCII was extended for non-English languages. Go down to the "ASCII Standard" link for the real ASCII standard, which only covers English.

Or would you prefer that Project Gutenberg supported the Unicode standard that is mired in controversy

Oh, yes, a standard that has been supported by offical acts of Japan, Chinese, Iran, India, US, the EU, and various states of the US, and is a crucial part of Windows NT, Mac OS X, KDE, Gnome, HTML, XML, Ada, Java, and C9x, but is disliked by one ranting academian who never bothered to update his page in response to corrections, that obviously means that standard is fatally mired in controversy.

Besides, English (European languages, anyway) are the real languages of the Internet.

Nice ethnocentrism. Of course, PG in many ways is not an Internet group; it only does books 80 years old. And even if you're only talking about English culture, many of the books printed 100 years had Greek, and many of the basis books for English culture were written in Greek, so PG has to go beyond ASCII for those.

(Which it does; PG only uses ASCII for English texts, and uses CP1251, Big5, Latin-1 and yes, Unicode where appropriate.)
Re:What's more. . . by Anonymous Coward · 2003-01-28 23:42 · Score: 0

If the asians want to make copies of their books, they are free to use whatever format that they want, but where pg started, we read english.

As for going from text to sgml; so what. What are we trying to maintain with the sgml? The project is to maintain the words.

Why don't you start a project goldfarb to rescue the pg texts from ascii. BTW, twenty years from now, when the tools to format your work are lost or forgotten, people will be will be returning your files to standard ascii so they can be archived.
Re:What's more. . . by Doug+Merritt · 2003-01-29 05:50 · Score: 4, Informative

No, it's a bad thing, because it renders Gutenberg near useless for anything other than English, and it cripples it for creating PDFs, TeX files for printing, and the like
Strangely enough, people have actually addressed this, notably with the Gutenmark program to convert Gutenberg text into nicely formatted documents in a variety of markup formats (including PDF and TeX, using postprocessing filters).
See GutenMark home
It never ceases to amaze me that, when people see something that only addresses 90% of their own problem, they call it useless, rather than doing a web search to see whether someone has addressed the remaining 10% of their problem.
Gutenberg is an amazingly important project; I urge everyone to support it.

--
Professional Wild-Eyed Visionary

Project Gutenberg? by bezza · 2003-01-28 13:49 · Score: 2, Funny

Let it go guys....

No 'project' is going to get Steve Guttenberg back in Police Academy.

It is time to move on...

--
WARNING: This sig does not contain a joke

Transcribing? by bravehamster · 2003-01-28 13:51 · Score: 4, Insightful

Hah, try transcribing "Huckleberry Finn", or any Dr. Seuss, or better yet, try "Feersum Endjinn" by Iain M. Banks. I'd love to see what a transcriber would do to that one. Given the amount of made-up words in literature, catching and correcting the mistakes a transcriber commits would make it less than useless.

--
---- El diablo esta en mis pantalones! Mire, mire!

I don't like reading online! by corvi42 · 2003-01-28 13:54 · Score: 4, Interesting

It seems paradoxical, but there it is. I spend a huge amount of time glued to the screen, reading articles, blogs, forums, FAQs, HOW-TOs, etc. But I don't like it, in fact I find it aggravating.

I am lured and lulled by the vast amount of easy information suitably tailored to my interests, all with an easy to use intuitive associational ( read hypertextual ) interface. But it is tiring, staring at a flickering glaring screen for hours, my eyes get dry, and I strain and get tired picking out fuzzy objects when I try to focus at distance. Its nasty and annoying.

Here is my point about this project. Nobody wants to read books on their computers. Well maybe some do, but I think the vast majority don't. Paper books are easily available and cheap. If you can't find the one you want in a local library or bookstore there are a multitude of ways of ordering them. You don't get tired looking at them, they are actually enjoyable. So why should there be a desire amongst the majority for e-books?

Don't get me wrong, I think its a good idea, but not one that I, nor I think the majority, will go in for until a better way is developped of presenting them. LCDs are an improvement, but they still are shabby. I don't think a project like this is going to see much public interest until some better presentation media is found. E-Paper will be needed before the E-book becomes a reality for most people. Some kind of little book-sized unit that you can hold and which will display on a matt - non-glaring, non-luminous surface.

--

There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin

Re:I don't like reading online! by gidds · 2003-01-28 14:15 · Score: 1

But we're not just talking about large CRTs or LCDs attached to desktop computers. For example: I do a lot of reading from the screen of my pocket computer (Psion 5mx). It has many advantages over dead trees, as well as some disadvantages: you can pack an entire library in your pocket, never lose bookmarks, convert from US to UK spelling, and search; but don't drop it in the bath! One of the unexpected advantages is that I can read in the dark (using the backlight), e.g. in bed. I don't find it any more tiring on the eyes than dead trees, and once I get into a story I'm only subliminally aware of the medium anyway.
And of course there are many other possibilities for reading ebooks, some of which won't make it to market for years.
Of course, ebooks aren't for everyone, and I doubt they'll ever entirely replace dead tree editions. But don't dismiss them just because they're not right for you now.

--
Ceterum censeo subscriptionem esse delendam.
Re:I don't like reading online! by Mochatsubo · 2003-01-28 14:19 · Score: 1

So you think we should wait until technology produces a more paper like reading experience before putting in the time and energy to get these texts into electronics form? Why not try thinking past today and spend a little time thinking about tomorrow. Electronic texts are important now and as technology improves, they will become more accessible in the future.
Re:I don't like reading online! by Gholam · 2003-01-28 14:46 · Score: 2, Insightful

The importance of having literature available in digital format extends far beyond just the ability to read it on your computer.

For a start, digital copies are easier and cheaper to store than paper-based documents. For older documents, keeping a digital reproduction may be the only way to ensure the continuing existence of the work.

The "plain ASCII" restriction on all the documents in PG is a boon for usability in areas other than screen-based reproduction. For instance, you can print the document in a variety of formats, or have it played to you as sound. Quoting and searching digital material is also significantly faster than with paper documents.

Reading documents on your computer may be the most obvious, but it's certainly not the only benefit of digital literature.

--
-- Matt Ryall
Re:I don't like reading online! by koreth · 2003-01-28 15:19 · Score: 1

But it is tiring, staring at a flickering glaring screen for hours, my eyes get dry, and I strain and get tired picking out fuzzy objects when I try to focus at distance.

If you can see your screen flickering (assuming you're using a CRT) you really owe it to yourself to try a higher refresh rate. I get horrible eyestrain when I have to use a 60Hz monitor for any length of time, but set the same monitor to 75Hz or higher and I can use it for hours with no ill effects.
Also make sure the brightness and contrast aren't set too high; excessive contrast can cause "blooming" (where the bright areas bleed into the dark ones) which makes text a lot more work to read.
Obviously the above shouldn't be taken to mean that some people just don't like reading from screens, even perfectly-calibrated ones, but a well-set-up screen is a lot more pleasant to look at than a bad one.
And don't get me started on how loud most computers are, not exactly ideal reading conditions...
Re:I don't like reading online! by evilmrhenry · 2003-01-28 15:46 · Score: 1

There is another purpose to having this available than just reading it on a screen.

Lets say you are a book publisher, and you're looking to make money. You can take this information, print it out, sell it cheap, and still make a
4) profit.

Take a look as some of the classics (if you have any) you have on your shelf. Some of them will undoubtedly have prices in the range of $0.50 or $1.00. These books would benefit from this.
Re:I don't like reading online! by rodgerd · 2003-01-28 15:59 · Score: 1

Paper books aren't readily available. My wife and I, between us, own a fairly large number of 19th century and earlier volumes that aren't in print any more. If you're interested in those volumes for some reason, an etext version is the only way to go.

And there's nothing to stop you printing Gutenberg texts out. If they weren't so impoverished (ASCII only), they'd even look real pretty (although I'm sure that people have Gutenberg to TeX programs, which would help).
Re:I don't like reading online! by jpt.d · 2003-01-28 16:19 · Score: 1

What would you think of an interactive medium that was like a shiny piece of paper? On 'andromeda' the tv show they have these things that are basically shiny pieces of paper that are like an interactive screen, hell you can play video on it :p

--
What we see depends on mainly what we look for. -- John Lubbock Now search for that bug slave!
Re:I don't like reading online! by corvi42 · 2003-01-28 16:31 · Score: 1

I use an Nvidia geforce2 on a samsung 955 dynaflat - I run 1280x1024 @ 85 Hz. I still get sore eyes.

--

There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin
Re:I don't like reading online! by Anonymous Coward · 2003-01-28 18:23 · Score: 0

When the movie came out (which I have not seen), I went and got "The Time Machine" from Project Gutenburg. Reading on the screen annoyed me as well, but I've got this magical device attached to my computer called a "printer" that solved that little issue. It did take 23 pages tho...
Re:I don't like reading online! by pm8e072 · 2003-01-28 19:21 · Score: 1

Yes, a typeset version of a Gutenberg etext makes for a nicer reading experience. It's a good complement to the plain ASCII. I did a few Austen and Dickens books last year; PDFs, TeX source, etc. can be found here:
http://www.pmonta.com/etext
Re:I don't like reading online! by DudeG · 2003-01-28 19:35 · Score: 1

You're right, ebooks won't replace paper books. But there are genuine advantages to them.
I store lots of books (40+) on my Palm; some of these are really huge texts (1000+ pages) in paper. But on a Palm it makes no difference at all.
And I can read a little at a time, whenever there's a spare moment; in a long queue at the supermarket, waiting for something to compile etc. It's like recycling all those otherwise wasted bits of time.
The other advantages are psychological. In normal books, I have a real problem with keeping my focus on the text I'm actually reading. As soon as I turn a page, my eyes flick down to the end of the next page, breaking my flow. Because of the small screen, I've got only a few sentences visible at any time so I can't inadvertently skip forward.
The other psychological benefit is a bit harder to explain. With a normal big paper book, it's quite a daunting feeling to see how little progress I've made after reading for quite some time. This can end up deterring me from reading the rest of the book. But on the Palm (and particularly with iSilo) there's a sense of progress through the chapter, not the whole book. So even a couple of minutes reading feels like it's made a dent.
There's a huge collection of ebooks available at Memoware, many of them converted by me! :-)
Re:I don't like reading online! by corvi42 · 2003-01-29 03:40 · Score: 1

Well I'd prefer them not to be "shiny" - I find that is part of the problem, but this is the sort of thing that would be great for reading books.

--

There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin
Re:I don't like reading online! by corvi42 · 2003-01-29 03:48 · Score: 1

Paper books aren't readily available.

No - books ARE relatively available. Specific books are not. In the cases where specific texts go out of print, that's an issue of market demand. If there was significant demand for these texts than they would still be in print. The fact that they are no longer in print indicates that there is not a significant demand for publishers to print them. In the case that you are looking for any rare / antique item, you have to accept it as a given that it is going to be difficult to find. I agree that this is a good reason for having etexts. But my arguments were about the role of etexts in the market as a whole, and why most there is no demand for them on the part of the majority. If there is no demand for these specialty books, there will be just as little demand for them in etext form.

--

There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin
Re:I don't like reading online! by corvi42 · 2003-01-29 03:50 · Score: 1

Supply & demand, simple as that. There is no demand for ebooks - so the supply is low. As soon as there is a greater demand, there will be more interest, and then they will appear. It will be a small effort for publishers to convert paper texts into etexts when there is money to be made in it. When finally the apetites of the market become attuned to these then there will be a supply with little or no delay. That's basic economics 101.

--

There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin
Re:I don't like reading online! by angle_slam · 2003-01-29 08:20 · Score: 1

I always thought e-books was a silly idea as well . . . Until I got a Sony Clie. Since I carry my Clie with me most of the time, if I have downtime, I can read books that I stored on the Clie. For example, I was shopping last weekend with my wife. She was taking longer than expected, and I was bored. So I just pulled out my Clie and started reading a novel. I have the complete Sherlock Holmes works with me everywhere I go. Try carrying the big print edition of the Complete Sherlock Holems with you.
Re:I don't like reading online! by Anonymous Coward · 2003-02-01 12:34 · Score: 0

There's a huge collection of ebooks available at Memoware [memoware.com], many of them converted by me!
I also have a large electronic library. Much of it is stuff I've worked on; proofreading, reformatting, fixing the typography, adding italics, &c &c. Trouble is, much of that is under copyright, so I can't just post it on a site like Memoware. (The vast majority is stuff that I already own in dead-tree form, so I don't feel too guilty.) Other than running a P2P app for long periods, is there any good way to share this work with others without putting myself at risk?

Online Interface.. by slashkitty · 2003-01-28 13:58 · Score: 2, Informative

While I like the project, I think the biggest problem is the interface to use the books. They end up in this crappy.txt format. The searching and browsing is slow and painful. If they just spent a little time on the website, they might get more support!

--
-- these are only opinions and they might not be mine.

Re:Online Interface.. by Squirrel+Killer · 2003-01-29 04:50 · Score: 1

While I like the project, I think the biggest problem is the interface to use the books. They end up in this crappy.txt format. The searching and browsing is slow and painful. If they just spent a little time on the website, they might get more support!
I don't mind the .txt format, since I can import it into virtually any other format I would want, but I would echo your comments on Project Gutenberg's web site interface. The site's insistence on constantly opening new windows is aggrivating enough, but then there's no obvious way to browse the collection. It's all very user hostile.
My personal beef with PG is the classification of multiple editions. There's four versions of most of Shakespeare's plays. Which one is the one I should use? Give me a little hint..."This is the one to use just to read.", "This one is the canonical version, straight from the Folio.", "This version is in playscript form." Give me a reason to use 1ws2610.txt instead of 2ws2610.txt instead of 1ws2611.txt instead of 0ws2610.txt. I know that "Project Gutenberg has avoided requests, demands, and pressures to create 'authoritative editions.'", but come on, I can deal with the archaic file names, but just give me a little help.
All that said, I still love PG and I think it's one of the more valuable resources on the internet. Keep up the good work guys!
-sk
Re:Online Interface.. by msouth · 2003-01-29 14:42 · Score: 1

If only all their work was in the public domain, you could make up such a front end yourself! :)

--
Liberty uber alles.

speech recog works by SHEENmaster · 2003-01-28 13:59 · Score: 2, Offtopic

for short command sets. Mac OS X has excellent speech recognition for example. What we are lacking is a way to differentiate a larger vocabulary.

I can see PG's next release now:
Welcome to the audiotape version of.......

--
You can't judge a book by the way it wears its hair.

Re:speech recog works by The+Notorious+ASP · 2003-01-28 15:54 · Score: 1

Speech recognition for commands is far different than speech recognition of this nature. It's not difficult to differentiate "print" from open, close, next, etc... The difficulty is in differentiating "two" from "too" and "to" or "smile" from "mile" or "small" (the examples go on and on...)
Re:speech recog works by einer · 2003-01-28 16:22 · Score: 4, Funny

Best title for any paper, book or article on the subject: How to wreck a nice beach.
Re:speech recog works by Anonymous Coward · 2003-01-28 23:05 · Score: 3, Funny

so, speech recognition works, it just can't recognise much speech.

Just that one feature left to add then.
Re:speech recog works by Paul+Lamere · 2003-01-29 03:02 · Score: 0, Flamebait

nor can it wreck a nice beach.
Re:speech recog works by Molt · 2003-01-29 07:35 · Score: 2, Funny

So, all we need to do now is get all the great works of literature converted into Mac OS X commands?

Gutenberg has a more difficult challenge than I'd anticipated.

--
404 Not Found: No such file or resource as '.sig'

In Search of the Perfect Library by drmofe · 2003-01-28 14:00 · Score: 5, Interesting

There seems to be an interesting recurring theme in human history - we constantly strive to build libraries but we have never yet built one that is quite "good enough".

The Great Library in Alexandria was a wonder of the ancient world until it got burned down as part of a domestic dispute between Mark Anthony and Cleopatra. I was amused to note that the local University recently received funding approval to rebuild it - grants committees move slowly.

In mediaeval times, monks were the guardians of knowledge and the various monasteries dotted around Europe were oases of learning and knowledge in those times. Knowledge was restricted to the few.

The original Gutenberg made it possible to create huge volumes (literally) of knowledge and disseminate it on a wide scale. Ever since, people in power have sought to control this technology - either through censorship, copyright, or even education (you have to be able to read before a book is of greatest use to you.)

In Victorian England, the mark of a scholarly gentleman was in the breadth of works he maintained in his private library.

Perhaps a new initiative might be Gutenberg@Home whereby any reader made an electronic copy of physical works by some convenient, nondestructive means. By keeping such a personal library private, one would not have to worry about copyright laws, even as currently framed.

How much of what is holding us back from building the perfect library simply our insistence on monetary-related restrictions? How long will it take us to realize that lengthy (in time) and complex or intensive (in resources consumed) PHYSICAL processes are the only ones to which we need to attach a value. Whatever happens inthe electronic world should be free and that the collation, assembly, verification, dissemination and application of the sum of human knowledge is one of the most important things that we could achieve?

STF

Re:In Search of the Perfect Library by Allen+Varney · 2003-01-28 16:49 · Score: 2, Informative

The Great Library in Alexandria was a wonder of the ancient world until it got burned down as part of a domestic dispute between Mark Antony and Cleopatra.

Uh... what? For centuries people have blamed the burning of Alexandria's Great Library on the Romans, the Christians, or the Muslims, depending on which ones they disliked. But Mark Antony? Cleopatra? That's a new one. Maybe you're thinking of Julius Caesar, who gets the blame according to this fellow (a self-proclaimed Christian apologist).

Books online are not as good as books on paper .. by staaktdenarbeid · 2003-01-28 14:00 · Score: 2, Interesting

Storing books online is one thing. Gutenberg also needs readers to be successful. How many readers are willing to read .txt or .pdf files instead of printed material ? Several times I downloaded Gutenberg books, with the intention to read them from laptop or screen lateron. Turns out this is too inconvenient, when compared to paper print.
If only electronic paper would be at 1c a page ...

More than words by sammyo · 2003-01-28 14:01 · Score: 1

Look in the back of a good book, the credits for the font, the mark up design, the basic look. Formating an ascii book to something that is a pleasure to read is a lot of work. A book is more than just words. But Go Gut! We love ya.

Yes, but they don't... by dachshund · 2003-01-28 14:03 · Score: 1

So the point of this post is: why not ask publishers for the material? If it's already public domain, it's not like they'll lose profits, and maybe Project Gutenberg could let them put a little

Yup. Except that the vast majority of publishers won't give out their digital masters, even if the work in question is public domain. The formatting and page layout cost them money, and they (rightly or wrongly) feel that such a release would undercut their sales.

And even if you could get hold of the digital representation, it'd very likely be copyrighted as a "derivative work" (due to the layout info, page numbers, and even spelling corrections).

#bookz --- bookwarez anyone? by Slashdotess · 2003-01-28 14:04 · Score: 1, Informative

#bookz on irc.undernet is an excellent place for ebooks, of course, with a little illegality behind it. Many of these are the same one's that have been floating around on alt.binaries.ebooks since the stone ages, but I think this unrestricted database is probably the best library created.

And not going anywhere soon.... by Captain+Beefheart · 2003-01-28 14:06 · Score: 4, Interesting

Floppy disks get magnetized, hard drives crash, optical disks get scratched...A book can take a beating, man. All the OCR and voice rec in the world won't change this until we can get widespread, cheap cartridged optical media.

I think this take on media longevity also prevents progress WRT Project Gutenberg. Too many people don't see the point, when they can have the Library of Congress backed up on disk one day but be looking at a screen full of garbage characters the next because someone accidentally yanked the power supply on the server or whathaveyou.

A single $5 paperback book can be propagated more reliably than tens of thousands of dollars worth of networks and storage, although the latter system can admittedly hold a whole library's worth of that single book. But think about the infrastructure required to maintain the latter system. Until we have better media, the costs aren't justifiable, IMHO. It's an idea whose time has not yet come.

Re:And not going anywhere soon.... by dvdeug · 2003-01-28 15:32 · Score: 5, Insightful

Floppy disks get magnetized, hard drives crash, optical disks get scratched...A book can take a beating, man. All the OCR and voice rec in the world won't change this until we can get widespread, cheap cartridged optical media.

One small book also takes up the space of a hard drive, and can't be redownloaded, or backed up. If my roof leaks, or I have a fire, it will cost me thousands of dollars to replace my books, and some will be hard to impossible to replace. If my hard drive crashes, I redownload the files from Gutenberg, and/or restore them from my backups.

Technological Breakthrough ( funny ) by corvi42 · 2003-01-28 14:07 · Score: 2, Funny

From the bookmarks that a local bookstore ( bookcity ) gives out with purchases:

Introducing the new Bio-Optic Organized Knowledge device - BOOK.
BOOK is a revolutionary breakthrough in technology; no wires, no electric circuits, no batteries, nothing to be connected or switched on. It's easy to use. Even a child can operate it.
Compact and portable, it can be used anywhere - even sitting in an armchair by the fire - yet is powerful enough to hold as much as a CD-ROM.
[...]
BOOK never crashes nor requires rebooting. The 'browse' function allows instant movement to any sheet, forward or back, as one wishes. Many come with an 'index' feature, which pinpoints the exact location of any selected information for instant retrieval.
Portable, durable and affordable, BOOK is being hailed as a precursor of a new entertainment wave. BOOK's appeal seems so certain that thousands of content creators have committed to the platform and investors are reportadly flocking to the medium.

--

There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin

Re:Technological Breakthrough ( funny ) by Anonymous Coward · 2003-01-28 14:45 · Score: 0

I believe this is stolen from Isaac Asimovs "Report from $FUTURE_DATE" ( Can't remember the FUTURE_DATE used... ) which talks in similar terms about this amazing portable storage device.

Stupid article. Project Gutenberg doing great. by ChaosDiscord · 2003-01-28 14:08 · Score: 5, Insightful

Thus Project Gutenberg has inched ahead at a snail's pace. In its 32nd year of existence, the collection has only 6,267 etexts.

I prefer to phrase it, "Thus Project Gutenberg has raced ahead at an amazing rate. In its 32nd year in existence, the collection has 6,267 etexts, averaging almost 200 etexts per year. That works out to about one book every other day. This is more impressive given that in the first twenty years of the projects existance the Internet didn't exist anywhere near the form we take it for granted today. The popularization of the Internet has just accelerated the rate the Project Gutenberg grows. With the help of Distributed Proofreaders, a project that allows average people to donate small amounts of time to proofread just one page at a time, Project Gutenberg can expect to add over 400 etexts per year. Clearly Project Gutenberg is thriving."

--
Search 2010 Gen Con events

No. Boycott Dr. Seuss. by yerricde · 2003-01-28 14:10 · Score: 2, Informative

Hah, try transcribing "Huckleberry Finn", or any Dr. Seuss

No. Boycott Dr. Seuss. His estate submitted an amicus brief in favor of the Bono Act. Now that Project Gutenberg uses distributed proofreading, the Bono Act is the biggest barrier to the growth of PG.

--
Will I retire or break 10K?

If one demands that the library be born. . . by kfg · 2003-01-28 14:13 · Score: 4, Insightful

full grown, like Athena springing from the head of Zeus, this criticism is largely valid.

Patience, however, is a virtue. Libraries of public domain works *grow.* Every work added remains. Although it may take many years, even generations, as did the construction of the Giza plaza, over time The pyramid grows toward its apex, another pyramid joins it, a temple is added to the side, and so on.

That's part of the point of Project Gutenburg. Not just to provide an online library but to do so in an immutable manner that only grows over time.

Adding only *one page* to the project is valuable, and that addition remains and is added to by others.

Even brick and mortar libraries can take generations to build. A two hundred year plan only requires patience to complete.

That said, I'm going to take an even more contrarian point of view to the Wired article. The amazing thing I find about Project Gutenburg is how much is already in there. It's already at the point that I think few people could manage to read one half of the texts available in their lifetime, and finding a project to donate is complicated by the fact that the hardest part may not be performing the labor, but simply finding a project that interests you that *hasn't already been done.*

It's already a remarkable collection, and I've had to, on occasion, resort to it because my local library didn't have a lending copy of the work I wanted, but Project Gutenburg could give me free ownership of it.

KFG

Re:If one demands that the library be born. . . by Anonymous Coward · 2003-01-30 06:56 · Score: 0

>>If one demands that the library be born ull grown, like Athena springing from the head of Zeus, this criticism is largely valid.

Well, yes. I do. I'm greedy. And impatient. :-)

Brad DeLong

Gutenberg is fine by Anonymous Coward · 2003-01-28 14:16 · Score: 0

Wired is just looking for content. Gutenberg is alot better than nothing, and having 2627 texts in one easy-to-download place is good by me. Books like "A Young Girl's Diary" are fascinating, and very likely it is one of many texts in their resource which would be very difficult to find otherwise, or which would be completely forgotten.

Sure, who needs searchable text... by DeHar · 2003-01-28 14:27 · Score: 2, Insightful

Scanned documents might be fine for readers, but what if you're looking for "oh, you know, that one line in the book, where the dude was talking about melons."

A computer is NOT a glowing piece of paper with scrollbars.

Re:Sure, who needs searchable text... by trb · 2003-01-28 18:05 · Score: 1

Scanned documents might be fine for readers, but what if you're looking for "oh, you know, that one line in the book, where the dude was talking about melons."
Use the force.

That's part of what DP does by smiff · 2003-01-28 14:29 · Score: 5, Informative

Why not modify that in such a way as to have avaliable a scanned image of a single page of the book, along with an empty box to enter text?

That's basically what Distributed Proofers does. Except they OCR the book first, so the proofreaders just need to fix the OCR errors. Every page goes through two passes. Then the entire book goes into post-processing where a single person puts all the pages together, and checks for problems that the proofers didn't know how to solve (marked with an astrisk). Once Distributed Proofers finishes the book, they pass it on to Project Gutenberg where somebody reviews the whole text again.

Distributed Proofers currently has a problem. After the previous Slashdot announcement, they were overwhelmed with volunteers. The volunteers processed books so fast, they were running out of material to work on. Three or four people scan in most of the books. They have been slaving away trying to keep up with the proofers.

Distributed Proofers is also working on a standard to mark up the books to better preserve tables, illustrations, bold text, math, etc. I suspect that effort is being slowed due to the priority of keeping material on the site.

Re:That's part of what DP does by kalidasa · 2003-01-29 00:44 · Score: 3, Informative

Distributed Proofers is also working on a standard to mark up the books to better preserve tables, illustrations, bold text, math, etc. I suspect that effort is being slowed due to the priority of keeping material on the site.

Three Little Letters:

T E I

TEI is to literature as DocBook is to documentation.
Re:That's part of what DP does by smiff · 2003-01-29 04:42 · Score: 1

TEI is to literature as DocBook is to documentation.
I haven't been following it recently, but I believe DP is basing their markup on TEI. They don't want to use TEI itself because it is big and complicated. DP would prefer a simpler markup that is easy for volunteers to learn.
Re:That's part of what DP does by kalidasa · 2003-01-29 05:50 · Score: 1

I haven't been following it recently, but I believe DP is basing their markup on TEI. They don't want to use TEI itself because it is big and complicated. DP would prefer a simpler markup that is easy for volunteers to learn.

Good call. My understanding is that's what most projects do: use a subset of TEI (for instance, TEI Lite, or even so-called Bare Bones TEI, which while it is not ideal for a scholarly edition, is better than nothing.)
Re:That's part of what DP does by Anonymous Coward · 2003-01-29 09:08 · Score: 0

I can understand the need for a good markup language for this and other projects.

What I don't understand is why they are used in lieu of, instead of in addition to, the scans.

Offer the original scans in an accepted, compressed format. Maybe it's a bandwidth issue?

The Wired article misses the point by Anonymous Coward · 2003-01-28 15:06 · Score: 2, Insightful

The author makes a good observation, but misses the point afterwards. The Web is curiously devoid of primary subject matter. There are book reviews, but few books; movie reviews, but not the movies; music commentary but little music. It's a web of opinion, not knowledge.

But the problem isn't volunteers, it's litigation. Copyright law, DMCA, etc. The sources aren't there because the greedy owners won't allow them to be put there. The ebook-list over the last week has been publishing notes from various authors (real authors, not corporations like Disney) that read, "You'll get my copyrights when you pry them from my cold dead hands (and even then I'd like to leave them to my children!)."

If Project Gutenberg could publish modern texts, there would be an explosion of interest and activity, and a more or less immediate on-line library. But since it can only digitize books written before 1923, more or less, there's mainly interest from historians, English majors, and True Believers.

Downside to that method: by Anonymous Coward · 2003-01-28 15:07 · Score: 4, Insightful

I and probably many others here, like to read Project Gutenberg books on my Palm/Pocket PC. Whenever I have a little down time I can get that out and choose from a dozen "classic" books to read. Can't do that when the "book" is a 800x600 image, and your screen can only do 320x320 (Sony Clies, Palm Tungsten), 320x240 (PocketPCs, Handera), or 160x160 (almost all Palm and Handspring PDAs).

Plain text, HTML, or XML are much more portable than compressed images. Which is at least partly why Gutenberg uses plain ASCII text; it's readable on literally anything with an alphanumeric display, and by all signs will be for decades, if not centuries or millenia. Good luck finding a GIF or BMP in 100 years, let alone formats nobody's even heard of. I have plenty of pictures I made only a few years ago on an Apple II that can't be read by anything, even when I get it off the 5.25" floppies. Yet I've read code and other things written on computers from the 70s and 80s. ASCII Just Doesn't Die.

Re:Downside to that method: by Anonymous Coward · 2003-01-28 16:09 · Score: 0

Well, the DjVu viewer has been ported to the Sharp Zaurus.
Also, some people have proposed to reformat image-based text for narrow screens. It's easy enough to find all the lines and words in a document image and reformat the document into shorter lines.

I agree that ASCII is the most future-safe of formats, but DjVu being open sourced, it's pretty safe too.
Re:Downside to that method: by dvdeug · 2003-01-28 19:04 · Score: 1

ASCII Just Doesn't Die.

Interestingly enough, one of the problems with PG is files in odd charsets. The PG copy of the Swedish bible is basically in CP437 (that gets the accents right), but I have no idea what character set the quotes are in.

OCR might have a problem... by salimma · 2003-01-28 15:07 · Score: 2, Interesting

.. when the font used is different from fonts it is programmed to recognise. I tried scanning a 40-year-old book - a drama script written in Indonesian - and the combination of unusual font *and* unrecognised language was enough to make the OCR software's output 50% rubbish.

Hmm, imagine scanning a 500-year-old book hand-written in Cyrillic... forgetting for one second the damage that scanning might do to the book in the first place.

--
Michel
Fedora Project Contribut

Re:OCR might have a problem... by dvdeug · 2003-01-28 18:58 · Score: 1

I tried scanning a 40-year-old book - a drama script written in Indonesian - and the combination of unusual font *and* unrecognised language was enough to make the OCR software's output 50% rubbish.

ABBYY FineReader - which is a very popular OCR program in Project Gutenberg circles - will let you train the program for that font. It's not a pancea, but you can usually get decent text out of it. It also supports Indonesian - it supports just about every modern language written in Latin, Cyrillic, Greek, Georgian or Armenian.

imagine scanning a 500-year-old book hand-written in Cyrillic...

Cyrillic's no big deal; in theory, it's no harder than Latin, and ABBYY is produced by Russians, so it should handle it well enough. 500-year-old is a little worse, but not unsolvable; I've OCR'ed copies of books that were nearly that old. The handwritten part is going to be the killer, though.
Re:OCR might have a problem... by salimma · 2003-01-29 00:28 · Score: 1

ABBYY FineReader - which is a very popular OCR program in Project Gutenberg circles - will let you train the program for that font

Nice. Will check that out - scanner's on the PC back home and it's my sister that mostly use it; I can't remember what software came with it to be honest - probably some cheap-ish lite version since it was an old consumer-model parallel port version that has been discontinued and the credit card company gave out as 'rewards' for loyal customers :p

--
Michel
Fedora Project Contribut
Re:OCR might have a problem... by fataugie · 2003-01-29 01:43 · Score: 1

Not trying to be a smart ass (this time), but how do you scan something that old if it has a binding? I have a flatbed scanner, and I hate scanning books because I almost always end up breaking the binding.

I can understand flat sheets, but if it has a binding I am stumped how you could scan without damaging. Hand scanner maybe? I thought those were out of style 5 yrs ago. I had a black and white one that was OK, but man did you have to have a steady pull or else it would fuck up big time. I wasted more time with that freakin thing...

--

WTF? Over?

Agreed! Enough with the WiReD articles! by Anonymous Coward · 2003-01-28 15:08 · Score: 0

You can get a subscription for $12.

PLEASE, someone find something more interesting to submit!

All Hail The Text! by Jason+Scott · 2003-01-28 15:18 · Score: 2, Informative

Well, until it's free, there's always textfiles.com.

Actually, a while ago I copied a lot of the Project Gutenberg library, along with some others, and created etext.textfiles.com.

In my experience, the reason a lot of people don't donate free time to transcription or other similar drudge work is because a lot of sites that encourage it steal it. Witness CDDB, and just wait to see how long before you pay for IMDB.

I tend to differ... by joto · 2003-01-28 15:20 · Score: 2, Interesting

I think Gutenberg is very much there...

Have you ever looked at the amount of material in Gutenberg's archives? When it comes to books and material written in english, that is in the public domain, I have to say, that Gutenberg offers almost everything of interest already.

The reason the Gutenberg project isn't hugely succesful is not the lack of text. Part of it might be the lack of formatting. Nobody want's to read 600 pages of a classic work on a computer screen in ASCII. Some may be masochistic enough to do it if it was in HTML. Personally, I still prefer it in book-form.

But even if it was properly formatted in several formats (including .pdf's in several sizes), it still is a lot of work to print it out, find a decent way to keep it together (no, ring binders isn't very appropriate for something you are going to read).

The main reason Gutenberg isn't succesful is because it is not what people want. People don't want to read or print out old literature in the public domain. They either want a nice edition that looks good on the shelf, or a cheap paperback to carry around with them . And most likely they aren't particulary into really old books (with a possible few exceptions which the Gutenberg project long since have covered).

It's not like the work the Gutenberg people are doing isn't important, or isn't of good enough quality or anything else. The simple reasons it's not heavily succesfull is because very few people are really interested. I'm sure much of the work the Gutenberg people have done will become important as soon as on-demand printing is more common and affordable.

Re:Books online are not as good as books on paper by Anonymous Coward · 2003-01-28 15:27 · Score: 0

I am.
I have ready many of the PG texts on my Palm III using the Weasel reader. Easy to carry. Something to do anywhere. Waiting for an appointment or at an airport. I have gained enormously from their work.
I may have never read anything by Zane Grey without them.
Burroughs on the go.

Why doesn't anyone do it? by blair1q · 2003-01-28 15:31 · Score: 1

There's no money in it.

If there was, someone would do it.

But there isn't, so hardly anyone tries.

Get it?

Problems with speech recognition by mactari · 2003-01-28 15:31 · Score: 2, Interesting

Though it doesn't go into technology much, I expect there's a lot of potential in mass OCR tech and good speech recognition (faster to read a book aloud than to transcribe it correctly).

Was thinking about voice recognition today while lamenting that I haven't done more to type in my copy of The Queen's Necklace by Alexandre Dumas, copyright 1910.

Here are two problems that came to mindn why I probably won't be able to use voice recog soon:
1.) Works who have been lucky enough to actually have their copyright lapse are often pretty old works. Their English (let's use English just b/c it's the lang I'm using) isn't exactly today's English, and sometimes even spellings, etc, change. Try reading anything from the 1800s and before.

2.) Names (so any protracted dialog) and other tough-to-translate stuff is going to be a pain to proofread. My book in particular has quite a bit of French in it (lots of "Parbleu" and French names with crazy accents all over the place).

I'd like to say voice recog could produce a "new version" with "updated spellings", but I just don't think that'd fly.

So once voice recog is commonplace for, say, office use (still quite a ways off) and affordable (not sure there, but I haven't heard of a friend using it yet, even just to play) we'll still have a ways to go before we can get true literature into PG simply by reading.

As an aside, at the same time I've been thinking about simply taping me reading the book and donating *that* via mp3 (or Ogg or whatever the heck). For the time being anyone who wants can listen in the car, and as soon as voice recog is up to snuff, voila. Just run it on my recording, proofread (easier said than done), and you're ready to go!

--

It's all 0s and 1s. Or it's not.

Re:Problems with speech recognition by dvdeug · 2003-01-28 18:32 · Score: 1

I haven't done more to type in my copy of The Queen's Necklace by Alexandre Dumas, copyright 1910.

Scan it in, and send it to me, and I can OCR it and send it through Distributed Proofreaders. That's the quickest and easiest way, and we tend to produce better copy then typing it in.

I've been thinking about simply taping me reading the book and donating *that* via mp3

I'm sure we'd prefer the text version first, but PG takes horrible computer audio versions, so I'm sure we'd be happy to take a decent human-read version.
Re:Problems with speech recognition by fgb · 2003-01-29 01:24 · Score: 1

I recently discovered audiobooks. I hardly ever listen to music in my car anymore. It seems to me that in addition to the textual form, it would be great to have a lot of these works available in audio form too.

Lousy Frivolous Patents by ediron2 · 2003-01-28 15:37 · Score: 0, Offtopic

Sorry to hear the project's in trouble. Man, it sucks that big companies keep enforcing these frivolous patents.

If it helps in proving prior art, some guy invented something similar about 500 years ago, but I can't remember his name...

Re:Lousy Frivolous Patents by Anonymous Coward · 2003-01-28 15:45 · Score: 0

projects not in trouble
Re:Lousy Frivolous Patents by ediron2 · 2003-01-28 16:42 · Score: 1

Sorry to hear the project's in trouble. Man, it sucks that big companies keep enforcing these frivolous patents.
If it helps in proving prior art, some guy invented something similar about 500 years ago, but I can't remember his name...
projects not in trouble

Man, I was going for obscure joke, but not so obscure that I wanted to screw my karma and get corrected... Maybe I should have gone with the other joke that came to mind: Feature creep is killing the project. Who needs a printing press with OCR and voice recognition? Disdainfully, -- advaitavedanta

I read lots of stuff off there! by neurostar · 2003-01-28 15:39 · Score: 4, Interesting

...things that people want to read are copyrighted, and won't be availble until long after we're dead.

Actually I've found the most value from the project is downloading and reading classics. I've downloaded works by people such as: Adam Smith, Nietzsche, Aristotle, Plato, Karl Marx, Oscar Wilde, Thomas More, and various other classic writers. I've found this resource indispensable. It provides high quality texts for free. I probably wouldn't read many works by these authors if I had to purchase them. I unfortunately, don't have the money to spend on many small works such as these (they're short, but sometimes cost $10-15). I also don't have easy access to a library and I like keeping a copy for my own personal use.

So I find that Project Gutenberg is a very useful resource.

neurostar

Re:I read lots of stuff off there! by Anonymous Coward · 2003-01-28 17:00 · Score: 1, Interesting

Agreed. I find Project Gutenberg very useful. Right now I'm "reading" Boswell's "Life of Johnson." I put "reading" in quotations because I convert my etexts with TextAloud MP3 and ATT Natural Voices sound fonts to listen to them in my car. I find it humorous that the article mentions that there are "only" 6000+ books transcribed on Project Gutenberg. I doubt I'll live long enough to listen to all 6000.

I should also say that I'm one of those who contributed labor to the effort. It took a few weeks, but with a program like ABBY Finereader, it's actually not too hard. the problem isn't scanning, it's the proofreading.

The one thing I don't get is how come no big time philanthropists have hooked onto this idea. I mean, free knowledge to the masses? It's a no-brainer.

Re:Stupid article. Project Gutenberg doing great. by Anonymous Coward · 2003-01-28 15:39 · Score: 0

I was about to say it but you said it better..MOD IT UP FOLKS

and yes, the article sucks...

bookwarez by majcher · 2003-01-28 15:45 · Score: 2, Interesting

I love Project Gutenberg, and I've used and supported it since the pre-web days. However, I don't think they go far enough.

There are plenty of places on the net that one can find and download copyrighted works. Web sites, mail servers, IRC networks, and so on. I've used them extensively, myself. Many of the books I've downloaded, I own, and I got the electronic format for searching, reading on pocket devices, and so on. I think that this is fair - I've paid for the information once, and my sense of Fair Use tells me that it's okay to get this bits in this way.

I've also downloaded many, many books that I do not, nor will ever, own. (Some of these, I will probably never read.) Is this a copyright violation? Almost definitely. Is it ethically wrong? I don't think so. I would probably never buy a new copy of these works. If I hadn't downloaded them, I would have borrowed them from a friend, or a library, or bought a used copy, and sold it back later. None of these legal methods would have earned the author or publishers a cent. So, how are they different from downloading an electronic version? In my eyes, they are not.

I buy plenty of books - hundreds or dollars worth every year. I love to read. I support local authors, and independent publishers. I do not think my actions are criminal. If someone disagrees, tough. You won't stop me, or the legions of other electronic book traders. Ever. Sorry. If it helps, think of us as the "books" in Fahrenheit 451, keeping a distributed library available for public use, in the event that something terrible should happen someday. Eventually, one way or the other, copyright will go away, and the words will be truly free again.

(And anyway, I was just joking. I'd never knowingly violate copyright law. What am I, stupid?)

Project Gutenberg by Anonymous Coward · 2003-01-28 15:52 · Score: 0

is going great and my thanks to those involved.

Re:Sure, who needs searchable text...We Do! by Anonymous Coward · 2003-01-28 15:52 · Score: 0

Well as I pointed out DjVu is a good format, and it is indexable. I recommend looking at the examples posted on the site. There are browser plugins for all the platforms. I've even recommended it to my local library.

Comment removed by account_deleted · 2003-01-28 15:55 · Score: 3, Insightful

Comment removed based on user account deletion

There are constraints like copyright too ! by phanki · 2003-01-28 16:07 · Score: 1

Even though the project did not take as expected, one has to realise that there are copyright problems that are prevalent. The laws are different in different countries and the enforcement is equally varied. So in this setting, I think that Gutenberg has done a decent job. Only if there were less strict copyright laws, may be people would be interested to convert data.

Of course it's not there yet by Savatte · 2003-01-28 16:16 · Score: 2, Funny

Rescuing Steve Gutenberg's career requires more than just planning. hell, a generous donation from bill gates probably couldnt even do it.

Why does this always happen? by vizualizr · 2003-01-28 16:17 · Score: 1

Here's a prediction. My next issue of WIRED will be filled with interesting articles. I'll read the whole thing, then two weeks later, half the stories in the magazine will be submitted as /. stories.

Some month, I'm gonna go through story by story and submit the whole damn magazine the day I get it.

--
anything i tell you will cloud your opinion.

Re:OCR & 500 year old Cyrillic by No+Such+Agency · 2003-01-28 16:17 · Score: 1

On the other hand, if you need to input a 500-year old work in Cyrillic, it just might be worth doing it by hand, or hiring a Russian typist to input it if you have a bunch of hot dates that week or something. After all, this hypothetical Cyrillic book must be pretty important, huh?

--
Freedom: "I won't!"

Whoops by corvi42 · 2003-01-28 16:29 · Score: 1

Whoops my mistake - that link is to the wrong bookcity - doh! I meant bookcity in Toronto, Canada.

--

There are a thousand forms of subversion, but few can equal the convenience and immediacy of a cream pie -Noel Godin

Distributed Proofreading by Suppafly · 2003-01-28 16:40 · Score: 1

This distributed proofreading group looks like they might have the answer for helping PG get closer to being 'there'. Having people proofread one page at a time comparing the ocr'd text to the original scan is an excellent idea for speeding up the proofreading process as well as improving the quality.

More about Gutenberg copyright restrictions by Allen+Varney · 2003-01-28 17:05 · Score: 1

My wife heard about Project Gutenberg a couple of years ago and thought of OCRing and editing an English translation of Machiavelli's 1518 Italian play La Mandragola. She briefly corresponded with PG Executive Director Michael Hart, who was extremely kind and helpful. Had that been all there was to getting involved, she certainly would have put in the weekend or less of work the project required. But to avoid copyright issues with a translation that might not be public domain, Hart asked my wife to snail-mail a photocopy of the title page or copyright page of her chosen translation, so that PG could legally verify the work's availability.

Fair enough. But we were flakes, the library was waaay downtown, her work deadlines loomed.... She let the idea fade. I wonder how many other volunteers lose interest in the same way? By the way, Gutenberg still doesn't show a text of Mandragola.

Re:More about Gutenberg copyright restrictions by dvdeug · 2003-01-28 18:39 · Score: 1

Hart asked my wife to snail-mail a photocopy of the title page or copyright page of her chosen translation, so that PG could legally verify the work's availability.

Now you can email scans.
Re:More about Gutenberg copyright restrictions by clonebarkins · 2003-01-29 01:55 · Score: 1

By the way, Gutenberg still doesn't show a text of Mandragola.

Contact somebody at Distributed Proofreaders. I'm sure they would be happy to help. This sounds like a great work that I'm sure people would love!

--

"The evil of the world is made possible by nothing but the sanction you give it." -- Ayn Rand

Tax funded... by silverhalide · 2003-01-28 17:09 · Score: 3, Interesting

Why isn't a project like this tax funded? It would be trivial for Congress to put aside a million or two to pay some schlubs to sit around doing data entry all day. Heck, create a department to do it. Almody all brick 'n mortar libraries are tax funded, so why shouldn't a public electronic library be tax funded? You could (theoertically) crank up production of the conversions to save even more rare works, on top of the fact that ideally the project could work directly with major libraries around the USA, or even the world. Of course, realistically such a project would turn into some buereuacracy that gets barely more done than the volunteer version, but it would at least look like someone cares.

Really, information is the most important thing humanity has, and the people literally "Saving" the world are doing it on their free time.

Re:Tax funded... by fgb · 2003-01-29 01:15 · Score: 2, Insightful

I think it's better that the work is done by people who really care about the project rather than some poorly paid "schlubs" who couldn't care less. The transcriptions are going to be much more accurate.
Re:Tax funded... by Joey7F · 2003-01-29 15:48 · Score: 1

How about users fund it? The government doesn't have to spend your money on your behalf. If you want money to go to PG, give them a donation.

--Joey

Distributed Proofreaders by Amata · 2003-01-28 17:34 · Score: 5, Informative

I just found this site a few days ago. Essentially, volunteers can proofread one page at a time, so that huge time commitments of doing an entire book yourself are not required. Worth checking out.

http://texts01.archive.org/dp/

Re:Distributed Proofreaders by Benetech · 2003-01-29 22:51 · Score: 1

We've had a lot of success at www.bookshare.org with distributed scanning and proofreading with volunteers. We're adding a couple hundred books a month. Our major goal is a very large library of the books people want to read. However, due to copyright law, we can only do this for people with qualifying disabilities in the U.S. when we're looking at books still in copyright.

What's wrong with Wired Magazine... by raytracer · 2003-01-28 17:41 · Score: 4, Interesting

They obviously publish articles written by people with their head up their asses.

Honestly, just what is Mr. J. Bradford DeLong thinking? To characterize Project Gutenberg as a failure is just imbecilic. From PG's own pages, 203 ebooks were released in October 2002. 1975 new books in 2002 (1240 in 2001). It's a lot of work to produce even one book, and PG is churning them out at a pretty good clip for an entirely volunteer effort.

Even as it is, I've found PG to be pretty damned useful. It's kind of nice to be able to grep the collected works of Shakespeare. Or Darwin. Or Conan Doyle. Or H. G. Wells. Or Jules Verne. Or Charles Dickens. Or Frank. L. Baum.

Despite advances in technology, scanning, OCRing and proofreading books remains a very labor intensive process, and it is a boring, often thankless process as well. The Million Book project wants to take a somewhat different approach to providing digital books: they actually scan the books and store them in DJVU format (a very nice format similar to PDF). They can do OCR on it to provide searchable text, but such text doesn't have to be 100% accurate to be effective. Most of the time you print and read the original scans. After all, some publisher went to the trouble of carefully typesetting the book and proofreading it once, why bother to do it all again?

I first became aware of this project and technology when I met Brewster Kahle as he drove the Internet Bookmobile around the U.S., going to libraries and schools trying to drum up interest in Eldred vs. Ashcroft. A compressed version of Alice in Wonderland in DJVU format is about 5 megabytes (the same as a single MP3) including the illustrations and fancy typesetting. He could print and bind a copy of it for about $2 in materials, on demand using an HP laser printer out of the back of the mobile. The binding isn't amazing, but consider the possibility of having literally any book in any small town library in any place in the world. It's an exciting idea, and one that technology is only making easier and cheaper. You can get a decent scanner for $100 (even one small enough to hook to a laptop and take to a library). You can scan a book in an evening. And after you do, the file can be converted to a simple, easy to use format that everyone can use. Forever. One evening. One person. One book.

Despite the setback of Eldred v. Ashcroft, more and more books are going to be made available by the true philanthropists of the world: the volunteers who give something of their own time to make the world a better place. I wonder what Mr. DeLong has done to make the world a better place...

--

There is much pleasure to be gained in useless knowledge.

Re:What's wrong with Wired Magazine... by Anonymous Coward · 2003-01-29 11:24 · Score: 0

>>They obviously publish articles written by people with their head up their asses. Honestly, just what is Mr. J. Bradford DeLong thinking? To characterize Project Gutenberg as a failure is just imbecilic. From PG's own pages, 203 ebooks were released in October 2002. 1975 new books in 2002 (1240 in 2001). It's a lot of work to produce even one book, and PG is churning them out at a pretty good clip for an entirely volunteer effort.

Reread that last clause: "PG is churning them out at a pretty good clip for an entirely volunteer effort." That's my point. The social engineering task is immense. 1240 books a year is a *very* good clip for an entirely volunteer effort.

But I want the Universal Online Free Library of Humanity. I want it last year. I am greedy.

Brad DeLong

Semi-official response from Project Gutenberg by gbnewby · 2003-01-28 17:46 · Score: 5, Insightful

Michael Hart and I are working on a written response that we'll send to Wired and other media, but by then this /. article will be off the front page. So, allow me to make a few comments.

Projecting back to 1971, Project Gutenberg has tracked Moore's Law quite precisely. January 2003 will be our most productive month ever, and we are looking forward to continuing to double our rate of new eBooks every 18 months.
Project Gutenberg has received some big donations, and we're working on grants and other funding. However, when you do the math you realize that there's essentially no hope for paying for content -- it takes thousands and thousands of people. The hope for "someone" to do it is naive -- the only answer is to figure out ways for "everyone" to work on digitization.
While the author makes 6200 books sound like small potatoes, in fact it represents about 1/3 of all eBooks listed in places like the Internet Public Library. Not bad, and it certainly explains why some random book the author wants isn't part of the collection -- there just aren't that many projects working on digitizing literature.
Where did the author figure on $750million, and for what? Over 30 million printed books were registered for copyright in the last 100 years (this doesn't count magazines, recordings, etc.). The notion that $25/book could pay for digitization is not unreasonable. But where do you get the books, and what about copyright? If there's a plan, I'd like to hear it.
One more point, to keep this short: We have just under 7000 eBooks (up about 800 from whenever the author did his research!). We have over 1000 active volunteers. The books are in over 20 languages, dozens of formats and, if printed, would fill a small library. We're on track to reach #10,000 in 2003. Via Distributed Proofreading, as mentioned here and in a previous /. story, we can and frequently do complete digitizing a 300 page book in just a few hours. Mr. DeLong, I don't feel apologetic about these numbers at all.

That's all for now. Thanks to all the supportive comments in this thread, and to all the constructive criticism. And remember, a page a day is all it takes to contribute!

Greg Newby, Director and CEO The Project Gutenberg Literary Archive Foundation www.gutenberg.net

Re:Semi-official response from Project Gutenberg by Anonymous Coward · 2003-01-29 11:21 · Score: 0

>>Mr. DeLong, I don't feel apologetic about these numbers at all.

Don't feel apologetic. I think that the magnitude of the social engineering task is immense, and that the project is a wonderful thing.

But I'm greedy: given the human race's collective powers, I want the Universal Online Library of Humanity last year...

Brad DeLong
Re:Semi-official response from Project Gutenberg by msouth · 2003-01-29 14:18 · Score: 1

>>Mr. DeLong, I don't feel apologetic about these numbers at all.

Don't feel apologetic. I think that the magnitude of the social engineering task is immense, and that the project is a wonderful thing.

But I'm greedy: given the human race's collective powers, I want the Universal Online Library of Humanity last year...

Brad DeLong

Yeah, well, you can have them as soon as they get done working on my pet projects. :) (See website)

--
Liberty uber alles.

Speech = absurdly inefficient by Tuxinatorium · 2003-01-28 17:49 · Score: 1

Anyone with a quick scanner and a bit of good software could make book pages into formatted text at the rate of 10ppm or more. The question is, are there many good programs out there for doing that?

--
Repeal the DMCA!

You are correct on all points by kfg · 2003-01-28 18:06 · Score: 2, Insightful

In fact ASCII text can even be human translated (although not really human read) if all you have is the *binary*.

The poster to whom you reply seems to have missed the essential point.

I would give you one caveat though. English may well be the language of the internet ( and I'll leave the arguement as to whether that's a good or bad thing to the students), but it isn't the language of *literature.*

It would certainly be a Good Thing to be able to store the Vedas and Sun-Tzu, in the original script, at the lowest possible human readable electronic form.

This, however, as you note, will apparently have to wait for some future time.

KFG

If people insist on abusing the subject line... by Anonymous Coward · 2003-01-28 18:28 · Score: 1

full grown, like Athena springing from the head of Zeus

What about "full grown"?

Subject does not mean "first half of your first sentence." I usually skim over the subjects because they're mixed in with meaningless stuff (poster, date, etc). Keep that in mind, be nice to people when you want them to read what you've written. (and if you don't, why post?)

Enough Already! by bangzilla · 2003-01-28 18:40 · Score: 1

That's about six articles from the most recent Wired that have been covered on /. Hey - if we're that interested we can always go buy a copy. Is /. that hard up for articles....?

--
Rich people are eccentric. Poor people are strange. Me, I'd be happy with odd.

Ummmmmmmm? by kfg · 2003-01-28 19:17 · Score: 1

"it cripples it for creating PDFs, TeX files for printing"

You've seemed to go completely doofey here.

Wanting to produce printable documents from an ASCII terminal was kinda the reason Knuth invented TeX. N'cest pas?

If I wanted to use TeX to print Walden the very first step to take would be . . . what?

Firing up vim. That's right.

Now here I'll quote from Adobe's pdf page:

Adobe® Portable Document Format (PDF) is the open de facto standard for electronic document distribution worldwide. Adobe PDF is a universal file format that preserves all the fonts, formatting, graphics, and color of any source document, *regardless of the application and platform used to create it.*

Yes, I added the emphasis myself.

I'll refrain from mentioning how weird it would be to produce a pdf document from ASCII text though, since ASCII already perfectly duplicates ASCII, and in a nonproriatary and smaller file size.

Instead I'll simply point out that to convert an ASCII file into pdf one would *first* format it into the finished product of your choice and then convert *that* to pdf.

Why one would want to distribute Walden as a pdf file I'll leave as an exercise for the student (mostly because I'd be interested to see the answer myself. It beats the hell out of me. Maybe you're a font Nazi and don't believe in letting the reader use a font that *they* find pleasant to read?).

Your SGML comment is doofey beyond comprehension. SGML was developed at IBM as an ASCII markup language. HTML and XML are both interpretations of the SGML standard. The *point* of SGML is to take the plain text and create a document. I do it all the frikkin' time. So do millions of others. In vim.

You can find the author's recollection of its development here:

http://www.sgmlsource.com/history/roots.htm

I also use ASCII to write in more than one language. It's true that I don't write Chinese ideograms in it, but how one would go about it is trivial and obvious, although one *would* need an interpreting display layer, such as SGML/HTML/XML, where the trivial and obvious work has already been done, although not to everyone's satisfaction, to make it conviently human readable.

Forgive me if this post seems a bit bluff, but I'm truely baffled by your post.

KFG

Priceless by curiuz · 2003-01-28 19:22 · Score: 0

Everyone's into the technical difficulties, but, hey, think big: Bringing most classical works online would be one of the greates achievements of the internet. It totally dwarfs technical difficulties and another 50 megabucks on the expense account of the world community. Makes me wonder if we're sometimes too hung up on the medium to remember the message...

Ah, but you read what I wrote. Didn't you? by kfg · 2003-01-28 19:29 · Score: 1

Perhaps I could have written "Fire!" in the subject box. That would have been attention getting, although false. "My comment on the story" would have been factual, but pointless.

Sometimes "Subject" means "Write something here to provide the reader a reason for proceding".

My approach seems to have worked.

If, in future, you wish to skip my posts, go ahead. I won't be offended. To each his own.

KFG

You can search DJVU files... by raytracer · 2003-01-28 19:44 · Score: 2, Informative

Scanned documents might be fine for readers, but what if you're looking for "oh, you know, that one line in the book, where the dude was talking about melons."

It might help to actually understand what you are talking about before you are so quick to dismiss it. DJVU does support searchable text, which can be inserted automatically via OCR. The advantage of this is that the OCR need not be 100% accurate to still be useful (vastly more useful and accurate than the indices in most books, for instance).

--

There is much pleasure to be gained in useless knowledge.

...humanity wrote some ok books in its first 3000 years (-ish) of literacy. The Koran, the Bible, Shakespeare... yeah there's some ok books out there not covered by the stupid copyright situation we are now in. Hopefully Gutenburg can bring some pressure on the ridiculous copyright fiasco, but in the meanwhile, there's a whole store of amazing works of learning and literature out there.

Re:copyright sucks but... by kalidasa · 2003-01-29 00:53 · Score: 2, Insightful

...humanity wrote some ok books in its first 3000 years (-ish) of literacy. The Koran, the Bible, Shakespeare... yeah there's some ok books out there not covered by the stupid copyright situation we are now in.

Unfortunately, Bevington's, Taylor's, Kermode's, and even Muir's texts of Shakespeare are still under copyright. (Compare an Arden of Shakespeare to a facsimile of the First Folio some time: the printers of the First Folio were considered good in their day, but not in ours). Too bad most English translations of the Bible (the KJV and the Tyndale are two obvious exceptions) are still under copyright. Too bad most of the good translations of the Koran still are.

Yes, there's plenty of good lit before 1923, but sometimes you need to look at a more modern edition to see what the original author most likely really wrote.

Why Gutenberg... by Anonymous Coward · 2003-01-28 22:11 · Score: 1, Interesting

Why project Gutenberg won't succeed?

Two reasons: (one) their "universally readable" text format sucks mud, and (two) the US Government, eh, I mean Disney decided to extend copyright duration beyond any reasonable length, so no recent texts are available.

Harnessing free labor is easy enough: just stop by in alt.binaries.e-book on usenet.

Realize that for many people this is /not/ a warez group: a lot of the regulars there just want e-versions of books they already own so they can read them on their PDA's or computers, and would be willing to pay for them if they were DRM-free and the price was decent. "Decent" means not like the Star Trek e-books for example, which cost more than the hardcover edition (which is probably the primary reason why they're DRMed).

Text availability, ASCII to PDF conversion by harmonica · 2003-01-28 22:35 · Score: 1

I think Gutenberg is very much there... Have you ever looked at the amount of material in Gutenberg's archives? When it comes to books and material written in english, that is in the public domain, I have to say, that Gutenberg offers almost everything of interest already.

The 'vision' that the author of the Wired article had was somewhat different: To be able to access all texts electronically. Something that everybody who had to hunt down old magazine articles has dreamt of (I still have nightmares from that one dark and dusty university library cellar, *shudder*). While Gutenberg is a great project, to come closer to full availability of all texts via electronic media, there will have to be initiative from governmental organizations as well as commercial entities. Obviously, not all texts will be available for free. But even a somewhat unified way of searching and finding these texts will be huge task.

There is CiteSeer for articles on computer science, there is IEEExplore if you happen to be looking for something from IEEE. But you have to know these places. Even with better search engines like Google it's still quite a task to get your hands on a text, even if you have some time to do the search and are willing to spend money.

A large database of text references (maybe including abstracts) would also be nice to just see what's available while you are still doing research.

The reason the Gutenberg project isn't hugely succesful is not the lack of text. Part of it might be the lack of formatting. Nobody want's to read 600 pages of a classic work on a computer screen in ASCII.

GutenMark does that (almost) automatically. Uses LaTeX.

Are you kidding? by MisterSquid · 2003-01-28 22:57 · Score: 1

I'm not going to hunt this down, but I will point out that standard, easy-to-understand speech is about 150-200 wpm. That *handily* outstrips all but the most blindlingly fast of typists.

--
blog

Re:Are you kidding? by Anonymous Coward · 2003-01-29 08:32 · Score: 0

I just had eight of my more literat buddies cold read a page from Asshole Nivens and MS FUD suckin Pournelles "The Burning City". No one read it perfectly ( min 4, max 8, mean 6 mistakes). The reading speed was min 90, max 130, mean 100. One the one hand Niven and Pournelle are both highly accomplished authors, with a clear concise and smooth style, on the other, the story involves bizzar names. I will need to run more experiments, with different types of writing.

Scanned Images are not Accessible by joyjoy · 2003-01-28 23:09 · Score: 2, Insightful

Another side benefit of good old ASCII - text to speech! Or braille displays! Heck, you can read it on any device, changing it to any resolution you want quickly and easily.

What's even more, more by kfg · 2003-01-29 00:01 · Score: 1

PG doesn't restrict itself to the written word and it's works include midi and mp3 files.

Perhaps I should have been more exact and explicit in my original statements. PG *tries* to provide whatever coding method that results in the lowest level human understandable output, preferably in a nonpropriatary format.

Obviously for Kanji or Vedic Sanskrit ( or recorded music) this is not plain ASCII ( by which I actually mean extended ASCII, not "teletype" ASCII).

KFG

My housemate did this by Goonie · 2003-01-29 00:49 · Score: 1

He knocked up a program to convert PG etexts into LaTeX. It's not difficult to do and get something that looks quite good. I'm sure I could write something similar in a day or two, if there isn't something on freshmeat already.

--

Any sufficiently advanced technology is indistinguishable from a rigged demo
--Andy Finkel (J. Klass?)

Why not OCR@Home? by luismunoz · 2003-01-29 01:19 · Score: 1

Just going out on a limb here... Do you guys think it is possible to scan the books and store them somewhere so that an open source client such as Seti@Home's can work on the pages?

I guess this would tend to deal with the most expensive part of the process IMO, the typing. Of course, storage and scanning of the pages is still an issue.

Just my 2 cents...

Re:Why not OCR@Home? by clonebarkins · 2003-01-29 01:51 · Score: 1

OCR is probably the quickest part of the process. It's the human interactive portions that require a long time. In addition to scanning, which you've already mentioned, you've got to proofread multiple times, and then format the final work. That's not something that can be done by a program.

--

"The evil of the world is made possible by nothing but the sanction you give it." -- Ayn Rand

Speaking is faster than transcribing? by fataugie · 2003-01-29 01:37 · Score: 1

Bullshit,

Don't you remember in English class having to go around the room reading? Do you remember some of those retards in your class?

Ro ro ro ro Romeo, Romeo, wa wa wa were ffffor art thou Ro ro ro Romeo.

Oh yeah, that ought to be mu mu much faster

--

WTF? Over?

Re:Speaking is faster than transcribing? by kirkjobsluder · 2003-01-29 03:15 · Score: 1

Of course, here you are comparing skilled transcriptionists to unskilled speakers. Due to a bad case of RSI, I have been using speech recognition for most of the last six months. For composing papers, it is about the same speed as typing. The biggest problem would be if the text is very heavy with technical jargon, but if you add the word once, it is in the dictionary forever. Actually, transcribing using speech recognition is faster than composing using speech recognition, because the accuracy of speech recognition improves if you given much longer phrases.
Re:Speaking is faster than transcribing? by fataugie · 2003-01-29 04:42 · Score: 1

Actually, I know. I was kinda trying to be funny. But, there is a kernel of truth in both what I say as well as you. I know alot of ummmm and ahhhh would not be good. I remember using OS/2's speech recognition program back in the day. It was painful to speak...like...this...to...get...it...to...work. However, the recognition based on not only the sound of the word, but also the placement in the sentence and usage would come into play to help it achieve a suprisingly high rate of success. I have used that dragon speech to text thingie and found that even better.

So, while I was trying to be funny, I do think if you can't read outloud well, the recognition would be poor, but the more you do it, the higher rate of success the program would have.

--

WTF? Over?
Re:Speaking is faster than transcribing? by kirkjobsluder · 2003-01-30 17:42 · Score: 1

Actually, I know. I was kinda trying to be funny. But, there is a kernel of truth in both what I say as well as you. I know alot of ummmm and ahhhh would not be good.

Actually, I just read a blub that linguistic researchers have disovered that ums and ahhs are important for human comprehension of speech as a rythmic placeholder.

Distributed Proofing by jefu · 2003-01-29 01:41 · Score: 1

I've been doing the distributed proofing for a while now and its a relatively painless way to spend a few minutes and put a few pages into the public domain.

Mostly relatively painless anyway - I've spent some time working on the "Anatomy of Melancholy" which is a bear to do. Many english texts I've proofed here I can proof at the rate of a few minutes per page. "Melancholy" is more like a half hour per page.

Most of the works are nowhere near that bad though and this is a good way to make all that cool (or not so cool) stuff available and usable electronically.

Re:Distributed Proofing by Anonymous Coward · 2003-01-29 05:09 · Score: 0

Ugh-- "Melancholy" was too tough for me. You are a brave soul.

DP really requires a bit of a commitment. Not too much, but a bit. I suppose if it were absolutely painless you wouldn't feel good about doing it.

This is one of the first Apps I loaded on my Z! by MrJerryNormandinSir · 2003-01-29 01:51 · Score: 1

This is one of the first apps I loaded on my Zaurus.
Guttenburg rocks! I have to disagree with this
post. There's are many great works available for download. Moby Dick was taken down.. I don't know why. But other than that, It's defenitely worth a compile and try! Reference books are available too!

Sugestions for formatting the ASCII by himself · 2003-01-29 02:22 · Score: 1

How do people format the ASCII texts? That is, if you don't just open the .txt file in in editor, but instead mark it up as HTML (or whatever) to improve its readability, how do you do this? Got any scripts or filters to share? (If so, maybe the PG folks can post them to their web site.)
After all, the ASCII should just be a starting point -- you take that, add a little layout, and have yourself as pretty a book as you'd like to read onscreen or print out.
I have tried this in the past, adding sparse HTML tags to, say, a Willa Cather book, but it was too distracting to read while I marked up, and just too dull to mark up an entire novel. That's why I think borrowing a script or filter would be cool.

Re:Sugestions for formatting the ASCII by Anonymous Coward · 2003-01-29 03:06 · Score: 0

I read a lot of Gutenberg texts on my Visor using Plucker. I got tired of marking-up Gutenberg texts to HTML by hand, using Emacs macros, so I wrote gut.
It's a perl script that does most of the drudge work of marking-up ASCII text to HTML-- its use is not limited to Gutenberg texts. Works under Unix, GNU/Linux and Windows.
Enjoy!

Formatting vs. semantic markup. by androse · 2003-01-29 02:22 · Score: 1

Sure, some formatting is missing, but it's relatively minor for the majority of books in question. And given the existance of this unformatted text it's alot easier to create formatted text than from scratch, so you even get a benefit there.

P.G. is a worldwide project, not only north american. I'm living in France at the moment, and have a pile of old books that contain ancient french, latin and ancien greek. Transcribing the caracters to ASCII would be absurd. It would be a huge loss of information.

Secondly, you seem to say that basic formatting, like those described in the document guidelines are good enough.That brings up two problems :
Because this is digital media, you do not want to use formatting, you want to use semantic markup : the reader could be blind, of deaf, of using a PDA etc. Formatting is static, semantic markup can be reinterpreted again and again to suit best the reader. This is where the W3C is going.
The idea is not to make some alphabet soup that can be used to create formatted text. The idea is to provide books that are directly usable and readable. The document guidelines only specify one level of chapter heading for exemple. Why ?

I was getting all excited about contributing to this project, but the current guidelines are just too weak. Using unicode encoded xml documents with a specific DTD (or Schema) seems to be a good solution, but I'm no specialist.

Project Gutenberg is good anyway... by Kjella · 2003-01-29 02:23 · Score: 1

It's not the books, it's not the format... it's the fact that 99% of the people don't have a good way to read it. PCs suck. Laptops suck. PDAs & tablets maybe, certainly not the old ones I've used.

However, having project Gutenberg there is still good, even if noone actually reads it from there. Why? Because when you want to buy a reprint (yes I still like dead-tree version), there's no room for extravagant mark-up. If it's too big it's very simple for a publisher to take the Gutenberg text, format it to a book and sell it for less.

For any of you having studied economics, it acts like an undifferentiated Bertrand duopoly - price is pushed down to cost (in theory). It's a solution every business man hates, all the surplus value is given to the consumers, nothing to the company.

That's why I'm proofreading on DP (top 1000, but not exactly devoting my life to it), even though I never have and probably never will read a book directly from PG. Unless there's a miracle cheap electronic paper break-through or something at least...

Kjella

--
Live today, because you never know what tomorrow brings

Re:Project Gutenberg is good anyway... by cmpalmer · 2003-01-29 06:13 · Score: 2, Informative

I recently bought a Franklin eBookman ($39.95 at CostCo!) and then, more recently, got an iPaq through work. The last five books I've read have been on one or the other PDA's (I had the Baen CD-ROM of Honor Harrington books and others). It still isn't quite as good as a paper book, but it is the best way I've found yet to read in bed.

I've been using the Mobipocket reader on both devices and the autoscroll feature is really cool -- you can prop up the device, turn on the backlight, and adjust the autoscroll to your reading speed. Hands-free, no reading lamp, no cramps from trying to prop up and turn pages.

On thing that strikes me is how much typography and formatting matter, which is, as others have pointed out, the problem with Gutenberg texts. I have read quite a few PG texts in the past (or at least used them for reference when I was looking for particular quotes or need a big text file to test something :-), and the formatting leaves a lot to be desired. On the PDA's, weird page and line breaks or even bad justification or extreme ragged edges, are very disconcerting when reading.

--
-- stream of did I lock the front door consciousness

Not quite by Duds · 2003-01-29 02:35 · Score: 2, Interesting

So , at present Australians can get up to the beginning of 1953. Seems a hell of a lot easier to follow than the mess of dates the parent posted.

Not quite.

Up to 50 years after the end of the year of the author's death

i.e - they can get stuff up to the end of 1952, assuming the author also died that year.

I wonder though. What if they wrote something in 1951, died in 1952, but it was only discovered (and published) in 1973. What applies?

Re:Not quite by ColaMan · 2003-01-30 09:02 · Score: 1

Pedantic bastard ;-)

Yes, by "beginning of 1953", I meant the end of 1952. Of course, you only realise these ambiguities *after* you post.

--

You are in a twisty maze of processor lines, all alike.
There is a lot of hype here.
Re:Not quite by Duds · 2003-01-30 20:21 · Score: 1

I guessed but I had to be pedantic since I already had the more important "death" not "written" point to address.

Re:Stupid article. Project Gutenberg doing great. by Overt+Coward · 2003-01-29 02:57 · Score: 2, Informative

I'll point out that at the end of 2000, there were only roughly 2000 etexts in the entire PG library (I copied them all to a single CD)... So if they're up well over 6,000, then they've made amazing progress in two years!

That's what Adobe used to think. by Aquitaine · 2003-01-29 03:14 · Score: 1

...and they made their PDF format completely inaccessible to many types of disabled people. Since it's bad business to have sites offer a PDF version and then an alternate 'accessible' version of things, they're correcting the situation. Similarly, a scanned image is impossible for a screen reader to comprehend or for a text editor to search.

Speech Recognition. Yeah. Right. by edbarrett · 2003-01-29 03:35 · Score: 1

How many times have you read something you've never seen before to someone else with 100% accuracy? With no "ums" or "uhs"? With no corrections at all? You'd still have to go back and correct the transcription, because it's not going to be 100% accurate to what you said anyway.

Comment removed by account_deleted · 2003-01-29 03:46 · Score: 1

Comment removed based on user account deletion

TELL YOUR CONGRESSMAN! by donutz · 2003-01-29 04:54 · Score: 1

This is an excellent set of "copyright guidelines" to forward to your local congressman. Highlight the dates! 2074? Most of your congressmen will be long dead by this time, so really why are they being corporate whores to the point of making copyrights practically indefinite? Tell your congressman why it's good to put works in the public domain. Tell him why he should support shortening copyright periods to something reasonable! Tell him to stop whoring to the companies and start representing the people of his district! Corporations are just in it for the money: corporations are some fake "entity" which is basically just human greed embodied. Help people, not greed!

OT: Confidential to Brad DeLong by msouth · 2003-01-29 14:24 · Score: 1

Brad:

If you want anyone to see this, you should get an account and log in. I would wager that most people don't read stuff at moderation level 0, where you posts are because they are anonymous.

>>They obviously publish articles written by people with their head up their asses. Honestly, just what is Mr. J. Bradford DeLong thinking? To characterize Project Gutenberg as a failure is just imbecilic. From PG's own pages, 203 ebooks were released in October 2002. 1975 new books in 2002 (1240 in 2001). It's a lot of work to produce even one book, and PG is churning them out at a pretty good clip for an entirely volunteer effort.

Reread that last clause: "PG is churning them out at a pretty good clip for an entirely volunteer effort." That's my point. The social engineering task is immense. 1240 books a year is a *very* good clip for an entirely volunteer effort.

But I want the Universal Online Free Library of Humanity. I want it last year. I am greedy.

Brad DeLong

--
Liberty uber alles.

OT: Personally... by msouth · 2003-01-29 14:35 · Score: 1

..I think posting like that is fun, and reading a well done one is fun. Although I do like it when they put the continuation ... at the front so I know I was supposed to read the subject.

--
Liberty uber alles.

Your understanding is extremely limited by dachshund · 2003-01-29 16:37 · Score: 1

The constitution doesn't give them much wiggle room since it assigns Congress the right to regulate copyrights to their hearts desire.

The Constitution does no such thing. Read the damned thing before you comment. The Copyright Clause requires copyright terms to have "limited times". A term that can be repeatedly extended (retroactively) at the whim of Congress is not limited.

The issue here is: does the Supreme Court have the right to enforce this sort of Constitutional limitation on Congress, or does that responsibility fall to Congress itself? The traditional answer is "Congress can do whatever it wants and the Court must restrain itself". In 1995, Justices Rehnquist, Scalia, Thomas, O'Connor and Kennedy threw this understanding on its head and declared that they have the power to enforce the restrictions on enumerated powers (the right to grant copyrights is one of these.)

The plaintiffs in Eldred asked the Court to apply this plain-stated logic with respect to the copyright clause, and the Court's response was... well, nothing. They didn't even bother to explain why they would do it in some circumstances and not in this one.

Eldred & Co had a strong Constitutional argument for limiting Congressional power if the Court was willing to obey its own precedent. This Court just didn't feel the need to do so, or even to explain why this case was different.

Re:Your understanding is extremely limited by Bambi_72 · 2003-01-29 22:05 · Score: 1

All this just applies to the U.S. There are other countries you know, I know this may come as a shock to some of you, but the world doesn't end at the US borders. Now as far as this copyright thing go's, M. 'wacko' Jackson owns a fair few copyrights of Beatles songs. Now if he owns them for 'ever' nobody would ever be able to re-record some global classics. Can you imagine 400 years down the road, a class of music students wanting to study ancient music from the 20th centuary, and Sgt Peppers wasn't available cos some mad man in the 21st centuary decided he was the only person allowed to perform it. It has nothing to do with protecting artists from copyright theft, it has everything to do with large multinational companies making as much cash from a minimum outlay. Where would we be if some record company eneded up owning all the copyrights of all this worlds classic music and they just recycled it every 5 years with the latest 'poop idol'. No new music, just the same songs rehashed every 5 years. OK so I may have gone of the topic a little, but FUCK the 'establishment' all they are interested in is money, proffit and controlling what you 'want to listen to / read'. Imagine being taken to court for singing a song you didnt own copyright to, or even humming a vague tune. Opps I belive I just used a copyrighted song name there, best send Mr Jackson some money, after all he spent all that time thinking up the title, oh wait a min, no he just had enough money to buy it. Makes me mad that a bunch of suits with cash feel they have the right to own the creative work of another person FOREVER. Anyway I'll stop now, probably best....

Some details of Project Gutenberg... by Anonymous Coward · 2003-01-29 20:25 · Score: 0

About 10 years ago, I volunteered to do some work for Project Gutenberg. The way it worked then (and I'm pretty sure it'd be the way it still works) is that they would OCR a particular edition of a particular book. Then, they would get volunteers (more than one per book) to read through the OCR-ed version alongside the actual printed edition from which it was OCR-ed to validate that the OCR didn't make any mistakes.

This is a very dull, volunteer-intensive task for even interesting books.

Scanning, typing, and other dumb stuff by Anonymous Coward · 2003-01-30 06:26 · Score: 0

Why are we wasting our time bitching about this?

At the very least, PG is better than nothing at all. Free books is free books.

Alternative to the Wired article by Anonymous Coward · 2003-01-30 13:40 · Score: 0

"Project Gutenberg is in the cross hairs of J. Bradford DeLong, a Berkeley professor and Wired Magazine contributor, who accuses PG of failing to 'achieve any form of critical mass.' I'll get to Gutenberg in time. But first a few words on the DeLong column and then plenty more on his former employer, the Clinton Administration..."

Read the rest at http://www.teleread.org/blog/index.html.

Justify your argument! by mulp · 2003-01-30 14:19 · Score: 1

The provisions in the Constitution were written to address the problems the colonies were having with trade secrets. The colonies, and then what became the USA, was forced to go back to Britain for all sorts of machines and goods because no one could easily produce them here.

New England became a manufacturing center for hundreds of years because its residents became good at "reinventing the wheel", or "reverse engineering", the kinds of things that the USA acuses the far east at doing.

To foster the process of duplicating Britain's and Europe's manufacturing technology here, the patent system was created. For a short period of time you got exclusive rights AS LONG AS YOU TOLD EVERYONE YOUR SECRET AS SOON AS YOU THOUGHT IT UP.

Copyright was given similar status as part of a program to ensure that ideas flowed freely - the best way to protect your ideas was to publish them, rather than just "speak" them.

At no time was it intended to create a welfare program where you could work for a week or year and then live off that for the rest of your life. The idea was that you could think up something and tell the world and benefit from that just as much as working cutting down a tree and selling it or raising and butchering a cow.

In one regard, your exclusive rights to an idea should be no longer than it took you to produce it. If you watched an event and wrote up an article on it, then your exclusive copyright should be a day. But a couple hundred years ago, what was required was observing, writing, setting the type, running the press, shipping the copies around the world, and so on, so the process of making money from a new event might take a year. Books took even more time.

As the issues were complex, Congress was given the job of figuring out the tradeoffs.

The situation today is the opposite of what it was several hundred years ago. The intellectual giants and the influential politians are saying "secrecy is good", "exclusivity is good", "restriction in the flow of ideas is good", "prevention of reverse engineering is good", "unfettered innovation and creativity is EVIL".

I'd like just one explanation of how extending patents or copyrights will make you more creative than you are?

I'd like one explanation of why you think that you should have the right to restrict the free flow of ideas for an indefinite period of time?

If you can convince me that it takes so much work to write something that it will take you decades to make the money back, then you have to explain how it is that people write things with absolutely no expectation of getting any monetary value from it, and why you are so unique that you deserve special treatment so that we might be blessed with your special ideas?

How about using kids in school? by Wolfrider · 2003-01-30 19:26 · Score: 1

--Get them to volunteer for PG, and give them added credit for their English classes.

--
.
== WolfriderV6 == I'm willing to admit that *I just might* be wrong... Are you??

Slashdot Mirror

Why Project Gutenberg Isn't There Yet

334 comments