Proposal: Put Library of Congress' Contents Online
Mark_Uplanguage writes "The idea to scan in all materials available at the U.S. Library of Congress was presented at the Web 2.0 conference this week (as just one of many ideas presented). The proposed cost of $260 million would create a huge benefit to society (well, at least to those who can read English)."
Pardon me for sounding like an eegnoramoose, but isn't at least some of the material in the Library of Congress copyrighted material? Putting it all online would let people get copies of it for *gasp* FREE.
Can't have that, now can we?
This would violate the publishers' god-given right to milk their "creations" until the heat-death of the Universe.
and to those who can't, they can copy and paste the text into a translator.
So yes, it would benefit society as a whole.
Grump.
Is it true that more people vote for the winner of American Idol, than vote for the president? -Ali G.
wanted to do something really important and contributive, he would fund this.
a Library of Congress jokes will be on topic.
How data much storage would this require? Could someone give it to me in laymen's terms?
Since Congress and the President can so easily pull out a hundred billion dollars to bomb the hell out of another country, I see no reason we can't come up with a whimpy $260 million for something as worthwhile as this.
I want a new quote. One that won't spill. One that don't cost too much. Or come in a pill.
...just to put the whole dictionary on line. Figure it this way- you can use it to make all the other books out of it, and it gets around the whole copyright thingie, too (if you use a sufficiently old dictionary).
Someone, please.. how much is that in LOC?
Karma: -2147483648 (Mostly affected by integer overflow)
The government has proposed recently. I would also suggest that they put in place requirements that all future material that is to be copyrighted present appropriate copies in machine readable form so this will be cheaper in the future.
If you folks want out of state donations from non-taxpayers, I'll stump up happily from Canada!
well, at least to those who can read English
Correct me if I'm wrong, but doesn't the LOC contain all materials registered with the US copyright office? In which case it would have any foreign materials registered for copyright protection.
Javascript + Nintendo DSi = DSiCade
It would probably pay for itself too since FBI agents would no longer have to travel to libraries to secretly gather records of who borrowed what. They can just use Carnivore to do it instead.
Should work... its a lot of information though: about 1 library of congresses worth.
The Canadian gun registry cost is exceeding 2 billion dollars and climbing - 1.9 billion of which is probably wasted on corruption, but that 260 million sounds like a lowball.
Oh well, what the hell...
Finally, Slashdot can establish that for official purposes:
1 Library of Congress = $260M
And the 2004 US Federal budget can be spec'd at 0.000243754522 LoC:s (Libraries of Congress per second).
--
make install -not war
OK, 260 millions US$ to scan... $60,000 of space (a terabyte) according to the article... put it online... BANDWITH costs estimates? Oops... forgot about that I guess!
"Brewster Kahle's idea is to scan as many books as possible and put them online so everyone has access to that huge amount of knowledge."
The plan IS to put it online, after all...
Eureka Science News - automatically updated
Since the Library of Congress contains mostly copyrighted data, and Amazon is already doing this for profit, this is really just a good way to market Amazon's A9 search engine and the products it sells.
--
make install -not war
How about the whole world who can find any online translation service that goes from English to Local Dialect.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Soon well see the PIAA (Publishing Industry Association of America) forming up and talking about how it's ideas like this that cause them millions a year in lost revenue. And then start in on how libraries are nothing more than book pirate hang outs....
At long last, we shall finally know just how much one unit of Libraries of Congress is. This could quite possibly have profound effects on how we understand the universe. For example, for many years we have known that the universe is approximately 42 Libraries of Congress. Now we can fully understand its meaning.
Putting the LoC on-line is only the first step. How long before those Internet book printing stations that can create an entire book for you from an electronic image in a deciminute for $1 tap into this? I'd have to think that this would be good for everyone except B&N who are busy reprinting old classics under their own label right now.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
they can copy and paste the text into a translator.
So yes, it would benefit society as a whole.
Subjecting the world to the Babelfish translator would actually detract from knowledge considering the horrible linguistic bastardizations that people would then take as fact.
you've perused the Libray of Congress, but have you perused the Library of Congress Online
I would reserve that honor for Andrew Carnegie, who basically sold his empire for $485M and spent the rest of his life giving away all his money to good causes. Bill Gates is a far cry from that so far.
Just as a point of /. interest, what is the conversion factor between ACMs (Andrew Carnegie Millions) and BGBs (Bill Gates Billions)?
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
In a traditional library it's not really easy to...
...all within 30 minutes.
1. walk in and pick up a book
2. strike the author's name from it and replace it with your own
3. replace the copyright notice with your own
4. Make one thousand perfect copies
5. Offer it for sale, start taking orders, and PROFIT!
I could easily do that on the internet.
Right now, Internet2 can download the entire Library of Congress in about 20 seconds.
I'm not aware of any PIAA for publishers, but somebody is going to have a problem with this. And by the time this actually happens, I bet there will be an Internet4 that can do it all in 20ms.
Punctanym: alternate spelling of words using punctuation or numerals in place of some or all of its letters; see 'leet'
Otherwise how will they stop me from getting every book that I want to have? The only possible way would be for publishers to not send data to the LoC.
If this is such a wonderful idea why doesn't he get a bunch of artists, musicians and writers to donate their own work to this project and actually prove the concept works?
I'm tired of all the rhetoric about business models failing and how the web is going to transform the way society learns, works, and entertains themselves. The dotcom era should have taught these so called visionaries one thing, you actually have to have a business plan before you can transform business models.
If these business models are so full of potential he should start one, with his own intellectual property, and prove that the old economy intellectual property businesses they are extinct. If his ideas work then the dinosaurs of the MPAA and RIAA will either have to adapt to the new economy or die. Forcing them to risk their entire business on a gamble like this is wrong from any perspective.
The article claims that the LOC stored as image data would take up 1 TB.
That's wildly underestimated IMO. The LOC has 26 million books. If we conservatively assume that they each have at least 100 pages, that is 2.6 billion images. That equals 0.03 kb per image. That's some REAL good compression for an image as large as a full page of text.
and to those who can't, they can copy and paste the text into a translator.
With that who can't, they can reproduce and stick inside their language teacher.
D6 63 0D 70 89 81 BB 8E 7B 7C 5F 5D 54 EA AB 73
I said could understand - not necessarily write.
DAMN YOU WORD PROCESSING FOR MAKING IT SO EASY TO REVISE A STATEMENT!
Slashdot comments... splitting hairs since 1997.
Relatively successful? By my assessment, you must be incredibly prolific - a large proportion of the comments on Slashdot ard by Anonymous Coward. I can only assume that you are some relation to Noel?
Don't go to a brothel if you want to buy broth
Oh yeah, put 'em all online. I have a hard enough time already in libraries and book stores! If I could read any book I wanted to (even if they're only the ones already out of copyright) online, I'd probably not leave my computer until I passed out!!
For all extents and purposes a car is free. What do I mean? It's free in the sense that it's possible with time and effort to find out how a car works, take the bare components to a car, and make a duplicate. The total cost is your time + free blueprints (or cheap blueprints) + the cost of parts. The cost to duplicate a book? If digitized, less than a dollar for the time and the media. I know that's not what you were trying to say, but my point is that it's not even that people want cars or books to be free. They just wish the price was much closer to the duplication cost. For virtually all copyrighted works, that's amazingly small.
Eurohacker European paranoia, gun rights, and h
I don't normally resort to this sort of language, but I entreat you sir, to fuck off asshole.
If you had actually read my post, you would have realized that I was referring to any non-english works (foreign written or US written) that specifically sought out US copyright protection. Because, you know, it's not like foreign authors never register their copyrights in other countries.
<sarcasm>You may now begin hailing my amazing wisdom and knowledge.</sarcasm> OR, you could just be a nice guy and apologize for your rude behavior.
Javascript + Nintendo DSi = DSiCade
This will be great! You know all those ads that claim such and such can transmit the Library of Congress in so and so seconds?
Now we'll be able to test their notions!
A 0-rated post noted that this type of free access is a big deal to people who make an honest living publishing their creations.
This invokes a big, important question. The rise and flourish of the information age has and will continue to provide unbelievable freedom of access to unbelievable amounts of information. Where and how do we draw the line between the freedom of the consumers and the rights of the creators?
I'm a software developer who loves movies: I'm a creator and a consumer, so I see both sides of this coin. And I think there needs to be a compromise between consumers and creators.
Consumers need to realize that at a certain point, amassing more music, or more books, or more movies, or more whatever, becomes a luxury, not a right. So if the price of music prevents you from having a 10,000 song collection, I'm sorry but, "so sad too bad." That's how it's always been for just about every other purchaseable product. Sometimes you have to sacrifice what you merely want to get what you really desire.
Creators need to understand that the information they produce is a drop in the bucket compared to, for example, the estimated yottabyte (1x10^24 bytes) of information on the Internet. So if you want to make money off your creation, it had better stand out, because there's a lot of noise out there to drown it out. Simply put, if you want to get paid, make something people are willing to pay for.
Sorry for responding to myself, but I just came up with a more valid point to why copyright in general is a bad idea.
Copyright has basically turned into charging relatively lots of money for the embodiment of an idea (daily writers to newspapers and the like excluded). In truth, most manufactured goods have been outsourced to other countries. Why? To make more money because there's an inherent small mark-up available when companies turn a good into a commodity. But, copyright is different and can enjoy gross mark-ups by comparison. Now, eventually this too will be outsourced (since it too will make more money), but in the meantime, the economy of the US will be based less on trading physical goods and more on trading ideas as currency (strictly speaking, copyright is the embodiment of ideas, but it's more succinct to say it that way).
But you can't trade ideas or the embodiment of ideas as some sort of actually stable basis for an economy. It's just asking for the economy to crumble and/or be taken over by countries which actually *do* make *real* goods. But, it's the obvious economic outcome as people will obviously turn to whatever is the largest profit maker per unit. The outcome is that the entertainment market is become well ballooned beyond what is healthy; this isn't good for the quasi-free market of the US. But with other countries without strong copyright terms yet able to copy the idea of mass entertainment, the entertainment bubble is sure to collapse in the future to foreign markets. And what else does the US have to trade to other countries with except maybe agricultural goods (some of which most of Europe now has banned)?
Eurohacker European paranoia, gun rights, and h
How much is that in terms of Space Shuttle Fuel Tanks? What about in Weapons of Mass Destruction?
I might inherit a portion of his farm. But that's a result of money that he saved at the time. I do not collect royalties on the *work* that he did 70 years ago.
If an author or musician wants to leave an inheritance, then they should save the money they make during a reasonable copyright term, and give that to their children. They can leave their typewriters, musical instruments, and other tools of the trade (analagous to a farm) as well.
They might have to actually forego a blowing everything they earn on cocaine and refrain from signing away most of their income on bad contracts to actually achieve this, but then so do the rest of us.
Isn't the size of the Library of Congress what people used to use as a quantifier for the speed of high-bandwidth connections? I remember several years ago that companies would brag that they can transfer the entire Library of Congress to England or wherever in less than 2 seconds and what have you. I suppose a statement like that would indicate that there are already digital versions of the Library of Congress out there somewhere meaning it will take virtually nothing dollar-wise to put it online (since I guess it's been flowing back and forth for years).
I didn't really read TFA, but how do they propose to actually do the scanning? There seem to be a lot of books in there, is there some sort of book-scanning machine that I've never heard of?
Doing all of this by hand would be insane, even if it's by a large team of volunteers. Maybe 1000 monkeys on 1000 scanners copying all the books will get it done in a few months?
Maybe im the odd duck here but somehow waay back in early net days..the 90's i thought that this was such an obvious application of internet technology that it must be part of the original design purposes for the internet (darpanet and all that funding of course)
So the only surprise to me is that were just now hearing a proposal to do this??? sheesh, if i hadnt thought it so completely obvious to every netizen at those old public library terminals i wouda lost so much seep making it happen!!!
so now who's going to do it? and while its limboing through congress can we just put together a consortium to visit thie library we aready own with our digital camera's and OCR the thing into existence... how many of us woud need to donate our gmail 1g accounts to store it all?
I just downloaded the LoC.ps.tgz from the local WPI Internet2 tap using gnutella and my printer just ran out of ink....
...that to OpenOffice.org Text Format...much more compressed, and it natively uses XML.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
There's a Vulcan saying: "The needs of the many outweigh the needs of the few."
I would say, scupper copyrights for all volumes owned by LoC.Scan and put every volume on the internet.
Within few years we would witness a Renaissance of sorts once again in human knowledge and education.
"Doing what i can, with what i have." ~ Burt Gummer
Ask newspapers. The print in bulk at $0.50/paper, often with many sections. They recycle paper, use pretty bad ink except on Sunday, and they don't give tons of royalties to anyone. Yet, newspapers still make enough money to stay in business (especially the bigger ones) and they share thinks like AP reports. The copyright on most newspaper isn't worth the paper it's printed on, as few people would buy a previous days newspaper (now, 30 years, maybe..). So, it's clear that it's possible to survive on such a system. Of course, for something that takes months to make and can be easily copied, you'd give your customer various "perks" without charging much above what "pirates" sell new stuff at.
Maybe people will still be d/ling things for free. But if you know you can sell a few copies at $0.10 and the publisher is selling it at $0.15, you'll sell it at $0.10 (more than likely). And the publisher will likely still get a lot of buyers because it's a more trusted source and $0.05 is piddly for that guarantee. Of course, I don't really know the price the free market will offer. But I find it hard to believe that absolutely all books and the like will fail to be produced. If anything, someone like Samuel Clemmens believed that it would just mean more profit for publishers and less for authors. But with the internet, the author can be the publisher..though maybe it won't be better for the creator without the huge available mark-up.
Eurohacker European paranoia, gun rights, and h
Are you an attack dog from the GNAA or something?
Look up the Berne convention. The US didn't join it until 1989. That leaves over a hundred years of work that had to be explicitly copyrighted in the US.
And FYI, non-English works account for less than 2% of the total volume of the Library of Congress
Which is neither here nor there. And by making this statement, you are agreeing with me that the LOC has a substantial selection of non-english works.
unlike your delusional and paranoid rantings and ravings.
I'm sorry. Did I hurt your feelings? Grow up. If you flame someone, you can expect them to respond less kindly than if you had attempted to make a point in an intelligent fashion.
Now I will repeat. Please either act like a man (or woman as the case may be) and apologize for your uncalled for behavior, or get the hell out of this discussion.
Good day to you, sir.
Javascript + Nintendo DSi = DSiCade
Compared to the $200 billion to kill and maim tens (hundreds?) of thousands of people in the name of "terrorism", $260 million to create essentially, a Library of Alexandria is a fucking bargain.
I don't respond to AC's.
Not only the Library of Congress of the Unites States of America, we should also scan every big library in the world to create a pool of human work to freely share and preserve.
What's in a sig?
>>Copyright has basically turned into charging relatively lots of money for the embodiment of an idea
>And knives have turned into tools to kill people. but no one would argue that the abusing of knives is a reason to eliminate them. So why is the abusing of copyright seen differently?
Actually, that's not really an abuse of copyright. An abuse would be violating copyright or turning it into something that doesn't still meet the definition of copyright. I was merely pointing out what copyright is now for various companies (again, daily newspapers and the like excluded).
>>Why? To make more money because there's an inherent small mark-up available when companies turn a good into a commodity. But, copyright is different and can enjoy gross mark-ups by comparison.
>I would argue that the creation costs of copyrighted "goods" are more hidden (either out of ignorance, or simple apathy) than the creation costs of physical goods.
Last I checked, physical goods aren't created. They're constructed from raw goods. Creation would involve violating some conservation of matter/energy laws (given matter is energy of a different form, even claiming energy -> matter is creation seems as absurd to me as claiming you create iron by converting liquid iron to its solid form). The closet analogy to creation for a physical good is really the design of a good. The fact is, the design for most goods is so blatantly obvious in the good no trade secret would cover it, the trade secret for the good was lost, or the trade secret for the good still exists. But given that the copyright and the trade secret for a copyrighted work are the same, it seems clear that all copyrighted works would fall into the "blatantly obvious" category and there'd be no damages for redistribution.
>The main difference between the two is duplication costs, but economics dictates that both costs have to be made up.
Economics might dictate that costs have to be paid for, but it never dictated that copyrighted works as we know it had to exist. So, the Constitution plays an exception to allow many more copyrighted works to exist than conceivably would exist in a free market. That's not so bad, if copyright's length were proportional to the reward cycle which is proportional to the rate of communication. Copyright has gone so badly in the direction *opposite* of this, that having a breather of no or minimal copyright might be the right direction to give some perspective on what actually is good for our economy.
>>But you can't trade ideas or the embodiment of ideas as some sort of actually stable basis for an economy.
>Remember talk of the knowledge economy? And yes as long as certain things are true, we'll need physical goods.
We'll need physical goods, but if our country doesn't produce many physical goods and uses its IP as the main export for trade, then when other countries make their own IP we'll be in the situation that we no longer have a viable export nor anyone in the country to make the physical goods we need. Of course, eventually we might start making physical goods again. But the truth is that most other countries will probably still be more cost effective at it than we are. That seems to mean that the only main export we have is things like crops. I'll be glad to look more carefully, though, if you can come up with more specific examples of what other physical goods we make.
Eurohacker European paranoia, gun rights, and h
I want to know why the hell it would cost so much. I realize it's an incredible amount of material, but $260 million!? I'll do it for a cool $1 million no questions asked. It's not like it's hard or requires much technical expertise. Just lots of monotonous labor.
^^vv<><>BA
One thing... Advertising.
Look at your standard paperback novel. You don't see a lot of advertisements in there, do you? (Okay, maybe at the end of the book, a plug for the next book by the same author or other books from the same publisher...)
Look at your standard newspaper. Full page advertisements, or alf page ones, or quarter page ones. Sure, the front page of each section might be free of them, but second page on is fair game.
That pays a lot of the costs involved.
Next, the classifieds. I don't think the paper here lets you put _anything_ in the classifieds for free. That also pays for a lot of the costs involved.
Kierthos
Mr. Hu is not a ninja.
Sorry but NO
Digital content means that any piece of information is just a number, and that means that perfect copy is not only possible but simply natural.
If you really want to 'protect' your work, don't go digital, is that simple, there's no way to stop math.
What's in a sig?
Only five? Would that mean that Klingon and 1337 are included? Would the books have to be written in that annoying overly abbreviated dialect that some AOL users speak? Sure, it is faster to write, but it takes for ever to make the ascii art with the tilde and degree marks.
Someone googles for a paragraph from your book, and comes up with two different results...
"There is more worth loving than we have strength to love." - Brian Jay Stanley
As an author, I wonder how much of your valued craft was honed by reading the work of others for education and inspiration. How many books did you buy in elementary school, or high school? Yet that's where you learned your precious language skills you now market.
Knowledge, even the limited knowledge of an author, does not exist in a vacuum. You read, you learn, you practice, then you create. You could not have done this without the beneficence of others who aren't making a dime off the education they provided you.
To unleash the vast amounts of knowledge stored up in the LOC to the world would be one of the single best things this country could do for mankind. One book, one reader my hairy ass. Why not open the floodgates so everyone can benefit?
I understand the motivation of monetary incentives, but I also know a lot of great authors who died penniless. And they were at least brave enough to sign their names to their ideas.
If we are talking about text documents in the LOC, then they should be scanned as plain text (or Rich Text at the very most) to at least preserve the contents in a format that is pretty much basically standard to every computer for at least the last 30 years (not counting unicode).
Of course, I'm sure the Copyright Police would have something to say about preserving the LOC in such an open format...
"Empathise with stupidity, and you're halfway to thinking like an idiot." - Iain M. Banks
The AC may be being an asshole, but I believe he's (at least approximately) right. As a British author, I don't need to register my copyrights in the US to receive the same protection that US authors do. I can easily sue a US citizen for infringing my copyright in a US court and would have the full scope of possible damages available; a US citizen would have to register their copyright to gain this privelege. It's a crazy system, I know, but that's copyright registration for you...
Yeah, just what we need to keep the spiral of information addiction we all have going.
Like the Wikipediaholic who reads articles to find answers to questions no one asked, we're all writhing addicts to information systems.
I would imagine a database of documents would be easier to translate than their physical counterparts.
And there were magazines that used to include either short stories or a chapter of a novel in the past (do they still?). My understand is that's how Charles Dickens did at least a few of his novels. But you bring up an interesting point. People might be willing to pay less for books if they included full-page advertisements.
Of course, that idea really cheapens my idea of books. I'm sure at least *some* people would agree with that. And why can't there be ad-ware books and non ad-ware books. It works for something like Opera. I certainly wouldn't mind paying a little extra to not get fed advertisements in my books. I certainly don't believe that that will kill all books because if nothing else the local shopkeep might have a machine to print your own ad free book after the short extent of copyright.
I'm hopeful, though, that electronic books develop far enough that I'll rarely have a need for real paper. I like a solid book from time to time, but having a single pad with a ton of books is more practical. And to that, I'll probably keep my analog copy of HHGttG. In between, maybe the whole "micropayment" plan will work. Then I can get lots of books for relatively cheap and stick them on my ePad (or whatever it ends up being called).
Eurohacker European paranoia, gun rights, and h
Gambling is no way to run an economy.
An Education is the Font of All Liberty
well, at least to those who can read English
So that leaves out most Americans. Thanks from the rest of the world!
(tongue firmly in cheek)
Screw you all! I'm off to the pub
it wasnt really such a problem in the end, since in those days they didnt have as much knowledge as we do now. i would say it is much more important to protect todays internet than books in those days. and definitely the renaissance stemmed from galileo because he broke the power of the church.
I live in Washington and often go to the LOC on Saturdays (it's closed Sundays) -- it has a large collection of books in lots of different languages -- even Esperanto and Volapuk!
WOW..from Amazon's open job page it seems most developer jobs are in Washington state. If we assume a salary of $100,000 per this results in Amazon spending 6.5 Billion dollars per year, just on developers.
Or maybe this is a typo.
Of course you could do some cost cutting and move it off-shore for less than $500,000,000.00/yr. Anyway, they seem to have a lot of openings.
-- www.globaltics.net
Political discussion for a new world
In this way a website can be set up to disseminate digital copies of out-of-print works, and taking down the thing should not cost much in case the thing become in-print again (a dead-tree version of this may be risky, since printing has some considerable up-front cost, but this has at least become legal, while it is probably illegal under current law), and much much fewer valuable works will get lost. As for the "authorized" publisher, if the sales have fallen so low that it has become unprofitable to keep the printer running, they should have got most of the profit they can ever gain anyway, and even if the work happens to become popular again they can still restart printing and gain most of the profit from the exclusive copyright.
I think that they are talking about developers who use Amazon's webservice.
you can do this at the french national library (see http://www.bnf.fr/pages/zNavigat/frame/accedocu.ht m, yes its in french)....
The endowment for the Bill Gates's philanthropic foundation is currently more than $20 billion. As of 2003, he had already donated more than $5 billion, mostly to global health organizations and education causes. He has been saying for years that he plans to give eventually 95% of his wealth away. While $5 billion (so far) in 2004 dollars is still less than Carnegie's lifetime philanthropy (about 1/2, after giving effect to inflation), I wouldn't call it a "far cry", as if $5 billion is pocket change and Gates doesn't still have many more years to donate his money.
Hang out the flags -- this is a brilliant project. It would be a huge benefit to society, even if works still in copyright were not made freely available. And the benefits are not limited to the English-speaking world, just as that world has benefited from plenty of material not written in English originally.
I hope the Library of Congress are already scanning all books where there is reason to suspect they have the only copy. There are plenty more where this is our best hope of being able to read the contents in a reaonable timeframe. There are millions of books which are not 'rare' but for which this will provide the most convenient form of access. I hope they work out that cooperation with Operation Gutenberg should multiply the good effects of both projects.
A good companion project would be a campaign to honor those authors who voluntarily put their works into the public domain before the last tendrils of copyright law relinquish their grip.
It is good that the UK's equivalent, the British Library is involved in a project which will preserve and copy endangered archives around the world. The budget is not on the same scale though! http://www.bl.uk/cgi-bin/press.cgi?story=1418 (and the story mentions that the charity part-funding the programme is also trying to preserve endangered languages).
No, we can't... it not be fair to lots of people whose copyrights haven't yet lapsed.
Let us scan only things for which the copyright has lapsed. This has several advantages.
"We reject as false the choice between our safety and our ideals." --The American President (20.1.2009)
What a cool idea and, even "if" the dollar estimate is too low, who cares? $260M is chump change for our gov't.
Right now, the only way to access the stuff in LoC is to go there in person. Anyone can do it but you have to travel to WashDC and pass through security and so forth to get into the LoC public reading room. Then you have to ask the librarian to pretty-please bring you the book that you want.
Now imagine that you can access any item in the LoC by simply entering the building and using a public kiosk with a browser. LoC's software would only permit use within the copyright so that is OK. But you don't have to mess with as much security because LoC isn't handing over the physical book.
Now imagine that, from any web browser, you can access any book in the LoC for which the copyright has expired. I like that idea!
My opinion... skip the buy on the next couple of cruise missiles and digitize LoC's books instead.
Oh yeah, before I forget, LoC already has tons of seriously neat stuff online. My favorite is this collection of tons photos from Russia. These were taken between about 1907 and 1915! I don't know about you, but I never dreamed that I would see color photos that are almost 100 years old.
Cheers,
-- Art Z.
Now imagine that, from any web browser, you can access any book in the LoC for which the copyright has expired. I like that idea!
That's the idea of Project Gutenberg. It's been around for quite some time now, and everybody is free to join their distributed proofreading network!
cpghost at Cordula's Web.
I walk to the librarian and pay the purchase price. She fires up a local print run on the library's new laser book printer
I keep seeing stuff about print-on-demand coming to bookstores some time in the not to distant future (but it never seems to get here). Using LC as a source for at least some books (public domain, out of print) would be a nice extension. UMI already prints copies of theses on demand-- if you want to order a copy of a PhD thesis, you give them a credit card number and they shoot you a freshly printed and perfect-bound copy of the thesis in the mail.
www.lib.ru is a project started as a hobby by Maxim Moshkow in 1994. Today it's a comprehensive online library containing over 20,000 text and 37,000 other files. Uzbek. (they did have some issues with copyrights!)
If you get an inheritance? You effectively do.
No, he doesn't. He's getting money earned by his grandfather before he died. Authors can save their earnings and pass that money on to their descendents just like the rest of us.
If a job's not worth doing, it's not worth doing right.
Once it's up, just put it online and let Google cache it!
According to the LOC website, they have 119 million items in the library.
...so I guess we assume the rest are books and newspapers.
They tell us that there are:
4.5 million maps.
14 million 'images'
So in round numbers, let's say there are 50 million books and 50 million newspapers, periodicals, comic books, etc.
$260 million to scan all that stuff? $2.60 per book or newspaper? That seems a little unlikely. The book would have to be carried off the shelf to the scanning machine, mounted in the machine (which would clearly have to turn the pages and scan and index them 100% automatically), the title and such would probably have to be typed in manually, then the book carried back to the shelf and placed back in the correct place.
I find it hard to believe that a machine for scanning newspapers could be devised that could turn the pages automatically...but even without that, the project is still possible. At minimum wage, you'd need to pay people to scan a complete newspaper in maybe 20 minutes.
Then some significant fraction of the collection would probably be too fragile for the automatic page turning machines...the cost of hand-scanning those would be FAR more than the bulk of the books. Some books would be *so* fragile and valuable that scanning them would be a considerable expense.
Then there is the cost of the storage media. Suppose those 100 million books and newspapers had just 100 pages each on average. To get a readable image of the page you're going to need to scan at maybe 2000 x 2000 resolution. So we'll have something like 10^16 pixels, let's be generous and allow 100:1 compression ratios - and one byte per pixel. So we have 1000 terabytes. That's a lot - but to put it in context, it's only about a fifth of the amount
that Google is estimated to have in their main cluster. Goggle spent $250 mil to buy that - so maybe only 20% of the LOC's budget needs to be for storage.
OCR'ing and indexing all that data would be an incredibly valuable thing - the extra storage is trivial and the cost can be low if you aren't in a hurry to get the project done. Just stick a few thousand PC's in a room and wait!
Dunno - $260 mil sounds like a low end estimate to me - but it seems do-able.
www.sjbaker.org
Bullshit. Thomas Jefferson lived to be 83. Average life expectancy was pulled drastically downward by infant and child mortality, but if you made it to age 21, you could expect to easily live into your 60's. Here's a quickie on the reasons why the "avg life expectancy was 25 in the stone age" theory is a load of crap based on statistical oversimplification.
If a job's not worth doing, it's not worth doing right.
Some wild assumptions flying around here. Even if the LOC could get funding for the project, and even if the publishers did not tie it up in the courts for decades, there is still the questionable assumption everyone seems to be making -- that LOC would make the digitized texts available free. There is no reason why they should be expected to do that. In fact, they would most likely have to charge for use of copyrighted materials. The fee would necessarily include some sort of negotiated reimbursement to the copyright owner/publisher. Otherwise, publishers would just stop contributing their books to the LOC.
But, there's an even bigger fantasy involved. Does anyone really think that the right-wing protectors of our morality who are running our government would stand by and allow a government agency like the LOC to spend tax dollars scanning gothic novels, adult literature, subversive tracts, revolutionary polemics, treatises on abortion rights, non-christian religious texts, pagan and satanic epistles, books critical of the administration, etc, and making them available to the country at large? The Shrub would burst into a Burning Bush instantaneously at the idea.
The only possibility is digitization of "suitable" and "defensible" public domain items, which is already under way in piecemeal fashion.
> If this is such a wonderful idea why doesn't he
> get a bunch of artists, musicians and writers to
> donate their own work to this project and actually > prove the concept works?
It's been done, and the idea worked -- the United States convinced a bunch of artists, musicians and writers to donate their own works to the public domain AFTER A PERIOD OF TIME, THE COPY-RIGHT period.
Well, perhaps not convinced -- but promised and protected the copy-right, so artists, musicians and writers were encouraged to work and publish in the US.
Of course during the same first century or so, the US completely ignored the copyrights of other countries -- arguing it's the right of a poor developing nation to take whatever it needs of the intellectual source material to assure its survival. Only the large, established nations honor copy-rights, historically speaking.
You'd transcribe the data into plain old ASCII, perhaps UTF-8 if you wanted to preserve the original characters. Maybe make a version available that's marked up in XML if you want computers to parse/reason about the data within. Cryptographically sign the data, so that people can verify that their copy they hasn't been modified by some prankster, and make it available for download!
No matter how self-descriptive you make the bitstream, how would you suggest to preserve it against a catastrophe at the Physical level?
[0] That's another really annoying thing. The word "blue" has an E!! Damned marketing departments...
The thinking is that misspellings create more distinctive trademarks, and governments deem more distinctive trademarks worthy of a larger scope of exclusive rights.
I'm a relatively succesful independent author. Part of my success is due to the fact that if you want my content, the easiest way to get it is by buying it from me. Take away my control over my content and I'll make less profit, and ultimately produce less content.
Let's see. You produce less content, which I don't read anyway, because I don't pay for books. Maybe now that the content you've already created is free, I can start reading it. Meanwhile, with your new free time, you get a job fixing potholes or curing cancer, or installing operating systems, or whatever else it is you can do.
Ya know what, sounds good to me.
Barnes and Noble has a right to make money without having to compete with government-subsidized pseudo-businesses.
Copyright itself is a government subsidy. Doesn't that make The Walt Disney Company into what you call a pseudo-business?
If you want to OWN the book, go to a bookstore.
And if neither Barnes & Noble nor BN.com carries the title I want, then what?
So how much space would that be, in LOCs?
Settling scores with Disney this way does evil to most other copyright holders... two wrongs don't make a right.
Even if suddenly putting things into the public domain isn't acceptable, wouldn't phasing out the scope of exclusive rights on a sliding scale over the term of a copyright limit the damage?
The Disney problem should be addressed separately, and through laws.
Except who both makes laws and is not part of the problem?
More specifically, this is probably developers who have signed up for their (EULA-laden) API.
Not very much incentive for us developers to "plunder information on [Amazon's site for our] own ends".
You've seen Them boast about creating JOBS when they're really just creating WORK. This project creates jobs and REDUCES work. Very nice.
A remarkably good proposal. =)
"Forgive us our trespasses, as we forgive those who trespass against us." -Jesus Christ The Lord's Prayer
Consider if your grandfather, for instance, had traded in corn contracts that matured in 70 years (not very likely, but just for example)
So in your analogy, where do you get the federal government repeatedly extending the term of futures from 50 to 70 to 90 years?
And then when Europe goes to life plus 90 in the 2010s, then what will happen? You get a leapfrogging effect.
In order to sue an alleged copyright infringer in the United States, you have to register the copyright in the work in question first, as part of the procedure for establishing evidence of copyright ownership. Thus, the LoC has a copy of every work whose copyright has been enforced.
Of course this instantly deteriorates into a discussion about the shameful state of IP and copyright laws, the need to pool all human knowledge, and how crappy the US budget deficit is.
If you go to the LOC's site, you'll notice American Memory on the front page.
American Memory is where you can get a good portion of the public domain stuff (books, letters from immigrants to their families back home, photos of civil war enlistees, audio, Edison-era short movies) for free in a low-quality format. Archival quality copies and custom scans/recordings are available for $$$. Almost any work in the LOC can be scanned on request (3 week waiting time or so); this is how they manage to continue adding scans to their collection without requiring public or private funding. It's underfunded as it is and needs more bandwidth.
This idiot in the article's proposal is completely unrealistic. Books can contain 100,000 to 5,000,000 characters. That's 100k-5Mb per book, times 26,000,000 books. That's not including the images and illustrations in some of these works. Many of the texts have value beyond the words they contain. We may be talking about image scanning the pages to preserve the look of the type, paper, and images. Archival TIFFs, since that's what the LOC uses.
The article also mentions $60 thousand to 'store' this data (per month?, per year?, just once???, what about access?, searching?, redundant backups?). Another unrealistic number, even working off of the 1TB estimate.
Death and danger are my various breads and various butters.
I'm not convinced that OCR quality is good enough today to store the books as ASCII text. You're going to be doing a lot of work making the scan
A lot of work by Distributed Proofreaders?
Yeah, just what we need to keep the spiral of information addiction we all have going.
We're all addicted to air, water, and food, but nobody complains that those addictions are always unhealthy.
(Frankly, I'm more concerned about my 18 month old cousin's addiction to Winnie the Pooh videos.)
Should we give a monopoly to one soft drink company? One car manufacturer? They all have to fight with each other and have complexity to gain money. Yet they all are still able to prosper. The free market gives a fair due by guaranteeing an optimal societal and supplier surplus. And the amount of money earn decides if such a company should exist or not. In the free market, the price is a combination of the supplier's willingness to sell and the consumer's valuation of the product. What is more fair than these two forces working together to set a price? To act like copyright is the only exception and should be elevated is your overvaluation of copyrighted works which merely overprices it for the rest of, decreasing consumer surplus. Unless you can actually *prove* that copyright is an exception and is worth more in external effects than the free market accounts for, I don't see why leaving copyrighted works companies/people to find a complex system to substain themselves is a bad thing.
Eurohacker European paranoia, gun rights, and h
The 3 petabyte one is way inflated. 2 petabytes of that is 3.5M sound recordings. Doing a little math, they're assuming that each of those 3.5 million recordings takes up 600MB. I guess they consider each sound recording to be a full CD. First, most audio CDs don't fill up the entire 600 MB. Second, most of those recordings probably aren't entire CDs. Third, you can compress audio very well. Even losslessly, you can compress AIFF files at least 2x.
I'd say the upper bound on the Library of Congress is about 1 petabyte.
Do this the cheap way.
Give infinite monkeys access to the internet, and allow them to type in documents.
Eventually, you'll have every item in the library of congress at your disposal, and searchable via pigeonrank.
Hazzah!
-- (appended to the end of comments you post, 120 chars)
To break it down more explicitly (I am agreeing with you but am warning that the devil may be in the details) you would need:
With enough (masssive) redundancy maybe a future Alexandria-style fire event can be avoided. It may be cheaper in the long run just to produce quality products with the redundancy built in to the technology, but the current distribution infrastructure seems to favor only the survival of commodity vendors.
Either way, it would be a great great boon to research to put so much wisdom at everyone's fingertips (now if only if we can get Congress itself to use it...)
Wouldn't one of the chief advantages of scanning it be OCR'ing it, and then being able to translate it using software assistance?
It sounds like anyone could benefit from this.
$260 million is $1 per US citizen. A bargain if ever there was one. I suspect that this estimate is extremely low.
The hard part is, of course, proofreading. See distributed proofreading at http://www.pgdp.net/c/default.php
Let's get started on the out-of-copyright stuff NOW. Maybe b the time is online, people will see the benefit of making everything available.
Thank You Kindly.
project gutenberg.
i like their audio books burn to cd and listen while i drive.
Is it true that more people vote for the winner of American Idol, than vote for the president? -Ali G.
Ever been an artist? The fact of the matter is that most artists don't get paid, or are very poorly paid, now. It sucks but IP laws haven't made many artists rich and haven't even made very many a decent living. The system is already disfunctional before you even start considering what technology is going to do with it.
The only thing we can do to make it better is convince people that if they like an artist and want them to keep producing then they need to make an effort to support that artist. It's a social issue and not a technical issue and not something that can easily be forced by passing laws. If you're an average person then pick out those artists you like and donate money to them. If your rich then sponsor artists to create new works of art. Possibly a not-for-profit organization that takes donations for artist in general and uses that to fund young artists would be a good idea. Both consumers and well off artists could donate towards sponsoring new generations of artists.
The same issue exists for programmers, whom I consider a type of artist, in that we often are not well paid for our work. Especially those of us that give away our code for the public good could use more support from our users. If you use an opensource program you should consider donating to the developers. If you see a developer that looks promising you should consider donating to them. Pick one project a month and donate $10 to them. If even a fraction of the users of opensource sftware did this then there would be much more, and higher quality, opensource code available.
I imagine the same idea works for supporting the providers of any free service. Websites, etc.
It's the honor system. You can copy anything you want but if you continue to use it then you should make a donation. Yes, you can be cheap and not make your donation but by doing so you're hurting yourself too. Give what you can afford and convince others to do likewise.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
IANAL, but if I recall correctly you must register copyright in the US to sue in the US. However if you have registered copyright in any other (BERN) country you can register in the US based on the date of that.
You automaticly have a copyright when you create something. If someone copies it you can sue them for damages, but you must register it with the copyright office first ($25 last I checked). If you register it before the violation, then you can sue for triple damages, even if you only registered in some other country. You still have to register the copyright in the US to sue, but having the copyright elsewhere counts as registering it before the violation.
Could someone explain what exactly is in this library of congress? Is it just a big library of stuff?
Based solely on the name, I would infer that it would contain alot of US historical documents, government stuff, and what not. In which case, scanning would NOT "benefit" society as a whole. Perhaps US society. Not the rest of the world. Nobody else would really care.
Recently someone mentioned to me that it is possible to put hunderds of books in a CD using DjVu. It looks like DjVu is the MP3 of books!
Take a look at http://www.djvuzone.org/
The curious thing is that there is great support for it under Linux and KDE in particular.
Woo hoo - I finally got my cherry popped a modded down as a troll. I wonder if the modder recognizes the irony of branding "troll" on a post that was branding the original post a troll...
Slashdot comments... splitting hairs since 1997.
I do similar work on military tech manuals, and believe me, they've way underestimated the labor part. They'll never make it, unless the entire city of Bangalore decides to go for 17 cents an hour, then maybe.
A couple of years ago the Harvard University Centre for Astronomy had one of it's collections of technical publications scanned in order to be put online. But to make the material actually usable they had to launch a program over the net for volunteers (predominantly amateur astronomers) to view the scanned pages and enter, by hand, the necessary bibligraphical information (authors, paper titles, etc), as well as to QC things (look for duplicated pages, missing pages, work out which of several scans of fold-out drawings is the best image, etc).
The scanning step was trivial (probably lots of bored students on minimum wages, getting brownie points from their professors); the INDEXING process has been going on for over 2 years now and is not yet finished.
NASA ADS at SAO: Historical scans currently in the ADS
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
He uploads it to the CIC database--the Library, formerly the Library of Congress, but no one calls it that anymore. Most people are not entirely clear on what the word "congress" means. And even the word "library" is getting hazy. It used to be a place full of books, mostly old ones. Then they began to include videotapes, records, and magazines. Then all of the information got converted into machine-readable form, which is to say, ones and zeroes. And as the number of media grew, the material became more up to date, and the methods for searching the Library became more and more sophisticated, it approached the point where there was no substantive difference between the Library of Congress and the Central Intelligence Agency. Fortuitously, this happened just as the government was falling apart anyway. So they merged and kicked out a big fat stock offering.
--Neal Stephenson, Snow Crash
Hey, you try to find an open nick these days!
While we are at it, let's scale back the copyright limits back to life of creator + 20 years (or even farther back as far as I'm concerned), and bring back more of the booty which the corporations have plundered from us, the public.
I hear that the LoC has one of the best Playboy collections in the world! This will put playboy.com right out of biz!
I've started a fund raising co-op at http://www.ideacradle.com/givesupport_virtual.php? currentIdea=11/
I concede this is a long shot but we will see.
I'm building co-operatives right now at http://www.ideacradl
Where I can appreciate some of what the Gates Foundation does- the majority of those three items (immunization, AIDS Research, and anti-poverty work) is far more about opening India and China as markets and sources of cheap labor, than it is about actual philanthropy. It's a clever thing to do with the foundation to look like Bill is helping people when he's really just building a bigger user base for Microsoft. But then again, Bill's object and purpose in life isn't to be a billionaire- and he's not going to be leaving his children with anything other than a legacy and maybe a $100,000 loan (or the equivalent in 2030 dollars to his 1970 dollars that started Microsoft) to start their own legacies- the money is beside the point for him. While I don't respect what he's done or his own technological ability- I do respect him for his REAL purpose behind Microsoft; a computer on every desk running an operating system that is as easy for the end user as a TV set. The billions? They're just what comes from realizing that dream in a totally unethical way.
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
Considering all things, I think the dollar amount is to low...way to low.
I do like the idea, just can get a very good visual of it running 2.5-3 times above the projected budget.