Proposal: Put Library of Congress' Contents Online
Mark_Uplanguage writes "The idea to scan in all materials available at the U.S. Library of Congress was presented at the Web 2.0 conference this week (as just one of many ideas presented). The proposed cost of $260 million would create a huge benefit to society (well, at least to those who can read English)."
Pardon me for sounding like an eegnoramoose, but isn't at least some of the material in the Library of Congress copyrighted material? Putting it all online would let people get copies of it for *gasp* FREE.
Can't have that, now can we?
This would violate the publishers' god-given right to milk their "creations" until the heat-death of the Universe.
and to those who can't, they can copy and paste the text into a translator.
So yes, it would benefit society as a whole.
Grump.
Is it true that more people vote for the winner of American Idol, than vote for the president? -Ali G.
wanted to do something really important and contributive, he would fund this.
a Library of Congress jokes will be on topic.
How data much storage would this require? Could someone give it to me in laymen's terms?
Since Congress and the President can so easily pull out a hundred billion dollars to bomb the hell out of another country, I see no reason we can't come up with a whimpy $260 million for something as worthwhile as this.
I want a new quote. One that won't spill. One that don't cost too much. Or come in a pill.
...just to put the whole dictionary on line. Figure it this way- you can use it to make all the other books out of it, and it gets around the whole copyright thingie, too (if you use a sufficiently old dictionary).
Someone, please.. how much is that in LOC?
Karma: -2147483648 (Mostly affected by integer overflow)
The government has proposed recently. I would also suggest that they put in place requirements that all future material that is to be copyrighted present appropriate copies in machine readable form so this will be cheaper in the future.
If you folks want out of state donations from non-taxpayers, I'll stump up happily from Canada!
well, at least to those who can read English
Correct me if I'm wrong, but doesn't the LOC contain all materials registered with the US copyright office? In which case it would have any foreign materials registered for copyright protection.
Javascript + Nintendo DSi = DSiCade
It would probably pay for itself too since FBI agents would no longer have to travel to libraries to secretly gather records of who borrowed what. They can just use Carnivore to do it instead.
Should work... its a lot of information though: about 1 library of congresses worth.
The Canadian gun registry cost is exceeding 2 billion dollars and climbing - 1.9 billion of which is probably wasted on corruption, but that 260 million sounds like a lowball.
Oh well, what the hell...
Finally, Slashdot can establish that for official purposes:
1 Library of Congress = $260M
And the 2004 US Federal budget can be spec'd at 0.000243754522 LoC:s (Libraries of Congress per second).
--
make install -not war
I'd like to see all available newspapers on line. Old Scientific Americans are great. The NYC Public Library has them. Interesting to follow the building of the elevated railroads and later the subways. I did that years ago.
OK, 260 millions US$ to scan... $60,000 of space (a terabyte) according to the article... put it online... BANDWITH costs estimates? Oops... forgot about that I guess!
"Brewster Kahle's idea is to scan as many books as possible and put them online so everyone has access to that huge amount of knowledge."
The plan IS to put it online, after all...
Eureka Science News - automatically updated
Since the Library of Congress contains mostly copyrighted data, and Amazon is already doing this for profit, this is really just a good way to market Amazon's A9 search engine and the products it sells.
--
make install -not war
Huzzah! I can finally calibrate my scale to Libraries of Congress and set up a conversion factor to Bytes.
How about the whole world who can find any online translation service that goes from English to Local Dialect.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
Soon well see the PIAA (Publishing Industry Association of America) forming up and talking about how it's ideas like this that cause them millions a year in lost revenue. And then start in on how libraries are nothing more than book pirate hang outs....
I bet there are solutions for this, such as making that proposed LOC website accessible from public terminals within LOC premises only. Just because it's on the Internet doesn't mean you have to provide access to anyone, does it?
At long last, we shall finally know just how much one unit of Libraries of Congress is. This could quite possibly have profound effects on how we understand the universe. For example, for many years we have known that the universe is approximately 42 Libraries of Congress. Now we can fully understand its meaning.
Putting the LoC on-line is only the first step. How long before those Internet book printing stations that can create an entire book for you from an electronic image in a deciminute for $1 tap into this? I'd have to think that this would be good for everyone except B&N who are busy reprinting old classics under their own label right now.
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
they can copy and paste the text into a translator.
So yes, it would benefit society as a whole.
Subjecting the world to the Babelfish translator would actually detract from knowledge considering the horrible linguistic bastardizations that people would then take as fact.
you've perused the Libray of Congress, but have you perused the Library of Congress Online
I would reserve that honor for Andrew Carnegie, who basically sold his empire for $485M and spent the rest of his life giving away all his money to good causes. Bill Gates is a far cry from that so far.
Just as a point of /. interest, what is the conversion factor between ACMs (Andrew Carnegie Millions) and BGBs (Bill Gates Billions)?
"It's the height of ridiculousness to say for those 9 lines you get hundreds of millions."
I will not allow no kanuck to use a resource that many generations of hard working americans have paid for.
YOU LOVE US OR HATE US MAKE UP YOUR MIND
In a traditional library it's not really easy to...
...all within 30 minutes.
1. walk in and pick up a book
2. strike the author's name from it and replace it with your own
3. replace the copyright notice with your own
4. Make one thousand perfect copies
5. Offer it for sale, start taking orders, and PROFIT!
I could easily do that on the internet.
This is a dumb idea. I'm happy to have copies of my books in libraries (I even donate copies to libraries). One book equals one reader at a time. A library has the same rights as you or I have to lend out *the* copy they own - not to reproduce it en masse. Having all my stuff put online (against my will) would damage the value of my content.
I hear lots of arguments from people who don't create content on why it should be free. The only content producers who seem to think it should be free seem to be ones who produce content with limited commercial value.
I'm a relatively succesful independent author. Part of my success is due to the fact that if you want my content, the easiest way to get it is by buying it from me. Take away my control over my content and I'll make less profit, and ultimately produce less content. Read Atlas Shrugged to see what that leads to...
Right now, Internet2 can download the entire Library of Congress in about 20 seconds.
I'm not aware of any PIAA for publishers, but somebody is going to have a problem with this. And by the time this actually happens, I bet there will be an Internet4 that can do it all in 20ms.
Punctanym: alternate spelling of words using punctuation or numerals in place of some or all of its letters; see 'leet'
"Soon well see the PIAA (Publishing Industry Association of America) forming up and talking about how it's ideas like this that cause them millions a year in lost revenue. And then start in on how libraries are nothing more than book pirate hang outs...."
*sigh*
Welcome to slashdot, who's busily outfoxing FOX.
The Publishers vs the Libraries territory has already been raked over. Guess who won?
Anyway as someone else suggested they don't have to put it on the Internet, but an Intranet would work.
Just like you have to go to the physical library to access certain resources.
Otherwise how will they stop me from getting every book that I want to have? The only possible way would be for publishers to not send data to the LoC.
If this is such a wonderful idea why doesn't he get a bunch of artists, musicians and writers to donate their own work to this project and actually prove the concept works?
I'm tired of all the rhetoric about business models failing and how the web is going to transform the way society learns, works, and entertains themselves. The dotcom era should have taught these so called visionaries one thing, you actually have to have a business plan before you can transform business models.
If these business models are so full of potential he should start one, with his own intellectual property, and prove that the old economy intellectual property businesses they are extinct. If his ideas work then the dinosaurs of the MPAA and RIAA will either have to adapt to the new economy or die. Forcing them to risk their entire business on a gamble like this is wrong from any perspective.
The article claims that the LOC stored as image data would take up 1 TB.
That's wildly underestimated IMO. The LOC has 26 million books. If we conservatively assume that they each have at least 100 pages, that is 2.6 billion images. That equals 0.03 kb per image. That's some REAL good compression for an image as large as a full page of text.
and to those who can't, they can copy and paste the text into a translator.
With that who can't, they can reproduce and stick inside their language teacher.
D6 63 0D 70 89 81 BB 8E 7B 7C 5F 5D 54 EA AB 73
All authors seeking American copyright had to submit two copies of the work to the Library.
Authors in foreign countries don't copyright their works in America -- they copyright their books in their country, and the copyright is automatically valid in (nearly) every other country thanks to the Berne Convention.
Die of shame now, dumbass.
"This is one more reason that the whole basis behind IP law needs to be reevaluated. Although we do want authors, inventors, and other creative types to be rewarded for their efforts, it is also true that what they create becomes more valuable the more it gets out into the world."
Free the cars. Just look at how many people have seen a car? Been inspired by a car? From how to's to I love my car poems.
"Creating primarily for money is shortsighted when a work has the chance to impact the larger culture. "
Copyright really isn't about one's motivation for why one creates. Copyright is about what's premissible with copies (voluntary or involuntary). i.e. someone swiping the original from me and making copies, as well as when I voluntarily make copies.
I said could understand - not necessarily write.
DAMN YOU WORD PROCESSING FOR MAKING IT SO EASY TO REVISE A STATEMENT!
Slashdot comments... splitting hairs since 1997.
"My grandpa was a farmer who died over 50 years ago. Since I don't get to collect royalties on the corn he grew in the 1930s, I've had to work to produce my own income. Imagine that."
If you get an inheritance? You effectively do.
Oh yeah, put 'em all online. I have a hard enough time already in libraries and book stores! If I could read any book I wanted to (even if they're only the ones already out of copyright) online, I'd probably not leave my computer until I passed out!!
Keeping track of millions of guns across the country is a very different proposition from just sticking however many books in an auto-scanner. The registry requires an entire force of people to talk to the owners, enter their data, check the weapons themselves.... don't they even have to physically store two test bullets from each gun? Then there's getting the data to the police all over the country, which required a far more intense data network to be put in place.
The gun registry is much more like setting UP a library of congress. Also, sadly, canadian governemnt programs usually include funding for producing television programming telling people how to use the service. Don't know about this one.
The books at the LoC are already in one place, so that's easy. They're already catalogued, so that's easy. There's already a staff of librarians going through them all the time for research purposes, so you've got access.
The biggest part of the task is just moving the books to the auto-scanners and then back. Some of it is being careful to preserve the original as its done, but not much... most of the books are are excellent condition right now. It's a very good library.
An auto-scanner is a a robotic arm that turns the pages with puffs of air and takes a photo of each. Such a machine can scan a book in almost no time.
There is of course the programming effort and such... that's why it's $260 million. Just scanning the books would probably cost a tenth of that.
Again I stress: There's no reason to suspect they're lowballing because there's already a whole organization there devoted to doing things with those books. They're old hands at this. They know how hard it will be.
This will be great! You know all those ads that claim such and such can transmit the Library of Congress in so and so seconds?
Now we'll be able to test their notions!
"The cost to duplicate a book? If digitized, less than a dollar for the time and the media. "
Since you brought it up. How much did it cost to actually bring that book, movie, or music into existance?
How do you also keep the cost of copies reasonable, while not leaving any unpaid debts behind?
How do you have enough left over for future endeavours?
How much is that in terms of Space Shuttle Fuel Tanks? What about in Weapons of Mass Destruction?
Isn't the size of the Library of Congress what people used to use as a quantifier for the speed of high-bandwidth connections? I remember several years ago that companies would brag that they can transfer the entire Library of Congress to England or wherever in less than 2 seconds and what have you. I suppose a statement like that would indicate that there are already digital versions of the Library of Congress out there somewhere meaning it will take virtually nothing dollar-wise to put it online (since I guess it's been flowing back and forth for years).
I didn't really read TFA, but how do they propose to actually do the scanning? There seem to be a lot of books in there, is there some sort of book-scanning machine that I've never heard of?
Doing all of this by hand would be insane, even if it's by a large team of volunteers. Maybe 1000 monkeys on 1000 scanners copying all the books will get it done in a few months?
"Just as a point of /. interest, what is the conversion factor between ACMs (Andrew Carnegie Millions) and BGBs (Bill Gates Billions)? "
t p://www.westegg.com/inflation/
http://www.jsc.nasa.gov/bu2/inflateCPI.html
ht
From what I could find, Carnegi had donated $450M from 1898 to 1911. That would be around $11Billion today.
Maybe im the odd duck here but somehow waay back in early net days..the 90's i thought that this was such an obvious application of internet technology that it must be part of the original design purposes for the internet (darpanet and all that funding of course)
So the only surprise to me is that were just now hearing a proposal to do this??? sheesh, if i hadnt thought it so completely obvious to every netizen at those old public library terminals i wouda lost so much seep making it happen!!!
so now who's going to do it? and while its limboing through congress can we just put together a consortium to visit thie library we aready own with our digital camera's and OCR the thing into existence... how many of us woud need to donate our gmail 1g accounts to store it all?
*smile* I'm going to be busy tonight. I only have 10 posting slots you know.
"Sorry for responding to myself, but I just came up with a more valid point to why copyright in general is a bad idea."
"Copyright has basically turned into charging relatively lots of money for the embodiment of an idea "
And knives have turned into tools to kill people. but no one would argue that the abusing of knives is a reason to eliminate them. So why is the abusing of copyright seen differently?
"Why? To make more money because there's an inherent small mark-up available when companies turn a good into a commodity. But, copyright is different and can enjoy gross mark-ups by comparison."
I would argue that the creation costs of copyrighted "goods" are more hidden (either out of ignorance, or simple apathy) than the creation costs of physical goods.
The main difference between the two is duplication costs, but economics dictates that both costs have to be made up.
"But you can't trade ideas or the embodiment of ideas as some sort of actually stable basis for an economy. "
Remember talk of the knowledge economy? And yes as long as certain things are true, we'll need physical goods.
"And what else does the US have to trade to other countries with except maybe agricultural goods (some of which most of Europe now has banned)?"
You're short sheeting the US. Look more carefully.
Authors in foreign countries don't "seek US copyright protection", because global copyright protection is guaranteed by the Berne Convention as soon as a work is copyrighted in any country.
And FYI, non-English works account for less than 2% of the total volume of the Library of Congress so the "well, at least to those who can read English" comment was entirely appropriate, unlike your delusional and paranoid rantings and ravings.
I just downloaded the LoC.ps.tgz from the local WPI Internet2 tap using gnutella and my printer just ran out of ink....
I'm hacking the planet!
...that to OpenOffice.org Text Format...much more compressed, and it natively uses XML.
There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
"You must mean currently. But we all know that as soon as anything major (like Steamboat Willy) comes close to coming out of copyright, we'll see Congress extend the term of copyright yet again, thanks to 'encouragement' from Disney."
So our intrepid hero is faced with two doors.
Behind door number one our brave hero squares off against those who would try to stiffle the public domain. Rallying legions to his cause. Pretty maidens throw rose petals at his feet.
Behind door number two our simpering hero goes on a popular geek site known as slashdot, and bemoans at how weak he is against the MIGHTY *rolling echo* business, and congressional interests, and everything is business as usual. Legions throw rotten tomatoes at our fallen hero.
Wich will he pick? Place your bets.
"Copyright terms are nigh on infinite in fact, if not in law."
Actual examples will do just fine.
There's a Vulcan saying: "The needs of the many outweigh the needs of the few."
I would say, scupper copyrights for all volumes owned by LoC.Scan and put every volume on the internet.
Within few years we would witness a Renaissance of sorts once again in human knowledge and education.
"Doing what i can, with what i have." ~ Burt Gummer
is if it is a billion dollar award to haliburton.
"I'm a software developer who loves movies: I'm a creator and a consumer, so I see both sides of this coin. And I think there needs to be a compromise between consumers and creators."
Well this is going to be my last post before turning in.
There already is a compromise, and has been for decades (copyright).
However what has brought us to our present state is individuals on BOTH sides (consumer and business) that have chosen to not honor this old agreement.
There is also innocents (much like Iraqi's caught between Americans and Insurgents).
The only way out is for the innocents to recognize that neither side represents their interests (extremists never do), and to drive them out. Going back to what originally worked.
Compared to the $200 billion to kill and maim tens (hundreds?) of thousands of people in the name of "terrorism", $260 million to create essentially, a Library of Alexandria is a fucking bargain.
I don't respond to AC's.
Not only the Library of Congress of the Unites States of America, we should also scan every big library in the world to create a pool of human work to freely share and preserve.
What's in a sig?
I want to know why the hell it would cost so much. I realize it's an incredible amount of material, but $260 million!? I'll do it for a cool $1 million no questions asked. It's not like it's hard or requires much technical expertise. Just lots of monotonous labor.
^^vv<><>BA
now, let's just take this a little over the edge and see what happens.
First, we need to get the library of congress online
Next, we will merge it with archive.org and other relevent places
Then we buy out google and add that to it also
Now that we controle all of this data, we set it up so that when you use information from it, you automaticly pay for it
o, dont forget to allow people to be able to upload stuff to it and get paid if it is looked at
next, let corporations controle the government
o, wait, we allready do
allright, well, we will just let the corporations arm their employes
and become indipendent states
now, all you have to do is make the mafia deliver pizza's
want to try some Snow Crash?
what makes you think all the contents of the library of congress are in english?
i am convinced that "/.ers" are homosexuals and imma make that my "sig"
Some of the contents are written in American "English".
If every person goes into the Library of Congress, borrows a book and scans it, the job is done! :-)
Only five? Would that mean that Klingon and 1337 are included? Would the books have to be written in that annoying overly abbreviated dialect that some AOL users speak? Sure, it is faster to write, but it takes for ever to make the ascii art with the tilde and degree marks.
Someone googles for a paragraph from your book, and comes up with two different results...
"There is more worth loving than we have strength to love." - Brian Jay Stanley
As an author, I wonder how much of your valued craft was honed by reading the work of others for education and inspiration. How many books did you buy in elementary school, or high school? Yet that's where you learned your precious language skills you now market.
Knowledge, even the limited knowledge of an author, does not exist in a vacuum. You read, you learn, you practice, then you create. You could not have done this without the beneficence of others who aren't making a dime off the education they provided you.
To unleash the vast amounts of knowledge stored up in the LOC to the world would be one of the single best things this country could do for mankind. One book, one reader my hairy ass. Why not open the floodgates so everyone can benefit?
I understand the motivation of monetary incentives, but I also know a lot of great authors who died penniless. And they were at least brave enough to sign their names to their ideas.
If we are talking about text documents in the LOC, then they should be scanned as plain text (or Rich Text at the very most) to at least preserve the contents in a format that is pretty much basically standard to every computer for at least the last 30 years (not counting unicode).
Of course, I'm sure the Copyright Police would have something to say about preserving the LOC in such an open format...
"Empathise with stupidity, and you're halfway to thinking like an idiot." - Iain M. Banks
Yeah, just what we need to keep the spiral of information addiction we all have going.
Like the Wikipediaholic who reads articles to find answers to questions no one asked, we're all writhing addicts to information systems.
I would imagine a database of documents would be easier to translate than their physical counterparts.
As can be seen here at the bottom of page 1: http://www.loc.gov/fsd/fin/pdfs/fy03.pdf
the library part of the LOC costs $ 353 million annually.
So in two years max, the proposed operation could be done budget-neutral.
Anybody want to buy some spacey, sturdy storage room?
Hey, doesn't the LOC have foreign languages?? The Chicago public library sure does, so I would expect LOC to have some too.
They even have a copy of the Gutenberg Bible, don't they? It wasn't in English. Which brings me to...
Project Gutenberg! This LOC project would be wholly redundant to Project Gutenberg's work, and might save time by cooperating with them.
well, at least to those who can read English
So that leaves out most Americans. Thanks from the rest of the world!
(tongue firmly in cheek)
Screw you all! I'm off to the pub
it wasnt really such a problem in the end, since in those days they didnt have as much knowledge as we do now. i would say it is much more important to protect todays internet than books in those days. and definitely the renaissance stemmed from galileo because he broke the power of the church.
I live in Washington and often go to the LOC on Saturdays (it's closed Sundays) -- it has a large collection of books in lots of different languages -- even Esperanto and Volapuk!
WOW..from Amazon's open job page it seems most developer jobs are in Washington state. If we assume a salary of $100,000 per this results in Amazon spending 6.5 Billion dollars per year, just on developers.
Or maybe this is a typo.
Of course you could do some cost cutting and move it off-shore for less than $500,000,000.00/yr. Anyway, they seem to have a lot of openings.
-- www.globaltics.net
Political discussion for a new world
In this way a website can be set up to disseminate digital copies of out-of-print works, and taking down the thing should not cost much in case the thing become in-print again (a dead-tree version of this may be risky, since printing has some considerable up-front cost, but this has at least become legal, while it is probably illegal under current law), and much much fewer valuable works will get lost. As for the "authorized" publisher, if the sales have fallen so low that it has become unprofitable to keep the printer running, they should have got most of the profit they can ever gain anyway, and even if the work happens to become popular again they can still restart printing and gain most of the profit from the exclusive copyright.
I think that they are talking about developers who use Amazon's webservice.
you can do this at the french national library (see http://www.bnf.fr/pages/zNavigat/frame/accedocu.ht m, yes its in french)....
The endowment for the Bill Gates's philanthropic foundation is currently more than $20 billion. As of 2003, he had already donated more than $5 billion, mostly to global health organizations and education causes. He has been saying for years that he plans to give eventually 95% of his wealth away. While $5 billion (so far) in 2004 dollars is still less than Carnegie's lifetime philanthropy (about 1/2, after giving effect to inflation), I wouldn't call it a "far cry", as if $5 billion is pocket change and Gates doesn't still have many more years to donate his money.
Hang out the flags -- this is a brilliant project. It would be a huge benefit to society, even if works still in copyright were not made freely available. And the benefits are not limited to the English-speaking world, just as that world has benefited from plenty of material not written in English originally.
I hope the Library of Congress are already scanning all books where there is reason to suspect they have the only copy. There are plenty more where this is our best hope of being able to read the contents in a reaonable timeframe. There are millions of books which are not 'rare' but for which this will provide the most convenient form of access. I hope they work out that cooperation with Operation Gutenberg should multiply the good effects of both projects.
A good companion project would be a campaign to honor those authors who voluntarily put their works into the public domain before the last tendrils of copyright law relinquish their grip.
It is good that the UK's equivalent, the British Library is involved in a project which will preserve and copy endangered archives around the world. The budget is not on the same scale though! http://www.bl.uk/cgi-bin/press.cgi?story=1418 (and the story mentions that the charity part-funding the programme is also trying to preserve endangered languages).
No, we can't... it not be fair to lots of people whose copyrights haven't yet lapsed.
Let us scan only things for which the copyright has lapsed. This has several advantages.
"We reject as false the choice between our safety and our ideals." --The American President (20.1.2009)
The French National library has been scanning its archives (only books whose copyright has expired) and now has 70,000 books and 80,000 images scanned and available in PDF and TIFF formats for free download. The homepage for their "Biblioteque Numerique" is http://gallica.bnf.fr/
Some Jules Verne anyone?
What a cool idea and, even "if" the dollar estimate is too low, who cares? $260M is chump change for our gov't.
Right now, the only way to access the stuff in LoC is to go there in person. Anyone can do it but you have to travel to WashDC and pass through security and so forth to get into the LoC public reading room. Then you have to ask the librarian to pretty-please bring you the book that you want.
Now imagine that you can access any item in the LoC by simply entering the building and using a public kiosk with a browser. LoC's software would only permit use within the copyright so that is OK. But you don't have to mess with as much security because LoC isn't handing over the physical book.
Now imagine that, from any web browser, you can access any book in the LoC for which the copyright has expired. I like that idea!
My opinion... skip the buy on the next couple of cruise missiles and digitize LoC's books instead.
Oh yeah, before I forget, LoC already has tons of seriously neat stuff online. My favorite is this collection of tons photos from Russia. These were taken between about 1907 and 1915! I don't know about you, but I never dreamed that I would see color photos that are almost 100 years old.
Cheers,
-- Art Z.
Now imagine that, from any web browser, you can access any book in the LoC for which the copyright has expired. I like that idea!
That's the idea of Project Gutenberg. It's been around for quite some time now, and everybody is free to join their distributed proofreading network!
cpghost at Cordula's Web.
I walk to the librarian and pay the purchase price. She fires up a local print run on the library's new laser book printer
I keep seeing stuff about print-on-demand coming to bookstores some time in the not to distant future (but it never seems to get here). Using LC as a source for at least some books (public domain, out of print) would be a nice extension. UMI already prints copies of theses on demand-- if you want to order a copy of a PhD thesis, you give them a credit card number and they shoot you a freshly printed and perfect-bound copy of the thesis in the mail.
"Side note: this was back when copyright lasted 14 years, and *could* be extended another 14. But that was it. None of this "milking the work of your great-great-grandfather nonsense.""
Another side note: The life expectency was lower back then. 28 was effectively one's life.
I used to work in the music department of the Library of Congress, and there was already scanning going on, in fact there was about three terabytes of books all ready to go up on the web and pictures of a collection of over a thousand flutes that is kept at the library were also being prepared to go up. They also already have a large amount of audio files up for grabs. http://memory.loc.gov/ammem/browse/ListSome.php?ca tegory=Performing%20Arts,%20Music
As well, on the topic of how many books actually get in to the LOC, the copyright office is in the same building as one of the three LOC buildings, so it's pretty easy for stuff to make its way there.
-Julius
As usual technology isn't the problem (Echelon), but human nature. Would we really be having this long drawn out discussion if everyone played by the rules?
"The people that will be able to figure out what the _real_ answers are to these issues are the ones that will do really well. Think about it. "
How to bend human nature away from our baser side. Truly a herculean task. Look at how well we've done elsewere (drug war).
www.lib.ru is a project started as a hobby by Maxim Moshkow in 1994. Today it's a comprehensive online library containing over 20,000 text and 37,000 other files. Uzbek. (they did have some issues with copyrights!)
Once it's up, just put it online and let Google cache it!
Compare the complexities of all the schemes to avoid paying one's fare due (ad this, no-ad that, e-paper, etc). Verses the simplicity of the present system that works.
According to the LOC website, they have 119 million items in the library.
...so I guess we assume the rest are books and newspapers.
They tell us that there are:
4.5 million maps.
14 million 'images'
So in round numbers, let's say there are 50 million books and 50 million newspapers, periodicals, comic books, etc.
$260 million to scan all that stuff? $2.60 per book or newspaper? That seems a little unlikely. The book would have to be carried off the shelf to the scanning machine, mounted in the machine (which would clearly have to turn the pages and scan and index them 100% automatically), the title and such would probably have to be typed in manually, then the book carried back to the shelf and placed back in the correct place.
I find it hard to believe that a machine for scanning newspapers could be devised that could turn the pages automatically...but even without that, the project is still possible. At minimum wage, you'd need to pay people to scan a complete newspaper in maybe 20 minutes.
Then some significant fraction of the collection would probably be too fragile for the automatic page turning machines...the cost of hand-scanning those would be FAR more than the bulk of the books. Some books would be *so* fragile and valuable that scanning them would be a considerable expense.
Then there is the cost of the storage media. Suppose those 100 million books and newspapers had just 100 pages each on average. To get a readable image of the page you're going to need to scan at maybe 2000 x 2000 resolution. So we'll have something like 10^16 pixels, let's be generous and allow 100:1 compression ratios - and one byte per pixel. So we have 1000 terabytes. That's a lot - but to put it in context, it's only about a fifth of the amount
that Google is estimated to have in their main cluster. Goggle spent $250 mil to buy that - so maybe only 20% of the LOC's budget needs to be for storage.
OCR'ing and indexing all that data would be an incredibly valuable thing - the extra storage is trivial and the cost can be low if you aren't in a hurry to get the project done. Just stick a few thousand PC's in a room and wait!
Dunno - $260 mil sounds like a low end estimate to me - but it seems do-able.
www.sjbaker.org
Some wild assumptions flying around here. Even if the LOC could get funding for the project, and even if the publishers did not tie it up in the courts for decades, there is still the questionable assumption everyone seems to be making -- that LOC would make the digitized texts available free. There is no reason why they should be expected to do that. In fact, they would most likely have to charge for use of copyrighted materials. The fee would necessarily include some sort of negotiated reimbursement to the copyright owner/publisher. Otherwise, publishers would just stop contributing their books to the LOC.
But, there's an even bigger fantasy involved. Does anyone really think that the right-wing protectors of our morality who are running our government would stand by and allow a government agency like the LOC to spend tax dollars scanning gothic novels, adult literature, subversive tracts, revolutionary polemics, treatises on abortion rights, non-christian religious texts, pagan and satanic epistles, books critical of the administration, etc, and making them available to the country at large? The Shrub would burst into a Burning Bush instantaneously at the idea.
The only possibility is digitization of "suitable" and "defensible" public domain items, which is already under way in piecemeal fashion.
What most people don't know is that other countries send their material here for safekeeping.
Say what one will, the US is relatively stable compared to some other places.
> If this is such a wonderful idea why doesn't he
> get a bunch of artists, musicians and writers to
> donate their own work to this project and actually > prove the concept works?
It's been done, and the idea worked -- the United States convinced a bunch of artists, musicians and writers to donate their own works to the public domain AFTER A PERIOD OF TIME, THE COPY-RIGHT period.
Well, perhaps not convinced -- but promised and protected the copy-right, so artists, musicians and writers were encouraged to work and publish in the US.
Of course during the same first century or so, the US completely ignored the copyrights of other countries -- arguing it's the right of a poor developing nation to take whatever it needs of the intellectual source material to assure its survival. Only the large, established nations honor copy-rights, historically speaking.
*sigh* I posted before I saw this...
You'd transcribe the data into plain old ASCII, perhaps UTF-8 if you wanted to preserve the original characters. Maybe make a version available that's marked up in XML if you want computers to parse/reason about the data within. Cryptographically sign the data, so that people can verify that their copy they hasn't been modified by some prankster, and make it available for download!
No matter how self-descriptive you make the bitstream, how would you suggest to preserve it against a catastrophe at the Physical level?
[0] That's another really annoying thing. The word "blue" has an E!! Damned marketing departments...
The thinking is that misspellings create more distinctive trademarks, and governments deem more distinctive trademarks worthy of a larger scope of exclusive rights.
Barnes and Noble has a right to make money without having to compete with government-subsidized pseudo-businesses.
Copyright itself is a government subsidy. Doesn't that make The Walt Disney Company into what you call a pseudo-business?
If you want to OWN the book, go to a bookstore.
And if neither Barnes & Noble nor BN.com carries the title I want, then what?
So how much space would that be, in LOCs?
Settling scores with Disney this way does evil to most other copyright holders... two wrongs don't make a right.
Even if suddenly putting things into the public domain isn't acceptable, wouldn't phasing out the scope of exclusive rights on a sliding scale over the term of a copyright limit the damage?
The Disney problem should be addressed separately, and through laws.
Except who both makes laws and is not part of the problem?
More specifically, this is probably developers who have signed up for their (EULA-laden) API.
Not very much incentive for us developers to "plunder information on [Amazon's site for our] own ends".
You've seen Them boast about creating JOBS when they're really just creating WORK. This project creates jobs and REDUCES work. Very nice.
A remarkably good proposal. =)
"Forgive us our trespasses, as we forgive those who trespass against us." -Jesus Christ The Lord's Prayer
And then when Europe goes to life plus 90 in the 2010s, then what will happen? You get a leapfrogging effect.
In order to sue an alleged copyright infringer in the United States, you have to register the copyright in the work in question first, as part of the procedure for establishing evidence of copyright ownership. Thus, the LoC has a copy of every work whose copyright has been enforced.
Of course this instantly deteriorates into a discussion about the shameful state of IP and copyright laws, the need to pool all human knowledge, and how crappy the US budget deficit is.
If you go to the LOC's site, you'll notice American Memory on the front page.
American Memory is where you can get a good portion of the public domain stuff (books, letters from immigrants to their families back home, photos of civil war enlistees, audio, Edison-era short movies) for free in a low-quality format. Archival quality copies and custom scans/recordings are available for $$$. Almost any work in the LOC can be scanned on request (3 week waiting time or so); this is how they manage to continue adding scans to their collection without requiring public or private funding. It's underfunded as it is and needs more bandwidth.
This idiot in the article's proposal is completely unrealistic. Books can contain 100,000 to 5,000,000 characters. That's 100k-5Mb per book, times 26,000,000 books. That's not including the images and illustrations in some of these works. Many of the texts have value beyond the words they contain. We may be talking about image scanning the pages to preserve the look of the type, paper, and images. Archival TIFFs, since that's what the LOC uses.
The article also mentions $60 thousand to 'store' this data (per month?, per year?, just once???, what about access?, searching?, redundant backups?). Another unrealistic number, even working off of the 1TB estimate.
Death and danger are my various breads and various butters.
I'm not convinced that OCR quality is good enough today to store the books as ASCII text. You're going to be doing a lot of work making the scan
A lot of work by Distributed Proofreaders?
Yeah, just what we need to keep the spiral of information addiction we all have going.
We're all addicted to air, water, and food, but nobody complains that those addictions are always unhealthy.
(Frankly, I'm more concerned about my 18 month old cousin's addiction to Winnie the Pooh videos.)
The 3 petabyte one is way inflated. 2 petabytes of that is 3.5M sound recordings. Doing a little math, they're assuming that each of those 3.5 million recordings takes up 600MB. I guess they consider each sound recording to be a full CD. First, most audio CDs don't fill up the entire 600 MB. Second, most of those recordings probably aren't entire CDs. Third, you can compress audio very well. Even losslessly, you can compress AIFF files at least 2x.
I'd say the upper bound on the Library of Congress is about 1 petabyte.
Do this the cheap way.
Give infinite monkeys access to the internet, and allow them to type in documents.
Eventually, you'll have every item in the library of congress at your disposal, and searchable via pigeonrank.
Hazzah!
-- (appended to the end of comments you post, 120 chars)
To break it down more explicitly (I am agreeing with you but am warning that the devil may be in the details) you would need:
With enough (masssive) redundancy maybe a future Alexandria-style fire event can be avoided. It may be cheaper in the long run just to produce quality products with the redundancy built in to the technology, but the current distribution infrastructure seems to favor only the survival of commodity vendors.
Either way, it would be a great great boon to research to put so much wisdom at everyone's fingertips (now if only if we can get Congress itself to use it...)
Wouldn't one of the chief advantages of scanning it be OCR'ing it, and then being able to translate it using software assistance?
It sounds like anyone could benefit from this.
$260 million is $1 per US citizen. A bargain if ever there was one. I suspect that this estimate is extremely low.
The hard part is, of course, proofreading. See distributed proofreading at http://www.pgdp.net/c/default.php
Let's get started on the out-of-copyright stuff NOW. Maybe b the time is online, people will see the benefit of making everything available.
Thank You Kindly.
project gutenberg.
i like their audio books burn to cd and listen while i drive.
Is it true that more people vote for the winner of American Idol, than vote for the president? -Ali G.
The copyright/IP issues with this are huge and significant. It is simply a bad idea to unleash all of this material without addressing this issue. That said I think I have a good idea for dealing with this problem and addressing another major issue as well, the deteriorating state of public libraries.
First, we spend whatever it takes to get all this information online. Then we make it accessible ONLY at libraries via a secure library network. You must visit a terminal at a public library (and have a library card) in order to access books over the LOC web. This way the information is made widely available but the number of people accessing is still limited. Perhaps 50k people can be using the network at any time, depending on the number of access points in local libraries throughout the country.
This IMMEDIATELY expands the value of small branch libraries throughout the country. They can all continue to operate much as they do now with physical books while simultaneously offering this great new service. It will dramatically increase support for libraries and also control the information.
It may even be possible to create a check-out policy. You select an LOC text and a disc is burned with the data for you to take home that must be returned to the library much as other things are. I know the DRM will be cracked quickly but the fact remains that THE VAST MAJORITY prefer a book to reading on a screen.
Until there is the ability to mass produce pirated versions that RESEMBLE THE ORIGINAL book, like happens with cd's and dvd's, piracy on a mass scale will not be a serious problem. Certainly not much greater than it is now.
Anyway, the idea here is to use the network of local libraries to control and distribute this information.
All of this of course requires repealing and SPECIFICALLY PROHIBITING the fbi from snooping in library records.
Congress doesnt need more warehouse to store more books. Everything can be scan in, and store on DVD/hard drive and can be online for everyone to search. The best thing - library doesnt need expensive air conditioning and humidifier to control book age. It may cost $260 millions, but save billions in other cost.
Ever been an artist? The fact of the matter is that most artists don't get paid, or are very poorly paid, now. It sucks but IP laws haven't made many artists rich and haven't even made very many a decent living. The system is already disfunctional before you even start considering what technology is going to do with it.
The only thing we can do to make it better is convince people that if they like an artist and want them to keep producing then they need to make an effort to support that artist. It's a social issue and not a technical issue and not something that can easily be forced by passing laws. If you're an average person then pick out those artists you like and donate money to them. If your rich then sponsor artists to create new works of art. Possibly a not-for-profit organization that takes donations for artist in general and uses that to fund young artists would be a good idea. Both consumers and well off artists could donate towards sponsoring new generations of artists.
The same issue exists for programmers, whom I consider a type of artist, in that we often are not well paid for our work. Especially those of us that give away our code for the public good could use more support from our users. If you use an opensource program you should consider donating to the developers. If you see a developer that looks promising you should consider donating to them. Pick one project a month and donate $10 to them. If even a fraction of the users of opensource sftware did this then there would be much more, and higher quality, opensource code available.
I imagine the same idea works for supporting the providers of any free service. Websites, etc.
It's the honor system. You can copy anything you want but if you continue to use it then you should make a donation. Yes, you can be cheap and not make your donation but by doing so you're hurting yourself too. Give what you can afford and convince others to do likewise.
At what price learning? At what cost wisdom? The price is a man's peace of mind, and the cost is his life.
There r some people on the board who seem to remember what a copyright and English r!!!
IANAL, but if I recall correctly you must register copyright in the US to sue in the US. However if you have registered copyright in any other (BERN) country you can register in the US based on the date of that.
You automaticly have a copyright when you create something. If someone copies it you can sue them for damages, but you must register it with the copyright office first ($25 last I checked). If you register it before the violation, then you can sue for triple damages, even if you only registered in some other country. You still have to register the copyright in the US to sue, but having the copyright elsewhere counts as registering it before the violation.
Could someone explain what exactly is in this library of congress? Is it just a big library of stuff?
Based solely on the name, I would infer that it would contain alot of US historical documents, government stuff, and what not. In which case, scanning would NOT "benefit" society as a whole. Perhaps US society. Not the rest of the world. Nobody else would really care.
Recently someone mentioned to me that it is possible to put hunderds of books in a CD using DjVu. It looks like DjVu is the MP3 of books!
Take a look at http://www.djvuzone.org/
The curious thing is that there is great support for it under Linux and KDE in particular.
Woo hoo - I finally got my cherry popped a modded down as a troll. I wonder if the modder recognizes the irony of branding "troll" on a post that was branding the original post a troll...
Slashdot comments... splitting hairs since 1997.
I do similar work on military tech manuals, and believe me, they've way underestimated the labor part. They'll never make it, unless the entire city of Bangalore decides to go for 17 cents an hour, then maybe.
Don't be silly -- LOC has hundreds of thousands of books in more than a dozen languages
A couple of years ago the Harvard University Centre for Astronomy had one of it's collections of technical publications scanned in order to be put online. But to make the material actually usable they had to launch a program over the net for volunteers (predominantly amateur astronomers) to view the scanned pages and enter, by hand, the necessary bibligraphical information (authors, paper titles, etc), as well as to QC things (look for duplicated pages, missing pages, work out which of several scans of fold-out drawings is the best image, etc).
The scanning step was trivial (probably lots of bored students on minimum wages, getting brownie points from their professors); the INDEXING process has been going on for over 2 years now and is not yet finished.
NASA ADS at SAO: Historical scans currently in the ADS
Birds are not dinosaur descendants;birds are dinosaurs, for all useful meanings of "birds", "are" and "dinosaurs"
He uploads it to the CIC database--the Library, formerly the Library of Congress, but no one calls it that anymore. Most people are not entirely clear on what the word "congress" means. And even the word "library" is getting hazy. It used to be a place full of books, mostly old ones. Then they began to include videotapes, records, and magazines. Then all of the information got converted into machine-readable form, which is to say, ones and zeroes. And as the number of media grew, the material became more up to date, and the methods for searching the Library became more and more sophisticated, it approached the point where there was no substantive difference between the Library of Congress and the Central Intelligence Agency. Fortuitously, this happened just as the government was falling apart anyway. So they merged and kicked out a big fat stock offering.
--Neal Stephenson, Snow Crash
Hey, you try to find an open nick these days!
While we are at it, let's scale back the copyright limits back to life of creator + 20 years (or even farther back as far as I'm concerned), and bring back more of the booty which the corporations have plundered from us, the public.
I hear that the LoC has one of the best Playboy collections in the world! This will put playboy.com right out of biz!
I've started a fund raising co-op at http://www.ideacradle.com/givesupport_virtual.php? currentIdea=11/
I concede this is a long shot but we will see.
I'm building co-operatives right now at http://www.ideacradl
Where I can appreciate some of what the Gates Foundation does- the majority of those three items (immunization, AIDS Research, and anti-poverty work) is far more about opening India and China as markets and sources of cheap labor, than it is about actual philanthropy. It's a clever thing to do with the foundation to look like Bill is helping people when he's really just building a bigger user base for Microsoft. But then again, Bill's object and purpose in life isn't to be a billionaire- and he's not going to be leaving his children with anything other than a legacy and maybe a $100,000 loan (or the equivalent in 2030 dollars to his 1970 dollars that started Microsoft) to start their own legacies- the money is beside the point for him. While I don't respect what he's done or his own technological ability- I do respect him for his REAL purpose behind Microsoft; a computer on every desk running an operating system that is as easy for the end user as a TV set. The billions? They're just what comes from realizing that dream in a totally unethical way.
SJW: a person who perceives an injustice, and while correcting it, commits a greater injustice.
Does the Church of Scientology register theirs?
Considering all things, I think the dollar amount is to low...way to low.
I do like the idea, just can get a very good visual of it running 2.5-3 times above the projected budget.