Brewster Kahle & The Largest Library In History
BorgiaPope writes "WAIS creator and Alexa founder Brewster Kahle is interviewed by Feed. Kahle talks about the 30 terabytes of 'net content stored in Alexa's Linux servers, a data store he calls the 'largest library the world has ever known.' Some fascinating observations about how sites move in and out of the top traffic tier. He also claims that the top ten Web sites have the "greatest worldwide concentration of power since the Roman Empire.""
A professor of mine as well as myself and a number of other students are doing some indepth research on language and how it changes over time. One of our biggest problems at this point is finding sufficient samples of text data from strict editorial sources, so we have had to resort to using photocopied->scanned->OCR'ed National Geographic articles. However, now that we're moving on to a new phase of the project, we need ten times as much data to realize the accurracy of our results. As of now, sources of digital text are few and far inbetween, with no sources going back very far. Why is it that organizations in our society haven't invested the money and time into, say, digitizing the Library of Congress? I realize it's incredibly expensive and timeconsuming - that's what we discovered, but it would be oh so useful to be able to read publications from a hundred years ago on my web browser. It's also great to see modern material produced by our society being archived, but there's a lot of ancient history that should be put into a format that should last forever as well.
Much as I like InfoTech, I don't like the Roman Empire analogy. Information can influence people, but it is NOT military power.
Perhaps a better analogy would be to 400-1400 when the Popes and the Roman Catholic Church did hold a monopoly on religious information in the West. That ended with Gutenberg and the Reformation.
That does seem like a very questionable statement to me. The top ten web sites are potentially powerful, but it depends what content they are serving up. If they are selling things, like Amazon, would that be so powerful? Sure, you can push certain things, but ultimately it's up to the buyer. Of course portals like Yahoo are powerful, but only when it comes to the content they are providing. Do they really have any power over my everyday life? What about people and cultures without so much internet access? Are they not even considered in this discussion?
Besides, power is fleeting.
Spooon!
-N
This line was inexplicably removed from the final inteview: Q: "Thirty Terabytes? That's a lot, isn't it?" A: "Well, once we've taken out all the Spam, 'Make Money Fast' schemes, Pr0n, "w3 0\/\/N j00" homepages, Natalie Portman fansites, 'USS Enterprise vs. Star Destroyer' discussions, links to goatse.cx, and Jon Katz articles, we can fit it all onto a floppy."
-------
CAIMLAS
~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
For example: In three hundred years, pornography is viewed as a valuable cultural resource. A historian wishes to study the subject of pornography over the ages and relate it to the prevailing attitudes in those ages. The historian will be stuffed, because to a librarian now, pornography is clearly not suitable for inclusion.
The history we have is much more a history of the rich and powerful, and not a history of the poor, because nobody wrote anything about the poor. Today, big scientific tomes are kept, but Joe Blogg's Geocities page (with exciting photos of him and his family and his cat) gets binned. In three hundred years this might be interesting historical evidence, the same as Joe Chimney Sweep's diary from 1800 or something.
The technology to do this effectively might not really be here yet, but it will probably arrive in those three hundred years. (Unless we're all too busy looking at porn instead ;) )
Here's a reality check - for me, anyway: I honestly thought slashdot.org would be somewhere in the top 500. I was going to make a joke about "You know it's time to move on to kuro5hin when slashdot makes it into the top 100". Nope. Slashdot isn't in the top 1000.
Linux doesn't show up in any of the top 1000 domain names, but windows does - once - in windowsmedia.com, which is about a TV-like a site as you can get, and a subsite of MSN.
Google was 21st, cnn.com was 37th, and wired.com was 970. Other than that, none of the sites I've bookmarked are in the top 1000.
I guess I shouldn't be surprised that the web I see is nothing like the web most of the world sees. I am a little disconcerted though. No wonder the general public doesn't care about software freedom, DeCSS, software patents, privacy, etc. The awful truth is that for most people, the internet is like TV.
What a depressing way to start a Friday.
Torrey Hoffman (Azog)
Torrey Hoffman (Azog)
"HTML needs a rant tag" - Alan Cox
His assumption of power concentration would be true, if the net was the major medium for all, which it is not. That crown, for better or for worse, is still television.
However, that makes by definition the American media & Hollywood the #1 social power on the planet, not those sites. Sites will come and go. It's not the hits that count. There are countries with no web access or very restricted access (Chad, Syria, almost anywhere in the 3rd world), yet these countries get much more "Americanization" via movies & print literature.
So I'd say that he's on the mark with the content idea, and the web itself is a powerful distributor of knowledge and information. But the most concentrated since the Roman Empire? Almost. That's still the press/media.
46. The Hobo smiles, his eyes glaze over, and he burps. "Beware the man who has lived longer than the Wasteland."
I always thought Brewster's neatest trick was getting his company this amazing space in San Francisco's leafy and spacious retired military base, the Presidio. It was reserved for non-profit firms, so he said that Alexa was archiving the web. Then, lo and behold, he found some commerical application of that library (does anyone actually use that "context" bar thing?) and sold the company to Amazon for a bazillion dollars. And kept his space!
Here:
And here:
Part of the reason I don't like that notion is because it starts a level of accountability that I wouldn't be comfortable seeing. Where would the tracking begin - or end, for that matter - so that the proper payment balance could be provided? Which ISP - the one the surfer is using to view the content, or the one hosting the content? I imagine he means the latter - and that bothersome. If an ISP can be held financially liable for content that a user provides - regardless of who the copyright holder/content owner is - then how long before said ISP decides to host only content that's marketable and profitable? Draw your own conclusions about where the picking and choosing would go from there.
Another reason I don't like it - not necessarily a valid one, but definitely a personal one - is that it commercializes the web that much further. There's already enough corporate-owned and profit-driven crap here. It's not like we need more like that.
Kahle mentions that something like ASCAP is needed, but he himself talks about the nasty history behind his example's development. He also throws out AOL as an example of a company in the "best position" to implement such a thing. Like we didn't have enough concerns about content ownership/control/marketing without an endorsement like that...
Karma: Excellent, but still won't get you laid.
The Internet will be useless as a repository of knowledge until it is quite ruthlessly edited. I doubt any posts on this thread (including this one) would survive in a proper library.
-- the most controversial site on the Web