Slashdot Mirror


Brewster Kahle & The Largest Library In History

BorgiaPope writes "WAIS creator and Alexa founder Brewster Kahle is interviewed by Feed. Kahle talks about the 30 terabytes of 'net content stored in Alexa's Linux servers, a data store he calls the 'largest library the world has ever known.' Some fascinating observations about how sites move in and out of the top traffic tier. He also claims that the top ten Web sites have the "greatest worldwide concentration of power since the Roman Empire.""

6 of 88 comments (clear)

  1. Digital archives... by Gendou · · Score: 4

    A professor of mine as well as myself and a number of other students are doing some indepth research on language and how it changes over time. One of our biggest problems at this point is finding sufficient samples of text data from strict editorial sources, so we have had to resort to using photocopied->scanned->OCR'ed National Geographic articles. However, now that we're moving on to a new phase of the project, we need ten times as much data to realize the accurracy of our results. As of now, sources of digital text are few and far inbetween, with no sources going back very far. Why is it that organizations in our society haven't invested the money and time into, say, digitizing the Library of Congress? I realize it's incredibly expensive and timeconsuming - that's what we discovered, but it would be oh so useful to be able to read publications from a hundred years ago on my web browser. It's also great to see modern material produced by our society being archived, but there's a lot of ancient history that should be put into a format that should last forever as well.

  2. Roman Empire and Power by CAIMLAS · · Score: 4
    Power is the ability to conform the will of others to your own. The Romans did this by killing their opponents, and by the threat of such things. These sites don't have such power - anything that people submit to are submitted to out of free will. Unless, of course, you count thinks like the ability to sell personal information. :)

    -------
    CAIMLAS

    --
    ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
  3. Re:Library? I think not by Harri · · Score: 4
    ...and the disadvantage of a library, is that the stuff is selected by a librarian, according to a view of what is interesting that is specific to the age in which he is living and the culture to which he belongs. (Or she, or it). Thus I believe there is a good deal of value in the idea of a library which is not filtered at the time of collection, but which can be filtered at the time of reading according to the interests of the reader.

    For example: In three hundred years, pornography is viewed as a valuable cultural resource. A historian wishes to study the subject of pornography over the ages and relate it to the prevailing attitudes in those ages. The historian will be stuffed, because to a librarian now, pornography is clearly not suitable for inclusion.

    The history we have is much more a history of the rich and powerful, and not a history of the poor, because nobody wrote anything about the poor. Today, big scientific tomes are kept, but Joe Blogg's Geocities page (with exciting photos of him and his family and his cat) gets binned. In three hundred years this might be interesting historical evidence, the same as Joe Chimney Sweep's diary from 1800 or something.

    The technology to do this effectively might not really be here yet, but it will probably arrive in those three hundred years. (Unless we're all too busy looking at porn instead ;) )

  4. Re:Alexa's top50 list by Azog · · Score: 5

    Here's a reality check - for me, anyway: I honestly thought slashdot.org would be somewhere in the top 500. I was going to make a joke about "You know it's time to move on to kuro5hin when slashdot makes it into the top 100". Nope. Slashdot isn't in the top 1000.

    Linux doesn't show up in any of the top 1000 domain names, but windows does - once - in windowsmedia.com, which is about a TV-like a site as you can get, and a subsite of MSN.

    Google was 21st, cnn.com was 37th, and wired.com was 970. Other than that, none of the sites I've bookmarked are in the top 1000.

    I guess I shouldn't be surprised that the web I see is nothing like the web most of the world sees. I am a little disconcerted though. No wonder the general public doesn't care about software freedom, DeCSS, software patents, privacy, etc. The awful truth is that for most people, the internet is like TV.

    What a depressing way to start a Friday.

    Torrey Hoffman (Azog)

    --
    Torrey Hoffman (Azog)
    "HTML needs a rant tag" - Alan Cox
  5. Here's the part I'm not sure I like... by hiryuu · · Score: 4

    Here:

    And I think the right place to tax is the ISPs.

    And here:

    Right now, people are paying all of their money to use ISPs but the ISPs don't have to pay for the content.

    Part of the reason I don't like that notion is because it starts a level of accountability that I wouldn't be comfortable seeing. Where would the tracking begin - or end, for that matter - so that the proper payment balance could be provided? Which ISP - the one the surfer is using to view the content, or the one hosting the content? I imagine he means the latter - and that bothersome. If an ISP can be held financially liable for content that a user provides - regardless of who the copyright holder/content owner is - then how long before said ISP decides to host only content that's marketable and profitable? Draw your own conclusions about where the picking and choosing would go from there.

    Another reason I don't like it - not necessarily a valid one, but definitely a personal one - is that it commercializes the web that much further. There's already enough corporate-owned and profit-driven crap here. It's not like we need more like that.

    Kahle mentions that something like ASCAP is needed, but he himself talks about the nasty history behind his example's development. He also throws out AOL as an example of a company in the "best position" to implement such a thing. Like we didn't have enough concerns about content ownership/control/marketing without an endorsement like that...

    --
    Karma: Excellent, but still won't get you laid.
  6. Library? I think not by streetlawyer · · Score: 4
    He misunderstands the concept of a "library". A library, in its historic definition, is not just a heap of information and publications; it represents someone's selection and preservation of worthwhile knowledge. A massive cache of shitty Geocities sites, corporate bumph and pathetically precious weblogs is not a library by any stretch of the imagination. A library isn't a library unless it has a librarian, deciding what needs to be preserved and, importantly editing out the dross. His servers might have three times as many bytes as the Library of Congress has letters, but I know which one I'd rather spend an afternoon with.

    The Internet will be useless as a repository of knowledge until it is quite ruthlessly edited. I doubt any posts on this thread (including this one) would survive in a proper library.