Slashdot Mirror


Interview with Brewster Kahle

Netmonger writes "A fascinating interview with the man behind The Wayback Machine. Some specs from the article: "It's 150-odd standard PC cases, with four drives in each.. 'Over 100 terabytes.. As plain text in book form, that'd be over 3000 miles of shelf space.." All I can say is.. Wow!"

195 comments

  1. How many by FunkSoulBrother · · Score: 4, Funny

    How many miles of shelf space equal one Library of Congress? Lets use standard units here people!

    1. Re:How many by diesel_jackass · · Score: 2

      How many miles of shelf space equal one Library of Congress? Lets use standard units here people!

      5.6603773584905660377358490566038 LOCs

    2. Re:How many by illsorted · · Score: 2

      According to this page, that's 5 LOC's (give or take).

    3. Re:How many by Kong+the+Medium · · Score: 1

      The comment directly above yours is directly to the LOC.

      There it states:It is also the largest library in the world, with more than 120 million items on approximately 530 miles of bookshelves.

      So you are wrong by at least two orders of magnitude

      Shame on you, and shame on me for nitpicking

      --
      ... whenever a text is transmitted, variation occurs. This is because human beings are careless, fallible, and occasiona
    4. Re:How many by Asprin · · Score: 2


      Lets use standard units here people!

      YEAH! What is it in FPS?

      --
      "Lawyers are for sucks."
      - Doug McKenzie
    5. Re:How many by Anonymous Coward · · Score: 0

      He actually did the conversion. The wayback archive is about 5.66 times the size of the Library of Congress.

    6. Re:How many by EngMedic · · Score: 1

      1 LOC = 530 miles shelf space.

      this guy has 3000mi(1LOC/530mi)=5.667 LOC's of storage space.

      --
      filter: +3. Hey, look! all the trolls went away!
    7. Re:How many by gabec · · Score: 2

      I've always wondered how companies like this made money to keep going... surely it's expensive to keep all that going...

  2. Wow! by Orkin · · Score: 1

    Does this thing still exist? I thought he had to kill it....

    1. Re:Wow! by cybermace5 · · Score: 3, Funny

      Yes. Yes it does still exist. That will be $5.00.

      --
      ...
    2. Re:Wow! by Anonymous Coward · · Score: 0

      no, no your thinking of the terminator

  3. A lot of internet information is crap... by nofx_3 · · Score: 2, Interesting

    So why would you want to preserve all of it? Why not just get the good stuff and maybe he won't need so many comptuers. I understand that just choosing the good stuff would be very subjective, but do we really need archives of pr0n sites and popups?

    --
    Visualize Whirled Peas
    1. Re:A lot of internet information is crap... by Anonymous Coward · · Score: 5, Interesting

      We're not qualified to judge what "good stuff" is.

      For example, a ciouple of centuries ago old household accounts would have been considered valueless. But today's historians find a wealth of social data in them - what did people eat? how much did they get paid? did families tend to enter service together? how often did servants get new clothes?

      Disc space is cheap. Keep everything, let future historians sort it out.

    2. Re:A lot of internet information is crap... by JJAnon · · Score: 1, Redundant

      The whole point of the Internet Archive project is to document the growth of the Internet - in all its glory or lack of it thereof. There has never been an opportunity like this before. To be able to study the growth and maturity of a massive social phenomenon like the internet - something that affects the way humanity communicates on an elemental scale - is the dream of every social scientist.

      Picking and choosing what goes into the archive does not solve the purpose of the archive in any way.

    3. Re:A lot of internet information is crap... by t0qer · · Score: 2

      How dare you! The first web page I ever made was "crap"

      But I went to the wayback machine and checked it out. It was cool to see how far along i've come in my web design skills. Now I can show my friends the very first web site I ever made, and crap or not, it's there on the wayback machine and brings with it a lot of nostalgia. That's just the way sentimental crap goes, no matter how ugly or whatever, just the fact that you can go back and look at it makes it "cool"

    4. Re:A lot of internet information is crap... by tanveer1979 · · Score: 2
      A lot of internet information is crap.

      One mans crap is anothers holy grail,
      what you think is interesting for another may be pale,
      For examply this limerick for you is shit,
      But I would be happy if it gets in the archive bit.

      --
      My Aurora : http://www.youtube.com/watch?v=o91ZsGwJYyg
      FB : https://www.facebook.com/TanveersPhotography
    5. Re:A lot of internet information is crap... by Vellmont · · Score: 2, Insightful

      He brings up this point in the article. It's important to archive everything because we never know what's going to be usefull information in the future.
      In other words, perspective and context is a huge part in determining value and meaning. At some point these annoying popup ads may play be important for someone studying the evolution of advertising on the net. In fact, popups, or the frequency or timing of them might be something that's missing from the archive.
      Most of the culture is invisible to most of us most of the time. The things we take for granted are the most ingrained into us, and possibly the most interesting to someone after the culture has changed.

      --
      AccountKiller
    6. Re:A lot of internet information is crap... by 0xdeadbeef · · Score: 4, Funny

      You don't consider the archiving of pr0n a noble cause? Don't be so selfish, man, think of future generations!

      I mean, hell, forget pr0n, just imagine the blackmail value for the kids of 2020, to be able to dig up pictures of their parents on amihotornot.

    7. Re:A lot of internet information is crap... by digitalsushi · · Score: 2

      A limerick has 5 lines... drink less ale!

      --
      slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
    8. Re:A lot of internet information is crap... by kscguru · · Score: 2, Insightful
      50 years from now when historians digs through 2002 e-mail logs, they'll probably think the most heavily consumed product in the country was (insert random spam product here).

      Ah, the legacies we'll leave... based on YOUR e-mail, what will YOU be remembered as?

      --

      A witty [sig] proves nothing. --Voltaire

    9. Re:A lot of internet information is crap... by Anonymous Coward · · Score: 0

      I can guarantee to you that some day some sociologist will get a PhD for analysing the troll "community". And they'll spend hours search old slashdot archives for the earliest mention of That Picture.

      Whether that's a useful thing to do or not... *shrug*.

    10. Re:A lot of internet information is crap... by ChaosDiscord · · Score: 3
      Why not just get the good stuff and maybe he won't need so many comptuers.

      Identifying "good stuff" is very hard and certainly not something that can be automated. Furthermore, "good stuff" is in the eye of the beholder. Perhaps Jane's web page dedicated to her kittens in useless to almost everyone in the world. However, to Jane's great-great granddaughter who hasn't been born yet, it might provide a fascinating look into her own past. A historian a hundred years from now analyzing the first twenty years of the web would certainly want to know that porn and popups were so pervasive.

    11. Re:A lot of internet information is crap... by garcia · · Score: 4, Insightful

      I think that storing everything on computers will make historians jobs MUCH less difficult but a lot less fun.
      Doing historical research is fun b/c you get to get your hands dirty (literally). I spent 6 hours a day for three weeks researching crime rates in Toledo, OH during prohibition (before, during, and after) and b/c the books were all handwritten and they were so old my hands turned black for days at a time...
      It would have been MUCH easier if all the information was sorted and easily found I guess it would make future historians jobs easier but what fun would that be?

      Just my worthless .02

    12. Re:A lot of internet information is crap... by Anonymous Coward · · Score: 1, Funny

      Fuckery duckery doo
      Fuck shit ass hell damn poo
      A cock and nutsack
      And boobs and asscrack
      Fuckery duckery doo

    13. Re:A lot of internet information is crap... by aengblom · · Score: 3, Insightful

      I think that storing everything on computers will make historians jobs MUCH less difficult but a lot less fun.

      I think it's more that i will be different people. Understanding most of history is constrained by the lack of data about that time. Our age is precisely the opposite. We try and save EVERYTHING we can possible afford--because we know that crap will be valuable to many people later on. For next centuries historians it will be about data sampling and extracting the gold nuggets from all the crap we have saved.

      It will be the folks who built google. Not the current type of folks.

      That said. It's better to have too much than too little.

      --


      So close and yet so far from the world's perfect ID number
    14. Re:A lot of internet information is crap... by Anonymous Coward · · Score: 0

      If we store everything the future historians will see it is 96.733% pr0n and blame the Internet for their overpopulation.

    15. Re:A lot of internet information is crap... by digitalsushi · · Score: 2

      "Humans of our time do not actually consume the products our country is most advertised in random spams, which is email we did not want."

      There. Now this will get archived in a few weeks, and will cancel out your worries. Move along :)

      --
      slashdot: where everyone yells sarcastic metaphors to themselves to understand the issue
    16. Re:A lot of internet information is crap... by Anonymous Coward · · Score: 0

      A historian a hundred years from now analyzing the first twenty years of the web would certainly want to know that porn and popups were so pervasive.

      Well porn, obviously :o)

    17. Re:A lot of internet information is crap... by efatapo · · Score: 1

      Interesting, but isn't it all relative? By the time anyone cares what was on the internet in the 80's, don't you think these machines will all be antiqued and probably incompatible. Doing 'hand dirtying' research will now involve finding an operating CD ROM drive (HA, what a piece of crap, it was that large and not flexible, and you couldn't touch one side). Or, going into a vault of old computer storage crap. It's all relative. :)

    18. Re:A lot of internet information is crap... by jez9999 · · Score: 2

      Well I just went back to the earliest archive they had of my first ever website, and all I got was a message saying 'page not in archive'. The later snapshots they had were a completely different, more modern design. Impressive indeed.

    19. Re:A lot of internet information is crap... by nofx_3 · · Score: 1

      True, like I said in the original comment, we can't know who a qualified judge is so how can we ethically say what is crap and what is not, but on the other hand you can't possible tell me that all the hamsterdance pages need to be stored for the future or that countless porn pages and penis enlargement gimmicks could be of any serious use to historians in the future. If you have seen one, you have seen them all, so save one and throw the rest away.

      --
      Visualize Whirled Peas
    20. Re:A lot of internet information is crap... by Anonymous Coward · · Score: 0

      part of the goal is to keep the current culture of the internet society intact as it is, or was when it was captured.

    21. Re:A lot of internet information is crap... by garcia · · Score: 2

      you mean in an age where we have CDROM and DVD-ROM and searchable indexes and I was using Microfilm and Fiche (like I do everday in my job)?

      I work w/records are are 30+ years old. They are on Microfiche (sometimes really bad copies).

      They will have the tools and they will know how to use them, believe me.

    22. Re:A lot of internet information is crap... by Anonymous Coward · · Score: 0
      I understand that just choosing the good stuff would be very subjective, but do we really need archives of pr0n sites and popups?

      OK, I'll let you drop the popups.

  4. What a great way... by Randolpho · · Score: 2, Funny

    to get old pr0n! :D

    --
    "Times have not become more violent. They have just become more televised."
    -Marilyn Manson
  5. Transient Moments by szyzyg · · Score: 5, Interesting

    It's a shame that some fo the more interesting moments in Internet history are so transient the wayback machine can't catch them.

    e.g. The Ded Kitty picture we put up when napster shut down at the star of september, it was only there for a few hours but it will be lost.

    Of course, some of the more interesting transient events are websites that are hacked, but there exist dedicated archives for this kind of event, so you can relive the hilarity of RIAA.org being repeatedly defaced.

    1. Re:Transient Moments by Randolpho · · Score: 1

      A very good point. I suppose transient and dynamic web pages aren't really as important to an archive of webpages... They're looking to save the RFCs and other important articles. Or so I suppose.

      --
      "Times have not become more violent. They have just become more televised."
      -Marilyn Manson
    2. Re:Transient Moments by WatertonMan · · Score: 2
      I wonder what all it catches. As someone else mentioned, dynamically created sites might not fair too well. There are obvious ways to make a crawler that will crawl many pages generated with CGI. Take a site like AintItCoolNews. Crappy website with everything done with CGI. However you could still write a crawler to crawl most of its links without any trouble. (I wrote one that did)

      I also wonder how well it does with Flash or other multimedia. I don't care about it not crawling commercial sites as much. There are much bigger copyright issues on that. Plus, lets be honest, most of those sites are just porn anyway. I don't think we need a historic archive of porn sites.

      My big question though is whether they backup their data regularly. Afterall even hard drives wear out. . .

    3. Re:Transient Moments by Anonymous Coward · · Score: 0

      Anyone have a site with the napster dead kitty picture?

    4. Re:Transient Moments by SeanAhern · · Score: 1

      Why can't we just have the wayback machine make exceptions for "interesting moments"? I mean, if the machine can be set to take a snapshot every 60 days or so, I fully believe that it could also be told, "Go take a snapshot of www.some.site.com right now." It would just require a human to identify what are "interesting moments".

  6. stupid Joe Six-Pack metaphors by p_rotator · · Score: 4, Funny


    As plain text in book form, that'd be over 3000 miles of shelf space.."

    Huh? How about "If all data was spoken at once, it would be as loud as 674 jet engines!" Or "If this archive were a planet, it would be as large as Jupiter!"

    1. Re:stupid Joe Six-Pack metaphors by Angry+White+Guy · · Score: 1

      What about just leaving it at 100 Terabites?

      --
      You think that I'm crazy, you should see this guy!
    2. Re:stupid Joe Six-Pack metaphors by swb · · Score: 2

      3000 miles of shelf space -- at what type size?

      Would the Large Type Edition be [cue Mr. Evil voice]:

      One Million Miles!?!

    3. Re:stupid Joe Six-Pack metaphors by Anonymous Coward · · Score: 0

      Perhaps because they are Terabytes

    4. Re:stupid Joe Six-Pack metaphors by Anonymous Coward · · Score: 1, Funny
      What about just leaving it at 100 Terabites?


      What is that, Pacman's system of measuring storage?

    5. Re:stupid Joe Six-Pack metaphors by jaraxle · · Score: 2, Funny

      That's "Doctor" Evil. I didn't go to seven years of medical school to be called "Mister"... thank you.

    6. Re:stupid Joe Six-Pack metaphors by Dephex+Twin · · Score: 1

      That's evil medical school, thank you very much.

      --

      If you want to make an apple pie from scratch, you must first create the universe. -- Carl Sagan
    7. Re:stupid Joe Six-Pack metaphors by Anonymous Coward · · Score: 0

      That's *Doctor* Evil. He didn't go to Evil Medical School to be called mister.

    8. Re:stupid Joe Six-Pack metaphors by Kaz+Riprock · · Score: 1

      Joe Six-Pack metaphors are fun! Don't buzzkill the fun, man!

      3000 miles of shelf space...why, that'd be like enough shelf space to go from LA to NYC! Enough shelf space to almost reach the molten core of the Earth! Enough shelf space to circle the Earth (at the Artic Circle)!

      Enough shelf space to go 1/80th of the way to the MOON!!!!

      --
      Mordor...a magical, mythical land where women are more rare than dragons--but where every man would rather find a dragon
  7. I met him by Anonymous Coward · · Score: 0

    At CFP 97... he scared me.

    1. Re:I met him by Anonymous Coward · · Score: 0

      Why/how?

    2. Re:I met him by Anonymous Coward · · Score: 0

      I don't like to tell this story, but...

      He was giving a talk on the wayback machine when, about 3/4 of the way through, he suddenly stopped and clutched his stomach, as if he was in severe pain. He then let out montrous fart and then continued as if nothing happened. I guess it was a way to keep the audience's attention. But I found it in poor taste.

      Any one else at that seminar who could comment?

    3. Re:I met him by Anonymous Coward · · Score: 0

      Hahahaa typical geek

  8. Wait wait by PygmyTrojan · · Score: 2, Insightful

    Now how many Library's Of Congress is that?

    --

    Trying is the first step towards failure.

  9. Move over Borges by doogieh · · Score: 2, Interesting

    As Borges once said about the Libaray of Babel wayback now...

    The universe (which others call the Library) is composed of an indefinite and perhaps infinite number of hexagonal galleries, with vast air shafts between, surrounded by very low railings.
    Looks like he wasn't too far off...

    ...The Library is a sphere whose exact center is any one of its hexagons and whose circumference is inaccessible.

    Well, maybe not...

  10. Is this thing backed up? by TheSHAD0W · · Score: 3, Funny

    I'd hate to see the history of the net destroyed if the sprinkler system goes off in their server room...

    1. Re:Is this thing backed up? by Anonymous Coward · · Score: 1, Informative

      read the article, it is backed up in two seperate locations, as well as all their old disks.

    2. Re:Is this thing backed up? by Anonymous Coward · · Score: 0
      As the article said, they used to use magtape but now they just archive the old drives.

      But just how long will a switched-off drive reliably last as a backup? In the Bad Old Days, disk drives that were left off for more than even a few months tended to get "stiction" and be unable to spin up. Things have no doubt improved, but I'd still be amazed if a drive that's spent ten years in storage would still work reliably.

      Punchcards are pretty robust. I suppose they could be transcribed into that, but we'd run out of forest pretty fast.

  11. Kahle? by Prince_Ali · · Score: 3, Funny

    Who is this Kahle guy? I know for a fact that it is Mr. Peabody who is behind the way-back machine. I was with him when he visited Nobel.

  12. Maybe we can help them to get this info for us? by Ninja+Programmer · · Score: 2, Interesting

    Perhaps we need to propose an extension to the robots.txt file to tell certain classes of search crawler to visit more frequently or at specific times?

    1. Re:Maybe we can help them to get this info for us? by Anonymous Coward · · Score: 0

      oh ya, like that won't ever be abused. :)

    2. Re:Maybe we can help them to get this info for us? by Ninja+Programmer · · Score: 1

      The activities of the robot are ultimately up to the robot. What I am suggesting is that there could be hinting which would allow the robot to make better guesses about how to react. The robot could also *learn* that a site's hints are not very useful.

  13. On a related note, look up the Long Now Foundation by JJAnon · · Score: 3, Interesting

    Here. They seek to create physical items (clocks and libraries are two items they name) that will last for very, very long periods of time. This diagram shows what is meant by the "long now", and this is a link to their first prototype clock that is on display in the Science Museum in the UK (the second clock on the page).

  14. 100 Terabytes! by insanecarbonbasedlif · · Score: 3, Informative

    I did a quick price check and for 100 terabytes of data on 80GB drives (Best price/size ratio I could find), that's about $111,250 worth of storage. Of course, I guess they would get bulk discounts :).

    --
    Just because I doubt myself does not mean I find your position compelling.
    1. Re:100 Terabytes! by dougmc · · Score: 3, Informative
      The math (100 terabytes, 150 computers, 4 drives per computer) works out to an average of 171 GB/drive. Of course, they said `over 100 TB' so it's actually higher than that.

      Obviously they're using IDE drives. Modern ones. And they must have replaced almost everything at once -- there could a mixture of 200 GB and 120 GB drives, but it would have to be mostly 200 GB drives.

      Pretty neat, but still doesn't hold a candle to google's massive setup :)

      (google must have a *team* of people who's sole job is finding failed computers/drives and replacing them :)

    2. Re:100 Terabytes! by jasoncart · · Score: 2, Funny

      Lets hope they arn't using those IBM disks

    3. Re:100 Terabytes! by gid · · Score: 2

      They use 160 GB drives btw. And they claim 200 GB drives should be comming out any day now.

      Plus they have 3 separate facilities, I'm assuming each is a "complete set".

      I personally have two 120 GB drives which is way more than I need, but hey, I got a deal on em, so I bought two so one could be a backup (put it in a seperate machine, albeit at the same location) rsync is a wonderful tool. :) I'm a digital pack rat as well, what can I say?

    4. Re:100 Terabytes! by Anonymous Coward · · Score: 0

      Smurf. He's using 180G drives. Read the article.

    5. Re:100 Terabytes! by product+byproduct · · Score: 3, Funny

      For comparison I did a quick price check and for 3000 miles of shelf space on 5x26.25" bookcases (best price/size ratio I could find), that's about $29M worth of bookcases. Using harddisk drives was a smart decision.

    6. Re:100 Terabytes! by capnjack41 · · Score: 1
      over 100 TB

      Wait, do they mean 100 trillion bytes, or 100 * 2**40 bytes? That's how these sneaky hard drive manufacturers get you!

    7. Re:100 Terabytes! by awptic · · Score: 2

      Actually, google uses some kind of flash memory to store all their data; the initial cost is much more, but the savings of not having to replace drives constantly pays off in the end.. this was brought up in an intervew a while back (sorry, can't find URL). And seeing as how their entire archive is refreshed every few days, it's not all that important that the data storage be reliable after a power failure or whatever.

  15. According to... by Cyclopedian · · Score: 4, Informative
    this, the LOC pales in comparision to "3000 miles of shelf space".

    -Cyc

  16. Obligatory by Pastor+Fluff · · Score: 1, Offtopic

    ...Beowulf Cluster reference.

    --
    Bubble, bubble, toil and trouble... can't we just go to Starbuck's for coffee?
    1. Re:Obligatory by Anonymous Coward · · Score: 0

      No, it's not obligatory. shut up.

  17. I don't understand terabytes.... by nebenfun · · Score: 4, Funny

    "Over 100 terabytes.. As plain text in book form, that'd be over 3000 miles of shelf space.."

    I don't understand terabyte or the shelf space analogy...
    I need to know how many banana's.

    nbfn

    1. Re:I don't understand terabytes.... by BeeShoo · · Score: 2

      I believe the actual standard for this is supposed to be the number of Libaries of Congress that would fit.

    2. Re:I don't understand terabytes.... by BeeShoo · · Score: 2

      Ahem... Libraries... Oops.

    3. Re:I don't understand terabytes.... by gid · · Score: 4, Funny

      Well since bananas can't directly hold data that well since they rot so quickly, well have to use those bananas to store data by some other indirect means.

      So, how many bananas would it take to feed all the monkeys needed to store the data? Monkey's aren't that smart so lets approximate each monkey can hold 4k worth of data.

      100 TB = 100 * 1024 * 1024 * 1024 KB = 107374182400 KB

      107374182400 KB / 4 = 26843545600 monkeys

      Now we'd want redundancy so lets have triplictate monkeys for all our data, in case one dies, or runs away, or simply forgets.

      26843545600 * 3 = 80530636800 monkeys

      But now want want to figure out how many bannas they're gonna eat, lets say 5 bananas a day per monkey?

      80530636800 * 5 = 402653184000 bananas to feel all monkeys per day

      402653184000 * 365 = 146968412160000 bananas to feed all monkeys per year

      146,968,412,160,000 or 146 trillion bananas per year, which is probably just slightly over the nation debt.

      Overall, I think your method of using bananas to store all this data is quite ridiculous. The latency and dataloss would be unbearable. Plus think of all the poop these monkeys would create, and you'd NEVER be able to get PETA off your back.

    4. Re:I don't understand terabytes.... by Anonymous Coward · · Score: 0

      Plus think of all the poop these monkeys would create

      So you sell fertilizer; the monkeys practically pay for themselves!

      and you'd NEVER be able to get PETA off your back.

      If you had eighty billion trained monkeys, I think you could get the whole U.S. Army off your back.

    5. Re:I don't understand terabytes.... by RealAlaskan · · Score: 1
      Better to have PETA on your back than all those monkeys and their poop.

      If we assume moderate-sized monkeys, at 20 pounds each (about 10 kilograms), that's around 536.9(10^9) pounds of monkey, generating too many pounds of monkey poop. On the other hand, think of all the gardens we could fertilize with all that zoo poo.

    6. Re:I don't understand terabytes.... by spinlocked · · Score: 2

      ...lets have triplictate monkeys for all our data, in case one dies, or runs away, or simply forgets...

      Authough they have a much larger footprint than monkeys, I'm told Elephants have better data retention characteristics and the peanut has a much smaller form factor than the banana.

      --
      # init 5
      Connection closed.


      Oh... ...bugger.
    7. Re:I don't understand terabytes.... by elfkicker · · Score: 1

      Funniest thing I've read in months. Thank you, sir.

    8. Re:I don't understand terabytes.... by Anonymous Coward · · Score: 0

      ...and you'd NEVER be able to get PETA off your back.

      Your telling me! they're still bitching about this little incident and it was over a year and a half ago!

    9. Re:I don't understand terabytes.... by x98chn · · Score: 1

      Well since bananas can't directly hold data that well since they rot so quickly, well have to use those bananas to store data by some other indirect means.

      So, would this be what they call bit rot??
      Sorry...

  18. Wayback technology by watchful.babbler · · Score: 5, Informative

    There's an excellent interview with Kahle on technical details at O'Reilly's own archive -- here.

    --
    "Freedom is kind of a hobby with me, and I have disposable income that I'll spend to find out how to get people more."
  19. Like all those crappy old buildings... by FreeUser · · Score: 3, Insightful

    A lot of internet information is crap... So why would you want to preserve all of it? Why not just get the good stuff and maybe he won't need so many comptuers.

    And of course, you're going to decide what is "good" and what "isn't?" He is providing the resource for, among other things, scholarly researchers. Of what use is the data if it has been hand edited according to one person's aesthetics or anothers?

    Indeed, your comment reminds me of one that was heard quite often, shortly before beautiful and irreplacable old buildings were razed to make way for a new strip mall, or, in downtown Chicago, a couple of new government buildings whose architectural style is best described as "Federal Drab." Preserving as much as possible is a good thing, because none of us can tell what will be valuable, and what will not, in another 20 or 30 years, and no one's aesthetic should be dictating such a decision to entire generations to come.

    --
    The Future of Human Evolution: Autonomy
    1. Re:Like all those crappy old buildings... by nofx_3 · · Score: 1

      You are taking the comment out of context, the very next line says "I understand that just choosing the good stuff would be very subjective" I'm just not saying that I or anyone I know is a competent judge, I'm just saying we really need to analyize whether this data is necessary or even useful before we commit resources to gathering it that could be better used for other projects, I mean 150 boxes could be used to crunch a lot of scientific data that could be more usefull than ads porn and people's personal pages.

      --
      Visualize Whirled Peas
  20. Who's to say what's crap and what's not? by C.U.T.M. · · Score: 1

    Something you may think is crap may not be crap to someone else... I'm sure someone out there is interested in the millions of Britney Spears or *NStank picture galleries.

    1. Re:Who's to say what's crap and what's not? by Anonymous Coward · · Score: 0

      What, so we shouldn't have septic tanks when there might be a perfectly good watch or ring in one of them somewhere? Let it all run out into the street for all to pick through! There might be something useful in there!

    2. Re:Who's to say what's crap and what's not? by Anonymous Coward · · Score: 0

      STFU you stupid son of a bitch, that's not what he meant and you fucking well know it. You put the "gobblin'" in "Ass-Goblin". Go to hell. You goddamn make me sick. Mod parent down as the WORST FUCKING POST EVER WRITTEN ON SLASHDOT. I never though things could get this shitty.

    3. Re:Who's to say what's crap and what's not? by Anonymous Coward · · Score: 0

      you're new to the whole AC shit-talking thing, aren't you?

      it's all about quality, not quantity..

  21. Sounds good... by C0LDFusion · · Score: 2, Insightful

    ...except for the fact that he allowed the Church of Scientology to bend him over and use him like a toy. Why doesn't he get some Google backbone and refuse to bow to their DMCA threats?

    Oh, I forget that honor is dead on the internet.

    --
    Only in slashdot are posts of solidarity modded at -1 Redundant, while posts of antagonism are modded as -1 Flamebait.
    1. Re:Sounds good... by Anonymous Coward · · Score: 0

      I wonder how quickly you'd bend over when there's an army of Scientologist lawyers at your door and they tell you to remove some pages, many of which that contain the horror stories about how the CoS ruins the lives of people who dare to stand up to them.

      My guess is faster than you can say, "Yessir, right away sir, just call me Mr. Goatse!"

    2. Re:Sounds good... by MushMouth · · Score: 2

      First of all Brewster doesn't have the resouces to deal with the "Church" of Scientology. If you want to put up your money to defend against them feel free, but don't go and tell the Archive what to do. Secondly most of this stuff still exists at the Bibliotheca Alexandria's copy of the wayback machine. Check out http://archive.bibalex.org, and that is unlikely to change, Brewster was smart enough to send copies to places with different copyrights.

    3. Re:Sounds good... by C0LDFusion · · Score: 1

      I wonder how quickly you'd bend over when there's an army of Scientologist lawyers at your door and they tell you to remove some pages, many of which that contain the horror stories about how the CoS ruins the lives of people who dare to stand up to them.

      My guess is faster than you can say, "Yessir, right away sir, just call me Mr. Goatse!"


      You know, that would've been an interesting and concise comment if I hadn't already been dealing with the shit that the Church of Scientology cranks out. Start picketing in front of an Org and be a regular poster to alt.religion.scientology. Then tell act like I don't know the shit they throw around.

      --
      Only in slashdot are posts of solidarity modded at -1 Redundant, while posts of antagonism are modded as -1 Flamebait.
    4. Re:Sounds good... by Anonymous Coward · · Score: 0

      I have it on pretty good authority that the info that the CoS wanted censored is still in there - it's just been blocked from public view for the time being. I know that's not a perfect solution, but it will have to do until there is enough funding to stand up to those bastards.

      So the library isn't burning the books, they're just in the back room for now. It's easy to say you'd stand up to a 200 pound gorilla when its knocking on someone else's door.

    5. Re:Sounds good... by C0LDFusion · · Score: 1

      I have it on pretty good authority that the info that the CoS wanted censored is still in there - it's just been blocked from public view for the time being. I know that's not a perfect solution, but it will have to do until there is enough funding to stand up to those bastards.

      As to my recall, Google didn't spend a dime standing up to the CoS. Why? Because CoS knows its DMCA complaints are bogus, especially when applied to other people's sites. It's only applicable if they are handing out the "secret teachings" that are actually still copyrighted.

      --
      Only in slashdot are posts of solidarity modded at -1 Redundant, while posts of antagonism are modded as -1 Flamebait.
  22. Silly Me! by Cap'n+Canuck · · Score: 3, Funny

    And here I thought it was Mr. Peabody that invented the Wayback Machine. No, hang on, it was Al Gore...

    But seriously, unless you know about this project, and the fact that you can ask to remove data from the archives (though there's no reference as to how to actually do it), it means that your Internet past can haunt you forever.

    Or at least until simultaneous attacks occur on Cairo and San Francisco...

  23. Another site, with pics by RhBaby · · Score: 5, Informative

    http://www.mindjack.com/feature/archive.html

    In the interest of full disclosure, I wrote it, so be gentle.

    1. Re:Another site, with pics by Anonymous Coward · · Score: 0


      Dude, I wouldn't own up to the fact that you wrote that piece of shit. You could smear dog shit on your face too and walk down the middle of the mall yelling "look at my creation!!" I guess if you want to, but I sure as fuck ain't gonna do it.

    2. Re:Another site, with pics by RhBaby · · Score: 1

      Thanks for the constructive critisism. I am shamed by your skills.

  24. Robots.txt - That was how the RIAA was hacked by szyzyg · · Score: 3, Interesting

    Hint: Don't put security pages in your robots.txt which aren't supposed to be linked.... or at least secure them with a password.

    http://www.zone-h.org/en/news/read/id=894/

  25. Picture of a Picture by paughsw · · Score: 4, Funny

    I put in www.archive.org into the wayback machine and my computer exploded!

    1. Re:Picture of a Picture by Xtraneous · · Score: 2

      Oh Contraire

      The first *working* link from http://archive.org
      October 11th 1997

      Notice how it says The Archive will provide historians, researchers, scholars, and others access to this vast collection of data (reaching ten terabytes), and ensure the longevity of this information.

      Oh how the times have changed.
      BTW: Considering the importance of the archive, be gentle! Slashdoting archive.org == bad!

      --
      .noitacidem deen uoy siht daer nac uoy fI
  26. See also by danlyke · · Score: 4, Informative

    For other Brewster Kahle interviews, see also the Slashdot story that pointed to the O'Reilly interview and the Slashdot story that pointed to the Feed magazine interview (which is currently unaccessible from my machine).

    1. Re:See also by Orne · · Score: 3, Informative

      Hehe, that's what the Wayback Machine is for!

      Feed magazine interview, back from the grave...

  27. Not just Transient by Cap'n+Canuck · · Score: 1

    This sounds great, but there are a lot of limitations. It's not just that the archive is transient (every 60 days), it's also static. Any web pages that access pay sites are not found. Any cool database links that you put into your Web Page and accessed through a cgi - I'm guessing that they are toast.

    1. Re:Not just Transient by Anonymous Coward · · Score: 0
      Any cool database links that you put into your Web Page and accessed through a cgi - I'm guessing that they are toast


      a good web architect takes this into consideration when designing a site. that's why there's more to web design than just knowing how dreamweaver works - you need to know how diff search engines work if the search engine will be part of your audience, and you need to be able to create pages that those engines will be able to find (among a myriad of things)

  28. Please mod the parent up by Anonymous Coward · · Score: 0

    For some reason, it got modded offtopic, but it seems relevant - for an archive of the internet to remain physically intact for long periods of time, reliable storage media have to be created - media that will survive unmaintained for long periods of time.

  29. Re:These days ... by Anonymous Coward · · Score: 0


    What in the hell is wrong with you guys today? Is there a bug in the slashcode giving mod points out to trolls?! I don't see how the parent could be more on-topic unless it was written by Brewster Kahle himself, and who knows maybe it was! Jeeze, get a clue!

  30. Re:100 TB by Anonymous Coward · · Score: 0

    Yeah, that's a shitload of CD-Keys and Serial #s.

  31. OH CRAP!!!! by Cap'n+Canuck · · Score: 1

    I just realized - if terrorists blow up Cairo and the Bay area, I'm going to be the first one on the suspect list!

    Damn! Now I'm really interested in how to remove stuff from their archive!

  32. Odd, no copyright questions by dsanfte · · Score: 5, Insightful

    I was curious to how the Wayback Machine's operators view its legal status... I mean, it's not really a search engine in the broadly accepted meaning of the term. It doesn't just search what's out there, it archives entire pages of old information; And while search engine sites do this (google), this is ALL the wayback machine site does.

    Surely they must know they're treading on untested legal ground. All it might take is one offended copyright holder to bring the whole thing to its knees. Basing it in a country other than the USA might have been smarter, then, given the existence of laws like the DMCA which could serve to shut the site down.

    --
    occultae nullus est respectus musicae - originally a Greek proverb
    1. Re:Odd, no copyright questions by Wesley+Felter · · Score: 3, Interesting

      In presentations, Brewster says his policy is to take out the complainers. So if you think having your site in the Wayback Machine is a copyright infringement, he'll just take it out. Meanwhile he's taking the Napster approach: assume what you're doing is legal until someone tells you to stop. Hopefully that day won't arrive.

    2. Re:Odd, no copyright questions by Anonymous Coward · · Score: 0
    3. Re:Odd, no copyright questions by RealAlaskan · · Score: 2
      In presentations, Brewster says his policy is to take out the complainers.

      Does this mean that he has the Godfather send Guido around to take them out, or does he merely mean that he removes their data from the data base? For the sake of future social scientists, I sort of hope it's the first choice.

    4. Re:Odd, no copyright questions by Obiwan+Kenobi · · Score: 3, Interesting
      Or, as the buddhists say:


      "It is easier to ask for forgiveness than permission."

    5. Re:Odd, no copyright questions by Wesley+Felter · · Score: 2

      I meant that he takes the data out, but I got a good laugh out of the alternative.

      "Brewster says you want out. He also says nobody goes against the archive..."

  33. Re:100 TB by Anonymous Coward · · Score: 0

    um all the cd keys and serials ever invented by anyone wouldn't fill up a large fraction of that space. I'm talking about isos and divx/svcd

  34. True story and a small thanks.... by Anonymous Coward · · Score: 3, Interesting

    Small personal thanks from me. I had put an online exhibit of my artwork up a few years ago, but unfortunately lost all of it by a harddrive failure. Much to my surprise I was able to find nearly all of my site, http://www.gpapassavas.com online and backed up on the WBM.

    1. Re:True story and a small thanks.... by Anonymous Coward · · Score: 0

      You are welcome. We live for notes like yours. Fortunately we get a bunch of them. We get about 20k different IP addresses using the wayback machine each day. I think it is that high because the wayback machine's urls are woven back into the live web. kind of surreal to think of the past and present living side by side in this way.

      Anyway, thank you for the positive note.

      -brewster

  35. Why only four? by pla · · Score: 4, Insightful

    Out of curiosity, why only four drives per PC?

    With a simple $10 PCI IDE card (per additional 4), you could have gotten at *least* 8 drives, possibly as many as 16, per case. Granted, not many cases will let you *mount* that many, but I would expect paying a few bucks extra for the IDE cards and a better case would save quite a bit of money (and physical space) by halving or quartering the number of PCs you need ($100 extra to save $1500 per $2000, not counting the drives themselves?).

    88lf of machines vs 22lf. One requires an entire room, one would fit on a standard sized 3-or-4-tier storage rack. Of course, speaking of racks (of a different sort)... What on earth made you go with an array of standard PCs rather than a raid-in-a-rack?

    1. Re:Why only four? by stratjakt · · Score: 1

      I'd assume they'd be SCSI cases, in which case, he should be able to get at least 16. SCSI stops being hyper-expensive in quantity.

      --
      I don't need no instructions to know how to rock!!!!
    2. Re:Why only four? by jandrese · · Score: 5, Informative

      Probably the limiting factor there is the PCI bus. Modern ATA HDDs tend to saturate vanilla PCI busses (which is why most chipsets have custom busses between the north and southbridge these days). Add ATA cards and your PCI bus quickly becomes saturated and not very good for serving webpages. Worse, since the NIC probably sits on the PCI bus as well, you can easily starve your NIC with too many ATA devices on PCI ATA controllers.

      I know, I have a fileserver at home that has this exact problem, but I don't care if my fileserver is slow so it's not a problem.

      --

      I read the internet for the articles.
    3. Re:Why only four? by Wesley+Felter · · Score: 2

      The Wayback Machine uses IDE. Even in quantity, SCSI drives aren't as cheap as IDE drives in the same quantity.

    4. Re:Why only four? by Anonymous Coward · · Score: 0
      Because the CPU's (boxes) are really cheap. HP gave the Archive a killer deal (pennies on the dollar) on their desktop system. We did at one time use 12 drive IDE systems, but they we very flakey, and it was hard to get to the drives. The Archive and Alexa have 2 internal drives, and 2 drives which are in $6 removable trays, so that all replacing them requires is a shutdown.


      BTW I find it comical that slashdot purposely slows down the connection from Alexa's office subnet, not even the crawler subnet. It takes 10 seconds to load a slashdot page from the office subnet, but less than half second from our server subnet. Talk about throwing out the baby with the bath water.

  36. Re:These days ... MOD PARENT UP!! by Anonymous Coward · · Score: 0


    well, indeed I guess that was not the case and a typo or a slip of the mouse and not censorship. Still, I plan on ramming the broken Tobasco bottle up Taco's ass, it will be a blast for all parties involved.

  37. "That's X Pages!" analogies are silly. by tambo · · Score: 2, Interesting

    I always have to chuckle when I see these analogies. "If you printed all of the data on a CD-ROM, it would reach Mars!"... that's super.

    There are at least two problems with such analogies:

    1) People use them to comment on the marvelous efficiency of technology - but in reality, it's only a comment on the hideous inefficiency of print. It doesn't say much at all about technology. It might be useful to convince people to digitize/OCR their printed matter - but is anyone *not* doing this? Even the Library of Congress is scanning its texts now.

    2) In this case it's a particularly bad analogy, because it assumes that all data is printed as hex. Example: images, which are obviously a huge, huge chunk of the Wayback archive. Virtually all website images are small enough to print on a printed page at full resolution. But consider a 500x500-pixel image, at 16 bits (2 bytes per pixel, 2 chars to represent each byte)... that's 1,000,000 characters, or 1,000 pages!

    Basically the analogy is good for wildly inflating some numbers to stun the 0.00001% of the population that doesn't already realize these things.

    - David Stein

    --
    Computer over. Virus = very yes.
    1. Re:"That's X Pages!" analogies are silly. by Anonymous Coward · · Score: 0

      First of all, there are at least 2500 characters on a 12 point single spaced printed page. Second, any 500x500 pixel image on the web will be compressed using one of the many formats into many fewer than a million characters. So I would guess that at a 4:1 compression ratio, you would have more like 100 pages of text for this image. Talk about inflating numbers!

    2. Re:"That's X Pages!" analogies are silly. by Anonymous Coward · · Score: 0

      Gee, thanks for taking the fun out of it all Cliff.

      Why do fsuktards feel the need to pontificate useless information and/or opinions.

      How the fsuk is print inefficient? How would you, the insightful Mr Clavin, improve upon it?

    3. Re:"That's X Pages!" analogies are silly. by SeanAhern · · Score: 2

      Uh, you're a troll, but here goes.

      Print is inefficient because it's a waste of physical space. In terms of information density, it's hard to come up with a more inefficient use of space than setting 12 point text. I can't speak for Mr. Clavin, but I'd use something more efficient like, oh, say, a hard drive with compressed text and images.

      Radical, yes I know... But we gotta move out of the 19th century some time. The 21st is as good a time as any.

  38. EWWWW! by Anonymous Coward · · Score: 0

    What would you give for a video clip of your great-grandmother? I'd give a lot.

    What a perv! What does he want with my grandmother?

  39. Fascinating? by mgkimsal2 · · Score: 1

    It's OK, but what's so fascinating about it? Honestly, I don't get it. I get the archive's idea - I use it myself. I just don't understand what was so 'fascinating' about the interview...

  40. Ya But... by _ph1ux_ · · Score: 2

    ...Who's going to archive the archive?

    1. Re:Ya But... by Anonymous Coward · · Score: 0

      What happens if Malkovich goes through his own door?

  41. Netscape Behind? by witort · · Score: 1

    I knew I'd read this Slashdot story before!

    http://web.archive.org/web/19971221012817/http://s lashdot.org/

  42. Imagine a cluster of these! by BestNicksRTaken · · Score: 0

    mwahahahaha

    --
    #include <sig.h>
  43. Vaguely uncomfortable by Anonymous Coward · · Score: 1, Interesting
    I understand what the Internet Archive is meant to do, and in alot of ways I admire Brewster Kahle. But...they are archiving and republishing millions of pages that were never intended to last forever. And without permission at that. I don't mean this from a legal perspective, as I have no idea what the laws are on this, but something seems at least slightly wrong about that.

    If there is a way to permanantly erase pages from the archive, I would be a little less worried. But I can never tell if they let you delete stuff, or just "block" it. "Blocking" is crap, we all know what that will be worth if somebody really wants the info someday and knows the Archive has it.

    1. Re:Vaguely uncomfortable by Maul · · Score: 3, Interesting

      I disagree completely.

      If you put something on the web, you have put it up for the world to see. The whole point of putting information on the web is making that information available to lots of people.

      What the Internet Archive is doing is no different than libraries storing old copies of newspapers and magazines. With an increasing amount of things being published online, we need an archive of those things.

      Years from now archives of web pages will be quite useful for those doing research on the events of today.

      Say you are a student in the year 2050 and are doing a report on the "history of the web." Wouldn't it be nice to have copies of the web pages from the 1990s to show how the "early" web looked like?

      --

      "You spoony bard!" -Tellah

    2. Re:Vaguely uncomfortable by Anonymous Coward · · Score: 0
      What the Internet Archive is doing is no different than libraries storing old copies of newspapers and magazines.

      You might be totally right legally. I guess that you probably are.

      But newspaper and magazine writers know that they are 'on the record' when they write. Most internet users have no idea that everything they say and do will last forever. If our current law makes it so that a 14 year old putting up a geocities page for a few weeks is equivalent to printing an editorial in the LA Times, then something is wrong with the law.

  44. Damn Machines by ccollao · · Score: 0

    They'll tell my academic grades to all my future generations.

    Or even worst, my children o grandchildren may realize that I used to be a Porn star!

  45. Typical Question... by wykkyd · · Score: 1
    ...whenever faced with interesting PC configurations (such as the "150-odd PCs with 4 drives in each")...

    "Yeah, but can it make coffee?"

    Response being:

    Of course it can! But since it's the Wayback Machine, it's yesterday's coffee... old, cold, and slightly burnt (but when you gotta, you gotta)...

    --
    ... there is no spoon ...
  46. Underfunded. by PerlPunk · · Score: 1

    Inspite of the fact they seem to get a good amount of funding for this project, it seems the equipment they can afford cannot nicely handle many, if not most, of the page requests. I tried to access a website on a date I know for certain it was up, and their proxy server timed out.

  47. Re:How many....answer by ubugly2 · · Score: 0

    question How many miles of shelf space equal one Library of Congress? Lets use standard units here people! answer 1 L.O.C.=13.2 MS.EULAs..

  48. What if John Malkovich tried the same thing? by Anonymous Coward · · Score: 0

    Malkovich. Malkovich Malkovich Malkovich.

    Malkovich... Malkovich?

    Malkovich? Malkovich!

    MalkovichMalkovichMalkovichMalkovichMalkovich

    ~~Malkovich Malkoviiiich...~~

    Malkovich!!!

  49. Vannevar Bush by Mannerism · · Score: 3, Informative

    Technologists have promised the digital library for decades. In 1945, Vannevar Bush, who was technology adviser to several US presidents, wrote an article in The Atlantic magazine outlining how computers might one day augment libraries.

    Those who find this subject interesting, but who may not be familiar with Vannevar Bush's work, might want to read the paper to which Brewster Kahle refers.

  50. Foundation of our intellectual society by Anonymous Coward · · Score: 0
    The average life of a Web page is 100 days. After that either it's changed or it disappears. So our intellectual society is built on sand.


    The whole point of comprehensive library collections is that you can't tell in advance what will be important.



    It's not the life of the web page that is the shifting sands of our intellectual society, it's what we hold to be important as we live. As in all the previous thousands of years of human history, the important stuff will be remembered and copied down.

    Whether or not people preserve Visa Card or MasterCard receipts long after the bill is paid will not make their lives better.

  51. Re:all i can say is... by Anonymous Coward · · Score: 0

    What is your phone number?

    As a woman, I find it increasingly difficult these days to find a man who is willing to have sex with me. I'm attractive and buxom. I am not interested in a whole lot of relationship baggage, just a man who is willing to let me stick my finger into his anus right when he is at the moment of climax. I know men like it, so what is their problem?

    Oh well, guess I'm stuck with Ladybird for now (that's what I call my dildo).

  52. But, what if I want to see..... by andawyr · · Score: 1

    ....a prior version of the Wayback machine?

    Who has one of those, hmmmm?

    If the currenty archive has 100TB, and is growing by 10TB a month, imagine what Wayback-Wayback would need to have.....

  53. Speaking of shit by Anonymous Coward · · Score: 0

    I just almost shit myself because that was so funny!

  54. You are a dipshit by Anonymous Coward · · Score: 0

    Why you decided to broadcast that to the Slashdot community, I have no idea.

  55. You mean Mr. Kahle is Jay Ward's father? by JThaddeus · · Score: 2

    Really? And where is Mr. Ward these days?

    --
    "Love is a familiar; Love is a devil: there is no evil angel but Love." --William Shakespeare ('Love's Labors Lost')
  56. The Wayback machine is a lie by corebreech · · Score: 5, Insightful

    Try accessing news stories immediately prior to and after the September 11 attack and you'll see just how valuable this website is... or rather, isn't.

    I have also personally ran a website which contained fairly controversial material (based on this story) that I saw listed on their website and then removed shortly thereafter. Tell me, why would a service like this ever have occasion to remove material once it's been archived, especially if there are *NO* copyright issues and the webmaster of the archived site never asked them to remove it?

    The answer is simple: the powers-that-be saw how dangerous it was to make all this information available to anyone on demand so they took control. It would be a great service were it allowed to operate unfettered, but the reality is quite different.

    And I'm the first to mention this here so far? You should all be modded down -1 for naiveté.

    1. Re:The Wayback machine is a lie by Anonymous Coward · · Score: 0

      He does say that after 911 that, "Yes, there were things that had to be taken off..."

      The reader is left with the impression that it is only things like information about nuclear power plants that were spiked, but it could be much more... things is a very big word.

    2. Re:The Wayback machine is a lie by watchful.babbler · · Score: 2, Informative
      I have also personally ran a website which contained fairly controversial material (based on this story) that I saw listed on their website and then removed shortly thereafter. ...

      And I'm the first to mention this here so far? You should all be modded down -1 for naiveté.

      Hm. And yet the WayBack Machine has the Project Censored page here, and even the AlterNet story linked therein. Ah, but yes, it must be a conspiracy by the Big Eye In The Pyramid -- someone call Hagbard Celine. Fnord.

      -1, Delusional.

      --
      "Freedom is kind of a hobby with me, and I have disposable income that I'll spend to find out how to get people more."
    3. Re:The Wayback machine is a lie by corebreech · · Score: 1

      The Big Eye In The Pyramid?

      And you call me delusional?

      No effort to explain why my site would appear and then disappear. No effort to explain why so many news stories before and after 911 were taken down. All you do is tell us of two pages they do archive, despite my never suggesting the contrary.

      You must be one of these retards I keep hearing about. Tell me, how long did it take you to develop this ability, or are you naturally gifted?

    4. Re:The Wayback machine is a lie by Anonymous Coward · · Score: 0

      What's the URI of your site, so I can investigate?

    5. Re:The Wayback machine is a lie by corebreech · · Score: 1

      http://www.holocaustnow.org/

      But it's been down for a month or so... waiting for a new server. :(

  57. Re:How many....answer by Kong+the+Medium · · Score: 2, Funny

    = 23.42 Bills of constitution? + div. amend.

    --
    ... whenever a text is transmitted, variation occurs. This is because human beings are careless, fallible, and occasiona
  58. What do they use for backup? by Proc6 · · Score: 1

    I mean seriously, it's gonna take more than some CDRW's baby.

    --

    I'm Rick James with mod points biatch!

  59. Any bets.... by MDX-F1 · · Score: 5, Interesting

    on how long before a politician has to resign because of some over the top statements he/she made in a flamewar back in college? Or maybe that webpage of ethnic jokes that seemed so hilarious back in high school.

    I have a feeling we are either going to have to become way more forgiving, or we're going to be stuck with only faceless boring types with no opinions as our leaders (no wisecracks, it could be much worse than it is now).

  60. They also host PG DP by jonathan_ingram · · Score: 1

    That's Project Gutenberg's Distributed Proofreading effort (much more fun than that *other* DP).

  61. Duuuhhhhhh!~ by Anonymous Coward · · Score: 0

    Because it is understood that the pages are ->archived- and not thewaybackmachines property or copyright, but that of the original page.

  62. Why 150 PCs? by swb · · Score: 2

    Wouldn't it make more economic and performance sense to cut the number of PCs by a third and take the $50k and invest in a more high-performance and space-conservative disk subsystem?

    Something like this.

    Would give you far better disk performance and scalability than trying to add another 200 PCs with IDE disks.

  63. IDEA... by suprnova · · Score: 1

    dont pick on the wayback machine...its great for when a site gets /.'ed and its down...use the wayback machine to see it :)

    --
    --"The revolution will be simulcast..."--
  64. All Hail the Wayback Guy! by CommieBozo · · Score: 0
    I had no idea the Wayback guy was the inventor of WAIS and a co-founder of Thinking Machines. I thought he was just some guy, like all of those other people.

    All Hail the Wayback Guy!

  65. inside the internet archive by donald · · Score: 1

    Mindjack had an in-depth report on the Internet Archive a few weeks ago, with pictures from the inside.

  66. Great archive by jez9999 · · Score: 2

    http://web.archive.org/web/19971221012817/http://s lashdot.org/

    With quality website snapshots like this, I can see how it will be a great resource for future historians!

  67. Scientology-censored web history by Moorlock · · Score: 2

    A site I run (sniggle.net - formerly found at syntac.net) was removed from the wayback machine when the church of Scientology complained about an image of L. Ron Hubbard on one of the site's pages.

    Now, not only all of the pages on my site, but all of the pages at syntac.net have vanished from the wayback machine.

    Oh yeah, and they can't be found at Bibliotheca Alexandria either, so that's no solution.

    Brewster's going to have to turn down his rhetoric about the wayback machine a bit until he gets the resources to fight back. Otherwise people might get the impression that he really is keeping the history of the web, even the parts of the web that entities like the church of Scientology don't like, alive.

    --
    Quiquid latine dictum sit altum viditur
  68. Link by Anonymous Coward · · Score: 1, Informative

    Sigh.

    Didn't you mean this?

  69. Re:Link - WARNING GOATSE LINK by Anonymous Coward · · Score: 0


    This is a goatse link - don't click it unless you're into that kind of thing.

  70. 150 is even, not odd.. by Bubblesculpter · · Score: 1



    He said the Wayback machine is 150-odd computer. 150 is EVEN, not odd

    Not like it really matters, though.

    Just one more comment to be archived forever and nitpicked from the future.

    --
    www.Beyond7.com Insane modern art water sculpture.
    1. Re:150 is even, not odd.. by Anonymous Coward · · Score: 0

      I take it English isn't your first language. Are you in the habit of stating whether a number is even or odd after the number? If not, why would you assume 150-odd means what you said? It's around 150 machines.z

  71. Oooh! Localhost! by Xtraneous · · Score: 2

    Well, it seems as if the wayback machine has indeed created a paradox. A simple lookup of 127.0.0.1 gives some fairly interesting results.

    The most interesting of these is the one from October 19th 2000. See for yourself!

    --
    .noitacidem deen uoy siht daer nac uoy fI
  72. Archive architecture by yppiz · · Score: 3, Informative

    I worked on some projects with the Internet Archive from 1998 - 2000.

    The Archive's first storage device (circa 1996) was a large StorageTek tape robot with a multi-gigabyte disk cache to handle user requests for archived pages. As drives and processors became cheaper, it became more interesting to use them instead of tape. The cost penalty of using drives over tape is only 2x - 3x, with the enormous win of increased bandwidth and decreased latency (when the request queue for the bot got large, the wait time for a page could be 16 hours. With disk, it's a fraction of a second).

    The first hard-drive based Archive storage used multiple 4U and 5U 12-20 drive Linux/FreeBSD boxes with ~80G IDE drives and Promise cards.

    Drive density is greater now - you can get 200G IDE drives and 320G IDEs are on the way, so you can use regular PCs as opposed to custom or niche-market (rackable server) boxes.

    --Pat / zippy@cs.brandeis.edu

  73. Woops... by peter446 · · Score: 0, Offtopic

    I would imagine doing a rm -fR / might not go down too well, even with the backups

  74. should be moderated "miss-informative" by Anonymous Coward · · Score: 0

    The Archive choose four drive boxes because they are cheaper than getting higher drive density boxes. Also it allows them to have more CPU's for data mining. The cost of space taken by the machines is negligible. Brewster is one of the cheapest guys you will ever meet and in this case cost was everything. People may think they could get it done for less, but in reality this was done about as low as it goes.

  75. Thinking Machines Corp. by Dr.+Cody · · Score: 1

    Whether you know it or not, this man was responsible for starting a company which, to this day, gives case-modders wet dreams...
    Case in point.

    1. Re:Thinking Machines Corp. by yppiz · · Score: 1

      Brewster did the router hardware for the Thinking Machines CM-2. I think Tamiko Thiel did the case design for the CM-2 and the CM-5.

      http://mission.base.com/tamiko/cm/index.html

      --Pat / zippy@cs.brandeis.edu

  76. My head spins by doru · · Score: 1
    Does the Wayback Machine hold snapshots of the Wayback Machine repository as it was n months ago ?

    If it does, wouldn't that be a recursive Beowulf cluster ?

  77. Have you been to the site? by popeydotcom · · Score: 2

    Copyright Policy
    10 March 2001

    The Internet Archive respects the intellectual property rights and other proprietary rights of others. The Internet Archive may, in appropriate circumstances and at its discretion, remove certain content or disable access to content that appears to infringe the copyright or other intellectual property rights of others. If you believe that your copyright has been violated by material available through the Internet Archive, please provide the Internet Archive Copyright Agent with the following information:

    Identification of the copyrighted work that you claim has been infringed;
    An exact description of where the material about which you complain is located within the Internet Archive collections;
    Your address, telephone number, and email address;
    A statement by you that you have a good-faith belief that the disputed use is not authorized by the copyright owner, its agent, or the law;
    A statement by you, made under penalty of perjury, that the above information in your notice is accurate and that you are the owner of the copyright interest involved or are authorized to act on behalf of that owner; and
    Your electronic or physical signature.

  78. Moderators on drugs? (Was:150 is even) by Anonymous Coward · · Score: 0

    Why was the parent moderated upward? In every store, it looks like someone mods up a few posts just to cause trouble. Why do those people have, and keep, moderator points anyway?