Slashdot Mirror


Archive.org Celebrates Its 20th Anniversary (sfchronicle.com)

20 years ago this week, Archive.org started with just 500,000 sites. An anonymous reader quotes the San Francisco Chronicle: Now, the nonprofit San Francisco organization -- which celebrated the milestone with a party Wednesday night -- curates a vast digital archive that includes more than 370 million websites and 273 billion pages, many captured before they disappeared forever. It's more than an archive of Internet sites. The organization, founded by computer scientist and entrepreneur Brewster Kahle, now has a virtual storehouse ranging from digitally converted books and historic film to funny memes and audio recordings of Grateful Dead concerts...

The Internet Archive has survived through community donations and by working with about 1,000 libraries around the world that pay the group to help digitize books and other material. But the site itself remains free.

We've written about Archive.org over the years, and its collection of 2,400 DOS games, over 10,000 Amiga games (and other software) and a massive collection of arcade machine emulators. And here's what Slashdot looked like back in 1998. But what's your favorite page on Archive.org?

42 comments

  1. robots.txt by Anonymous Coward · · Score: 5, Informative

    One thing I greatly dislike about archive.org is that they retroactively apply current robots.txt contents to archived versions of a site.

    I had a website that I sold years ago which now has a no crawl directive so the entire history is gone from the archive. Why would they remove archived versions which permitted crawling?

    1. Re:robots.txt by VicVegas · · Score: 2

      I cannot agree more. It is baffling to me that they would do such a thing.

    2. Re:robots.txt by JustAnotherOldGuy · · Score: 1

      I cannot agree more. It is baffling to me that they would do such a thing.

      I'm guessing it's tied to some sort of legal liability issue or something like that. If it's not, then I'd love to hear their reasoning on why they do this.

      --
      Just cruising through this digital world at 33 1/3 rpm...
    3. Re:robots.txt by negRo_slim · · Score: 1

      It's odd, I see plenty of discussion about the robots.txt nonsense:

      https://archive.org/post/10194...
      https://news.ycombinator.com/i...
      https://archive.org/post/18880...


      But no solid answers as to why.

      --
      On the Oregon Cost born and raised, On the beach is where I spent most of my days
    4. Re:robots.txt by Anonymous Coward · · Score: 0

      Seems straightforward enough. Take a website that, in ignorance, doesn't set robots.txt. They get archived. They discover that they should've set robots.txt, and then immediately set it.
      Should archive.org take this to mean that they meant the site to be indexed and shouldn't be indexed from then on? Or should they assume that a mistake was made and the owner never wanted it indexed in the first place?
      Realize, YOU may know exactly what you want. Archive.org doesn't, and would probably prefer to err on the side of caution.

    5. Re: robots.txt by Anonymous Coward · · Score: 0

      I agree. Years ago I emailed the contact address and the person I corresponded with seemed unable to comprehend that a site can and many times does change owners, and therefore the robots.text should only be valid if present at the time the site was crawled. Unfortunately I gave up as it seemed to be a lost cause trying to get him to understand.

    6. Re:robots.txt by Gavagai80 · · Score: 1

      Robots.txt isn't supposed to be a tool for de-indexing yourself anyway -- it's only supposed to control spidering. Archive.org should specify a different file to set if you want to be de-indexed, and that file can specify a retroactive date if desired. Or just make people fill out a form on the site to de-index.

      --
      This space intentionally left blank
  2. Gimp by Anonymous Coward · · Score: 2, Insightful

    Saw the Slashdot screenshot's article on Gimp and realized that it still sucks as badly now as it did 18 years ago.

    1. Re:Gimp by Tablizer · · Score: 1

      Why is the Gimp icon the only one that works?

  3. And somehow TPB by Anonymous Coward · · Score: 0

    is bad? How does A get away with it?

  4. One cool thing I found... by Zontar+The+Mindless · · Score: 1

    Audio of back-to-back Jefferson Airplane concerts from October 1966--Sygne Anderson's farewell show with the band, followed by Grace Slick's first one on the following evening, both at the Fillmore. Part of the Anderson gig was eventually (sometime in the 2000s, I think) released commercially.

    She died earlier this year--on the same day as Paul Kantner, IIRC.

    --
    Il n'y a pas de Planet B.
  5. Archive.org is getting so old by FudRucker · · Score: 1

    that they will have to make an archive of archive.org

    --
    Politics is Treachery, Religion is Brainwashing
    1. Re:Archive.org is getting so old by Anonymous Coward · · Score: 1

      There are two different archives of the Internet Archive. One is at the Library of Alexandria and the other is a distributed system that is done by many of the same people who are a part of Archive Team.

    2. Re: Archive.org is getting so old by Anonymous Coward · · Score: 0

      What about all those web pages that people saved for personal use but are bot on archive.org. In the days of web browsers provided by AT&T (which sent your system username along with your login account name), we browser windows would close the minute a dial-up connection was lost. It became more effecient to download and save webpages for reading offline that to take your chances online. Thus vast libraries would be built up.

  6. this is my favorite section by FudRucker · · Score: 1

    http://archive.org/details/old...

    if I owned a shortwave broadcasting station i would play those old radio shows exclusively

    --
    Politics is Treachery, Religion is Brainwashing
  7. i remember by Nick · · Score: 2

    i remember reading /. that day and those articles / headlines are still in my memory. feels weird.

    --
    Fuck Ajit Pai
    1. Re:i remember by caferace · · Score: 1

      Doesn't it though?

  8. SHUT THEM DOWN by Anonymous Coward · · Score: 0, Troll

    Calling it 'archiving' doesn't change the fact that these are simply thieves who just want to deprive hard working creators of their well deserved profits.

    Disgusting leeches of society .

    1. Re:SHUT THEM DOWN by Anonymous Coward · · Score: 0

      /s ?

  9. Oh god you mentioned their Amiga archive by Anonymous Coward · · Score: 0

    now you've awoken Cloanta Software to send DMCA notices and try to take down Archive.org. Will there be a 21st birthday?

  10. Didn't Work. by DMFNR · · Score: 3, Funny

    I tried to register on the 1998 Slashdot so I could get one of those nifty "low UIDs" that apparently denote a programmer of great skill and wisdom around here but it didn't work and I'm still a 12 year old cut and paste Python programmer.

    1. Re:Didn't Work. by Nick · · Score: 5, Funny

      time machine dude, it's what most of us did.

      --
      Fuck Ajit Pai
    2. Re:Didn't Work. by hawk · · Score: 1

      I refused to register at first once it was required to post--over those pesky, untrustworthy, cookies.

      It took a while before I wanted to post something enough to both overcome my distrust and to allow anything to set a cookie . . .

    3. Re:Didn't Work. by antdude · · Score: 2

      I use Apple's Time Machine. It doesn't work.

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    4. Re:Didn't Work. by Anonymous Coward · · Score: 0

      fascinating.... truly fascinating. The babbling of a neckbeard never fails to inform.

    5. Re:Didn't Work. by WallyL · · Score: 2

      You were holding it wrong!

    6. Re:Didn't Work. by antdude · · Score: 2

      Steve Jobs, is that you? Aren't you dead? :P

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
    7. Re:Didn't Work. by WallyL · · Score: 2

      Time machine, remember?

    8. Re:Didn't Work. by antdude · · Score: 1

      Prove it. :P

      --
      Ant(Dude) @ Quality Foraged Links (AQFL.net) & The Ant Farm (antfarm.ma.cx / antfarm.home.dhs.org).
  11. 10th anniversary was better! by Gravis+Zero · · Score: 1

    On their 10th anniversary, their front page had things you wanted to read about and things you cared about: conspiracies!

    9/11 Revisited: Scientific and Ethical Questions

    September 11th Revisited - Were explosives used?

    ;)

    --
    Anons need not reply. Questions end with a question mark.
  12. A few other good things by bobstreo · · Score: 1

    The Old Time Radio archive, the Public Domain Movies and some kodi addons

  13. Re: Other archive: Hillary's Yoga schedule found by Anonymous Coward · · Score: 0

    :)

  14. 10,000 Amiga games GONE by citizenr · · Score: 1

    All of the Amiga games were taken down after a ~week.

    --
    Who logs in to gdm? Not I, said the duck.
  15. Robot reading of "The Book of Urantia" by doom · · Score: 1

    I like the audio recordings of the entire "Book of Urantia" in a computer generated Robot Voice, like for example "The Paradise Sons of God" from "The Central and Superuniverses": https://archive.org/download/U...

  16. archive.org is blocked in Turkey by Anonymous Coward · · Score: 0

    I would love to link to some favourite pages but, for a few weeks now, archive.org has been blocked. And not just at the DNS level either (like most of the 114,000 currently blocked sites here in Turkey) but they firewall-of-china it, timing out http connections and breaking https connections.

    Probably to do with 14GB of buying-oil-from-ISIS emails leaked recently.

  17. Thousands of games by iampiti · · Score: 1

    99% of which they don't have the rights to publish.
    Most of the currents right holders won't care much but guys, this isn't right. And surely there's someone at Archive.org who knows this.

    1. Re:Thousands of games by Anonymous Coward · · Score: 0

      It's better to beg for forgiveness than ask for permission in these cases.

      *IF* the companies involved still exist, it's simply easier to say 'no' than to get get lawyers involved in playing IP archeologist and drafting up license agreements for retro software.

  18. Suspicious Treatment of Domain Drop Catching by Baldrson · · Score: 2

    Archive.org plays it dumb when archived content becomes unavailable due to a domain drop catcher placing a robots.txt archiving exclusion on the domain.

    This would not be quite so suspicious if it were not for the fact that when the original author of the material "memory holed" by archive.org pays the extortion to the domain drop catcher, archive.org and requests that archive.org restore the content for the public, archive.org will frequently (always?) fail to do sodo so.

    Archive.org's motive?

    What is Google's motive for making its Usenet archives virtually unusable?

    He who controls the past...

  19. Bravo, it did not save the mp3 files by syntotic · · Score: 1

    Nor traversed the links to find the attached Geocities sites. Seemingly it did not save Geocities, either. But at least I can document I had a site then. As an algorithm it is lacking. Pity I did not go open source and uploaded ALL my code, it did save the pure text files... But funny iexplorer is asking to play the automatic mp3 file though it does not understand the **format**...

    1. Re:Bravo, it did not save the mp3 files by syntotic · · Score: 1

      WOW! Now I go to the machine and... IT CHANGED! IT CHANGED 2001 CONTENTS! I still have a copy of that site they copied. I do not care who is the thief, I want it DOWN. They are not getting life saving fame from it but only the money. Bad that it did not save the content files, but now the site dissapeared! What are they scared of? I have no idea who it is!!! I do NOT KNOW who stole the computer in 2004, only it was Ledezma who confessed about the computer stolen in 1999. Anyway, I have confidence the criminals and their associated will be caught soon, because of their systematicity. And I still am the one who started reading from Science at five or earlier by myself, so... Hope the archive is reading this, someone does not understand whatever happens to me it happens to Occident. That important I am in se.

  20. I love archive.org by k6mfw · · Score: 1

    Really, I use it when I come across a URL but the site went down or a website that is interesting or has certain info I am looking for (i.e. a story or spec sheet for equipment) but the owners either went out of business or died. I even donate money to IA, I sure wish I knew of their party. I've been to a few "Lost Landscapes" of Prelinger's collection. Of course don't expect IA to archive everything, it just can't be done. Their neoclassic, Greek-columned home on Funston that matches their logo, that was coincidence. Logo came years before then they got a good deal to locate on Funston Ave, they were pleasantly surprised front of building matches their logo. In the rear of main seating are racks of servers. When a light blinks it means someone is visiting archive.org.

    --
    mfwright@batnet.com