Slashdot Mirror


Archive Team Is Busy Saving Geocities

jamie found this note from Jason Scott, who organizes the Archive Team. They are busy downloading as much of Geocities as they can before it vanishes from the Net after Yahoo pulled the plug. (Note: that textfiles.com link is a good candidate for Readability.) "..after 48 hours of work, Archive Team has saved over 200,000 Geocities sites. We're now pulling in new sites at the rate of something like 5 a second. Is that fast enough? We'll see, won't we. ... A side-effect of the whole process is I now know way, way, way too much [sic] about Geocities than I ever expected to. We've had to dissect every aspect of how the site functions to understand how to mirror things, from its history through how it does crazy javascript ads. Some of it is stupid and some is hilarious... We think we have most every site from 1999 and before on Geocities that was left. ... It is more important to me to grab the data than to figure out how to serve it later. People who have been talking about copyright and stuff seem to think I'm going to sell it or take credit or some crap. I don't see how the final collection won't end up online, but how is elusive — maybe a torrent of a bunch of zip files, or as a curated collection, or as a bunch of hard drives. However it is, I'll make sure people can get it, somehow."

13 of 267 comments (clear)

  1. Re:And nothing of value was archived by Ilgaz · · Score: 3, Interesting

    I think some Yahoo suits thinking exactly as you joked but a message for them: It is history they will be rm -rf 'ing and you show like a company which can't even afford idle webpages hosting for historical purposes, in such a bad shape with no future.

    They will be deleting (or considering even) dead/passed away people's webpages while they don't have any chance to reply to their lame mails or "click here" things. They did the very same thing in Yahoo Briefcase, 10 MB of highly compressible data for God's sake. At most!

  2. Re:At that rate... by Anonymous Coward · · Score: 3, Interesting

    They'll be broke in only 40 years.

    I wonder if you were thinking the same thing I was when you said this.

    There is a part in Citizen Kane where his editor is telling Kane as a publisher 'your losing hundreds of thousands of dollars a month' or words to that effect and Kane says 'your right, at that rate I'll have to close the doors in 20 years' or there abouts.

    I am too lazy to login or google the exact quote.

  3. Re:And nothing of value was archived by jlarocco · · Score: 5, Interesting

    There was a time, I'd put it somewhere between 1996 and 1998, when Geocities wasn't half bad. Few people were really "up" on the technology, so they'd use Geocities to host real, actual pages that didn't suck. Granted it didn't last very long, and practically overnight everybody was using real hosting options for anything serious. But for a little while, seeing search engine return a link to Geocities wasn't automatically a bad thing.

    Then again, maybe there just wasn't much to compare to back then. Or maybe it just seemed neat because I was only 14.

  4. And how many of them will find other hosting? by TheModelEskimo · · Score: 3, Interesting

    There was an awesome amount of amateur research on Geocities. Some of my favorite reference sites are therefore just about toast (most of them containing first-hand military history).

    And just because someone asked, I saved all ~300 of my Youtube favorites to my HDD last weekend, when I realized how much I rely on them for my own hobby research projects, teaching classes, etc. Most of it was stuff that will never be on DVD. Some of it is stuff that the owners have *already* deleted in the last week, due to perfectionism or whatever.

    I was a Boy Scout, and relying on some free service without thinking of contingencies just doesn't make sense.

  5. Re:Who do I bribe? by N3Roaster · · Score: 2, Interesting

    It might already be gone. I, too, once had a page on GeoCities, so I decided to look into it. Searching for it, Google couldn't find it (but it seems Google Books likes to interpret the old long s as an f). Fine tuning my search pulled up one hit: a Usenet post with a link to the page in the .sig. So, I take this, and I go to the wayback machine. Put in the URL, and I get two versions, both from the year 2000 (well after I had stopped updating the site). Clicking the links, both were unavailable. The content at the URL itself, of course, is long gone. I looked in a couple other places as well and as near as I can tell, that set of pages is fully and permanently gone from the Internet and this project can do nothing to change that.

    Okay, it turns out that I do have a full copy on an old computer. If I hooked a pair of modems up to it and a more modern machine, I could get it back and theoretically put it back on the Internet, but that won't be happening any time soon. So take a Google. You might not have to write that check out after all.

    --
    Remember RFC 873!
  6. Re:And nothing of value was archived by PopeRatzo · · Score: 2, Interesting

    There might be a lot of important information there to archive and we should help them if we can.

    Can you give us an example?

    I'm not doubting that there's something culturally crucial that's on a Geocities page somewhere that's never been moved elsewhere, but I'd like an example before I get too exercised.

    --
    You are welcome on my lawn.
  7. Re:And nothing of value was archived by PopeRatzo · · Score: 4, Interesting

    you shouldn't fix what isn't broken.

    That would eliminate a whole lot of what we call "progress" in technology and culture.

    Sometimes, you don't realize something is "broken" until somebody comes along and "fixes" it.

    Know what? I like people who fix what isn't broken.

    --
    You are welcome on my lawn.
  8. Re:And nothing of value was archived by darkstar949 · · Score: 2, Interesting

    Agreed, in fact there is still some good content up on Geocites that I just recently discovered. Case and point would be a fairly inclusive reverence to the Cokin Filter System. I'm not sure if it is still being updated, but it would be a loss if it is the only site like it on the internet.

  9. Thank god that somebody is archiving it by TinBromide · · Score: 3, Interesting

    I posted earlier about how Geocities was the early web 2.0 in practice, where anybody could post anything and contribute to the community. I'm sure that there is a wealth of information on geocities about obscure topics that *Might* come in handy if you were to let your true inner geek reign supreme. I.E. I have bios roms of early mac's that I found on Geocities sites that couldn't be found anywhere else, and I'm sure that if they were posted nowadays, they would be subject to lawsuits or take-down notices by Apple.

    I think that our generation will leave less of a mark than that which came before it because nobody is writing on paper. Geocities is the closest thing that we have to shoe-boxes full of letters and diaries for the period spanning the late 90's (In the form of websites about star trek and software and pointless articles posted by ambitious young proto-webdesigners). In the future, there will be a similar scramble to preserve facebook and myspace to preserve correspondence for future generations.

    --
    Is it sad that I am more likely to recognize you and your posts by your sig than your name or UID?
  10. Re:We should not let this happen. by merreborn · · Score: 2, Interesting

    Is there a productive way to scream? A petition of some kind? An attorney to be addressed?

    Petitioning Yahoo to continue hosting an antiquated service that is likely bleeding money isn't likely to be productive, obviously.

    But it would be awfully nice of them to .tar everything up and .torrent it. There are thousands of us who'd be more than happy to do our part to keep those bits from disappearing into the ether.

  11. angelfire's open directories by British · · Score: 2, Interesting

    Angelfire was fun to snoop around on, since the image subdirectories were open for the browsing. Sometimes you found images not meant for the public.

  12. Re:We should not let this happen. by mike2R · · Score: 2, Interesting
    Maybe not the Colosseum itself, but maybe the contemporary graffiti scrawled on it. See (although these are from Pompeii).

    It's actually quite an apt comparison, and shows how little we have changed as a species :) eg:

    I.4.5 (House of the Citharist; below a drawing of a man with a large nose); 2375: Amplicatus, I know that Icarus is buggering you. Salvius wrote this.

    --
    This sig all sigs devours
  13. Re:And nothing of value was archived by Sancho · · Score: 2, Interesting

    Searching in pages: pleaaze... that has nothing to do with dhtml based pages!
    You can search within pages as long as you are document centric and dont have a rich client application running!

    I will give an example, most of the stuff mentioned can be done via applying a hash value which represents some kind of application state (hash because it is alterable from the script without causing page refreshes)

    I think you're both coming to the discussion with a different set of assumptions. You're absolutely right that for a web application, many of his gripes don't make sense. Realistically, though, many companies use DHTML for content which is static.

    http://digg.com/ is a perfect example. Disable Javascript and go to the comments on one of their stories. Now turn on Javascript. There's actual content which is inaccessible unless you have Javascript turned on. Slashdot has a similar system, except it gracefully falls back when Javascript isn't available. However it's still troublesome to bookmark certain things like a specific comment if you're using the Web 2.0 version.

    Think that's too close to an application? Try http://www.toyota.com./ The site ostensibly provides information on the company and their product--relatively static content compared to a lot of the Internet--but the site isn't navigable without Javascript. It's barely a Web 2.0 site, yet it's horribly difficult to navigate.

    I'm not just complaining about Javascript. Just about any time that Javascript is required for navigation, the site is not going to be screen-reader accessible.

    Anyway, the point is that lots of sites unnecessarily use DHTML and make interacting with the site in a conventional way difficult, even if they're serving static content and not providing a web application. I suspect that it's these sites that the grandparent is complaining about.