Digital Archaeology Show Reveals 'Lost' Web Sites
Stoobalou writes "The world's first ever 'archaeological dig' of the internet is set to begin this week in London's über-trendy Shoreditch. The exhibition, entitled Digital Archaeology, kicks off today to mark the 20th anniversary of the first stirrings of the world wide web. According to its organisers, valuable evidence from the interweb's early days is at risk of being lost forever. Digital Archaeology is an attempt to kick-start a wider attempt to archive the web in Britain's first 'digital archive'."
This has to be one of the best written and well thought out posts on /. I've ever seen. The TFA is also top notch.
When they started the dig, the scientists were amazed to see the old now defunct web has buried in it the perfect tool to do the digging! Gophers!
sed -e 's/Chuck Norris/Rajnikant/g' joke > fact
So when are we going to get torrents for the Internet Archive's Wayback Machine?
Hasn't this been done before?
From TFA:
"In five years' time or so, I doubt websites will exist and I expect the vast majority of sites from the first twenty years of the web to be gone forever," says Jim Boulton, curator of Digital Archaeology.
This seems a pretty bold prediction. I know things in The Land Of The Tubes change pretty quickly, but no more websites in five years? Am I misreading that?
"You cannot simultaneously prevent and prepare for war." -- Albert Einstein
of textfiles.com is more of a "digital archeologist" than this wanker, because he might have all that stuff you posted to BBSs back in the 70s/80s.
Plus, he's got an awesome speech on the history of electronic porn, going back to tickertape machines and ham radio(think about that).
http://laughingsquid.com/jason-scott-on-the-atomic-level-of-porn-at-arse-elektronika-2009/
This can be daunting. First, how would they store stuff in a way resistant to bit rot? For example, data stored on 5.25" floppies needs to be imaged and stored on other media. It also would need to be stored with plenty of error correction so that in the future, archivists can put in the relatively ancient hard disk and check to see if there is any irreparable damage. What comes to mind would be a CAS system that automatically copies and checks for errors data on older drives when newer drives are put in.
It would be nice to develop a format whose sole purpose in life is long term archiving with decent byte capacities. Perhaps a cube where three lasers heat up epoxy, forming a microscopic bubble at x,y,z coordinates for "1"s?
Second, emulation. Back then, it was essentially HTML with some basic LiveScript, then JavaScript and animated GIFs. Will we have a way to translate those add-ons in the future if everyone's Web browser consists of Flash and JavaScript is the code of the land, not HTML?
We have archive.org .. but it's not great.
Not saying I could do any better, it's a pretty damn hard problem, and I think resources are a big issue for them.
And a lot of it isn't there fault. One big problem I see is a _lot_ of really good content is behind registration walls. Massive forums packed with loads of useful information vanishes.. and services like archive.org (and search engines) can't get at it.
I think a big issue comes with who gets to decide if you can keep data around. Archive.org will retroactively (or last I checked they did this) disable access to a domains snapshots if there is a robots.txt file on the live site restricting access. This makes sense in theory, letting a site owner prevent access to data retroactively, however it causes problems when the domain dies and an ad page gets put up (or someone else buys the domain and actually puts up a legitimate site).
For example when jumpedtheshark got bought... the new owners put a robots.txt file (and eliminated all old content from their site) preventing anyone from accessing the vast amount of user provided content.
And then there is the issue of a site who actually did want to block their own content for whatever reason. When the site goes down, so does the robots.txt file, and all the old content becomes accessible again!
What, you mean high contrast animated gif backgrounds on barely-visible text?
It's like an Archaeologist is having a conversation with a layman:
Archaeologist: You see this dirt?
Layman: Yep, that's nice dirt, what's so special about it?
Archaeologist: This dirt is FOUR BILLION years old!
Layman: Wow, that's pretty old! So how does that make it different than this dirt I'm standing on?
Archaeologist: Well, for one, if you were to grow marijuana with it, you'd be smoking some ancient shit, man.
Layman: *just stares*
Archaeologist: Seriously, it's OLD!
Layman: I'm sure.
Hmm. I think I need my morning coffee.
I do not respond to cowards. Especially anonymous ones.
http://www.theonion.com/video/internet-archaeologists-find-ruins-of-friendster-c,14389/
Digiboard, Inc. website (http://www.dgii.com) predated the Internet Archive's Wayback Machine. When I started the website, only ~200 sites existed.
Smithsonian Digital Archeology Museum exhibits: Hello My Future Girlfriend Mahir All Your Base Supergreg
FTA: "In five years' time or so, I doubt websites will exist and I expect the vast majority of sites from the first twenty years of the web to be gone forever," says Jim Boulton, curator of Digital Archaeology.
Websites won't exist in 5 years? What will be be using, direct neural interfaces?
Trolling is a art,
I can't wait to visit the "punch the monkey and win!" exhibit.
Maybe something like this?
When you're afraid to download music illegally in your own home, then the terrorists have won!
Three or four standard hard disk backups is the equivalent to the cube you speak of, but a heck of a lot cheaper.
Why OpalCalc is the best Windows calc
"Uber-trendy" is about right. The only reason a digital project like this requires a physical exhibition is so the people conducting it have somewhere to stare at their own moustaches growing in each other's 20-inch square glasses.
FTFA: Many of the now-defunct sites will no longer run on modern hardware, so the exhibition's organisers have assembled a veritable PC junkyard of old kit so you can make like it's 1996 again.
Puh-lease.
Oh good more stuff stored in a 'library' no one except researchers will look at.
Let the blink tags and "Under Construction" GIF files stay buried!
Did apple recently put up a robots.txt to block all robots including the internet archiver?
http://web.archive.org/web/*/http://apple.com
It would be sad to miss out on their logos in the future.
There needs to be a custodian (or agency) which looks after the data.
I don't think coming up with some magical storage medium is the answer..
The current generation stores the data using whatever methods are used for insuring data integrity (multiple copies, raid, checksums, tape.. whatever).
The next generation should still have means to access data from one generation previous (just as I can still access stuff that was put on floppy disk/tape without too much difficulty.. I would have a hard time accessing stuff on punchcard or papertape). And it would be their responsibility to migrate it to the newest storage medium.
The problem of actually accessing and making sense of the data though, I agree, that is daunting. Older data it's not so bad, as most of it was plain text or very basic binary formats that could be boiled back down to text.. at least enough to get the raw content. With all the newfangled formats being used to store our data.. "what the hell is a .flac file" becomes a very real issue.
The archive will be filled with lots of Geocities pages hosting shrines to random anime characters
If they don't do justice to B1FF, then why bother? :(
This is the NSA, we're gonna geet U h@x0r5! Also, what is a h@x0r5?
If there is a bad batch, then the archivists would have four dead hard disks on their hands, and no data.
Optical had the promise of near infinite life. However as time went on, oxidation and bit rot showed that often this would not be true. I am sure there is a way to do burned CDs that have a long archival life, but it would require far better manufacturing tolerances and processes than we have now to ensure that oxygen doesn't seep in along the edge of a layer, or even UV "weld" rings so if oxygen got in from the hub or edge, it wouldn't propagate to the other parts of the disk.
LOCKSS as mentioned by one poster is one idea, but really, life of data begins and ends with the lowest layer. If the way stuff is physically stored is not stable and long-lived, there are only so many bandaids that can be applied, and so much error correction code that can be slapped on.
I remember holographic storage being touted for this, but it seems that we hear an announcement, then nothing. There has yet to be a holographic storage product. Tamarak tried in the early 1990s. InPhase Technologies had products announced, but never shipped a single drive and got faceplanted last February.
So, essentially we are where we were 20 years ago. We have hard disks, flash memory, optical storage, and magnetic tape. Yes, all four technologies have matured, but there hasn't been anything revolutionary.
Long term archival needs more than just shuttling data between formats and making sure the data moved is intact. We need to be able to decode formats. For example, .MOD files. Who has a player for those these days? Does one find an A500 in an attic, and analog hole any files like that? Essentially, we need a PDF/a -like format for not just text, but audio and video.
That should be Yahoo's Cool Site of the Day.
Well get the hard disks from difference sources. The more you get, the closer the chance of non-recovery approaches 0. If you want a one in a 1,000,000,000,000,000,000 chance you can always get another couple of HDs. If even that is too risky, add another couple of HDs to make it another million times less risky.
Why OpalCalc is the best Windows calc
Geocities, banner ads and popups. Nothing to see here folks. Want some real content? Look at the old BBS network! ASCII graphics at 300bps R KEWL!
Does this also mean they are going to scour around the early days of /b/? That is a little scary
The world is how you make it
CD-Rs don't erase themselves because of oxygen. They erase for the same reason why my carpets and paintings fade - the dye loses its color.
.
>>>MOD files. Who has a player for those these days?
I do. MOD never died as a format. I also have a HAM viewer for those old por... er, photos. 4000 colors baby. ;-)
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
>>>I don't think coming up with some magical storage medium is the answer..
What about books filled with barcodes? Not very efficient for space, but it will still be readable ~5000 years from now and convertible back to databits/audio/video.
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
Sure, but it's useless in that format.
The idea behind having an archive is it can be browsed periodically. If you can't easily access the data, what is the point.
When did Interweb become a non-sarcastic word?
I was surprised when I clicked the link and saw that the original article actually used "Interweb" as a serious word, but then I did a quick search and found that there are lots of actual companies that call themselves "Interweb something": Interweb Designs, Interweb Solutions, etc.
What's next? Luser Ltd.? AOHELLpdesk.com? n00btech Inc.?
Let us all mourn the death of satire.
I thought the purpose of an archive was to backup text/audio/video so that if the primary source (DVD) turns to rust, some future ~4000 AD generation can go back to the book and scan it page-by-page to reconstruct it.
One of the tragic things about Greco-Roman culture is that almost all their music was lost. If someone had simply wrote it down on paper, we'd still have the yellowed sheets to reconstruct the songs, but nobody ever bothered. The same will happen to our culture if we fail to convert our audio/video into a permanent format. Like books with barcodes.
"I disapprove of what you say, but I will defend to the death your right to say it." - historian Evelyn Beatrice Hall
They'll probably quit after they dig up goatse.cx.
At first, I got really excited because I thought there was a TV show about digital archaeology that was revealing new information about the ABC show 'Lost', through interconnected web sites. 'Lost' used "args" (alternate reality games) via hidden web sites, to keep up fan interest between seasons.
Then I realized I must still be too addicted to 'Lost' even though it's now off the air. I should find something else to do. :-)
Some nice stuff there.
Damn! I thought I was going to finally find out why they were on the island.
Digital Archaeology Show Reveals 'Lost' Web Sites
Yeah, they're not hard to find...
Spoken as someone that worked in Whitechapel for ten years, it's somewhere you move away from, not to. Trust me on this, the only people that think it's trendy to live somewhere like that are journalists
Attention, all honor students will be rewarded with a trip to an archeological dig.
Conversely, all detention students will be punished with a trip to an archeological dig.
Even then there existed something called copyright. So do they have explicit permission, because if no explicit exceptions are given, copyright is implied. The words copyright and the sign and year are a nice extra and make it easier to proof, but not needed.
Don't fight for your country, if your country does not fight for you.
For those who aren't familiar with Shoreditch, this music video provides a quick primer.
Interesting exhibition, though the article's reference to the UK's "first digital archive" is unclear. I'm guessing that they are referring to the UK Web Archive, which has been accessible for a number of years, but only officially launched at the start of 2010.
Oh MAN, fuck you very much for reminding me of Supergreg. I've met people back then who thought he was serious.
http://www.youtube.com/watch?v=V8wrIgbiCgs
DUDE. It's Sacha Baron Cohen. So obviously that was the first thing I've ever seen from him. ^^
Who is General Failure and why is he reading my hard disk?