The Internet Archive Has Saved Over 10,000,000,000,000,000 Bytes of the Web
An anonymous reader writes "Last night, the Internet Archive threw a party; hundreds of Internet Archive supporters, volunteers, and staff celebrated that the site had passed the 10,000,000,000,000,000 byte mark for archiving the Internet. As the non-profit digital library, known for its Wayback Machine service, points out, the organization has thus now saved 10 petabytes of cultural material."
The announcement coincided with the release of an 80-terabyte dataset for researchers and, for the first time, the complete literature of a people: the Balinese.
How much of that is porn, I wonder.
Well, I guess they didn't have time to write much, being busy dealing with Orcs and Balrogs.
What about the Thorinim?
And nothing of value was saved...
If you want news from today, you have to come back tomorrow.
That's what they're for.
Counting zeroes is a chore.
I need a car analogy about the Library of Congress before i can understand that number.
hookers and grits.
Should there be a gigantic catastrophe, none of this will be useful to the survivors. Chiseled on stone is the only way.
How do you choose a date? Whose company would you enjoy?
Well, one thing you can consider is looks. Woody thought of Janice and how good looking she was. He'd really have to rate to date her. Yes, he'd enjoy that, except... Well, it's too bad Janice always acts so superior. She'd make a fellow feel awkward and bored.
Well, perhaps someone who doesn't feel so superior. There's Betty. And yet, it just doesn't seem as if she'd be much fun.
What about Anne? She knows how to have a good time, and how to make the fellow with her relax, too. Yes, that's what a boy likes.
Yes, the Internet now provides everything you ever needed to know but were afraid to ask.
10,000,000,000,000,000 is 8.88178 Petabytes. Remember a kilobyte is 1024 bytes not 1000 bytes.
...since the TOS specifically prohibits copying data from the site:
"Our terms of use specify that users of the Wayback Machine are not to copy data from the collection. If there are special circumstances that you think the Archive should consider, please contact info at archive dot org. "
Warrick hasn't been taking new requests for months (and I'm sure it's more of a research tool than an actual service for the public), and the site effectively blocks attempts to backup data using wget. It makes me wonder who (or what) this archive really serves, because it's most certainly not the general public.
99% of this is probably porn.
>ls
The Internet Archive Has Saved Over 10,000,000,000,000,000 Bytes of the Web
>ls -h
The Internet Archive Has Saved Over 9095 Terabytes of the Web
I have never understood why the few archive sites, that I have been to, never back up the entire web site, instead of just a few important pages and images. I can understand not accessing pages that are supposed to be secure, but all other pages should be fair game. This is most important for product knowledge. Some times a company takes down its site and images. It would be nice to have an archive to go to.
testing out my trending skills
There is a Beatles reference here somewhere.
10,000,000,000,000,000 Bytes = 8.88 Petabytes
I don't know if they have done anything about this recently, but there was a problem with domain parking sites putting up a robots.txt that instructs Archive.org to delete or suppress any archives of the site that was there previously. Have run in to a few sites like that. If someone dies and their site goes with them, it isn't right for some squatter to remove their work from history.
And I wish I could pull up historic copies of the original altavista.digital.com.
How nice of them to do the archiving and release such a large dataset.
Where can I download the file?
It looks like they've copied my website and are therefore infringing my copyright.
But I won't be suing them because I don't mind, because I'm not Apple.
I should use this sig to advertise my book ISBN-13 : 978-1501515132.
Is that 10,000,000,000,000,000 bytes created or saved?
On one hand, it's a HORRIBLE violation of everyone's privacy. The more you know about how things work, the worse it gets. It's way worse than you think at first.
On the other hand, it's amazingly nice to be able to look at some old site that you thought was forever forgotten. But they may have no idea it's still there and want it gone, etc.
are they using for backups?
If you want to keep something private, maybe you shouldn't make it available to everyone on the Web?
Dilbert RSS feed
Maybe you fail to grasp basic psychology?
I know the prefix invokes unpleasant connotations, but it also means 10^15.
If this was "News for Nerds" the title would read:
The Internet Archive Has Saved Over 10^16 Bytes of the Web!
Do you want to impress us on how looong your zeros* are?
* Hint: I'm talking about your penis.
Sorry, I can't comprehend that number, could some young newspaper hack put it in terms of olympic sized swimming pools and/or UK double decker buses for me please...
They should print it all off, for safekeeping.
With all those pages stored why does it always tell me that page can't be found?
They should have written it in bits: 80,000,000,000,000,000 !!
Shame about the lack of images*, archive.org is the only remaining evidence of Cliff Bleszinski's Cat-Scan.com. The site doesn't have the same comedy value without all the scans of squished cats.
*Yes, yes, I know that archiving images would require many extra fucktons of storage, but it would be worth it in some cases.
http://web.archive.org/web/20040202004210/http://www.cs.auckland.ac.nz/~pgut001/links.html
http://web.archive.org/web/20040206214035/http://www.cs.auckland.ac.nz/~pgut001/links/archives.html
http://web.archive.org/web/20060831063210/http://faculty.ncwc.edu/toconnor/reform.htm
http://web.archive.org/web/20060831063224/http://faculty.ncwc.edu/toconnor/data.htm
http://web.archive.org/web/20060831081811/http://faculty.ncwc.edu/toconnor/thnktank.htm
http://web.archive.org/web/20070207050215/http://faculty.ncwc.edu/toconnor/sources.htm
http://web.archive.org/web/20070217052232/http://faculty.ncwc.edu/TOConnor/427/427links.htm
http://web.archive.org/web/20100528020113/http://milw0rm.com/
http://web.archive.org/web/20040215020827/http://www.linux-mag.com/2003-09/acls_01.html
http://web.archive.org/web/20041031074320/http://sun.soci.niu.edu/~rslade/secgloss.htm
http://web.archive.org/web/20041125131921/http://tips.linux.com/tips/04/11/23/2022252.shtml?tid=100&tid=47&tid=35
http://web.archive.org/web/20041231085409/http://www.cs.auckland.ac.nz/~pgut001/links.html
http://web.archive.org/web/20050306035558/http://www.spitzner.net/linux.html
http://web.archive.org/web/20060712182215/http://linuxgazette.net/128/saha.html
http://web.archive.org/web/20090109020415/http://www.securityfocus.com/print/infocus/1414
http://web.archive.org/web/20100529035423/http://www.cert.org/current/services_ports.html
http://web.archive.org/web/20070717124745/http://www.tldp.org/linuxfocus/English/Archives/lf-2003_01-0278.pdf
http://web.archive.org/web/20060712151452/http://jbd.zayda.net/enscribe/
http://web.archive.org/web/20040608141549/http://all.net/journal/netsec/1997-12.html
http://web.archive.org/web/20060220113124/http://www.dss.mil/training/salinks.htm
http://web.archive.org/web/20080222191230/http://the.jhu.edu/upe/2004/03/23/about-van-eck-phreaking/
What's a Wetback Machine?
It's great that archive.org is doing this, but it's such an important part of history so I thought I would do a mini-version for the pages I visit, just to be able to refer back to stuff. I've been using the Firefox addon called Shelve to save all pages I visit on my home computer for about 2 months now (at most one version for each day). It's a total of 5.8 GB. It's not useful for browsing though, I'd love it if it was better integrated with Firefox such that I could choose among all versions of each page. There's sometimes some excellent information on university pages or cheap hosting, that could be 10 years old, and you never really know how long it's going to stay up..
Anyway, this may give some perspective too; 2 months of daily snapshots of slashdot, other news, some tech stuff and a little Facebook takes just 5.8 GB.
What OS and file system are they using to store all that data?
Trolls *trying* to bury this http://tech.slashdot.org/comments.pl?sid=3213635&cid=41795713 fail again.
Trolls *trying* to "bury" this fail yet again (keep blowing your mod points, you'll run out soon) http://tech.slashdot.org/comments.pl?sid=3213635&cid=41795713
Imagine how many bits they have saved!
10 Petabytes of information is insignificant. My corporate network has that much data, and backs up several hundred Terabytes nightly.
It's great of Archive.org to do this, but I wish they'd pay more attention to the quality of some of their material. The ePub and Kindle conversions of some of their ebooks are truly abysmal. It's a double shame that many of those are available nowhere else.
11258999068426240 bytes. This is 10PB
Bitch about how hard it is to context switch the normal meaning of the metric prefixes when you see the bytes keyword next to it, indicating a number denoted in the base 2 number system.
Score -3: Nonconformist.