Slashdot Mirror


The Internet Archive Has Saved Over 10,000,000,000,000,000 Bytes of the Web

An anonymous reader writes "Last night, the Internet Archive threw a party; hundreds of Internet Archive supporters, volunteers, and staff celebrated that the site had passed the 10,000,000,000,000,000 byte mark for archiving the Internet. As the non-profit digital library, known for its Wayback Machine service, points out, the organization has thus now saved 10 petabytes of cultural material." The announcement coincided with the release of an 80-terabyte dataset for researchers and, for the first time, the complete literature of a people: the Balinese.

16 of 135 comments (clear)

  1. Relevance of byte count by Anonymous Coward · · Score: 5, Funny

    How much of that is porn, I wonder.

    1. Re:Relevance of byte count by martin-boundary · · Score: 4, Funny

      If only one of those files is a MP3, the RIAA is going to have an orgasm.

    2. Re:Relevance of byte count by Xtifr · · Score: 4, Insightful

      They have over 1.5 million unique audio files in the Live Music Archive alone. I know because I helped them count. (That's unique files, not counting the duplicates in different formats.) If the RIAA has anything to say about it, they're serious slacking.

    3. Re:Relevance of byte count by GofG · · Score: 5, Funny

      There is a torrent on thepiratebay of every single geocities site. It's an archive, but i've downloaded it. What was your site? I'll rar it up for you.

      --
      GFA/M/S d-- s: a--- C++++ UBL++$ P+ L+++ !E- W++ N+ !o K- w--- !O !M !V PS++ PE Y+ PGP+ t+++ 5- X+ R tv@ b++ DI++++ D+ G
    4. Re:Relevance of byte count by GofG · · Score: 5, Interesting

      No, go ahead and mod me down. Every time i post, I look at my user ID and think "GOD FUCKING DAMNIT IF I HAD WAITED LIKE TEN MINUTES I WOULD HAVE HAD A PALINDROME AUAUUUUUUGGGHHH"

      i deserve all the downmods i get, accidental or otherwise.

      --
      GFA/M/S d-- s: a--- C++++ UBL++$ P+ L+++ !E- W++ N+ !o K- w--- !O !M !V PS++ PE Y+ PGP+ t+++ 5- X+ R tv@ b++ DI++++ D+ G
    5. Re:Relevance of byte count by Raenex · · Score: 5, Funny

      when copyright runs out

      Thanks for the laugh.

  2. Yes, but... by Lordfly · · Score: 4, Funny

    I need a car analogy about the Library of Congress before i can understand that number.

    --
    hookers and grits.
    1. Re:Yes, but... by Squeeself · · Score: 4, Interesting

      I know this was in jest, but in this case, unlike so many other times this joke is made, it's slightly relevant. A quick Google turned up the following incomplete info http://www.quora.com/Library-of-Congress/How-much-data-does-the-library-of-congress-actually-represent which states tape storage capacity of the Library of Congress circa 2011 at 4.5 petabytes. The answer, then, is the this is approximately ~2 Library of Congresses of data, which is just a tad bit much to fit in the trunk of your car. It's going to take a few trips to the Library and back to move that data around.

  3. Indispensable reference for slashdotters by guttentag · · Score: 5, Insightful
    For instance, note the archived film "Dating: Do's and Don'ts" (1949) It begins thus:

    How do you choose a date? Whose company would you enjoy?

    Well, one thing you can consider is looks. Woody thought of Janice and how good looking she was. He'd really have to rate to date her. Yes, he'd enjoy that, except... Well, it's too bad Janice always acts so superior. She'd make a fellow feel awkward and bored.

    Well, perhaps someone who doesn't feel so superior. There's Betty. And yet, it just doesn't seem as if she'd be much fun.

    What about Anne? She knows how to have a good time, and how to make the fellow with her relax, too. Yes, that's what a boy likes.

    Yes, the Internet now provides everything you ever needed to know but were afraid to ask.

  4. All of which is rather useless... by pongo000 · · Score: 4, Interesting

    ...since the TOS specifically prohibits copying data from the site:

    "Our terms of use specify that users of the Wayback Machine are not to copy data from the collection. If there are special circumstances that you think the Archive should consider, please contact info at archive dot org. "

    Warrick hasn't been taking new requests for months (and I'm sure it's more of a research tool than an actual service for the public), and the site effectively blocks attempts to backup data using wget. It makes me wonder who (or what) this archive really serves, because it's most certainly not the general public.

  5. looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 4, Insightful

    10,000,000,000,000,000 Bytes = 8.88 Petabytes

    1. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 5, Informative

      You have that backwards, kilo, mega, giga, tera and so forth are base ten prefixes and have been for quite a bit longer than people have been misusing them to refer to base 2 numbers. As such it made more sense to leave it consistent with everything else and make a new prefix for the binary numbers.

  6. Domain parkers deleting archives by linebackn · · Score: 5, Informative

    I don't know if they have done anything about this recently, but there was a problem with domain parking sites putting up a robots.txt that instructs Archive.org to delete or suppress any archives of the site that was there previously. Have run in to a few sites like that. If someone dies and their site goes with them, it isn't right for some squatter to remove their work from history.

    And I wish I could pull up historic copies of the original altavista.digital.com.

  7. Download Link? by mysidia · · Score: 4, Interesting

    How nice of them to do the archiving and release such a large dataset.

    Where can I download the file?

  8. My Poor Infringed Copyright!! by TechyImmigrant · · Score: 4, Funny

    It looks like they've copied my website and are therefore infringing my copyright.

    But I won't be suing them because I don't mind, because I'm not Apple.

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
  9. What the hell by nuckfuts · · Score: 5, Interesting

    are they using for backups?