Slashdot Mirror


The Internet Archive Has Saved Over 10,000,000,000,000,000 Bytes of the Web

An anonymous reader writes "Last night, the Internet Archive threw a party; hundreds of Internet Archive supporters, volunteers, and staff celebrated that the site had passed the 10,000,000,000,000,000 byte mark for archiving the Internet. As the non-profit digital library, known for its Wayback Machine service, points out, the organization has thus now saved 10 petabytes of cultural material." The announcement coincided with the release of an 80-terabyte dataset for researchers and, for the first time, the complete literature of a people: the Balinese.

135 comments

  1. Relevance of byte count by Anonymous Coward · · Score: 5, Funny

    How much of that is porn, I wonder.

    1. Re:Relevance of byte count by martin-boundary · · Score: 4, Funny

      If only one of those files is a MP3, the RIAA is going to have an orgasm.

    2. Re:Relevance of byte count by Xtifr · · Score: 4, Insightful

      They have over 1.5 million unique audio files in the Live Music Archive alone. I know because I helped them count. (That's unique files, not counting the duplicates in different formats.) If the RIAA has anything to say about it, they're serious slacking.

    3. Re:Relevance of byte count by girlintraining · · Score: 1

      If only one of those files is a MP3, the RIAA is going to have an orgasm.

      Correction: Evilgasm.

      --
      #fuckbeta #iamslashdot #dicemustdie
    4. Re:Relevance of byte count by Anonymous Coward · · Score: 1

      Sweet! Did you guys save my Geocities page too?

    5. Re:Relevance of byte count by pongo000 · · Score: 1

      They have over 1.5 million unique audio files in the Live Music Archive alone.

      Since they can't be copied per the terms of the TOS, what good do they serve? Why bother counting something you technically can't access?

    6. Re:Relevance of byte count by Anonymous Coward · · Score: 0

      Exclusive-Or-gasm. FTFY.

    7. Re:Relevance of byte count by dohzer · · Score: 1

      4kB should be enough for anyone.

    8. Re:Relevance of byte count by Anonymous Coward · · Score: 2, Insightful

      Because eventually they WILL be accessable when copyright runs out. But if nobody other than the 'rightsholders' have copies, that wouldn't matter, they could trivially remaster them, then have copyright over the remasters for another century after destroying the originals so they could never get out.

    9. Re:Relevance of byte count by GofG · · Score: 5, Funny

      There is a torrent on thepiratebay of every single geocities site. It's an archive, but i've downloaded it. What was your site? I'll rar it up for you.

      --
      GFA/M/S d-- s: a--- C++++ UBL++$ P+ L+++ !E- W++ N+ !o K- w--- !O !M !V PS++ PE Y+ PGP+ t+++ 5- X+ R tv@ b++ DI++++ D+ G
    10. Re:Relevance of byte count by Anonymous Coward · · Score: 1

      Sorry my finger slipped when I went to mod this up. Can I undo!

    11. Re:Relevance of byte count by Xtifr · · Score: 1

      I think you must be looking at the wrong part of the Archive. Everything in the Live Music section and the Netlabels section is public domain or licensed under a CC license or equivalent. The media collections are separate from the Wayback Machine.

    12. Re:Relevance of byte count by Xtifr · · Score: 2

      Probably. I found a copy of my first-ever homepage, which actually predated Geocities, and was probably even more useless than your average Geocities page. :)

    13. Re:Relevance of byte count by GofG · · Score: 5, Interesting

      No, go ahead and mod me down. Every time i post, I look at my user ID and think "GOD FUCKING DAMNIT IF I HAD WAITED LIKE TEN MINUTES I WOULD HAVE HAD A PALINDROME AUAUUUUUUGGGHHH"

      i deserve all the downmods i get, accidental or otherwise.

      --
      GFA/M/S d-- s: a--- C++++ UBL++$ P+ L+++ !E- W++ N+ !o K- w--- !O !M !V PS++ PE Y+ PGP+ t+++ 5- X+ R tv@ b++ DI++++ D+ G
    14. Re:Relevance of byte count by Raenex · · Score: 5, Funny

      when copyright runs out

      Thanks for the laugh.

    15. Re:Relevance of byte count by Mikkeles · · Score: 2

      They're exaggerating; I know there are only 256 bytes, so I think they're counting duplicates!

      --
      Great minds think alike; fools seldom differ.
    16. Re:Relevance of byte count by amanaplanacanalpanam · · Score: 0

      At least a couple pedobytes.

    17. Re:Relevance of byte count by Anonymous Coward · · Score: 0

      Maybe not the RIAA...but there's also the "shareware CD collection" section which contains russian software piracy compilation CDs, over 50gb worth of 94-99 ripped commercial games.

      I don't know who's perverting the archive with illicit material but it's there, and sure is tainting the place.

    18. Re:Relevance of byte count by jc42 · · Score: 1

      How much of that is porn, I wonder.

      Actually, only about half. The other half is lolcats.

      --
      Those who do study history are doomed to stand helplessly by while everyone else repeats it.
    19. Re:Relevance of byte count by maxwell+demon · · Score: 1

      You probably wouldn't have gotten a palindrome. Instead, you'd be even more angry for having missed the palindrome even more closely.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    20. Re:Relevance of byte count by GofG · · Score: 1

      I'm only one off of a palindrome. I don't think it's possible to be any closer.

      --
      GFA/M/S d-- s: a--- C++++ UBL++$ P+ L+++ !E- W++ N+ !o K- w--- !O !M !V PS++ PE Y+ PGP+ t+++ 5- X+ R tv@ b++ DI++++ D+ G
    21. Re:Relevance of byte count by bikubarat · · Score: 1

      70%

    22. Re:Relevance of byte count by mcgrew · · Score: 1

      Since they can't be copied per the terms of the TOS

      What are you talking about? A lot of friends of mine host their music on Archive.org.

    23. Re:Relevance of byte count by mcgrew · · Score: 1

      Nope, as good as Archive.org is, most of the pre-2000 stuff is gone forever. Much of my old gaming site is there, but not all of it. The only surviving page of Janet "Kneel" Harriott's Yello There is one I posted on my gaming site. Very liitle of mcgrew.info survives.

    24. Re:Relevance of byte count by Anonymous Coward · · Score: 0

      Any chance you could put up a rar of "Imre_Leader_Appreciation_Society". The internet seems to have lost this site.

  2. Balinese, huh? by Anonymous Coward · · Score: 2, Funny

    Well, I guess they didn't have time to write much, being busy dealing with Orcs and Balrogs.

    What about the Thorinim?

    1. Re:Balinese, huh? by K.+S.+Kyosuke · · Score: 1

      Hey, be glad it's not written in Palinese. Now *that* would have been nasty.

      --
      Ezekiel 23:20
  3. Indeed! by Frosty+Piss · · Score: 3, Funny

    And nothing of value was saved...

    --
    If you want news from today, you have to come back tomorrow.
    1. Re:Indeed! by Anonymous Coward · · Score: 0

      How many users are not aware they are not being paid by the Alexa Internet Archive?

    2. Re:Indeed! by Anonymous Coward · · Score: 0

      Come on, you stupid asshats, it's a joke based on the "and nothing of value was lost..."

      What a bunch of overly sensitive juveniles.

    3. Re:Indeed! by Anonymous Coward · · Score: 0

      Come on, you stupid asshats, it's a joke based on the "and nothing of value was lost..."

      What a bunch of overly sensitive juveniles.

      Captain Obvious to the rescue.

  4. Use prefixes. by Anonymous Coward · · Score: 0, Funny

    That's what they're for.
    Counting zeroes is a chore.

    1. Re:Use prefixes. by Anonymous Coward · · Score: 0

      Dammit, where's my prefix flamewar! This is slashdot, how am I supposed to root for the anti-"mebi" crowd when both sides are a no-show? I mean, look at this story! It's an obvious cue for that old chestnut to be brought out and fought over.

        Maybe the pro-SI guys just can't bring themselves to defend "pebibytes?"

        Hey, that would mean we win by default! Alright! 1024! 1024! 1024! Whoooo!

  5. Yes, but... by Lordfly · · Score: 4, Funny

    I need a car analogy about the Library of Congress before i can understand that number.

    --
    hookers and grits.
    1. Re:Yes, but... by MangoCats · · Score: 2

      It's like the Library of Congress stuffed floor to ceiling with Service Manuals?

    2. Re:Yes, but... by Squeeself · · Score: 4, Interesting

      I know this was in jest, but in this case, unlike so many other times this joke is made, it's slightly relevant. A quick Google turned up the following incomplete info http://www.quora.com/Library-of-Congress/How-much-data-does-the-library-of-congress-actually-represent which states tape storage capacity of the Library of Congress circa 2011 at 4.5 petabytes. The answer, then, is the this is approximately ~2 Library of Congresses of data, which is just a tad bit much to fit in the trunk of your car. It's going to take a few trips to the Library and back to move that data around.

    3. Re:Yes, but... by thygate · · Score: 1

      Well it's about 30 libraries of congress and 3 SUV's, plus or minus a minivan.

    4. Re:Yes, but... by deblau · · Score: 2

      If you live in Vancouver, it's roughly the number of nanometers you would cover on a round trip drive to the Library of Congress.

      --
      This post expresses my opinion, not that of my employer. And yes, IAAL.
    5. Re:Yes, but... by jimmydevice · · Score: 1

      At what tape density?

    6. Re:Yes, but... by Anonymous Coward · · Score: 0

      I need a car analogy about the Library of Congress before i can understand that number.

      Most of it is a bunch of lubricated tubes, usually sets of two.

    7. Re:Yes, but... by oodaloop · · Score: 1

      It's roughly 13,000 VW Beetles filled with telephone books.

      --
      Tic-Tac-Toe, Global Thermonuclear War, and relationships all have the same winning move.
    8. Re:Yes, but... by Voyager529 · · Score: 1

      this is approximately ~2 Library of Congresses of data, which is just a tad bit much to fit in the trunk of your car. It's going to take a few trips to the Library and back to move that data around.

      In books, yes. In 32GByte MicroSD cards, it might be possible to do it in one trip with a large enough vehicle.

  6. Future Generations by Anonymous Coward · · Score: 0

    Should there be a gigantic catastrophe, none of this will be useful to the survivors. Chiseled on stone is the only way.

    1. Re:Future Generations by metalmaster · · Score: 0

      You mean like hurricane sandy?

    2. Re:Future Generations by Anonymous Coward · · Score: 0

      Hey smilin' strange...

      You're lookin happily deranged.

    3. Re:Future Generations by ixnaay · · Score: 1

      The First Council of the Druids will find a way to recover the data.

    4. Re:Future Generations by Voyager529 · · Score: 1

      The First Council of the Druids will find a way to recover the data.

      And when they do, they will be known as the Disk Druids.

  7. Indispensable reference for slashdotters by guttentag · · Score: 5, Insightful
    For instance, note the archived film "Dating: Do's and Don'ts" (1949) It begins thus:

    How do you choose a date? Whose company would you enjoy?

    Well, one thing you can consider is looks. Woody thought of Janice and how good looking she was. He'd really have to rate to date her. Yes, he'd enjoy that, except... Well, it's too bad Janice always acts so superior. She'd make a fellow feel awkward and bored.

    Well, perhaps someone who doesn't feel so superior. There's Betty. And yet, it just doesn't seem as if she'd be much fun.

    What about Anne? She knows how to have a good time, and how to make the fellow with her relax, too. Yes, that's what a boy likes.

    Yes, the Internet now provides everything you ever needed to know but were afraid to ask.

    1. Re:Indispensable reference for slashdotters by Anonymous Coward · · Score: 1

      How do you choose a date? Whose company would you enjoy? ... Well, it's too bad Janice always acts so superior. She'd make a fellow feel awkward and bored. ... What about Anne? She knows how to have a good time, and how to make the fellow with her relax, too. Yes, that's what a boy likes.

      Yes Janice, get your head out of your ass. You could take a few tips from Anne -- she's a pro."

  8. The more you know. by Anonymous Coward · · Score: 0

    10,000,000,000,000,000 is 8.88178 Petabytes. Remember a kilobyte is 1024 bytes not 1000 bytes.

    1. Re:The more you know. by Aldanga · · Score: 1, Informative

      Incorrect. A kibibyte is 1024 bytes, while a kilobyte is 1000 bytes.

      I don't usually care enough to point out the distinction, but since you did, I figured a correction was appropriate.

    2. Re:The more you know. by Anonymous Coward · · Score: 0

      You were around in the 50s and 60s for marketing meetings for delay line memory when some of them decided to use powers of tens as they were not constructed from a power of 2 number of discrete elements and instead based more on communication conventions which also tended to powers of ten bits at the time?

    3. Re:The more you know. by Anonymous Coward · · Score: 0

      Bob, is that you?

    4. Re:The more you know. by 91degrees · · Score: 2

      If you are referring to storage sizes in relation to computers, be it RAM, disk sizes, etc., it is correct to express them in powers of 2.

      No it's not. It's sometimes convenient to do so, especially for RAM, but the prefixes used are defined by the SI and recognised by a large number of international organisations including the IEEE.

      Yes, marketing people find this useful. But it's also recognised as correct by many engineers. It's actually quite useful. Using a certain type of modulation, a 1KHz signal can transfer 1 kilobit in 1 second, and this will take 8 second to transfer 1 kilobyte of data. Why does it make any sense to worry about binary addressing here?

    5. Re:The more you know. by Anonymous Coward · · Score: 0

      SI prefixes are standardized for a reason.

    6. Re:The more you know. by Anonymous Coward · · Score: 0

      prefixes used are defined by the SI

      For storage sizes they're defined based on powers of two, which overrides the SI definition because more specific rules always override more general ones.

    7. Re:The more you know. by 91degrees · · Score: 1

      For storage sizes they're defined based on powers of two, which overrides the SI definition because more specific rules always override more general ones.

      Which rules? Where is a kilobyte defined as 1024 bytes by any organisation with any influence?

      And why base it on powers of two? It's illogical. The only time you're forced into a power of two is in the address space available to a CPU.

  9. All of which is rather useless... by pongo000 · · Score: 4, Interesting

    ...since the TOS specifically prohibits copying data from the site:

    "Our terms of use specify that users of the Wayback Machine are not to copy data from the collection. If there are special circumstances that you think the Archive should consider, please contact info at archive dot org. "

    Warrick hasn't been taking new requests for months (and I'm sure it's more of a research tool than an actual service for the public), and the site effectively blocks attempts to backup data using wget. It makes me wonder who (or what) this archive really serves, because it's most certainly not the general public.

    1. Re:All of which is rather useless... by Anonymous Coward · · Score: 0

      It serves Alexa, part of Amazon. Anyone who wants content excluded from the archive should notify them.

    2. Re:All of which is rather useless... by Anonymous Coward · · Score: 0

      Archive.org also seems to be more than willing to comply with foreign requests to delete website content they have archived. ex:
      http://nspcanada.nfshost.com/ is another website that Archive.org will most likely remove when requested to do so by the Canadian court.
      ref: http://www.whitenewsnow.com/paul-fromms-cafe/34485-judge-ponders-sending-dissident-prison-not-shutting-down-his-website.html

    3. Re:All of which is rather useless... by happyscientist · · Score: 1

      Besides the fact that it is not open, how are you supposed to compute against it? Download the 80TB and run it on your private data center? It should be fully open and available on a platform like AWS so people can actually use it.

    4. Re:All of which is rather useless... by Xtifr · · Score: 2

      A) You can read it just like you can read normal webpages on the main web, most of which also don't allow you to copy them.
      B) The Archive is more than just the Wayback machine. They also have what is almost certainly the worlds largest digital collection of public domain and CC-licensed media files in their media collections.

    5. Re:All of which is rather useless... by Anonymous Coward · · Score: 1

      On A: reading webpages IS copying them. Any attempt at distinction, given the technical details, is INSANE.

    6. Re:All of which is rather useless... by Anonymous Coward · · Score: 0

      Exactly my thoughts and experience too.

      We've tried contacting them with queries about access to URL lists for specific countries... //sound of crickets//

  10. Great! by Anonymous Coward · · Score: 0

    99% of this is probably porn.

  11. -h by Anonymous Coward · · Score: 0

    >ls
    The Internet Archive Has Saved Over 10,000,000,000,000,000 Bytes of the Web
    >ls -h
    The Internet Archive Has Saved Over 9095 Terabytes of the Web

    1. Re:-h by maxwell+demon · · Score: 1

      No, -h always takes the largest applicable unit. Thus it would report 9 Petabytes. No wait, 8 Petabytes, because it always rounds down.

      --
      The Tao of math: The numbers you can count are not the real numbers.
  12. They Should Copy All Of The Web Site by eugene+ts+wong · · Score: 2

    I have never understood why the few archive sites, that I have been to, never back up the entire web site, instead of just a few important pages and images. I can understand not accessing pages that are supposed to be secure, but all other pages should be fair game. This is most important for product knowledge. Some times a company takes down its site and images. It would be nice to have an archive to go to.

    1. Re:They Should Copy All Of The Web Site by Anonymous Coward · · Score: 0

      As someone who has been looking for a variety of old pages linked to off websites. I have to agree.

      I've gotten linked to archive.org looking for both SUN technical PDFs, which were totally eliminated after the oracle purchase, as well as various old dos era apps linked to off old geocities websites and such. The files in question have basically fallen off the face of the internet, but the pages linking to them haven't.

      The utility of the latter is being severely inhibited by the lack of archiving of the former.

  13. They had to count them all. by Anonymous Coward · · Score: 0

    There is a Beatles reference here somewhere.

    1. Re:They had to count them all. by fustakrakich · · Score: 1
      --
      “He’s not deformed, he’s just drunk!”
  14. looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 4, Insightful

    10,000,000,000,000,000 Bytes = 8.88 Petabytes

    1. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 3, Informative

      looks like you forgot to spell pebibytes correctly

    2. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 0

      Pedobytes.

    3. Re:looks like you forgot to add '-h' switch by AmiMoJo · · Score: 0

      They should have made a pebibyte 1,000,000,000,000,000. Trying to redefine petabyte was stupid.

      --
      const int one = 65536; (Silvermoon, Texture.cs)
      SJW, n: "Someone I don't like, and by the way I'm a fuckwit" - AC
    4. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 5, Informative

      You have that backwards, kilo, mega, giga, tera and so forth are base ten prefixes and have been for quite a bit longer than people have been misusing them to refer to base 2 numbers. As such it made more sense to leave it consistent with everything else and make a new prefix for the binary numbers.

    5. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 0

      As such it made more sense to leave it consistent with everything else and make a new prefix for the binary numbers.

      In the context of storage sizes they were well-established with the binary-based definitions, so changing them to be decimal-based isn't "leaving" anything.

    6. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 0

      No. If they were to rename the decimal prefixes they would have to call it peDEbyte. Bi stands for binary after all. Incidently, pede is French for "gay" in the sense of homosexual.

    7. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 0

      Why would you want to use binary-based definitions for storage anyway? 1024 bytes in a KB, 5280 feet in a mile -- both are archaic. You shouldn't have to use a calculator to convert 1,000,000,000,000,000 into petabytes. Use metric! Your life will be easier.

    8. Re:looks like you forgot to add '-h' switch by swillden · · Score: 2

      As such it made more sense to leave it consistent with everything else and make a new prefix for the binary numbers.

      In the context of storage sizes they were well-established with the binary-based definitions, so changing them to be decimal-based isn't "leaving" anything.

      Not true. In the context of storage sizes they were well-established to be base-10 definitions from the dawn of the computer age up until the 1980s or so. Only in the last 30 years or so have we started using powers of two units, and then only for RAM. Up until then, RAM was measured in powers-of-10 words, and in disk-based storage base 10 was and still is the norm. Network data rates likewise are and always have been in powers-of-10 units.

      This is why it's useful to be careful to use the proper prefix. 10 petabytes is approximately 8.8 pebibytes. See? No confusion.

      --
      Note to ACs: I usually delete AC replies without reading them. If you want to talk to me, log in.
    9. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 0

      The reason kilo, mega and friends are properly base 2 numbers when used with computers are due to the underlying hardware constraints.
      Using RAM as an an example, the capacity is typically in powers of two. The number of memory locations accessible by 10 address lines is
      2, (the lines are either in a "0" or a "1" state) times the number of address lines - id est 2 to the tenth power, or 1024 in base 10.

      An one that use kilo as 1000 in a presentation, has my immediate suspicion that he only know marketing speak and has limited technical knowledge.

    10. Re:looks like you forgot to add '-h' switch by maxwell+demon · · Score: 1

      No. If they were to rename the decimal prefixes they would have to call it peDEbyte. Bi stands for binary after all. Incidently, pede is French for "gay" in the sense of homosexual.

      Is there another meaning of "gay"?

      --
      The Tao of math: The numbers you can count are not the real numbers.
    11. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 0

      There's a reason too why there are 5280 feet in a mile -- that doesn't mean it's a good idea to use that multiplier. The fraction of people who have to deal with "the number of memory locations accessible by 10 address lines" is small. The benefits of using a round multiplier like 1000 is large, so people should use that by default. Technical people can still use kibibytes in a technical context if they really need to.

    12. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 0

      No. If they were to rename the decimal prefixes they would have to call it peDEbyte. Bi stands for binary after all. Incidently, pede is French for "gay" in the sense of homosexual.

      Is there another meaning of "gay"?

      When you hear someone does something with gay abandon, what's the picture that comes to your mind?

    13. Re:looks like you forgot to add '-h' switch by Anonymous Coward · · Score: 0

      Yes.
      "Happy".

    14. Re:looks like you forgot to add '-h' switch by mcgrew · · Score: 1

      Is there another meaning of "gay"?

      Within living memory (my own memory in fact) "Gay" didn't mean "homosexual", it meant happy and carefree. The Christmas song "deck the Halls" isn't about transvestites ("Don we now our gay apparell").

    15. Re:looks like you forgot to add '-h' switch by sysrammer · · Score: 1

      Yeah, when I read through my old Heinlein, I get a chuckle nowadays. He tended to use gay fairly often...the old definition, of course.

      --
      His ignorance covered the whole earth like a blanket, and there was hardly a hole in it anywhere. - Mark Twain
  15. Domain parkers deleting archives by linebackn · · Score: 5, Informative

    I don't know if they have done anything about this recently, but there was a problem with domain parking sites putting up a robots.txt that instructs Archive.org to delete or suppress any archives of the site that was there previously. Have run in to a few sites like that. If someone dies and their site goes with them, it isn't right for some squatter to remove their work from history.

    And I wish I could pull up historic copies of the original altavista.digital.com.

    1. Re:Domain parkers deleting archives by Anonymous Coward · · Score: 0

      That is so retarded, such a thing shouldn't be possible.
      Robots.txt shouldn't ever be retroactive without direct request from the owner. (with proof and all)

    2. Re:Domain parkers deleting archives by Anonymous Coward · · Score: 0

      Yeah, a startup I used to work for had some actual useful stuff in the Wayback Machine, but a squatter took over the domain name and all the archived material went away. It seems like vandalism to me, but I can't imagine what could be done about it.

    3. Re:Domain parkers deleting archives by Anonymous Coward · · Score: 0

      *.digital.com/* and *.dec.com/* appear to be blocked from the wayback machine at the request of Compaq or HP (some subdomains like altavista.digital.com were available a few years back but disappeared with the new version)

  16. Download Link? by mysidia · · Score: 4, Interesting

    How nice of them to do the archiving and release such a large dataset.

    Where can I download the file?

    1. Re:Download Link? by Anonymous Coward · · Score: 0

      There site is utterly useless in this regard - there is no download link, contact attempts for information about anything go unanswered, etc, etc.

      I have no fucking idea who they're supposed to be serving.

  17. My Poor Infringed Copyright!! by TechyImmigrant · · Score: 4, Funny

    It looks like they've copied my website and are therefore infringing my copyright.

    But I won't be suing them because I don't mind, because I'm not Apple.

    --
    I should use this sig to advertise my book ISBN-13 : 978-1501515132.
    1. Re:My Poor Infringed Copyright!! by Anonymous Coward · · Score: 0

      It looks like they've copied my website and are therefore infringing my copyright.

      But I won't be suing them because I don't mind, because I'm not Apple.

      Good thing you don't mind, they are starting to look at "siteless websites" http://blog.archive.org/2012/10/22/siteless-website-possible-if-bittorrent-is-a-fileserver-without-a-server-what-about-a-website-without-a-site/ which would be even harder to complain about:
       

    2. Re:My Poor Infringed Copyright!! by archen · · Score: 1

      I'd be very interested in that. One thing I've started to wonder about is what will happen to my website after my death. Archive.org stopped archiving changes on my site in 2005 and it only did a so-so job of capturing things anyway. Ages after I'm gone, it's likely http websites may simply have gone away. I've started looking into services that will preserve my site for historical reasons, but I'd feel a lot better having it among a dedicated catalogue in a historical preservation.

  18. Clarification by Anonymous Coward · · Score: 0

    Is that 10,000,000,000,000,000 bytes created or saved?

  19. So split about what to feel about Archive.org... by Anonymous Coward · · Score: 0

    On one hand, it's a HORRIBLE violation of everyone's privacy. The more you know about how things work, the worse it gets. It's way worse than you think at first.

    On the other hand, it's amazingly nice to be able to look at some old site that you thought was forever forgotten. But they may have no idea it's still there and want it gone, etc.

  20. What the hell by nuckfuts · · Score: 5, Interesting

    are they using for backups?

    1. Re:What the hell by rubycodez · · Score: 2

      more disks, and they send a copy to euroarchive and the Library of Alexandria. in 2006, that copy & verify process to remote site took two weeks.

      http://www.enterprisestorageforum.com/technology/features/article.php/3633256/The-Wayback-Machine-From-Petabytes-to-PetaBoxes.htm

    2. Re:What the hell by Anonymous Coward · · Score: 0

      A very large cluster of 1,44 floppy drives. And 20,000 interns swapping them them out every 10 seconds.

    3. Re:What the hell by SlashDread · · Score: 1

      Why, the internet of course.

    4. Re:What the hell by Trogre · · Score: 1

      Isn't that a bit like doing a Google search for "google" ?

      --
      "Nine times out of ten, starting a fire is not the best way to solve the problem." - my wife
  21. Re:So split about what to feel about Archive.org.. by icebraining · · Score: 1

    If you want to keep something private, maybe you shouldn't make it available to everyone on the Web?

  22. Re:So split about what to feel about Archive.org.. by Anonymous Coward · · Score: 0

    Maybe you fail to grasp basic psychology?

  23. Just fucking say Petabytes. by Arancaytar · · Score: 2

    I know the prefix invokes unpleasant connotations, but it also means 10^15.

    1. Re:Just fucking say Petabytes. by Arancaytar · · Score: 1

      (This is in reference to the headline.)

    2. Re:Just fucking say Petabytes. by tehcyder · · Score: 1

      I know the prefix invokes unpleasant connotations, but it also means 10^15.

      When I see the word "peta" I think of naked supermodels in public protesting about animals, or something. Call me superficial but I'm prepared not to worry about the animals they're insulting if I get to see more naked supermodels.

      --
      To have a right to do a thing is not at all the same as to be right in doing it
  24. ZeroZeroZeroZeroZero... by Anonymous Coward · · Score: 0

    If this was "News for Nerds" the title would read:

    The Internet Archive Has Saved Over 10^16 Bytes of the Web!

    Do you want to impress us on how looong your zeros* are?

    * Hint: I'm talking about your penis.

  25. Big Numbers hurt by Anonymous Coward · · Score: 0

    Sorry, I can't comprehend that number, could some young newspaper hack put it in terms of olympic sized swimming pools and/or UK double decker buses for me please...

  26. Hardcopy by Anonymous Coward · · Score: 0

    They should print it all off, for safekeeping.

    1. Re:Hardcopy by tehcyder · · Score: 1

      They should print it all off, for safekeeping.

      We would then be able to get a more realistic Libraries of Congress measurement.

      --
      To have a right to do a thing is not at all the same as to be right in doing it
  27. Were's my page then? by AndyKron · · Score: 3, Informative

    With all those pages stored why does it always tell me that page can't be found?

  28. Re:f1rst by Anonymous Coward · · Score: 0

    They should have written it in bits: 80,000,000,000,000,000 !!

  29. Moar Pics! by CodeheadUK · · Score: 1

    Shame about the lack of images*, archive.org is the only remaining evidence of Cliff Bleszinski's Cat-Scan.com. The site doesn't have the same comedy value without all the scans of squished cats.

    *Yes, yes, I know that archiving images would require many extra fucktons of storage, but it would be worth it in some cases.

  30. You're Welcome. by Anonymous Coward · · Score: 0

    http://web.archive.org/web/20040202004210/http://www.cs.auckland.ac.nz/~pgut001/links.html
    http://web.archive.org/web/20040206214035/http://www.cs.auckland.ac.nz/~pgut001/links/archives.html
    http://web.archive.org/web/20060831063210/http://faculty.ncwc.edu/toconnor/reform.htm
    http://web.archive.org/web/20060831063224/http://faculty.ncwc.edu/toconnor/data.htm
    http://web.archive.org/web/20060831081811/http://faculty.ncwc.edu/toconnor/thnktank.htm
    http://web.archive.org/web/20070207050215/http://faculty.ncwc.edu/toconnor/sources.htm
    http://web.archive.org/web/20070217052232/http://faculty.ncwc.edu/TOConnor/427/427links.htm
    http://web.archive.org/web/20100528020113/http://milw0rm.com/
    http://web.archive.org/web/20040215020827/http://www.linux-mag.com/2003-09/acls_01.html
    http://web.archive.org/web/20041031074320/http://sun.soci.niu.edu/~rslade/secgloss.htm
    http://web.archive.org/web/20041125131921/http://tips.linux.com/tips/04/11/23/2022252.shtml?tid=100&tid=47&tid=35
    http://web.archive.org/web/20041231085409/http://www.cs.auckland.ac.nz/~pgut001/links.html
    http://web.archive.org/web/20050306035558/http://www.spitzner.net/linux.html
    http://web.archive.org/web/20060712182215/http://linuxgazette.net/128/saha.html
    http://web.archive.org/web/20090109020415/http://www.securityfocus.com/print/infocus/1414
    http://web.archive.org/web/20100529035423/http://www.cert.org/current/services_ports.html
    http://web.archive.org/web/20070717124745/http://www.tldp.org/linuxfocus/English/Archives/lf-2003_01-0278.pdf
    http://web.archive.org/web/20060712151452/http://jbd.zayda.net/enscribe/
    http://web.archive.org/web/20040608141549/http://all.net/journal/netsec/1997-12.html
    http://web.archive.org/web/20060220113124/http://www.dss.mil/training/salinks.htm
    http://web.archive.org/web/20080222191230/http://the.jhu.edu/upe/2004/03/23/about-van-eck-phreaking/

  31. Re:f1rst by Anonymous Coward · · Score: 0

    What's a Wetback Machine?

  32. Private archive by fa2k · · Score: 3, Interesting

    It's great that archive.org is doing this, but it's such an important part of history so I thought I would do a mini-version for the pages I visit, just to be able to refer back to stuff. I've been using the Firefox addon called Shelve to save all pages I visit on my home computer for about 2 months now (at most one version for each day). It's a total of 5.8 GB. It's not useful for browsing though, I'd love it if it was better integrated with Firefox such that I could choose among all versions of each page. There's sometimes some excellent information on university pages or cheap hosting, that could be 10 years old, and you never really know how long it's going to stay up..

    Anyway, this may give some perspective too; 2 months of daily snapshots of slashdot, other news, some tech stuff and a little Facebook takes just 5.8 GB.

    1. Re:Private archive by fa2k · · Score: 1

      It's a total of 5.8 GB.

      Seems I forgot the most important part: It's a total of over 6,000,000,000 bytes!!1

  33. What file system are they using by QuietLagoon · · Score: 1

    What OS and file system are they using to store all that data?

    1. Re:What file system are they using by maxwell+demon · · Score: 1

      Given how incomplete the stored sites are, I guess most of the data is stored on /dev/null.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    2. Re:What file system are they using by Anonymous Coward · · Score: 0

      Or worse someone else comes along and 'buys' a .com name and puts into a robots.txt that they do not want the site archived. Poof all old history is gone...

  34. Re:1st speedup & security guide for Windows by Anonymous Coward · · Score: 0

    Trolls *trying* to bury this http://tech.slashdot.org/comments.pl?sid=3213635&cid=41795713 fail again.

  35. Re:LMAO @ trolls & their "effete" moddowns... by Anonymous Coward · · Score: 0

    Trolls *trying* to "bury" this fail yet again (keep blowing your mod points, you'll run out soon) http://tech.slashdot.org/comments.pl?sid=3213635&cid=41795713

  36. "10,000,000,000,000,000 Bytes" by Anonymous Coward · · Score: 0

    Imagine how many bits they have saved!

  37. Insignificant by Anonymous Coward · · Score: 0

    10 Petabytes of information is insignificant. My corporate network has that much data, and backs up several hundred Terabytes nightly.

    1. Re:Insignificant by tehcyder · · Score: 1

      10 Petabytes of information is insignificant. My corporate network has that much data, and backs up several hundred Terabytes nightly.

      data!=information

      --
      To have a right to do a thing is not at all the same as to be right in doing it
  38. Better ebooks please by Anonymous Coward · · Score: 0

    It's great of Archive.org to do this, but I wish they'd pay more attention to the quality of some of their material. The ePub and Kindle conversions of some of their ebooks are truly abysmal. It's a double shame that many of those are available nowhere else.

  39. How many bytes by Anonymous Coward · · Score: 0

    11258999068426240 bytes. This is 10PB
    Bitch about how hard it is to context switch the normal meaning of the metric prefixes when you see the bytes keyword next to it, indicating a number denoted in the base 2 number system.
    Score -3: Nonconformist.