Slashdot Mirror


Info Glut - Five Exabytes of Data Created in 2002

securitas writes "If you had any doubts that you are overwhelmed by the volume of information in your life, a new Berekley study (PDF) shows that five exabytes of data were created in 2002, twice the 1999 total. That's five million terabytes of data, or 500,000 Libraries of Congress, which works out to about 800 MB of data for each of the 6.3 billion people on the planet. Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future. The study was conducted by University of California-Berkeley's School of Information Management and Systems professors Peter Lyman and Hal Varian. More at CNet, Infoworld, ByteAndSwitch and The Register."

284 comments

  1. And about 1% was worthwhile by XNuke · · Score: 4, Insightful

    I looks like they are counting every tiny email about "going to lunch". Lots of DATA little INFORMATION.

    1. Re:And about 1% was worthwhile by Anonymous Coward · · Score: 0, Funny

      And if you don't count all the troll and flamebait comments on Slashdot, we're down to 1 exabyte!

    2. Re:And about 1% was worthwhile by bcolflesh · · Score: 0, Funny

      There has to be a special section in that report on Nigerian email!

    3. Re:And about 1% was worthwhile by uberdave · · Score: 3, Interesting

      I wonder how much of that was duplicate data. How many copies of the Matrix are floating around online? Did they count FTP mirror sites as separate data?

      For that matter, how much of the data is real, and how much is virtual? If two sites point to the same download, is that data counted twice, or once?

    4. Re:And about 1% was worthwhile by Jason1729 · · Score: 3, Interesting

      That's a good point. How much of that was spam?

      ProfQuotes

    5. Re:And about 1% was worthwhile by tachin · · Score: 4, Insightful
      Lots of DATA little INFORMATION.
      From data you can extract "information", take a lot of those "going to lunch" mails and you can see what groups of people lunch together and at what time....
    6. Re:And about 1% was worthwhile by Carnildo · · Score: 1

      In the article, they said they did their best to filter out duplicate data.

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    7. Re:And about 1% was worthwhile by Anonymous Coward · · Score: 0

      Sir Haxalot is really in a class all its own, and so should not be included in that estimate...

    8. Re:And about 1% was worthwhile by Pieroxy · · Score: 1

      Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future

      I don't really think historians and archaeologists are ever going to be able to dig through Five Exabytes of Data. Maybe the magnetic storage is a blessing then...

    9. Re:And about 1% was worthwhile by Tenebrious1 · · Score: 4, Informative

      I wonder how much of that was duplicate data. How many copies of the Matrix are floating around online? Did they count FTP mirror sites as separate data?

      The blurb said 92% was stored on magnetic media; curious about the rest, I looked glanced around the article. Surprisingly a large part, 7%, is FILM! The reason film comprised such a large percentage is that each film reel is duplicated thousands of times to be sent to theaters around the world.

      So if they're counting duplicates in film, I'd guess they'd count duplicates in magnetic media.

      --
      -- If god wanted me to have a sig, he'd have given me a sense of humor.
    10. Re:And about 1% was worthwhile by iturbide · · Score: 1

      Duplicate data is a _good thing_. It saves your ass when the unthinkable happens, anything between the dog eating over your cdr and a plane hitting.. Oh well, you get the idea.
      Trust me, the nicest thing about stored data is its own copy safely guarded somewhere else, at at least 10 km distance andsoon.

    11. Re:And about 1% was worthwhile by MosesJones · · Score: 2, Insightful

      Its actually all my fault...

      I left this script running on the unix farm which did the following on each box

      while(true)
      rm filename
      echo "Whose the Daddy" > filename
      end while

      Its a big farm, and its been running all year. The net result is about 100k of files on the farm total... but terrabytes during the year.

      In otherwords what I mean is...

      How much of this "created" information was transient.

      --
      An Eye for an Eye will make the whole world blind - Gandhi
    12. Re:And about 1% was worthwhile by CrimeDoggy · · Score: 1

      With the amount of spam, cracked software, pr0n, illegally copyrighted material, etc out there, I'd be shocked if even 1% is information with saving. The question I have is who's responsibility is it to save it? Do we just hope that 100 years down the road the folks trying to sift through it all can separate the wheat from the chaff? Is there a market for information archivist now that can benefit and profit from the preservation of data?

    13. Re:And about 1% was worthwhile by 4of12 · · Score: 1

      historians and archaeologists are ever going to be able to dig

      A. They'll use machines to do the heavy digging.

      B. Or, the historians and archaelogists will be machines.

      A big problem will be that those 5 EB of data describing 5 years near Y2K will be dissolved in a much larger ocean of data by that time.

      --
      "Provided by the management for your protection."
    14. Re:And about 1% was worthwhile by kfg · · Score: 5, Funny

      "I wonder how much of that was duplicate data."

      3% was [AOL] Me Too! [/AOL] posts.

      1% was In Soviet Russia jokes.

      0.5% Profit!!!

      So I guess there was a fair amount of duplication.

      KFG

    15. Re:And about 1% was worthwhile by dolo666 · · Score: 1

      "I wonder how much of that was duplicate data. How many copies of the Matrix are floating around online? Did they count FTP mirror sites as separate data?"

      Not to mention all the websites online that only have keywords aimed to hack google, and nothing else, but maybe links to OTHER void pages by the same author/group/company!!

    16. Re:And about 1% was worthwhile by palutke · · Score: 1

      Don't forget MILLIONS of AOL CDs sent to mailboxes worldwide. That's a lot of ones and zeros.

      --
      'I ain't a liar, baby, and I ain't proud I just want what I'm not allowed.' -- Violent Femmes, 36-24-36
    17. Re:And about 1% was worthwhile by SiliconBateman · · Score: 1

      I wonder how they count film too, as film is not digital medium did they MPEG2 it a la DVD, or take the raw footage from the cameras (as long as it wasn't a direct analog to analog transfer)... and what about photographs - did they count them and if so at the molecular level of the photo paper?

      Unfortunately the site is /.ed so I may never know!

      --
      -- Alchohol is a hard drug. Cannabis is a soft drug.
    18. Re:And about 1% was worthwhile by Anonymous Coward · · Score: 0

      Just 1%? I was going to read the article to see what percentage of it was porn, but I'm glad I read the comments first. I'm going to have to get a faster internet connection.

    19. Re:And about 1% was worthwhile by Illbay · · Score: 1

      How much of it was redundant?

      --
      Any technology distinguishable from magic is insufficiently advanced.
    20. Re:And about 1% was worthwhile by Anonymous Coward · · Score: 0

      Some might think that the pr0n is worth saving.

    21. Re:And about 1% was worthwhile by Anonymous Coward · · Score: 0

      There is a future for you in the Office of Homeland Security. We like the way you think.

    22. Re:And about 1% was worthwhile by Anonymous Coward · · Score: 0

      I can't wait until we get to a yottabyte of data.

    23. Re:And about 1% was worthwhile by Shark · · Score: 1

      ... and 25% of it was porn, what does that leave future generations?

      --
      Mind the frickin' laser...
    24. Re:And about 1% was worthwhile by Anonymous Coward · · Score: 0

      I wonder how much of that was duplicate data.

      Five Exabytes..

      I wonder how much of that data was duplicate slashdot stories.

    25. Re:And about 1% was worthwhile by mabinogi · · Score: 1

      if you're talking about data and film, then microfilm and microfiche is probably what is meant.

      It's kind of bizarre the way that digital media is replacing these two technologies for archival purposes...since no one knows if they'll still be able to read digital media in even 20 years, but with microfiche, as long as you can magnify light, you can get the data.

      However, who am I to complain? people pay us to make microfiche...people pay us to write CDs....some people even pay us to scan the microfiche we produced for them years ago, and put it on CDs.

      --
      Advanced users are users too!
    26. Re:And about 1% was worthwhile by paroneayea · · Score: 1

      Jesus, we're counting film, and film duplicates, and god knows how many billion blogs shooting around the internet?
      Christ, this doesn't say anything about more "info" or "data" existing in our world at all! Look, every day around you there are people talking, there are smells, there are notes passed from kid to kid. Data has existed in our world forever. It isn't that our world has taken on more data than ever before, because it hasn't. If anything, there is better worldwide connectivity to said data. But more of it? Look, I bet in my own single body there's more data than exists on the internet. We've got the graphical coordinates of all my molecules, exactly what type they are taking upon, where they are going... the amount of data here is practically infinite. Once again, I restate my point: there is not a rediculously overwhelming amount of information or data in the world anymore. It's data in a particular form.
      I realize that this is probably not what the authors meant, but people need to think about this. Data exists outside the electronic form!

      --
      http://mediagoblin.org/
    27. Re:And about 1% was worthwhile by Tenebrious1 · · Score: 1

      if you're talking about data and film, then microfilm and microfiche is probably what is meant.

      That's what I thought when I saw the "film"... which is why I read further, didn't seem possible that so much had been archived to microfilm/fiche in a single year. But no, they were indeed talking about the movie industry and all the movies they distribute.

      --
      -- If god wanted me to have a sig, he'd have given me a sense of humor.
    28. Re:And about 1% was worthwhile by Tenebrious1 · · Score: 2, Funny

      I wonder how they count film too, as film is not digital medium did they MPEG2 it a la DVD, or take the raw footage from the cameras (as long as it wasn't a direct analog to analog transfer)... and what about photographs - did they count them and if so at the molecular level of the photo paper?

      I only glanced through the numbers, but couldn't find any place that said "for our purposes pictures are considered HxV resolution". For film (studio movies), they did say each frame was considered a picture and that sound contained a lot of data, but well again, I don't know how they sampled it.

      Maybe they just used "a picture is worth 1000 words". Hmm... no, at 5 characters an average word, that's only 5K per picture, way too low.

      --
      -- If god wanted me to have a sig, he'd have given me a sense of humor.
    29. Re:And about 1% was worthwhile by mabinogi · · Score: 1

      The other thing I've never understood is the whole "Information Overload" thing.

      all this "data" floating around in the world doesn't go jumping up in your face yelling "I'm data, assimilate me!" you have to look for it.

      so if you're getting overwhelmed by information, it's because you keep looking for the bloody stuff, and probably with badly designed queries. People would save themselves some pain if they figured out how to make better use of tools available so that they only had to deal with relevant data.

      --
      Advanced users are users too!
    30. Re:And about 1% was worthwhile by Valdrax · · Score: 1

      0.5% Profit!!!

      Now we know why the "New Economy" failed.

      --
      If it's for-profit but free, you're not the customer -- you're the product (e.g., the Slashdot Beta's "audience").
    31. Re:And about 1% was worthwhile by DJ+Spencer · · Score: 1
      Well, that, I think, just gets too philosophical. I mean, that's like taking the scientific approach that there are the same number of atoms in the universe, it's just a matter of how dense you pack them into one space.

      With that in mind, my question to you is this:

      Has the data always been there, but just represented in a different way? Or does the data actually change as the molecular structure changes?

      Don't get me wrong - I agree that it doesn't have to be digital to be data, but the focus of this story seemed to lean towards collective media, not data.

      Another reply here stated that you could count a blank 700MB CDR as data. I suppose, but to me that's just a media form waiting for data to be written to it - unless you consider it to be a statistic of 1 more blank CD in the pool of blank media.

      Are we confused yet?

    32. Re:And about 1% was worthwhile by Anonymous Coward · · Score: 0

      Hey, and what if a site has a meta tag for refresh to itself - that page would spin in an infinite loop and create infinite information!

      And hey - I hope they didn't count /. - I mean, talk about useless information!

    33. Re:And about 1% was worthwhile by Pathetic+Coward · · Score: 1

      One hundred years from now, PhD theses will be written about the early 21st century obsession with penis size.

    34. Re:And about 1% was worthwhile by Anonymous Coward · · Score: 0

      BOOO-RIIING

    35. Re:And about 1% was worthwhile by Fembot · · Score: 1

      Sadly "BSD is Dying" was no where to be seen in that list, so who know.... perhaps "BSD is Dying" is in itself dying too :-)

    36. Re:And about 1% was worthwhile by Anonymous Coward · · Score: 0

      Exactly. Ever clean out your basement? Did you ever find yourself wishing you hadn't gotten rid of that green & yellow sholder warmer? I think not.

      Hmm... Are the historians or packrats worried about this?

  2. Sounds about right. by Matey-O · · Score: 4, Insightful

    That's a believable number. Consider the amount of published data on Kazaa, or that 45 minutes of raw DV video is roughly 12.5 Gb*. Move 100 of your CD's to MP3s and you're consuming/creating roughly 3.5 Gb* (or more if you're using higher than 128kb MP3's). And I'm not evern commentin on pr0n.

    (*I said roughly...comment on the comment, not the mathematical precision of the statement.)

    --
    "Draco dormiens nunquam titillandus."
    1. Re:Sounds about right. by mobby_6kl · · Score: 0

      >Move 100 of your CD's to MP3s

      And if you move 100 of your MP3 cds to CD-Audio, tha would be a different number!

    2. Re:Sounds about right. by zwoelfk · · Score: 1

      Actually I think it probably undershoots the mark...

      By the article: The researchers relied on existing data such as ISBN numbers to count books and journals, as well as industry reports about data handled by enterprise servers for things such as supermarket sales and airline bookings. They performed surveys to estimate how much unique information exists on each type of hard drive.

      I don't think they attempted to collect information on more ephermeral data... For example, artists that go through many versions of a 3d model or movie or textures (each of which is data that is "created" but not all is stored), or hell, in core files alone the numbers must be staggering!

      On the other hand, this would probably be lowered quite a bit if they were looking for unique information. A lot of the data farms are compiliations of other data. But it would be a major undertaking just to define what "unique" meant. I wonder how big the "file" would be if you compressed all the worlds data with various compression algorithms.

      Ah... Interesting anyway!

    3. Re:Sounds about right. by Anonymous Coward · · Score: 0

      5Exa of pr0n...

      Someone needs a wipe...

    4. Re:Sounds about right. by Anonymous Coward · · Score: 0
      And I'm not evern commentin on pr0n.

      I will. I just finished leeching 20 gigs of porn from a website last weekend. If I keep accumulating this shit at this rate where will I put it?

    5. Re:Sounds about right. by LittleBigLui · · Score: 1
      Move 100 of your CD's to MP3s and you're consuming/creating roughly 3.5 Gb


      look in the mirror and you're creating roughly 100 kg of matter.
      --
      Free as in mason.
    6. Re:Sounds about right. by Anonymous Coward · · Score: 0

      Now factor in the amount of emails to increase the size of your tadgy, make it stay up longer, and valium to get over the depression of being scammed.

      Yep there numbers seem to accurately reflect my email inbox this week :-)'s

    7. Re:Sounds about right. by SiliconBateman · · Score: 1

      You could open it to Kazaa etc, delete it when a few have uploaded it but have a backup stored on the net for a pretty long time... I encrypted some data 3 years ago, stuck it in an .AVI but keep the hash of that .AVI and searched for it and it is still there!

      Not that I would recommend this as a particuraly reliable backup method. But I think it is one that could conceal data in a nice opaque way if ever searched/raided etc.

      --
      -- Alchohol is a hard drug. Cannabis is a soft drug.
    8. Re:Sounds about right. by Anonymous Coward · · Score: 0

      Buy a DVD burner...

    9. Re:Sounds about right. by Anonymous Coward · · Score: 0

      Well, if you are putting porn online have to common courtesy not to encrypt it - it's not like all of us are going to get raided :)

    10. Re:Sounds about right. by SiliconBateman · · Score: 1

      'tis not the porn that is encrypted but 'error' seeming like noise/fuzz in the video but is infact a somewhat-randomly placed and strong-encoded set of data. A modern version of the micro-text full stop.

      All you get is a slightly degraded video... but lossy encryption makes up the vast portion of the fuzz you get.

      --
      -- Alchohol is a hard drug. Cannabis is a soft drug.
    11. Re:Sounds about right. by Dun+Malg · · Score: 1
      On the other hand, this would probably be lowered quite a bit if they were looking for unique information. A lot of the data farms are compiliations of other data. But it would be a major undertaking just to define what "unique" meant.

      Yeah, that opens up a really messy can of worms. I guess it all depends on what you define as a discrete unit of information. Taking it to the extreme, one could say there are only two bits worth of unique digital data: 1 and 0 -- they're just combined in various orders in variable length sets...

      --
      If a job's not worth doing, it's not worth doing right.
  3. Yeah... by the_mad_poster · · Score: 4, Funny

    ...and most of it is still sitting in my Inbox at work right now.

    --
    Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
    1. Re:Yeah... by GMontag · · Score: 1

      Of that, how many PFUs of spam does it contain?

    2. Re:Yeah... by Anonymous Coward · · Score: 0

      If you can't bother to spell States right, you're not exactly qualified to comment on world affairs that are of much greater importance than either you or I. Next time you're comfortably sitting in your house watching your tv on your couch, ask yourself if the Iraqis were better off living under a mass-murderer where they lived in fear of being carried off to torture over their political views, as opposed to now where the have the freedom to speak their minds.

    3. Re:Yeah... by Anonymous Coward · · Score: 0

      ...and most of it is still sitting in my Inbox at work right now.

      Why, did Cowboy Neal accidently mail you his grocery list?

    4. Re:Yeah... by darkweasel · · Score: 0

      It's not a question about if they are better off now than before. It's that we were giving money and weapons to Saddam then, and we didn't really care that he was killing people. Now we need a easy deamon, and the fact that he was killing people suddenly matters.

      --
      .sig.
    5. Re:Yeah... by the_mad_poster · · Score: 1

      Because we all know that typing skills equate directly to a person's capability to understand complex world affairs! It doesn't matter how much you've looked into and thought about the matter, if you happen to have large hands (and we all know what that means, don't we small hands?) and can't type particularly well - YOU'RE AN IDIOT!

      Of course, you don't even know why I have that in my signature. As a result, your response is ridiculously off base from the purpose of the sig or my feelings on the subject of the war in Iraq. Therefore, you must not even be competent enough to comment on Slashdot posters' sigs, much less world affairs.

      Some people post AC because they actually have to. Some to troll. You do it because you're an idiot, apparently.

      And, of course, I'm not going to fix it now just to be a pest.

      Oh, by the way. You're not qualified to comment on the situation because you typoed:

      ...now where the have the freedom...

      The have the freedom? What the hell does that mean? Did someone beat you with the idjit stick as a child?

      --
      Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
    6. Re:Yeah... by SiliconBateman · · Score: 1

      I dislike your sig.

      "Memorize the books before burning" and what do you get? Certainly you open the possibility to differences when it comes to writing it down again. This is witnessed in the Bible (numerous translations) and many oher religious texts. Then what do you get? You get pizza donated to those who are fighting people who have a particular interpretation of how this book should have beem written while causing pain, death and disruption to the majority of people (both the ordinary Palestinian and the Israeli) in a very valuable part of the ancient world.

      --
      -- Alchohol is a hard drug. Cannabis is a soft drug.
    7. Re:Yeah... by Anonymous Coward · · Score: 0

      Apparently you are not very well read.

    8. Re:Yeah... by SiliconBateman · · Score: 1

      Someone with a sig about the Middle East troubles (donate pizza) then saying it is OK to burn books. Much of the trouble in the Middle East is caused by different interpretations of the same theme from the same period in history (books have been burnt in Islamic history and more recently flags are burnt in the Middle East troubles)

      I have read Fahrenheit 451, I have seen Equilibrium etc etc. That is a different topic. I am disappointed someone doesn't have the insight to see the hypocrisy in the 'donate pizza' campaign and blatantly promoting something which promotes misunderstanding.

      Popular media is not what everything is about, you know?

      --
      -- Alchohol is a hard drug. Cannabis is a soft drug.
    9. Re:Yeah... by WNight · · Score: 1

      That's the issue of Bush being a liar, I'd say it's pretty obvious. And yes, it has nothing to do with the Iraq war.

      However, for the people of Iraq, the war will probably be a good thing in the long run. They didn't have much freedom (look at things like Saddam's 100% mandate from the people, that's obviously not real) and the mass graves. Hell, ex-partiate Iraqis (who you must assume, know the truth) urged the USA to remove Saddam back in '91, and he hasn't gotten better.

      Similarly, the people of Afghanistan are free of the Taliban and while there are other brutal warlords looking to step in, at least the world is watch and peace-keepers are trying to stop them. Afghani women were being brutally oppressed and now they have a hope of freedom. The religions over there are even more freakishly stupid than the religions over here and the people in charge are just as quick to impose their religion on everyone possible.

      Hopefully we keep a first-world presence there long enough to help the people, instead of simply letting another set of creeps move in.

  4. Ugh. by DrEldarion · · Score: 1

    With all the time I spend at work, it seems like I've created about half of that.

  5. Obligatory pr0n comment by avkillick · · Score: 0

    That's an awful lot of pr0n!

    --
    OpenOffice tips:richhillsoftware.com
  6. Damn by Judg3 · · Score: 1, Funny

    That's a lot of porn. Though I think their stats are off a bit, as I have 800gb of porn, not mb. Oh well, better luck next year!

    --
    Looking for hardware (Currently need: Large Etch-a-Sketch) Have one? See my journal!
    1. Re:Damn by Carnildo · · Score: 3, Funny

      You've got a thousand times your allotment of porn! Think of all the poor people in Africa who you are depriving of their annual allowance!

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    2. Re:Damn by Short+Circuit · · Score: 1

      You generated 800 GB of porn?

      That's just scary...

    3. Re:Damn by pmz · · Score: 1

      Think of all the poor people in Africa who you are depriving of their annual allowance!

      Well, if they aren't wearing any clothing to begin with, they already have all the soft-core they could hope for.

    4. Re:Damn by Judg3 · · Score: 1

      Well, think about it. Unemployed geek + broadband + DV Camera + Insomnia = 800 gb of fat geek love!

      ermmm, maybe it's best to forget I wrote that.

      --
      Looking for hardware (Currently need: Large Etch-a-Sketch) Have one? See my journal!
  7. The report... by Lidless+Eye · · Score: 0, Funny

    IS IN PDF! Now we know who to blame...

  8. Dissertation by BWJones · · Score: 2, Funny

    a new Berekley study (PDF) shows that five exabytes of data were created in 2002,

    Shoot, it felt like my doctoral dissertation was responsible for at least 2 of those 5 exabytes. :-)

    --
    Visit Jonesblog and say hello.
    1. Re:Dissertation by Anonymous Coward · · Score: 0

      Ewuuu, look at me! I wrote a doctoral dissermatation! -- Onion Counterpoint Guy

  9. This artcical says 23 exabytes by SirJaxalot · · Score: 3, Informative
    1. Re:This artcical says 23 exabytes by Vaevictis666 · · Score: 3, Informative
      Your article states:

      They found that new information flowing across televisions, radios, telephones, Web sites and the Internet had increased by 3 1/2 times to a total of 18 exabytes as of 2002. The amount of new but stored (non-transmitted) information in 2002 was determined to be about five exabytes.

      This jives with the other articles. 5 exabytes generated content, 18 exabytes transferred content - still one heck of a lot of bits floating around :)

    2. Re:This artcical says 23 exabytes by mopslik · · Score: 1

      This artcical says 23 exabytes.

      Indeed, the numbers are correct, but they represent two different things.

      The researchers concluded that the amount of new information produced last year was about 23 exabytes... The amount of new but stored (non-transmitted) information in 2002 was determined to be about five exabytes.

    3. Re:This artcical says 23 exabytes by Anonymous Coward · · Score: 0

      This just shows how wasteful our society has become. We have to get the most data or the biggest SUV without any thought to the impact on our planet. I hope we all realize that we are only shooting ourselves in the foot.

    4. Re:This artcical says 23 exabytes by Anonymous Coward · · Score: 0

      posting ban is 72 hours. moderation won't help.

    5. Re:This artcical says 23 exabytes by lanswitch · · Score: 1
      If you piss someone off who has a lot of modpoints you just have to bear the consequences. Sit it out, cool down. That's what I did, and currently i'm posting at +1 again.

      I cite myself: "Think".

      Why do you think you provoke such reactions?

      I cite myself: "Think".

      FYI: i did not mod you down.

  10. Woah ... by rosewood · · Score: 1

    Subject says it all

  11. Well... by Pingular · · Score: 1

    Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.
    In 70, 60 maybe even 50 years it might be difficult accesing todays hard-disks with the futures technology. And of course (as always) it brings about the problem of how long the data lasts before it's corrupted.

    --

    When anger rises, think of the consequences.
    Confucius (551 BC - 479 BC)
    1. Re:Well... by Excen · · Score: 1

      Yeah, but how much of that information stored on hard disks is not pertinant to archaeologists, i.e. pr0n and "Wanna Big Dick? Click Here!" ads. That is the true question. I can think of only a minor percentage. CERN's supercollider only would produce about .000001% of the information produced last year.

      --
      "No beer until you finish your tequila!" -Leela's Dad
    2. Re:Well... by Prof.Phreak · · Score: 1

      In 70, 60 maybe even 50 years it might be difficult accesing todays hard-disks with the futures technology. And of course (as always) it brings about the problem of how long the data lasts before it's corrupted.

      I thus propose to transcribe all that data to clay tablets...

      --

      "If anything can go wrong, it will." - Murphy

    3. Re:Well... by Pxtl · · Score: 2, Interesting

      Amen - I'm surprised the government or companies have not encouraged the development of some sort of long-term storage system for archival purposes. What happens when you crack open that 5-year old archive of the source to see what a long-forgotten client is running, and find out the CD has skipped a few bits? Or old government documents?

      Maybe more research could be done into a marketable multi-century (millenial?) storage.

      For corporate purposes, several decades of fidelity, perhaps a century or two, would be fine - but government will need better than that.

      Can anyone think of good media to store digital data that would last a few thousand years? Optical or otherwise, everthing decays, but what goes slowest? Engraved graphite maybe? Etched titanium disks?

    4. Re:Well... by Luveno · · Score: 1
      Yeah, you'd think with the data storage people saying "it decays too quickly" and the environmentalists saying "it doesn't decay quickly enough", someone, somewhere would find a solution.

      Ok, so I'm not funny.

  12. Nothing surprising.. by attackiko · · Score: 0

    .. just slashdotters copying porn

  13. No problem here. by FrankoBoy · · Score: 2, Funny

    Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.

    Well, why won't they just print it ? Sheesh...

    1. Re:No problem here. by GaelenBurns · · Score: 5, Interesting

      I wonder how many pages of paper an exabyte of data would take up? We're talking about gigantic masses, here. Why not figure it out? I'm guessing, based on character counts from Open Office, that you can get about 2kB of data on a single sheet. That's 4kB if you use both sides. And you get around 125 sheets per pound... So, based on some guesses, it looks like it will take 2,251,799,813,685 pounds of paper to print one exabyte of this data. For all 5 exabytes, we're looking at a wieght 122 times that of the Great Pyramid. Not as much as I'd suspected... but still fun!

    2. Re:No problem here. by Smidge204 · · Score: 1

      You can probably compress that using smaller font sizes and narrower page margins!

      =Smidge=

    3. Re:No problem here. by indianajones428 · · Score: 5, Funny


      So 122 Great Pyramids = 500,000 Libraries of Congress?

      Great, another conversion factor to remember...

      --
      When a thing has been said, and said well, have no scruple. Take it and copy it. --Anatole France
    4. Re:No problem here. by Anonymous Coward · · Score: 0

      Consider this though, you can print a several megabyte picture on one page, taking up less paper and being in a more useful form than the digital code for the picture. Movies could be represented as a series of pictures (24/second), so I think in all you could use a great deal less paper for all this data than by just assuming that it's all text.

    5. Re:No problem here. by Anonymous Coward · · Score: 0

      But how much is that in elephants? That's all I want to know.

    6. Re:No problem here. by DeadSea · · Score: 1
      On the other hand, if you are printing images with a 1200 dpi laser printer, you can print 16 MB/side:

      (1200*1200)bits/in^2 * (11in*8.5in/side) * byte/8 bits * KB/1024 bytes MB/1024 kb = 16 MB/side

      If we were printing on both sides of 20lb paper we get about 800 MB/pound of paper:

      16MB/side * 2 sides/page * 500pages/20pounds = 800 MB/pound

      Now that comes 6 billion pounds or only one per person.

    7. Re:No problem here. by indianajones428 · · Score: 1


      African or European?
      Eh, what's that? (whisperwhisperwhisper)
      African or Asian? When'd they kill off all the elephants in Europe?(whisperwhisperwhisper
      Oh. Sorry 'bout that. Now then, where was I...
      Ah, right

      Well, according to this sight (the first one I found on Google), Asian elephants weigh around 5000kg/11,000lbs and African elephants weigh around 6000kg/13,000lbs (conversions courtesy of Google calculator).

      At 2,251,799,813,685 lbs, that's almost 205 million Asian elephants or over 173 million African elephants. Hope you've got a shovel, some nose plugs, and alot of time on your hands, 'cause I sure as hell ain't helpin' you clean up after 'em.

      --
      When a thing has been said, and said well, have no scruple. Take it and copy it. --Anatole France
    8. Re:No problem here. by log0 · · Score: 1

      That's progress. The largest library in the world can only store 0.0244% as much information as something built thousands of years ago to store one man's corpse.

  14. Huzzah! by GaelenBurns · · Score: 4, Interesting

    Hooray for exponential curves! It is daunting, though. As an illustration of this, I read that the White House has already turned over 2 million pages of documents relating to 9/11 to the independent investigation panel.

    1. Re:Huzzah! by HungWeiLo · · Score: 1

      I read that the White House has already turned over 2 million pages of documents relating to 9/11 to the independent investigation panel

      Security by obfuscation?

      --
      There are a huge number of yeast infections in this county. Probably because we're downriver from the bread factory.
    2. Re:Huzzah! by GaelenBurns · · Score: 1

      Absolutely. I was thinking the exact same thing. How is a 10 person panel (with maybe a hundred staffers assisting) supposed to wade through that type of material?

    3. Re:Huzzah! by Fallen_Knight · · Score: 1

      Maybe thats why it takes goverment forever to do anything?

  15. Cool by Anonymous Coward · · Score: 0

    I guess I can buy that new HardDrive I was looking at now.

  16. Temporary data? by Devil's+BSD · · Score: 1

    How about temporary and efferent data, like SSH keys and data passed through X11, used for short point-to-point transfers? It might be just me, but if this doesn't take into account that data, the total could be much higher...

    --
    I'm the Devil the Windows users warned you about.
  17. letters from nigeria by simpl3x · · Score: 1

    as i just received another couple of letter asking for assistance from the war torn regions of africa, how much of this is spam and related garbage?

    oddly enough the most useful information is often the most concise. duck!

  18. Hmmm.... by Asprin · · Score: 1


    Hmmmmm.... I think I might know where all that 'new data' came from.

    --
    "Lawyers are for sucks."
    - Doug McKenzie
  19. How much is bloatware. by Polly_was_a_cracker · · Score: 1

    Subject says it all for me but since this requires a body...

    For those curious the dictionary's definition of data is as follows.
    Factual information, especially information organized for analysis or used to reason or make decisions.
    Computer Science. Numerical or other information represented in a form suitable for processing by computer.
    Values derived from scientific experiments. Plural of datum.

    --
    I have a Cig, but do you have a light?
  20. Slashdot's small contribution by kefoo · · Score: 0

    Glad to see Slashdot is contributing to the glut by reporting on it...

  21. Let's get the standard jokes out of the way by BabyDave · · Score: 1
    That's five million terabytes of data, or 500,000 Libraries of Congress, which works out to about 800 MB of data for each of the 6.3 billion people on the planet.

    But how many {VW Beetles, encyclopedias, football fields, Coke cans, DVDs, hours of porn} is that?

    1. Re:Let's get the standard jokes out of the way by Pingular · · Score: 1

      That's five million terabytes of data, or 500,000 Libraries of Congress, which works out to about 800 MB of data for each of the 6.3 billion people on the planet.
      That's about 7 billion CDs, or more than one for each of the 6.3 billion people on the planet.

      --

      When anger rises, think of the consequences.
      Confucius (551 BC - 479 BC)
    2. Re:Let's get the standard jokes out of the way by Anonymous Coward · · Score: 0

      And they all can be bought pirated in Shanghai.

    3. Re:Let's get the standard jokes out of the way by uberdave · · Score: 1

      That's about 7 billion CDs, or more than one for each of the 6.3 billion people on the planet.

      Ah! So it's AOL's fault.

    4. Re:Let's get the standard jokes out of the way by NumLk · · Score: 3, Funny
      You forgot these jokes:

      I for one welcome our new data generating overlords!

      With all that data you'd think that my conne3^&#5$ATDT01[NO CARRIER]

      In Soviet Russia data generates YOU!

      Homer: I see they have the Internet on computers now.

      --
      Children in the backseats don't cause accidents. Accidents in the back seats cause children.
  22. Not data, multimedia by mblase · · Score: 0, Redundant

    Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.

    That's because about 57 percent of it was internet porn.

  23. quote by CGP314 · · Score: 5, Interesting

    All of the books in the world contain no more information than is broadcast as video in a single large American city in a single year. Not all bits have equal value. --Carl Sagan

    1. Re:quote by Anonymous Coward · · Score: 0

      If Carl Sagan is so smart, why is he dead?

    2. Re:quote by DrEldarion · · Score: 1

      Well, a picture IS worth a thousand words (or ten thousand, if the words are from a Tolkien book). I wonder how many words a motion picture is worth?

    3. Re:quote by Anonymous Coward · · Score: 0

      I have to say, that made me laugh...

      It's just so inane...

    4. Re:quote by kfg · · Score: 1

      Well, a picture IS worth a thousand words

      Please express this concept with a picture.

      KFG

    5. Re:quote by danila · · Score: 0

      Actually, there is something more to that. Carl Sagan should have been smart enough to at least get a cryonic suspension. Too bad that many great minds are destroyed by death and will never be with us again. Sagan, Feynman, many others... :(

      Even if you are not as valuable for humanity, you can still get a contract.

      --
      Future Wiki -- If you don't think about the future, you cannot have one.
    6. Re:quote by danila · · Score: 1

      Here you are.

      P.S. It was extremely difficult, practically impossible, to ignore another brilliant example

      --
      Future Wiki -- If you don't think about the future, you cannot have one.
    7. Re:quote by kfg · · Score: 1

      Thank you. Here we have a perfect example of the true meaning of the phrase "begs the question."

      KFG

    8. Re:quote by Anonymous Coward · · Score: 0

      yeah, but the video broadcasts compress to almost nothing once you take out all the Ben and J'Lo stories

    9. Re:quote by plumby · · Score: 1

      But there's considerably less than 1,000 words in the picture. You need the text to explain what the picture is attempting to show, so surely this is disproving the arguement.

  24. If you're not part of the solution.... by 44BSD · · Score: 1

    I hope that Varian, et. al. realize that by publishing this study, they are adding to the problem.

    In the long run, the second law of thermodynamics will take care of this.

    1. Re:If you're not part of the solution.... by Anonymous Coward · · Score: 0

      ... you're part of the precipitate!!!

      {rimshot}

  25. Effective use of data by dmomo · · Score: 1

    From the article Verian (an economist) states:
    ``We're producing all this information, but we don't necessarily have the tools to use it most effectively,'' he said.

    What does it mean to use data "effectively", and is the "We" producing the data the same "We" using it? My first instinct on not having the tools to use this data most effectively is "that's good". My second instinct tells me that data is already being used TOO effectively. Personally, I hope that cross-reference of mass data stores containing personal information does NOT become more effective.

  26. Create a Yottabyte in seconds by Anonymous Coward · · Score: 0

    dd if=/dev/random of=/dev/zero bs=89458905980359804890448 count=403908538905980358904895983

  27. that's a LoC per minute, almost. by sulli · · Score: 3, Funny
    525,600 minutes per year. Impressive.

    But if these data were recorded on floppies, and stacked up to the moon n times, how many VWs would it take to carry those floppies to the stack site?

    --

    sulli
    RTFJ.
    1. Re:that's a LoC per minute, almost. by Anonymous Coward · · Score: 0

      525,600 minutes per year. Impressive. But if these data were recorded on floppies, and stacked up to the moon n times, how many VWs would it take to carry those floppies to the stack site? A godzillion?

    2. Re:that's a LoC per minute, almost. by demonbug · · Score: 1

      Okay, let's see, we would need 3472222222222 1.44 meg floppies to store it all. Assuming we are looking at a New Beetle, we've got 18 cubic feet of cargo space (according to their website). Since the volume of a single floppy (assuming they are completely rigid) is .295 ft * .308 ft * .0108 ft = .000981 cubic feet, we can fit 18348 of them in each VW. So, it would take about 189,242,546 VWs to transport all of the data on floppies. That's quite a few. I'd try and figure out how many trips per New Beetle that would be, but I don't seem to be able to find numbers for total sales. And yes, I'm bored.

    3. Re:that's a LoC per minute, almost. by damiam · · Score: 1

      Approximately 44 million, so long as you don't leave any room for the driver to sit.

      --
      It's hard to be religious when certain people are never incinerated by bolts of lightning.
  28. a problem that solves itself? by pomakis · · Score: 1

    So what the writeup is saying is that there's a whole lotta data, which is a problem, and that 92% of that data probably won't survive that long, which is a problem. It sounds like these two problems cancel each other out! (That is, as long as the 8% that does survive is the useful stuff.)

  29. Re:Should I kill myself? by Anonymous Coward · · Score: 0

    I'm 25, I've never had a girlfriend and I have no prospects

    Heck, you sound like the average Linux developer. Grab some code and start hacking.

  30. Data loss by grocer · · Score: 1

    I think more needs to be preserving the important e-mails of government for posterity. The DoD and other agencies do not backup or retain e-mails in any meaningful way nor does the Whitehouse or National Archive have any kind of e-mail policy, AFIK. Hard disks and, by extentsion, e-mail suffer from the time limit of magnetic media...eventually all those ions disappear and there is no *magnetic* in the media.

    CDs have the translation problem...what happens in 150 years when the standards are corrupted or lost and nobody can acknowledge the binary code in any meaningful format?

    1. Re:Data loss by yelligsc · · Score: 1

      CDs have the translation problem...what happens in 150 years when the standards are corrupted or lost and nobody can acknowledge the binary code in any meaningful format?

      OR: What happens in 150 years when no one knows english anymore? Even if you could decode the binary into ascii, you could not read the words!

      For those of you who missed the point: ALL data has to be coded SOMEHOW.

      Scott.

    2. Re:Data loss by jdhutchins · · Score: 1

      Most languages end up being decoded by future civilizations. We can mostly read the egyptian writings, and can read most of the ancient languages. We can read pretty much any language from AD 0 to present date. People remember languages, still speak them, or someone can go and decode it.

      Whether or not someone speaks it is pretty much irrelivent, the language will last MUCH longer than the media will.

  31. Storage by 3Suns · · Score: 3, Interesting

    I work at EMC, and this fact (along with projections for similar growth in the future) is a big marketing strategy for the company, especially toward investors. The storage market grows with the amount of information produced... it's gotta be stored somewhere!

    --

    -3Suns

    ~~~~
    The Revolution will be Slashdotted
  32. 5 Exabytes by tadas · · Score: 1

    Is that 5 Exabyte 8505's or did they use 8505XL's?

    --
    This page accidentally left blank
  33. Re:Should I kill myself? by FrankoBoy · · Score: 1

    People ALWAYS have prospects somehow. You just have to think about it some more and get some help from friends or professionals if you have problems figuring out what to do.

    ...of course, if you still wanna kill yourself, jumping off of some very high thing is the most beautiful way out... but still, don't do it :)

  34. Contribution by Yanray · · Score: 1

    I gave my 200 GB. What did you give.

    --
    --"Sorry for the inconvience." Gods Last Words to his Creation
    DNA, So Long and Thanks for all the Fish
  35. The year 10008 by Anonymous Coward · · Score: 0

    Today we have dug up another spinning thing in a case made of platters and heads. While we are not sure what exactly they were used for, we think they were used to play with the "charge" of milkey wayan matter. The "charge" is simply a property of the particles that make up the matter in that some have simple attraction or repulsion. This appears to have been harnessed in a primitive way as to allow simple machines to be created solely out of that matter. These spinning things appear to have been used to store small amounts of information - the heads seem to have been made to tranfer this "charge" property to a stronger "charge" property in other parts of the component. This may have been used in some of their simple machines to "display" the information in another place at a much later time.

    1. Re:The year 10008 by Anonymous Coward · · Score: 0

      Makes you wonder how much ancient civiliations were actually advanced to, if they suffered from similar data storage degredation. . .

      If the reason we don't find any cool records before a certain time in ancient egypt was a result of cultural stagnation and dying technology...

      Mmmmm sounds like the begining of a good work of fiction!

    2. Re:The year 10008 by Orne · · Score: 1

      Well, common papyrus wasn't exactly the most durable material. That's why most of our information on egypt comes from carved stone sources, and remnants of painted symbols on stone. Copying and re-copying literature was an important job in the middle ages, which is one thing that gave the Church so much power... the monopoly on reading data.

      Check out Frederik Pohl's Gateway series... humans find a remnant of an alien outpost on venus, and a ship on autopilot that takes them to a hangar of spaceships on an asteroid. On the stations, they find that the aliens were in an awful rush to abandon the place, and they left behind all these metallic folding fans and other widgets. The humans said "wow, neat, these must have been their toys" and went and sold them as novelty items to the public. It wasn't until the third book or so that the humans "discover" that the fans are solid state storage devices, and that the Heechee had left behind all of the manuals to their machines when they left, and that the fans were newspapers, books, videos, art, etc... everything about their culture... but because the humans had nothing to read the data, they had no concept what the devices represented.

  36. Not long-term data by micromoog · · Score: 2, Interesting
    That's a big-sounding number, but most of this is not going to be useful or stored long term. Examples:
    • Many large companies are building VERY large data warehouses, to capture and analyze every iota of information about every transaction. In a year or two, much of today's data will be largely irrelevant, and will likely be summarized and deleted.
    • People send a lot of email, and post a lot of messages, about day-to-day stuff that has no long-term value.
    • Surveillance video is used more than ever. This is not going to be stored long-term, except perhaps in the most security-sensitive areas.
    Either way, I highly commend the article's author for using both "Libraries of Congress" and "feet of books" as measurement units.
  37. It just occured to me... by Vaevictis666 · · Score: 1
    What with all the (expected) porn jokes out, keep in mind that the goal is to count new data generated this year, without duplicates.

    You only get to count data you have generated yourself, anything you got from somewhere else (99% of porn, everything on P2P apps) doesn't count.

    As such, I think I'm under my one-cd-per-person (800mb) limit for the year, but I do know a few friends (artists) that would definitely be over :P

    Another interesting question is whether data conversion counts - If I copy a CD to oggs, or a DVD to Divx, does that cound as new data created for the purposes of this study?

  38. Turn your little 0 into a big 1 by paiute · · Score: 1

    http://www.wired.com/wired/archive/11.09/full.html

    --
    If Slashdot were chemistry it would look like this:Cadaverine
  39. Kids by nightsweat · · Score: 1

    How much of that was in kids' artwork for the refrigerator door? Cause that would store a lot better in a vector file format...

    --

    the major advances in civilization are processes which all but wreck the societies in which they occur - A.N. White
  40. Quality vs Quantity by Anonymous Coward · · Score: 0

    Like the previous posters mentioned, it is really about quality not quantity. Who cares if all of this so called important information is on magnetic media. The constitution was written on shredded tree pulp that was compressed and dried to an unstable piece of paper; somehow we've managed to go all these years without losing track of *that* important piece of information.
    How *did* we do it???

  41. I can't comprehend such numbers! by TrollBridge · · Score: 1
    "That's five million terabytes of data, or 500,000 Libraries of Congress"

    But how many football fields long is that?? Let's try to put that in some context that Joe Sixpack like me can understand!

    --
    There's a Mercedes gap too. I want one and can't afford one, but it's not government's job to do anything about it.
    1. Re:I can't comprehend such numbers! by Anonymous Coward · · Score: 0

      Its exactly 12 trillion volkswagen beetles worth of data or a stack of one dollar bills reaching to mars and back.

  42. Mass replication by binaryDigit · · Score: 2, Interesting

    I think the more interesting thing to study would be to determine how much unique data is being generated. I mean who cares if two million people have the latest Britanny Spears song in mp3 format? And that's not even talking about "information", but just simply raw "data". I also wonder if they took into account "data in transit" (being transmitted over the ethernet) and temporary data (caches, etc).

    1. Re:Mass replication by dmomo · · Score: 1

      From the article I was under the impression that they WERE talking abour Unique data.

      "They performed surveys to estimate how much unique information exists on each type of hard drive."

      Still, it seems like it would be a difficult thing to discern.

    2. Re:Mass replication by Anonymous Coward · · Score: 0

      Unique data....

      Well....as it turns out, only two pieces of unique data were created.....0 and 1. Everything else was a copy of that.

    3. Re:Mass replication by Anonymous Coward · · Score: 0
      You misspelled my girlfriends name; it's Britney Spears.

      And yeah, she's good in bed.

  43. That brings a point on storage durability by tekiegreg · · Score: 0

    On slashdot, there have been topics on digital media durability in the past (run your own searches I'm too lazy). It really couldn't hurt to start archiving stuff on to material that can last hundreds of years if not longer. Was their any digital media that could do that? It couldn't be magnetic because that deteriorates over time, and it couldn't be CD etched as the scratches tear away at it piece by piece....thoughts?

    --
    ...in bed
  44. True it's a lot of info to create, but... by The+Jonas · · Score: 4, Insightful

    ...how much info is destroyed each year to offset these numbers. I mean shredded files, stuff thrown in trash, bills, deleted data files, discarded/lost storage media, etc... In the end (of each year), I wonder, what is the actual increase in stored information?

  45. It's only going to get worse... by mengel · · Score: 3, Interesting

    At Fermilab where I work, the larger experiments are expecting to generate 1PB/year of data in around 2005, up from somewhere around 300TB/year currently.

    --
    - "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'
    1. Re:It's only going to get worse... by FTL · · Score: 1
      At Fermilab where I work, the larger experiments are expecting to generate 1PB/year of data in around 2005, up from somewhere around 300TB/year currently.

      The remarkable thing is that after analysis is complete, all that data is reduced to just two bytes: "42"

      --
      Slashdot monitor for your Mozilla sidebar or Active Desktop.
    2. Re:It's only going to get worse... by Anonymous Coward · · Score: 0

      I doubt that Fermilab expresses numerical results in BCD.

  46. Exabyte? by Kedder · · Score: 1

    Wow, that sounds even more than Gazillion!

    My new harddrive will be no less than 1.2EB...

  47. Speak English!!! by sirgoran · · Score: 1

    Tera, Giga, Exa, Don't give it to me in those terms. Put it in terms I can understand!

    Just how much of that was porn?

    -Goran

    --
    Carpe Scrotum - The only way to deal with your competition.
  48. five exabytes? by Horny+Smurf · · Score: 0, Troll
    How many were first posters and gnaa crapfloods?


    Methinks the word "Data" may be used more loosely than Kathleen Fent-Malda's pussy.

  49. Libraries of Congress by Entropy248 · · Score: 2, Insightful

    500,000 Libraries of Congress, huh? I've always had several problems (SI questions aside) with this unit of measurement. The Library of Congress is constantly expanding & adding new material. What year Library of Congress do they mean? I imagine they aren't working w/ up to the minute data and that the libary is expanding much faster now. Not to mention the fact that everyone always makes exabytes ~2.4% smaller than they really are (and with numbers this big, it actually makes a difference!)... So call me the new number nazi troll already and get it over with...

    1. Re:Libraries of Congress by Mudd+Guy · · Score: 1

      It's a lot more than 2.4%! (1024/1000)^5=1.13 => 13% error.

  50. Units of measure by camusflage · · Score: 1

    Why is it that everything that is data is related to either/or x libraries of congress or y encyclopedia brittanicas, as if either of those is actually an approachable figure. I want to lobby for a new measure, such as x two hour porn dvd's or y illegally downloaded songs.

    --
    The truth about Scientology, Xenu, and you: Operation Clambake
  51. easy to believe by 514x0r · · Score: 1

    pr0n + spam + kazaa

    --

    !(^((ri)|(mp))aa$)
  52. Written a program to break this record... by public_class_name_ex · · Score: 1


    It repeatedly calls malloc() and free(), storing information in RAM, which may create an interesting problem for historians and archaeologists of the future.

    1. Re:Written a program to break this record... by Carnildo · · Score: 1

      How long do you estimate it will take for your program to break the record? Did you remember to include a counter to display how much data is being generated?

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
    2. Re:Written a program to break this record... by public_class_name_ex · · Score: 1


      The counter will be available in RecordBreaker XP, which will allow me to call malloc() more times than I call free(). I suspect this will significantly improve the performance, since I will be generating much more information.

  53. And if that number is about right... by saskboy · · Score: 1

    Then think of how many bytes of that number are actually backed up if they are irreplacable?

    I'd bet not much. And what is backed up may only have a shelf life of about 20 months if on poor CD-R or Floppies.

    --
    Saskboy's blog is good. 9 out of 10 dentists agree.
  54. 800 MB per person by TrippTDF · · Score: 1

    Damn- that puts some stuff in perspective... 800 MB per person is really not that much... just over one CD per person on the planet.

    I personally burned over 500 CDs last year, filled a couple of hard drives, and sent God knows how much email...

    I think this goes to show what a wealthy little world we computer people live in.

    1. Re:800 MB per person by Anonymous+Crowhead · · Score: 5, Funny

      I personally burned over 500 CDs last year

      Congrats, you balanced out 1 medium-sized tribe in Africa.

    2. Re:800 MB per person by IM6100 · · Score: 2, Insightful

      What did you burn on those 500 CDs?

      Do you run your own particular psuedo-random number generator and store the results? Do you go out with a digital camcorder and record tons and tons of images of the world? Do you write that much prose or poetry in a year?

      Or are you just talking about 500 CDs of data that you or somebody else 'ripped' from exisiting media and are shuffling around?

      --
      A Good Intro to NetBS
    3. Re:800 MB per person by ITman75 · · Score: 1

      800MB per Person, Man thats alot of p0rn

  55. Just one question ... by DaneelGiskard · · Score: 1

    ... how much of it was porn? :)

  56. Info Glut by Zygote-IC- · · Score: 1

    Hey, way to add another 800k to the glut with this pdf file!!

  57. Re:Should I kill myself? by Anonymous Coward · · Score: 0
    I could introduce you to a girl I know. Meet her once and you'll realize there are worse things than being lonely,

    Try going out to a bar/dance club and getting shit drunk some night.

  58. My figures by robogun · · Score: 3, Interesting

    I just did another backup, so the figures are right at hand.
    I'm a news photographer, shooting digital.
    In 2002 I saved 78,742 photos to disk. (Bad images were not saved.)
    That worked out to 122 gig. The output was transferred fromt he CF cards and archived to DVDs.
    But how much of that 122 gig is really information? The image file saved by the Canon 1d is mostly empty air, as far as I can tell. There is also EXIF data and IPTC, and who knows how much hidden BS is included a'la Microsoft Word documents?
    Simple compression was able to whittle that down to 33.2 gig. So that's my contribution.
    The main beneficiary is the DVD-R blank disc makers and Western Digital, I guess.

    1. Re:My figures by imsabbel · · Score: 1

      if "simple compression", actually probably a burrow wheeler transformation with a huffman encoding afterwards, can compress your data only to a factor of 4, your have a totaly normal entropy for camera sensor data.

      Its just you fault for making pictures of uninteressting things.

      --
      HI O WISE PRINCE. WHT TOOK U SO DAM LONG?
    2. Re:My figures by dcobbler · · Score: 2, Insightful

      I think other parts of this discussion are probably already arguing about "data vs. information" but this post, I think, points out one of the reasons for that argument: between 1999 and 2002, how many more digital cameras are around and how much larger (in pixels/bits) are the images? Just because there are more digital pics with more pixels each, doesn't mean that there are more actual pictures being taken. And for each new digital camera that is being used, how many fewer film cameras are being used. I suspect that there *are* more pictures being taken but this study doesn't necessarily prove that.

      Cheers,

      Dcobbler.

    3. Re:My figures by robogun · · Score: 2, Interesting

      Believe me, there are many more pictures being taken. The main reason is the limitation of film cost and processing has been removed.

      I never had that limitation and I still shoot 2-3 times as much as I did in 1999.
      Probably the main reason is the good cameras, like the Canon 1d, shoot 8 frames a second. A 1G CF card holds 420 shots. The largest roll of film is 36 frames.

      I shot digital starting in 1996, but still primarily used film until decent digital SLRs came out. I moved over entirely to digital in 2001.

      In 1996 I shot maybe 100 photos with digital (and they were small >10 kb each). That was an early Kodak.

      In 1998 I shot advertising using an Olympus D620L. That thing shot images maybe 80kb. In 2000 I shot 1,643 digital images occupying 250 mb or so, aainst 4,000 or 5,000 frames of film. Of the film, only the frames for publication needed to be scanned to disk. The total amount of disk space used wasn't much.

      In 2001 the Nikon D1 came out. I shot 56,066 that year (got it in March). 22 gigs worth, spanned across lots of CDRs.

      So far in 2003, with the Canon 1d and 1ds, shot 50,261 frames, taking up about 32 gig, archived to DVD.

      I would expect these increases to continue for the near future.

    4. Re:My figures by dcobbler · · Score: 1

      Okay, yeah. I was kind of making a rhetorical point about the original article/post but it didn't come off too well.

      I've been an amateur since the seventies. I've only had a digital for about 18 months and, yes, I'm taking *way* more pictures. I had to get more memory for the iMac because iPhoto was bogging-down from all the pictures in my library and now I'm scheming to trade in my coolpix 995 along with my Canon Elan and lenses to get a much better digital. When I do that, the only flim camera I'll have left will be my RB67 and I realize that, one day, that will go too; except I'll probably have to give it away and I'll likely have found that I can barely remember how to use it by then!

  59. Redundant Data by Furan · · Score: 1

    It really makes you wonder how much of that data is just redundant waste.

  60. Disk space is cheap - and other myths by rivaldufus · · Score: 1, Interesting

    How many other sysadmins out there are tired of hearing this? Every time I go to a company and even suggest quotas on the file server, the engineering group always says, "Disk space is cheap, or "you can get an 80GB disk for cheap."

    Of course, this never takes into account backup media and the whole backup infrastructure (anyone price decent commercial backup software recently?).

    I'm surprised it's only five exabytes. The admins of the world should go ahead and put a 400MB Quota on all 6.3 Billion people. That way, we'd be down to 1999's storage levels....

    1. Re:Disk space is cheap - and other myths by borgdows · · Score: 1, Funny

      Of course, this never takes into account backup media and the whole backup infrastructure (anyone price decent commercial backup software recently?).

      take this :

      Real Men don't make backups. They
      upload it via ftp and let the world mirror it.

  61. My Beer Gut is bigger than the Info Glut by hey · · Score: 1

    It's about 6 Exabytes.

  62. AOL doom day. by twitter · · Score: 2, Funny
    I've got more than my share of data, enough to discard the 800MB or so that AOL likes to mail me. 800MB/person is not shocking when I think of all the CDs I've stumbled across in the field - literally grass fields in the midle of nowhere.

    It's a joke..

    --

    Friends don't help friends install M$ junk.

  63. Question... by Short+Circuit · · Score: 1

    If we used analog computers instead of digital, how would this be measured?

    1. Re:Question... by silentbozo · · Score: 1

      If we used analog computers instead of digital, how would this be measured?


      With one huge abacus?

    2. Re:Question... by tiled_rainbows · · Score: 1

      Abaci are digital, though, aren't they? The bead's either on the left or the right of the bead-stick-thingy; no in-betweens allowed.

      Better to use one huge set of scales, or even better, a big thermometer-looking thing that rings a bell at the top if it gets to 10 Exabytes. Yup, that'd be cool.

      Whatever happened to analogue computing anyway?

  64. dudes, by Anonymous Coward · · Score: 0

    that's about two week's worth of linux-kernel traffic.

  65. For your edification by Anonymous Coward · · Score: 0
    Because no one here knew what that meant, admit it.

    efferent adj.
    1. Directed away from a central organ or section.
    2. Carrying impulses from the central nervous system to an effector.

  66. What about the data from nuclear colliders? by hey · · Score: 1

    Doesn't just one experiment produce 45 zillion
    megabytes. (Don't quote me on that.)

  67. compression by Twillerror · · Score: 1

    An mp3 is usually about 1 meg a minute. But a raw wav file is several times more. The same goes for raw video verus mpg2 or quicktime.

    I suppose the number could be much larger if you expand data before counting it.

  68. Sorry... by blackmonday · · Score: 1

    I don't understand, how many elephants does an exabyte weigh?

  69. words/motion picture by siskbc · · Score: 1
    I wonder how many words a motion picture is worth?

    Looks like 599, assuming said motion picture is a complete rotting turd. Thanks for gems like this one, MPAA!

    Review: 'Gigli' is really, really bad

    It's better than 'Swept Away,' for what it's worth.

    By Paul Clinton

    CNN Reviewer

    Saturday, August 2, 2003 Posted: 12:13 AM EDT (0413 GMT)

    OK, so "Gigli" is not the worst film in years. That dubious title still goes to "Swept Away," or maybe "Freddy Got Fingered." But "Gigli" is still a huge waste of celluloid. In Hollywood, it's all about "what have you done lately," and despite such successes as "Scent of a Woman," "Midnight Run" and "Beverly Hills Cop," writer/director/producer Martin Brest has done nothing that can make up for this ill-conceived mess.

    If miscasting was a crime, this movie would be proof of a felony. Ben Affleck and Jennifer Lopez fit their characters like a glove -- if the glove in question belonged to O.J. Simpson. Affleck plays a low-level mob enforcer named Larry Gigli (pronounced like "really") assigned to kidnap a mentally challenged young man, Brian (think Raymond in "Rainman"), played amazingly well by Justin Bartha in his feature film debut. Affleck's real-life lady love, Lopez (they met during the filming of this movie), plays Ricki, another mob enforcer hired to keep an eye on Gigli.

    Insult upon insult

    It seems Brian's brother is a powerful federal prosecutor who is after a mob boss, played by Al Pacino. The plan is for the prosecutor to drop the charges against the gangster in order to get his brother back safe and sound.

    Say what? In what universe?

    Of course, Ricki and Larry fight like cats and dogs and hate each other from the get-go -- a sure sign that they'll be under the sheets by the second reel. And they are, despite the fact that Ricki is a lesbian.

    Yes, ladies and gentlemen, Ben Affleck, who already did this in "Chasing Amy," is at it again. He's become "Benny the lesbian changer," the new secret weapon for the religious right. In all fairness, the ending was changed at the last minute after massive negative audience reactions in test screenings. This, however, is only the final insult after a film full of them.

    There were obviously many changes made during the making of this cinematic train wreck. The story is all over the place: there is one really strange scene with Christopher Walken playing a cop, and then we never see him again. He's on the cutting room floor.

    Wishing he were there too is Pacino, who appears in only one embarrassing scene.

    Beyond the cringe

    But the award for the most cringe-inducing moment goes to Lopez, for a scene in which she stretches out on the floor in every sexual position known to man while debating the pros and cons of female and male anatomy. I know, it sounds hot on paper, doesn't it?

    The bad guy characters become good guys with no motivation, nor any visible cause or effect. None of the scenes seem to be connected to each other in any way; the entire film feels like it was edited on an assembly line, without feeling for rhythm or nuance.

    At one point, Ricki's lesbian lover breaks into their "hideout" and tries to commit suicide. After comforting her in the hospital, Ricki runs back and jumps in the sack with the "lesbian changer."

    This is a comedy?

    Brest showed such great promise in the 1980s with hit after hit, as mentioned above. Then, in 1998, he gave us "Meet Joe Black." Now he's given us "Gigli." He should remember that California is a "three strikes and you're out" state for criminal offenders.

    --

    -Looking for a job as a materials chemist or multivariat

    1. Re:words/motion picture by Surt · · Score: 1

      Aww ... I was believing this review until it criticized freddy got fingered. That was the most brilliant film I saw last year.

      --
      "Who is the Journal of Quantum Physics going to believe?" --Stephen Hawking
    2. Re:words/motion picture by Threni · · Score: 1

      I prefer this review:

      http://www.bigempire.com/filthy/gigli.html

      (I really hope the filthy critic is only pretending to be dead).

  70. Magnetic Media by Anonymous Coward · · Score: 0

    "Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future."

    I'm going to start inventing in rock quarry companies right away. I predict all future data will be chisled on rock like in the olden days!!!

  71. Google Calculator Sucks by LPetrazickis · · Score: 1

    :: Either way, I highly commend the article's author for using both "Libraries of Congress" and "feet of books" as measurement units.

    Even though it knows the Answer to Life, the Universe, and Everything and number of feet in 10 metres, it can't convert 10 libraries of congress into feet of books.:(

    I demand that this be fixed immediately!;)

    --
    Is this a sigs-optional kind of place? 'Cause I am totally down with that if you know what I mean.
  72. almost exhausting a 64bit address space. by Merlin42 · · Score: 1

    ln2(5 exabytes) is a little over 62!
    (62.3 for RAM style exabytes or 62.1 for HD style exabytes).

  73. Relevance? by BorgCopyeditor · · Score: 2, Funny
    Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.

    Not least for those historians who want to know what my Amazon.com session ID was on the day that my Runescape character hit mining level 33.

    --
    Shop as usual. And avoid panic buying.
  74. Five Exabytes by jpetts · · Score: 1

    What's the big deal? That's only five 8mm tapes, isn't it?

    --
    Call me old fashioned, but I like a dump to be as memorable as it is devastating - Bender
  75. PLEASE SHIT IN SIR HAXALOT'S MOUTH! by Anonymous Coward · · Score: 0

    For he talks shit!

  76. MOD PARENT UP! by Anonymous Coward · · Score: 0

    So.. what will the archeologists *really* think when they did up our hard drives?

  77. LOL, GRABOULOUS! by Anonymous Coward · · Score: 0
    "So I'm asking for your help. Please mod up all my old posts so I'll be able to post again. Thanks in advance, Sir Haxalot",

    That is FUCKING AWSOME!
    Let me know if that works, so I can try it!

  78. You forgot... by siskbc · · Score: 1
    But if these data were recorded on floppies, and stacked up to the moon n times, how many VWs would it take to carry those floppies to the stack site?

    ...how many golf balls falling on said stack it would take to knock it over. And if you laid all the bits in the data side by side, I wonder how many times it would go around the earth?

    --

    -Looking for a job as a materials chemist or multivariat

  79. Archiving Digital Formats by JonBuck · · Score: 1

    I'm a library science student. I'll have my MLS in December, and I've found a lot about this topic. In fact, I'm sitting in the library science library right now.

    For books, the standard is that any book should last for at least 500 years (Though this is a problem, what with all the acidic wood pulp paper publishers have used since the mid-1800s). The much-hated microfilm has that same lifespan.

    But we are nowhere close to finding a viable archival format for electronic information.

    This is a problem. There is so much important stuff, but digital formats change so fast we can't keep up. And the reliability of computer hardware is another can of worms.

    Libraries and Archives would bow down to anyone who found a format that remains viable, readable, and usable for perhaps the next century.

    1. Re:Archiving Digital Formats by anubi · · Score: 1
      Maybe the deterioration of data is a good thing? The good stuff will be retained, and the stuff no-one took the care to see to its preservation fades away to oblivion.

      I take it that without death, life would suffocate, as there soon would be no room for new life. Hence no room for improvements.

      I would think that at least 99% of the data mentioned will have no use in as little as one year.

      I have often contemplated a world where nothing died. Anything once created kept going. You yell, and the echoes continue forever without dampening themselves into oblivion. Can you imagine the cacophony which would result in just a few minutes? If plants never died, would there be any room left on earth for a seedling?

      I have come to the belief that death is just another word for reboot.

      As a system, we experience all sorts of random events, a very few notable, most best forgotten. The information destruction ( death ) is just the way to purge the buffers of noise so that things worth the space can be loaded into it.

      In the case of information, I think we are bumping onto a very elastic "limit" on how much information we can keep track of. This limit will increase as the number of individuals who take an interest in keeping this information increase.

      The exact DNA of a tree living several centuries ago may be lost forever, but its progeny continue if they were fit for survival.

      We may lose a lot of data through deterioration, but I think we will always have a way to keep the important stuff. I have no doubt that one day we will understand DNA where we too can encode information on a molecular lattice, and have it copy or execute itself whenever needed.

      Ok.. I'll finish this one with a little troll.. I love Science classes. I hated studying English Literature. You don't know how much as a kid I would have appreciated it if all the works of Dickens and Chaucer went to rot before they copied all that stuff and made us study it.

      --
      "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]

  80. Do the evolution by FrankoBoy · · Score: 2, Interesting
    So this means 1.126 gigaton of paper. According to this research paper, the world's major nuclear arsenals is equal to about 5 gigatons of TNT.

    Now, here's a little math for you :
    • Print every single bit of information the whole world produced last year.
    • Copy all of the output four times.
    • Replace all this paper by TNT...
    ...and the result, my friends, is the perfect recipe for global annihilation. Conventional weapons sold separately.
    1. Re:Do the evolution by FrankoBoy · · Score: 1

      My little math sucked : you should copy the output three times, not four. Guess I should get back to my books a bit ;)

  81. But how much is that... by Anonymous Coward · · Score: 0

    ...in vlokswagon veichles?

  82. Re:Should I kill myself? by Anonymous Coward · · Score: 0

    I recommend something messy and embaressing involving autoerotic asphysixiation while speeding down the highway.
    You know, something we'll get a laugh at when we read about it on annanova.

  83. Re:Should I kill myself? by Anonymous Coward · · Score: 0

    Been there.. tried that. A change of scenery helps. Try getting out, and doing something radically different in/with your life.

  84. DUH. It's called limited copyright. by Thud457 · · Score: 1

    Don't worry about being able to read old legacy data formats. If there's any interest in the data, there's somebody somewhere who will write an interperter / converter / emulator for it. Just look at the 8-bit emulation scene.

    --

    the preceding comment is my own and in no way reflects the opinion of the Joint Chiefs of Staff

  85. Reminds me of this observation: by targo · · Score: 4, Funny

    5 billion files are created every day.
    3 billion of them will never be found again.
    Poor files...

  86. Do your fair share of the work. by rock_climbing_guy · · Score: 1
    OK, if we've only created that much data, it's time to get to work. Screen-saver authors, please add the following to the main() segment of your code:

    long x;
    { for (true)
    x = rand();
    send_to_info_glut(x); }

    Please send the data created to Info Glut, and while you're at it, send it to all the spammers and to SCO. With some luck, you might DDOS them off the internet.

    --
    Wh47 d1d j00 541, 31337 15n't t3h r0xor5 ne m0r3???
  87. it's gotta be stored somewhere! by Confused · · Score: 1

    > ...it's gotta be stored somewhere!

    For most of it /dev/null is the prime choice of storage medium. This should really be an opportunity companies producing high speed, high capacity null-devices.

    Where are the VC when one needs them?

  88. hmmm .... by Anonymous Coward · · Score: 0

    they seemed to have missed my massive collection of porn

  89. I'm sure my math is wrong, but... by srcosmo · · Score: 1
    5 exabytes = 1048576 terabytes = 1099511627776 MB = 763549741511.11 floppy disks (assuming 1.44MB per disk)

    And,

    Floppy disk volume: 0.0889m * 0.0889m * 0.015875m = 0.00012546345875m^3
    VW Jetta Cargo capacity: 368.119 liters = 0.368119m^3 (assuming all seats in place, and NOT the wagon model)

    So, 763549741511.11 floppies * 0.00012546345875m^3 = 95797591.4976523121517125m^3
    divide that by the 0.00012546345875m^3 per Jetta, and we get:

    ~7.635 x 10^11 Jettas required to ferry the floppy disks to the dump site!


    And all I want is a VW minibus. makes me seem quite modest..

    --
    free speach
    Did you mean: free speech
    1. Re:I'm sure my math is wrong, but... by alcmena · · Score: 1

      Silly person, the answer was 1. It just had to make lots of trips.

  90. What this means for the average bozo by WillWare · · Score: 1

    I'm 173.205 percent sure these numbers are not very accurate. I'm 314.159 percent sure that they won't affect how I sleep. And I'm 628.318 percent sure that the funding for this kind of "research" has an upper bound.

    --
    WWJD for a Klondike Bar?
  91. Some data from 1996 by va3atc · · Score: 1

    I'm still attempting to figure out how to hook up my 20MB hard drive from my first computer (Its not IDE) and get one very small (less then 100k)file.

    Being the usual procrastinator it gets more and more difficult to retreive this file.

    The hard drive was hooked up to a 286 through used two cables (one small, one large, not including power of course) and went to a daughter board.
    If anyone has any suggestions on how to retreive this data that would be super :) :)

    -Steve

    --
    Candle burns its brightest in the dark
    1. Re:Some data from 1996 by valkraider · · Score: 1

      Go to Goodwill and buy an old 286, then network it? Heck you can network a Commodore 64... The 286 should be easy. Or you could go to an Oregon Public School - they are likely still using 286s... ;)

  92. But.. by Anonymous Coward · · Score: 0

    How will the robots ever survive without reliable data to recreate our world accurately?

    I don't want chicken to taste like everything :(

  93. Oooops! by Sumbody · · Score: 1


    I just downloaded a WinXP "patch" - better chalk up another exabyte.

  94. 800MB, that's it by Anonymous Coward · · Score: 0

    hell i know i'm personally responsible for several gigs, at least 500. so does that mean there are a bunch of people running dataless? also if it's only 800MB per person, hell all we would need is for EVERYONE to have PC with a 1GB drive and make i gigantor sized cluster out of them all.

    1. Re:800MB, that's it by hastings14 · · Score: 1

      Of course there are people running dataless. A fifth of the planet gets by on less than $2 per day... Those people are not storing much data... To make up for it, the rest are actually storing several gigs, at least... Likely much more if you're reading /.

  95. 800 MB per capita by Viking+Coder · · Score: 1

    I just want to point out that 800 MB per person works out to 1,600 slices of 512x512 CT data (the standard size of CT slices at 16 bits per voxel) - which means that this amount of data is roughly the same thing as about a 1mm * 1mm * 1mm CT scan of every human on the planet.

    --
    Education is the silver bullet.
  96. 800MB each? by dmnic · · Score: 1

    in 2002 I personally created about 400-500GB of data.
    sometimes, I really have to wonder about studies like these and where they get their info from. . .

  97. What a load of twaddle. by crazyphilman · · Score: 1

    Statistics like this only serve to amaze and astound pointy haired boss types. Oh my God! They shriek. Do we REALLY??? Meanwhile, the world keeps turning, we all keep getting up in the morning, and I keep wishing I could get laid. Just once. I mean, REALLY!

    Seriously, though, I bet the breakdown is something like this:

    1. Most of the "information" is probably composed of music and film. We all know how much bandwidth and disk space music and film take up. Here's another thing: different sites might have different copies of a film, so there's probably a lot of duplication. Not to mention the zillion copies of any given song that are being passed around. I really don't think of this stuff as "information". It's more "entertainment" than anything else. Some of it may be interesting for archival purposes (news footage, for instance) but the news companies already do this. THIS AIN'T A PROBLEM, FOLKS.

    2. Another large chunk of the "information" they're kvetching about is probably (almost certainly) composed of transitory messages like emailed messages and IM. This stuff was never meant to be hoarded. And it doesn't matter. It's used, it disappears, that's it.

    3. Yet another large chunk of this "info" is probably control messages passed around the web as internal controls (ICMP, etc). Again, this stuff is transitory, like emailed memos. Who cares?

    4. Getting into the "real stuff", you have all the ecommerce going on. But each company handles its own backup and storage. This is not a societal problem, this is an individual problem. Companies can deal with their own information storage problems. If they design their applications well, they won't have to store so much. But this isn't even that serious a problem there; it's just part of doing business.

    5. Then you have informational web sites, and personal sites, and blogs, etc. They come and go -- they always have. Everything interesting gets cached or mirrored anyway. This isn't much of a problem either.

    6. Finally, you have real paper documents, like those used by the bank and the government. Ok, some of this might add up. But they've got procedures in place (and they've had them for hundreds of years) to deal with this. Digital technology is actually making this easier, not harder, so that's a good thing, right?

    Overall, who cares how much information is generated? It's a useless statistic, like the tonnage of toilet paper people use annually. It might work as filler for, say, a "Ripley's Believe it or Not" strip in the sunday paper, but that's about it. Who cares? If someone started screaming "OH MY GOD, do you know how many TONS of TOILET PAPER America uses in a single YEAR??? IT'S A CRISIS!" wouldn't you slap that person? I would. Unless she was a hot chick (see paragraph 1).

    --
    Farewell! It's been a fine buncha years!
  98. 1984 by CGP314 · · Score: 1

    Maybe more research could be done into a marketable multi-century (millenial?) storage. For corporate purposes, several decades of fidelity, perhaps a century or two, would be fine - but government will need better than that.

    Yeah right. The government wants all historical data distroyed as soon as it is created.

  99. And what kind of data are we creating? by Pedrito · · Score: 2, Funny

    Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.

    They fail to mention that also of note is that 99% of that informations is in the form of pr0n! That's a lot!

  100. Fuzzy Metric System by robolemon · · Score: 1
    They found twice as much new information had been created in 2002 as in 1999, the last year they studied. This time, they even had to employ a new term of measurement: the exabyte, or a million terabytes. (A terabyte is a million megabytes.)
    How did they measure it last year if this "new" measurement didn't exist yet? How would they have measured the sum of all information ever created?

    If I say zettabyte and yottabyte did I just create new measurement terms?

    Silly reporters!

    --

    I design user interfaces for a free network management application,

  101. hun hun hun? by after · · Score: 0

    Forget spam...

    How large is a usual 5 minute MPEG file with stereo sound in a medium resolution? Lots of those were created, way more then spam.

  102. Dangit! by stienman · · Score: 1

    Dangit, Cowboyneal! I told you to turn off that packet sniffer at MAE East!

    Now look what you've done.

    -Adam

  103. there are a few more considerations by abhisarda · · Score: 1

    What if you take a page with text and scan it? It can take a size anywhere between 30-1000 KB. The same text can be written in an text editor in 5-6 KB. In MS word in 60 KB.
    2 years back, CD-R's were the in thing. Everyone and anyone was storing data on it. Since its size was 700 MB, files were generally smaller and compressed. Higher broadband connections and DVD recorders(alongwith faster processors) are becoming common, people don't care so much about file sizes.

    Regarding duplicate data- ask five people to compare what files take up how much of their hard disk.

    Maybe slashdot could do a poll on this, asking what percentage of space do movies, music etc take up on the hard disk. This would give a rough guide as to how much data duplication takes place.

    If you go to IRC servers, you will see bots with uploading speeds of 2-5-10 Mb/s..
    Lots of people download files from there.
    Stuff that is interesting to one might be interesting to millions of others on the net.
    Similarly, if you check the files downloaded from download.com, you might see a 15 MB application downloaded millions of times.
    That is a lot of data duplication.
    If the data on the web is say 1 exabyte, then there must be a corresponding amount on the hard drives/backups of people, organisations... who put this stuff on the web in the first place.

  104. Not quite by imnoteddy · · Score: 1
    500,000 Libraries of Congress

    If poster had carefully read the report it is noted in the report that the comparison is to the print collection of the Library of Congress. If you add in their audio and film collections they have at least two orders of magnitude more data. Even the LOC doesn't seem to be sure how much their entire collection is.

    --
    No electrons were harmed creating this post, though some may have been subjected to electrical and/or magnetic fields.
    1. Re:Not quite by Anonymous Coward · · Score: 0

      This post makes you a loser.

      Sorry. Maybe next life.

  105. Information Doubling--Will we fall off the edge? by anomalous+cohort · · Score: 1

    In a weird way, this reminds me of the Jumping Jesus Phenomenon

  106. Nice, but what does it REALLY mean? by Trolling4Dollars · · Score: 1

    Five exabytes of data is a meaningless figure if you consider that probably 52% of that was pr0n. The other 35% was source code (non-human readable data). And the remaining 13% was made up of spam, web logs, and e-mail to grandmaw.

  107. Biz opportunity by Anonymous Coward · · Score: 0

    "Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future."

    Nevertheless, I still think that if historians are to analyze those 8% of the data of the last decade alone, history will become a booming business for the coming centuries.

  108. Interesting statistics regarding porn by SiliconJesus · · Score: 1
    According to the study...

    Regarding web pages:
    Porn. 2,743 sites (or 28%) appeared to contain pornographic content. To generate this statistic, we matched a list of 94 pornographic stopwords to terms in the associated URL and the index page.
    You read that right, 28% of the internet sampled appears to be porn. Anyone surprised? Read on...

    Regarding P2P networks:
    The largest file types are .AVI video files, followed by archival .ZIP files. AVI files are video files playable on a computer. The range of these in our sample is 82 bytes to 2GB, with most being in the 100-200 MB range. Pornography seems to be a major contributor to this traffic, according to user identification of genre types.
    This follows my general idea that part of the reason that the internet is as large as it is, is due to the fact that it allows anonymous connection to taboo material.
    --
    Clinton made me a Republican. Bush made me a Libertarian. Trump is making me question reality.
  109. HDTV by sykt · · Score: 1

    just wait until the HDTV porn files start swapping...200 exabytes here we come (no pun intended)

  110. Magnetic media isn't such a bad choice by Eric+Smith · · Score: 1
    Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.
    Many nine-track magtapes from the 1960s are still readable. For those that aren't, typically the problem is not with the magnetic coating, but the substrate. By now the properies of the substrate materials are much better understood, so this should be less of a problem with modern magnetic media.

    Most optical media does not have any better longevity than magnetic media, and in many cases is actually worse. There are a multitude of problems. For stamped discs, the most insidious is oxidation of the aluminum reflective layer, which reduces the contrast ratio between the pits and lands to a level too low for normal drives to read the discs.

    For dye-based writable discs (e.g. CD-R) there is the same problem (though with regard to the pregroove and general reflectivity rather than data pits and lands), and the dye will eventually undergo the same chemical reaction used to write the disc due to ambient temperature and aging.

    For phase-change discs (e.g. CD-RW) I expect the temperature and aging problems to be reduced due to the higher activation energy needed for the phase change. However, I am not aware of any actual studies on longevity of phase-change media.

    Discs with a gold reflective layer are basically immune to the oxidation problem, but how much of the 8% of data that is not on magnetic media is actually on gold phase-change discs? Probably only a trivial percentage of it.

  111. hmmm by MoFoQ · · Score: 1

    hmmm....p0rn.

    reminds me of that one ep of the simpsons where Bart starts drawing Angry Dad cartoons and Leny says "It's the number 1 non-porn site on the web; 1 trillionth overall"

  112. Perhaps . . . by lavaface · · Score: 1
    The storage market grows with the amount of information produced

    it's the other way around.

  113. most of this... by Anonymous Coward · · Score: 0

    most of this new 'information' is cryptography-related...I mean, did everyone but me write a book last year?

    And most of the rest of it is spam.

    As far as I'm concerned, the only new information available to me this year is Stephenson's 'Quicksilver' and the movie 'LotR:RotK'. The rest is just wash.

  114. Only 800 Megabytes/Year?? by ka9dgx · · Score: 1
    I've 3.15 Gigabytes of photos from 2002 on my laptop... and that's AFTER I weeded them out. So far for this year, I'm at 7.13 Gigabytes of photos, and it's not even Christmas season yet!

    I admit I take more pictures than most, but I haven't gotten a video camera yet... just think of the Terabytes I'll consume with that bad boy.

    --Mike--

  115. Girlfriends are overrated by Anonymous Coward · · Score: 0
    You don't want a girlfriend. You want a hot sex slave. Hot sex slaves will satisfy you sexually, but a girlfriend will just suck up your free time and demand attention and drain your mental and financial resources. Don't give up yet. I'm 33, and I haven't had a girlfriend for 2 years. She was about 230 pounds, and didn't really excite me more than my pr0n collection could. But she still needed all sorts of attention. If you're going to kill yourself over not having THAT, I'd have to question your judgement. Just buy some adult DVDs and a big bottle of silicone lube. You get much better results, for a lot less money. You're just suffering from high levels of hormones. Either just do the DVD thing (recommended), or go to Mexico and have a bilateral orchiectomy (not recommended).

    But if you've really decided to end it all (I've had these thoughts), consider cashing in all of your assets, and going to maybe the Chicken Ranch or somewhere like that, and finding a girl who excites you, and negotiate a full weekend sex session with all the money you have. Or maybe with two or more girls, if you have the money. You may want to bring some v1agra with You'll end up talking during all that time, and you can tell them your story, about how this is your blaze of glory, and that they'll be the only women you've ever done it with, and they'll think it's so romantic, sort of like the "Leaving Las Vegas" movie, and maybe one of them will fall in love with you, and decide to quit the business and marry you, and support you by only doing lesbian porn. If not, and when your time is up, and they kick you out on your ass, then you will probably have no trouble finding a way to kill yourself at that point. But of course, maybe at that point, you'll decide that the lack of sex with a female isn't worth killing yourself over. You'll have hit rock bottom, and realized that it isn't that bad. You'll rebuild your life and with your newfound clarity and attitude, you will naturally attract women, and you'll live happily ever after.

    This is all a big "maybe." Personally, I just recommend the DVD and lube thing.

  116. Re:Should I kill myself? by Superfreaker · · Score: 1

    Helium is the preferred method.
    Search for hemlock society.

    No GF is no reason to kill oneself anyway.

  117. Old English by BuilderBob · · Score: 1

    From the 400 or so years that are classed as the Old English (upto abotu 1150 AD), we have a total of 5 million words in texts. That would probably fit on less floppy disks than Windows 3.11 and its Dos. Or in my telephone. It's true that not all bits are equal.

  118. Library of Congress Measurement by SandSpider · · Score: 1

    Now, are you using the current Library of Congress Measurement, or are you using an old one? I mean, new books must be coming in. I presume that's not just the ASCII, but scans of the pictures as a decent resolution.

    How will I ever do the proper conversions if you aren't using the up-to-date standards?

    =Brian

    --
    There is nothing so good that someone, somewhere, will not hate it.
  119. The world farted 6 billion times today. by op00to · · Score: 1

    Holy crap! There's a lot of everything in the world. Why is data much more exciting?

  120. Your math is wrong. by sulli · · Score: 1
    You are assuming one floppy per Jetta in this case. (look closely at your math.) As a Jetta owner I can assure you the cargo capacity is better than that.

    Dividing 95,797,591m^3 of floppies by 0.368119m^3 per Jetta, the requirement is 260,235,389 Jettas to transport them all there. Or one Jetta, preferably one more reliable than my old thing, 260,235,389 times.

    (Is the cargo capacity really that little? I would think it's over a cubic meter. Maybe they reduced the capacity in newer models.)

    --

    sulli
    RTFJ.
  121. It should be noted..... by ziggy_zero · · Score: 2, Interesting

    That there can't be an accurate data representation of the data in the Library of Congress because THEY don't know how much stuff they have. My cousin worked there this past summer, and he said they still have a large portion of the basement filled up with (unorganized, mind you) stacks of CD's that they haven't even put into their database yet. Same goes for books. It'll be awhile until anybody knows how much data the LoC has.

    --
    I belong to the ______ generation.
  122. Five Exabytes.. by Anonymous Coward · · Score: 0

    I wonder how much of that data was duplicate slashdot stories.

  123. Anyone have a BitTorrent link? by An+Onerous+Coward · · Score: 1

    Cuz I know the guy hosting this file is going to have a huge bandwidth bill.

    --

    You want the truthiness? You can't handle the truthiness!

  124. Ummm.. That's not data... by DJ+Spencer · · Score: 1
    The reason film comprised such a large percentage is that each film reel is duplicated thousands of times to be sent to theaters around the world.

    Okay, call me... A dork, but wouldn't a film real technically be media and not data?

    I mean, come on, why not count all the stuff kids write on paper... Oh wait... Nevermind that comment.

    How about the little, itsy-bitsy electric impulses running around in my brain? That's data.. Kinda-sorta... Okay, okay... Most of it is cobwebs, but still.. If a duplicated film real (aka MEDIA) is counted, then you have to start adding other things to the mix.

  125. Historians? by blair1q · · Score: 1

    Historians/anthropologists/archaeologists are interested in the ways in which the past created its future.

    They're not interested in analyzing every lump of dung a past civilization created.

    If they have 3 lumps of dung from a million individuals, it's something they'll study. If they have a million lumps of dung from 3 individuals, no.

    Just how many copies of the goatse.cx picture do you need to archive, anyway?

  126. Re:Ummm.. That's not data... by uberdave · · Score: 2, Informative

    The incredibly long thin strip of plastic with the tiny holes running along the edges is the media. The sequence of pictures is the data. What they did was figure out how big of an mpeg-2 file file would be needed to encode the movie. A lot of what this study is, is not so much how much data was generated, but how much new data storage capacity was generated. For example, if the industry produced 1 million blank cds, the study would show 700 million megabytes of new data.

  127. Re:Ummm.. That's not data... by DJ+Spencer · · Score: 1
    Ohhh... OKay, now I get it.. Still seems like a silly figure then. I mean, that's like me saying that I created 10 new CDs last night, when the reality was that I hadn't updated to XP SP1 and my USB 2.0 burner was throwing a fit... So in reality I created 1 CD of data, and 9 CDs of recycled material.

    Rambling in my head: "Maybe I should have read the article first.... Boy do I look stupid!"

  128. Re:Ummm.. That's not data... by uberdave · · Score: 1

    They built fudge factors in for this. I read through some of the methods they used. For their internet figures, for example, they sampled 9800 websites of the supposed 61 million URLS compiled by the Internet Archive (enough to get a 95% confidence level), wget/mirrored them to thier own servers (dropping links to other domains), and then analyzed the files for creation date, size, and uniqueness. For television We estimate about 1/4 of the programs are "original,". For CDs, they estimate that 1 in 20 gets trashed. Presumably, these figures are statistically based.

  129. Oh no!!! by Anonymous Coward · · Score: 0

    Well it's a good thing the universe keeps expanding, cause otherwise we might run out of places to put all this data.

  130. Some Percentage by Treacle+Treatment · · Score: 0

    In a related report, 4.7 exabytes of that data was swapfiles being written by Windows XP.

    --
    TT
  131. Since this is obviously an estimate... by soft_guy · · Score: 1

    Couldn't you know how many hard drives have been shipped, their capacities, estimate how long they last, and then take some random samples of how full people's hard drives are and then make an estimate?

    Is that what they did?

    --
    Avoid Missing Ball for High Score
  132. Digital Cameras ? by SirFlakey · · Score: 1

    It seems wherever you go these days we're taking photos of it - these days usually in digital. Having become the (proud) owner of a Canon 300D 6MP camera in the last few days I am amazed that in the good old days of the 8086, where Wordperfect 4.2 and DOS 3.0 didn't quite full a 10MB drive - today that same drive would hold only ~ 3 JPG photos from the camera ...

    and then there is the old saying that junk *will* fill the space provided.

    --
    Jon - TheSpork
  133. What they DON'T tell you... by njord · · Score: 1

    ... is that 2 of those exabytes was just data created by the researchers to discern the amount of data made in 2001.

  134. cat /dev/urandom > ~/foo by Anonymous Coward · · Score: 1, Funny

    does that count?

    Some days at my job I create gigs of test data every few minutes.

  135. And how much of it will archaeologists find ... by code4fude · · Score: 1

    ... 5000 years from now?

    in their eyes, this century will hardly exist.

  136. though much is taken, little abides by danny · · Score: 2, Interesting
    I used to think in 7-bit ascii, but the digital camera changed all that... In the last year I've taken over 5000 photos - 5gig of data - as well as writing my usual couple of megabytes.

    But only a fraction of that will make it onto my web site - I have maybe 60 megabytes of photos (cut-down to around 100k each) online and 10 megabytes of text on my web sites, and would be adding less than 40 megabytes a year to that.

    Maybe I'll get a video camera, though, or put up some MP3s of my gamelan group...

    Though much is taken, much abides; and though We are not now that strength which in old days Moved earth and heaven, that which we are, we are.

    Danny.

    --
    I have written over 900 book reviews
  137. silly people by tabby · · Score: 1

    and how much was lost due to people not creating backups?

    --
    I've experiments to run, there is research to be done on the people who are still alive.
  138. WTF does "Create data" mean? by brunes69 · · Score: 1

    1. Get 500 TB raid array 2. Mount at /data 3. cat /dev/urandom > /dev/data/file.dat 4. Wait a while Does this count as "creating" 500 TB of data? I don't think so. Simmilrarly much of these comments about Kazaa and P2P are stupid... just because theres 500 TB of data on Kazaa doesn't mean theres 500 UNIQUE TB.. probably over 90% of it is duplicates of other data, after all that's how P2p functions.

  139. tiny program = massive data! by Anonymous Coward · · Score: 0

    i dinna believe it ...
    i got these tiny fractal generator (it's like
    15 kb in size) but it can easally generat ... what? ... 5 exa-whatever-bytes of data!

    so you must ahave been kidding.

    as for these "huge" amounts of data they're "producing" in these physical collison tests, well
    very simple acctually: they haven't figuered out
    how to make a "small" experiment. nevermind.

    oh and don't forget the "massive" amounts of redundant data that is being produced on Internet Relay Chat everyday.

  140. Re:Should I kill myself? by Anonymous Coward · · Score: 0

    Make sure there is no one standing on the ground before jumping then.. but still, don't do it =)