Slashdot Mirror


Info Glut - Five Exabytes of Data Created in 2002

securitas writes "If you had any doubts that you are overwhelmed by the volume of information in your life, a new Berekley study (PDF) shows that five exabytes of data were created in 2002, twice the 1999 total. That's five million terabytes of data, or 500,000 Libraries of Congress, which works out to about 800 MB of data for each of the 6.3 billion people on the planet. Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future. The study was conducted by University of California-Berkeley's School of Information Management and Systems professors Peter Lyman and Hal Varian. More at CNet, Infoworld, ByteAndSwitch and The Register."

41 of 284 comments (clear)

  1. And about 1% was worthwhile by XNuke · · Score: 4, Insightful

    I looks like they are counting every tiny email about "going to lunch". Lots of DATA little INFORMATION.

    1. Re:And about 1% was worthwhile by uberdave · · Score: 3, Interesting

      I wonder how much of that was duplicate data. How many copies of the Matrix are floating around online? Did they count FTP mirror sites as separate data?

      For that matter, how much of the data is real, and how much is virtual? If two sites point to the same download, is that data counted twice, or once?

    2. Re:And about 1% was worthwhile by Jason1729 · · Score: 3, Interesting

      That's a good point. How much of that was spam?

      ProfQuotes

    3. Re:And about 1% was worthwhile by tachin · · Score: 4, Insightful
      Lots of DATA little INFORMATION.
      From data you can extract "information", take a lot of those "going to lunch" mails and you can see what groups of people lunch together and at what time....
    4. Re:And about 1% was worthwhile by Tenebrious1 · · Score: 4, Informative

      I wonder how much of that was duplicate data. How many copies of the Matrix are floating around online? Did they count FTP mirror sites as separate data?

      The blurb said 92% was stored on magnetic media; curious about the rest, I looked glanced around the article. Surprisingly a large part, 7%, is FILM! The reason film comprised such a large percentage is that each film reel is duplicated thousands of times to be sent to theaters around the world.

      So if they're counting duplicates in film, I'd guess they'd count duplicates in magnetic media.

      --
      -- If god wanted me to have a sig, he'd have given me a sense of humor.
    5. Re:And about 1% was worthwhile by MosesJones · · Score: 2, Insightful

      Its actually all my fault...

      I left this script running on the unix farm which did the following on each box

      while(true)
      rm filename
      echo "Whose the Daddy" > filename
      end while

      Its a big farm, and its been running all year. The net result is about 100k of files on the farm total... but terrabytes during the year.

      In otherwords what I mean is...

      How much of this "created" information was transient.

      --
      An Eye for an Eye will make the whole world blind - Gandhi
    6. Re:And about 1% was worthwhile by kfg · · Score: 5, Funny

      "I wonder how much of that was duplicate data."

      3% was [AOL] Me Too! [/AOL] posts.

      1% was In Soviet Russia jokes.

      0.5% Profit!!!

      So I guess there was a fair amount of duplication.

      KFG

    7. Re:And about 1% was worthwhile by Tenebrious1 · · Score: 2, Funny

      I wonder how they count film too, as film is not digital medium did they MPEG2 it a la DVD, or take the raw footage from the cameras (as long as it wasn't a direct analog to analog transfer)... and what about photographs - did they count them and if so at the molecular level of the photo paper?

      I only glanced through the numbers, but couldn't find any place that said "for our purposes pictures are considered HxV resolution". For film (studio movies), they did say each frame was considered a picture and that sound contained a lot of data, but well again, I don't know how they sampled it.

      Maybe they just used "a picture is worth 1000 words". Hmm... no, at 5 characters an average word, that's only 5K per picture, way too low.

      --
      -- If god wanted me to have a sig, he'd have given me a sense of humor.
  2. Sounds about right. by Matey-O · · Score: 4, Insightful

    That's a believable number. Consider the amount of published data on Kazaa, or that 45 minutes of raw DV video is roughly 12.5 Gb*. Move 100 of your CD's to MP3s and you're consuming/creating roughly 3.5 Gb* (or more if you're using higher than 128kb MP3's). And I'm not evern commentin on pr0n.

    (*I said roughly...comment on the comment, not the mathematical precision of the statement.)

    --
    "Draco dormiens nunquam titillandus."
  3. Yeah... by the_mad_poster · · Score: 4, Funny

    ...and most of it is still sitting in my Inbox at work right now.

    --
    Alito: A vote for Alito is a punch in the eye to put that bitch back in her place!
  4. Dissertation by BWJones · · Score: 2, Funny

    a new Berekley study (PDF) shows that five exabytes of data were created in 2002,

    Shoot, it felt like my doctoral dissertation was responsible for at least 2 of those 5 exabytes. :-)

    --
    Visit Jonesblog and say hello.
  5. This artcical says 23 exabytes by SirJaxalot · · Score: 3, Informative
    1. Re:This artcical says 23 exabytes by Vaevictis666 · · Score: 3, Informative
      Your article states:

      They found that new information flowing across televisions, radios, telephones, Web sites and the Internet had increased by 3 1/2 times to a total of 18 exabytes as of 2002. The amount of new but stored (non-transmitted) information in 2002 was determined to be about five exabytes.

      This jives with the other articles. 5 exabytes generated content, 18 exabytes transferred content - still one heck of a lot of bits floating around :)

  6. No problem here. by FrankoBoy · · Score: 2, Funny

    Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.

    Well, why won't they just print it ? Sheesh...

    1. Re:No problem here. by GaelenBurns · · Score: 5, Interesting

      I wonder how many pages of paper an exabyte of data would take up? We're talking about gigantic masses, here. Why not figure it out? I'm guessing, based on character counts from Open Office, that you can get about 2kB of data on a single sheet. That's 4kB if you use both sides. And you get around 125 sheets per pound... So, based on some guesses, it looks like it will take 2,251,799,813,685 pounds of paper to print one exabyte of this data. For all 5 exabytes, we're looking at a wieght 122 times that of the Great Pyramid. Not as much as I'd suspected... but still fun!

    2. Re:No problem here. by indianajones428 · · Score: 5, Funny


      So 122 Great Pyramids = 500,000 Libraries of Congress?

      Great, another conversion factor to remember...

      --
      When a thing has been said, and said well, have no scruple. Take it and copy it. --Anatole France
  7. Huzzah! by GaelenBurns · · Score: 4, Interesting

    Hooray for exponential curves! It is daunting, though. As an illustration of this, I read that the White House has already turned over 2 million pages of documents relating to 9/11 to the independent investigation panel.

  8. Re:Damn by Carnildo · · Score: 3, Funny

    You've got a thousand times your allotment of porn! Think of all the poor people in Africa who you are depriving of their annual allowance!

    --
    "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
  9. quote by CGP314 · · Score: 5, Interesting

    All of the books in the world contain no more information than is broadcast as video in a single large American city in a single year. Not all bits have equal value. --Carl Sagan

  10. that's a LoC per minute, almost. by sulli · · Score: 3, Funny
    525,600 minutes per year. Impressive.

    But if these data were recorded on floppies, and stacked up to the moon n times, how many VWs would it take to carry those floppies to the stack site?

    --

    sulli
    RTFJ.
  11. Storage by 3Suns · · Score: 3, Interesting

    I work at EMC, and this fact (along with projections for similar growth in the future) is a big marketing strategy for the company, especially toward investors. The storage market grows with the amount of information produced... it's gotta be stored somewhere!

    --

    -3Suns

    ~~~~
    The Revolution will be Slashdotted
  12. Not long-term data by micromoog · · Score: 2, Interesting
    That's a big-sounding number, but most of this is not going to be useful or stored long term. Examples:
    • Many large companies are building VERY large data warehouses, to capture and analyze every iota of information about every transaction. In a year or two, much of today's data will be largely irrelevant, and will likely be summarized and deleted.
    • People send a lot of email, and post a lot of messages, about day-to-day stuff that has no long-term value.
    • Surveillance video is used more than ever. This is not going to be stored long-term, except perhaps in the most security-sensitive areas.
    Either way, I highly commend the article's author for using both "Libraries of Congress" and "feet of books" as measurement units.
  13. Mass replication by binaryDigit · · Score: 2, Interesting

    I think the more interesting thing to study would be to determine how much unique data is being generated. I mean who cares if two million people have the latest Britanny Spears song in mp3 format? And that's not even talking about "information", but just simply raw "data". I also wonder if they took into account "data in transit" (being transmitted over the ethernet) and temporary data (caches, etc).

  14. True it's a lot of info to create, but... by The+Jonas · · Score: 4, Insightful

    ...how much info is destroyed each year to offset these numbers. I mean shredded files, stuff thrown in trash, bills, deleted data files, discarded/lost storage media, etc... In the end (of each year), I wonder, what is the actual increase in stored information?

  15. It's only going to get worse... by mengel · · Score: 3, Interesting

    At Fermilab where I work, the larger experiments are expecting to generate 1PB/year of data in around 2005, up from somewhere around 300TB/year currently.

    --
    - "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'
  16. Libraries of Congress by Entropy248 · · Score: 2, Insightful

    500,000 Libraries of Congress, huh? I've always had several problems (SI questions aside) with this unit of measurement. The Library of Congress is constantly expanding & adding new material. What year Library of Congress do they mean? I imagine they aren't working w/ up to the minute data and that the libary is expanding much faster now. Not to mention the fact that everyone always makes exabytes ~2.4% smaller than they really are (and with numbers this big, it actually makes a difference!)... So call me the new number nazi troll already and get it over with...

  17. My figures by robogun · · Score: 3, Interesting

    I just did another backup, so the figures are right at hand.
    I'm a news photographer, shooting digital.
    In 2002 I saved 78,742 photos to disk. (Bad images were not saved.)
    That worked out to 122 gig. The output was transferred fromt he CF cards and archived to DVDs.
    But how much of that 122 gig is really information? The image file saved by the Canon 1d is mostly empty air, as far as I can tell. There is also EXIF data and IPTC, and who knows how much hidden BS is included a'la Microsoft Word documents?
    Simple compression was able to whittle that down to 33.2 gig. So that's my contribution.
    The main beneficiary is the DVD-R blank disc makers and Western Digital, I guess.

    1. Re:My figures by dcobbler · · Score: 2, Insightful

      I think other parts of this discussion are probably already arguing about "data vs. information" but this post, I think, points out one of the reasons for that argument: between 1999 and 2002, how many more digital cameras are around and how much larger (in pixels/bits) are the images? Just because there are more digital pics with more pixels each, doesn't mean that there are more actual pictures being taken. And for each new digital camera that is being used, how many fewer film cameras are being used. I suspect that there *are* more pictures being taken but this study doesn't necessarily prove that.

      Cheers,

      Dcobbler.

    2. Re:My figures by robogun · · Score: 2, Interesting

      Believe me, there are many more pictures being taken. The main reason is the limitation of film cost and processing has been removed.

      I never had that limitation and I still shoot 2-3 times as much as I did in 1999.
      Probably the main reason is the good cameras, like the Canon 1d, shoot 8 frames a second. A 1G CF card holds 420 shots. The largest roll of film is 36 frames.

      I shot digital starting in 1996, but still primarily used film until decent digital SLRs came out. I moved over entirely to digital in 2001.

      In 1996 I shot maybe 100 photos with digital (and they were small >10 kb each). That was an early Kodak.

      In 1998 I shot advertising using an Olympus D620L. That thing shot images maybe 80kb. In 2000 I shot 1,643 digital images occupying 250 mb or so, aainst 4,000 or 5,000 frames of film. Of the film, only the frames for publication needed to be scanned to disk. The total amount of disk space used wasn't much.

      In 2001 the Nikon D1 came out. I shot 56,066 that year (got it in March). 22 gigs worth, spanned across lots of CDRs.

      So far in 2003, with the Canon 1d and 1ds, shot 50,261 frames, taking up about 32 gig, archived to DVD.

      I would expect these increases to continue for the near future.

  18. Re:800 MB per person by Anonymous+Crowhead · · Score: 5, Funny

    I personally burned over 500 CDs last year

    Congrats, you balanced out 1 medium-sized tribe in Africa.

  19. AOL doom day. by twitter · · Score: 2, Funny
    I've got more than my share of data, enough to discard the 800MB or so that AOL likes to mail me. 800MB/person is not shocking when I think of all the CDs I've stumbled across in the field - literally grass fields in the midle of nowhere.

    It's a joke..

    --

    Friends don't help friends install M$ junk.

  20. Re:800 MB per person by IM6100 · · Score: 2, Insightful

    What did you burn on those 500 CDs?

    Do you run your own particular psuedo-random number generator and store the results? Do you go out with a digital camcorder and record tons and tons of images of the world? Do you write that much prose or poetry in a year?

    Or are you just talking about 500 CDs of data that you or somebody else 'ripped' from exisiting media and are shuffling around?

    --
    A Good Intro to NetBS
  21. Relevance? by BorgCopyeditor · · Score: 2, Funny
    Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.

    Not least for those historians who want to know what my Amazon.com session ID was on the day that my Runescape character hit mining level 33.

    --
    Shop as usual. And avoid panic buying.
  22. Do the evolution by FrankoBoy · · Score: 2, Interesting
    So this means 1.126 gigaton of paper. According to this research paper, the world's major nuclear arsenals is equal to about 5 gigatons of TNT.

    Now, here's a little math for you :
    • Print every single bit of information the whole world produced last year.
    • Copy all of the output four times.
    • Replace all this paper by TNT...
    ...and the result, my friends, is the perfect recipe for global annihilation. Conventional weapons sold separately.
  23. Re:Let's get the standard jokes out of the way by NumLk · · Score: 3, Funny
    You forgot these jokes:

    I for one welcome our new data generating overlords!

    With all that data you'd think that my conne3^&#5$ATDT01[NO CARRIER]

    In Soviet Russia data generates YOU!

    Homer: I see they have the Internet on computers now.

    --
    Children in the backseats don't cause accidents. Accidents in the back seats cause children.
  24. Re:Well... by Pxtl · · Score: 2, Interesting

    Amen - I'm surprised the government or companies have not encouraged the development of some sort of long-term storage system for archival purposes. What happens when you crack open that 5-year old archive of the source to see what a long-forgotten client is running, and find out the CD has skipped a few bits? Or old government documents?

    Maybe more research could be done into a marketable multi-century (millenial?) storage.

    For corporate purposes, several decades of fidelity, perhaps a century or two, would be fine - but government will need better than that.

    Can anyone think of good media to store digital data that would last a few thousand years? Optical or otherwise, everthing decays, but what goes slowest? Engraved graphite maybe? Etched titanium disks?

  25. Reminds me of this observation: by targo · · Score: 4, Funny

    5 billion files are created every day.
    3 billion of them will never be found again.
    Poor files...

  26. And what kind of data are we creating? by Pedrito · · Score: 2, Funny

    Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future.

    They fail to mention that also of note is that 99% of that informations is in the form of pr0n! That's a lot!

  27. It should be noted..... by ziggy_zero · · Score: 2, Interesting

    That there can't be an accurate data representation of the data in the Library of Congress because THEY don't know how much stuff they have. My cousin worked there this past summer, and he said they still have a large portion of the basement filled up with (unorganized, mind you) stacks of CD's that they haven't even put into their database yet. Same goes for books. It'll be awhile until anybody knows how much data the LoC has.

    --
    I belong to the ______ generation.
  28. Re:Ummm.. That's not data... by uberdave · · Score: 2, Informative

    The incredibly long thin strip of plastic with the tiny holes running along the edges is the media. The sequence of pictures is the data. What they did was figure out how big of an mpeg-2 file file would be needed to encode the movie. A lot of what this study is, is not so much how much data was generated, but how much new data storage capacity was generated. For example, if the industry produced 1 million blank cds, the study would show 700 million megabytes of new data.

  29. though much is taken, little abides by danny · · Score: 2, Interesting
    I used to think in 7-bit ascii, but the digital camera changed all that... In the last year I've taken over 5000 photos - 5gig of data - as well as writing my usual couple of megabytes.

    But only a fraction of that will make it onto my web site - I have maybe 60 megabytes of photos (cut-down to around 100k each) online and 10 megabytes of text on my web sites, and would be adding less than 40 megabytes a year to that.

    Maybe I'll get a video camera, though, or put up some MP3s of my gamelan group...

    Though much is taken, much abides; and though We are not now that strength which in old days Moved earth and heaven, that which we are, we are.

    Danny.

    --
    I have written over 900 book reviews