Slashdot Mirror


Info Glut - Five Exabytes of Data Created in 2002

securitas writes "If you had any doubts that you are overwhelmed by the volume of information in your life, a new Berekley study (PDF) shows that five exabytes of data were created in 2002, twice the 1999 total. That's five million terabytes of data, or 500,000 Libraries of Congress, which works out to about 800 MB of data for each of the 6.3 billion people on the planet. Of note is that 92 percent of the new information was stored on magnetic media, which may create an interesting problem for historians and archaeologists of the future. The study was conducted by University of California-Berkeley's School of Information Management and Systems professors Peter Lyman and Hal Varian. More at CNet, Infoworld, ByteAndSwitch and The Register."

16 of 284 comments (clear)

  1. Huzzah! by GaelenBurns · · Score: 4, Interesting

    Hooray for exponential curves! It is daunting, though. As an illustration of this, I read that the White House has already turned over 2 million pages of documents relating to 9/11 to the independent investigation panel.

  2. quote by CGP314 · · Score: 5, Interesting

    All of the books in the world contain no more information than is broadcast as video in a single large American city in a single year. Not all bits have equal value. --Carl Sagan

  3. Re:And about 1% was worthwhile by uberdave · · Score: 3, Interesting

    I wonder how much of that was duplicate data. How many copies of the Matrix are floating around online? Did they count FTP mirror sites as separate data?

    For that matter, how much of the data is real, and how much is virtual? If two sites point to the same download, is that data counted twice, or once?

  4. Re:And about 1% was worthwhile by Jason1729 · · Score: 3, Interesting

    That's a good point. How much of that was spam?

    ProfQuotes

  5. Storage by 3Suns · · Score: 3, Interesting

    I work at EMC, and this fact (along with projections for similar growth in the future) is a big marketing strategy for the company, especially toward investors. The storage market grows with the amount of information produced... it's gotta be stored somewhere!

    --

    -3Suns

    ~~~~
    The Revolution will be Slashdotted
  6. Not long-term data by micromoog · · Score: 2, Interesting
    That's a big-sounding number, but most of this is not going to be useful or stored long term. Examples:
    • Many large companies are building VERY large data warehouses, to capture and analyze every iota of information about every transaction. In a year or two, much of today's data will be largely irrelevant, and will likely be summarized and deleted.
    • People send a lot of email, and post a lot of messages, about day-to-day stuff that has no long-term value.
    • Surveillance video is used more than ever. This is not going to be stored long-term, except perhaps in the most security-sensitive areas.
    Either way, I highly commend the article's author for using both "Libraries of Congress" and "feet of books" as measurement units.
  7. Mass replication by binaryDigit · · Score: 2, Interesting

    I think the more interesting thing to study would be to determine how much unique data is being generated. I mean who cares if two million people have the latest Britanny Spears song in mp3 format? And that's not even talking about "information", but just simply raw "data". I also wonder if they took into account "data in transit" (being transmitted over the ethernet) and temporary data (caches, etc).

  8. It's only going to get worse... by mengel · · Score: 3, Interesting

    At Fermilab where I work, the larger experiments are expecting to generate 1PB/year of data in around 2005, up from somewhere around 300TB/year currently.

    --
    - "History shows again and again how nature points out the folly of men" -- Blue Oyster Cult, 'Godzilla'
  9. Re:No problem here. by GaelenBurns · · Score: 5, Interesting

    I wonder how many pages of paper an exabyte of data would take up? We're talking about gigantic masses, here. Why not figure it out? I'm guessing, based on character counts from Open Office, that you can get about 2kB of data on a single sheet. That's 4kB if you use both sides. And you get around 125 sheets per pound... So, based on some guesses, it looks like it will take 2,251,799,813,685 pounds of paper to print one exabyte of this data. For all 5 exabytes, we're looking at a wieght 122 times that of the Great Pyramid. Not as much as I'd suspected... but still fun!

  10. My figures by robogun · · Score: 3, Interesting

    I just did another backup, so the figures are right at hand.
    I'm a news photographer, shooting digital.
    In 2002 I saved 78,742 photos to disk. (Bad images were not saved.)
    That worked out to 122 gig. The output was transferred fromt he CF cards and archived to DVDs.
    But how much of that 122 gig is really information? The image file saved by the Canon 1d is mostly empty air, as far as I can tell. There is also EXIF data and IPTC, and who knows how much hidden BS is included a'la Microsoft Word documents?
    Simple compression was able to whittle that down to 33.2 gig. So that's my contribution.
    The main beneficiary is the DVD-R blank disc makers and Western Digital, I guess.

    1. Re:My figures by robogun · · Score: 2, Interesting

      Believe me, there are many more pictures being taken. The main reason is the limitation of film cost and processing has been removed.

      I never had that limitation and I still shoot 2-3 times as much as I did in 1999.
      Probably the main reason is the good cameras, like the Canon 1d, shoot 8 frames a second. A 1G CF card holds 420 shots. The largest roll of film is 36 frames.

      I shot digital starting in 1996, but still primarily used film until decent digital SLRs came out. I moved over entirely to digital in 2001.

      In 1996 I shot maybe 100 photos with digital (and they were small >10 kb each). That was an early Kodak.

      In 1998 I shot advertising using an Olympus D620L. That thing shot images maybe 80kb. In 2000 I shot 1,643 digital images occupying 250 mb or so, aainst 4,000 or 5,000 frames of film. Of the film, only the frames for publication needed to be scanned to disk. The total amount of disk space used wasn't much.

      In 2001 the Nikon D1 came out. I shot 56,066 that year (got it in March). 22 gigs worth, spanned across lots of CDRs.

      So far in 2003, with the Canon 1d and 1ds, shot 50,261 frames, taking up about 32 gig, archived to DVD.

      I would expect these increases to continue for the near future.

  11. Disk space is cheap - and other myths by rivaldufus · · Score: 1, Interesting

    How many other sysadmins out there are tired of hearing this? Every time I go to a company and even suggest quotas on the file server, the engineering group always says, "Disk space is cheap, or "you can get an 80GB disk for cheap."

    Of course, this never takes into account backup media and the whole backup infrastructure (anyone price decent commercial backup software recently?).

    I'm surprised it's only five exabytes. The admins of the world should go ahead and put a 400MB Quota on all 6.3 Billion people. That way, we'd be down to 1999's storage levels....

  12. Do the evolution by FrankoBoy · · Score: 2, Interesting
    So this means 1.126 gigaton of paper. According to this research paper, the world's major nuclear arsenals is equal to about 5 gigatons of TNT.

    Now, here's a little math for you :
    • Print every single bit of information the whole world produced last year.
    • Copy all of the output four times.
    • Replace all this paper by TNT...
    ...and the result, my friends, is the perfect recipe for global annihilation. Conventional weapons sold separately.
  13. Re:Well... by Pxtl · · Score: 2, Interesting

    Amen - I'm surprised the government or companies have not encouraged the development of some sort of long-term storage system for archival purposes. What happens when you crack open that 5-year old archive of the source to see what a long-forgotten client is running, and find out the CD has skipped a few bits? Or old government documents?

    Maybe more research could be done into a marketable multi-century (millenial?) storage.

    For corporate purposes, several decades of fidelity, perhaps a century or two, would be fine - but government will need better than that.

    Can anyone think of good media to store digital data that would last a few thousand years? Optical or otherwise, everthing decays, but what goes slowest? Engraved graphite maybe? Etched titanium disks?

  14. It should be noted..... by ziggy_zero · · Score: 2, Interesting

    That there can't be an accurate data representation of the data in the Library of Congress because THEY don't know how much stuff they have. My cousin worked there this past summer, and he said they still have a large portion of the basement filled up with (unorganized, mind you) stacks of CD's that they haven't even put into their database yet. Same goes for books. It'll be awhile until anybody knows how much data the LoC has.

    --
    I belong to the ______ generation.
  15. though much is taken, little abides by danny · · Score: 2, Interesting
    I used to think in 7-bit ascii, but the digital camera changed all that... In the last year I've taken over 5000 photos - 5gig of data - as well as writing my usual couple of megabytes.

    But only a fraction of that will make it onto my web site - I have maybe 60 megabytes of photos (cut-down to around 100k each) online and 10 megabytes of text on my web sites, and would be adding less than 40 megabytes a year to that.

    Maybe I'll get a video camera, though, or put up some MP3s of my gamelan group...

    Though much is taken, much abides; and though We are not now that strength which in old days Moved earth and heaven, that which we are, we are.

    Danny.

    --
    I have written over 900 book reviews