Slashdot Mirror


Digital Big Bang — 161 Exabytes In 2006

An anonymous reader tips us to an AP story on a recent study of how much data we are producing. IDC estimates that in 2006 we created, captured, and replicated 161 exabytes of digital information. The last time anyone tried to estimate global information volume, in 2003, researchers at UC Berkeley came up with 5 exabytes. (The current study tries to account for duplicating data — on the same assumptions as the 2003 study it would have come out at 40 exabytes.) By 2010, according to IDC, we will be producing far more data than we will have room to store, closing in on a zettabyte.

49 of 176 comments (clear)

  1. XXX by daddyrief · · Score: 5, Funny

    And half of that is porn...

    --
    "Banking establishments are more dangerous than standing armies." -Thomas Jefferson
    1. Re:XXX by the-amazing-blob · · Score: 4, Funny

      Way to clog the tubes up, guys. Seriously. :P

    2. Re:XXX by daddyrief · · Score: 2, Funny

      That's just what happens after a 'digital big bang.'

      --
      "Banking establishments are more dangerous than standing armies." -Thomas Jefferson
    3. Re:XXX by maxume · · Score: 4, Funny

      They upgraded to mpeg4.

      --
      Nerd rage is the funniest rage.
  2. It was only 9 megs by noewun · · Score: 5, Funny

    Without Slashdot dupes.

    --
    I am a believer of momentum and curves.
    1. Re:It was only 9 megs by cmacb · · Score: 4, Insightful

      HA!

      But seriously, I wonder what percentage of this data is text. I'd guess it is a very very small amount. When I had a film camera, in twenty years I bet I took less than 100 rolls of film. With digital cameras I've take thousands of pictures, sometimes taking a dozen or more of the same subject, just because the cost to me is practically zero. Now there are vendors that will let me upload large numbers of these amateurish photos for free, and let's pretend that there are enough people interested in seeing my pictures that these companies can pay for this storage with advertising. That's scary.

      Excluding attachments I think it would be practically impossible for anyone to use up Googles 2 gig of storage, but I've heard of people using it up in little more than a week by mailling large attachments back and forth (oh yeah, I HAVE to have every single iteration of that Word document, sure I do!)

      But what's scarier is that for some nominal fee (like $20 a year) they place no limit at all on my ability to hog a disk drive somewhere. I know people who are messed up in the head enough to want to test these claims. Give them 5 gig for photos and they've filled it up in a week, give them "unlimited" and they upload pure junk to see if they can break the thing.

      Like any house of cards, this thing is gonna come down sooner or later. I just hope that people who are making sensible use of these online services don't lose everything along with the abusers.

  3. Finally, an excuse... by bigforearms · · Score: 5, Funny

    The furry porn gets deleted first.

  4. How many... by Looce · · Score: 3, Funny

    ... times does the Library of Congress fit in that? Exabytes simply don't speak to me.

    Alternatively, you can also answer in anime episodes, or mp3 files.

    1. Re:How many... by LighterShadeOfBlack · · Score: 5, Funny

      That'd be 1,191,400 Libraries of Congress.

      Honestly, I don't know why the /. editors allow these "scientific articles" that only provide data in these obscure and archaic "byte" measurements. Absurd!

      --
      Spelling mistakes, grammatical errors, and stupid comments are intentional.
    2. Re:How many... by franksands · · Score: 5, Informative
      Since you asked:

      Oh, the equivalents! That's like 12 stacks of books that each reach from the Earth to the sun. Or you might think of it as 3 million times the information in all the books ever written, according to IDC. You'd need more than 2 billion of the most capacious iPods on the market to get 161 exabytes.

      I don't have anime estimates, but I can make a Heroes analogy.a hi-def episode is more or less 700mb. Considering the first season has 23 episodes, that would make 16.1gb. So 161 exabytes would be 10,000,000,000 (ten billion) seasons of Heroes. Since the earth currenlty has around 6.6 billion people, this would mean that you would have 1 episode for each person on the planet, and all the people of China, India and the US would have a second episode. That's how big it is.

      Regarding the storage space, I call shenanigans. We already have HDD that stores terabytes. A couple years from now, MS office will require that space to be installed.

    3. Re:How many... by Anonymous Coward · · Score: 5, Funny

      760 billion episodes of anime.In other words, about half the length of a typical Dragonball Z fight scene.

  5. Sorry, my fault... by slobber · · Score: 5, Funny

    I left cat /dev/urandom running

    --
    "You mortals are so obtuse." -Q
    1. Re:Sorry, my fault... by product+byproduct · · Score: 4, Interesting

      Amazingly it would take 1,600,000 years for /dev/urandom to produce 161 exabytes (assuming 3.2 MB/s, YMMV)

    2. Re:Sorry, my fault... by Dirtside · · Score: 4, Funny

      Yes, but what he didn't say is that he left it running on every computer on earth.

      --
      "Destroy science and religion. Science would re-emerge exactly the same; but not religion." - Penn Jillette, paraphrased
    3. Re:Sorry, my fault... by T-Ranger · · Score: 2, Funny

      Checking quickly, your comment just showed up in my /dev/urandom at 74629629165936 blocks of 1k. It may be in there again.

  6. And here I thought Malthus was dead by Anonymous Coward · · Score: 5, Insightful

    We won't be running out of space just like we didn't run out of food. New technology will allow us to store ever more data.

    1. Re:And here I thought Malthus was dead by LighterShadeOfBlack · · Score: 3, Informative

      As the article notes, the amount we produce is not the same as the amount we would actually want to store. Since that 161EB includes duplications such as broadcasting, phone calls, and all manner of temporary or real-time data it's not really relevant to compare that number with storage capabilities as the summary implies.

      --
      Spelling mistakes, grammatical errors, and stupid comments are intentional.
    2. Re:And here I thought Malthus was dead by Spunkee · · Score: 2, Insightful

      The mass of the Earth would increase as the number of humans increase up to a theoretical limit approaching infiniti.

      They wouldn't fit comfortably, and you'd certainly have to stack them. The acceleration of gravity would increase as more humans were added to the mass of the Earth.

      Possibly, you'd have to import food from throughout the universe... I'm not sure if conservation of mass applies to a planet and all living (or not living) entities on it... Debris from space that enters Earth's atmosphere may or may not be useful in reproducing humans.

      Of course, as the mass approaches infiniti the universe would begin to be pulled toward Earth eventually ending in one big-assed mass. Maybe. Smoke some weed, drop some acid, and figure it out for yourself. Most of you are so left-brained a little mental exploration would probably be good for you.

    3. Re:And here I thought Malthus was dead by Patrik_AKA_RedX · · Score: 2, Funny

      Exactly. My company is developing a new storage medium based on penistechnology. If you don't have enough space, just play with it and it gets bigger. We're close to commercial release, just one more critical bug to iron out: it tends to burst out data if you try to enlarge it for too long.

  7. What's an exabyte? by Anonymous Coward · · Score: 2, Informative

    Simply put, a lot

    10^18 bytes, or One million terabytes

    1. Re:What's an exabyte? by springbox · · Score: 2, Informative

      Did they measure in exabytes or exbibytes (2^60 bytes)? The difference between 161 exabytes and 161 exbibytes are 24,620,362,241,702,363,136 bytes - about 21.36 exbibytes. Kind of important since the margin of error will only increase as the measured data grows. (Lets stop using the SI units when we don't actually mean it.)

  8. What if ISP's are forced to retain data? by cryfreedomlove · · Score: 4, Interesting

    I imagine that a lot of this is web traffic logs. What if the US government really does force ISP's to keep records detailing the sites visited by their customers? Will my ISP rates increase to pay for all of that disk space?

    1. Re:What if ISP's are forced to retain data? by garcia · · Score: 2, Funny

      Will my ISP rates increase to pay for all of that disk space?

      No, of course not. Any law or regulation that the government comes up with doesn't have any hidden costs.

    2. Re:What if ISP's are forced to retain data? by daeg · · Score: 2, Funny

      Costs be damned when you're The Decider and, much to the dismay of IT budgets everywhere, can change time itself on a whim!

  9. Must be the space donuts by Anonymous Coward · · Score: 5, Funny

    So the sum total of data has increased by a factor of more than 30 since 2003? I knew Brent Spiner was putting on weight, but damn.

  10. The awesome information we retain by iPaul · · Score: 5, Insightful

    Web server log files with the history of people clicking around. My address stored by everybody I ever bought anything on line from. It's more an information land-fill than an information warehouse.

    --
    Leave the gun, take the cannoli -- Clemenza, The Godfather
  11. And there used to be so little on-line data by Animats · · Score: 5, Interesting

    What's really striking is how little data was available in machine-readable form well into the computer era. In the 1970s, the Stanford AI lab got a feed from the Associated Press wire, simply to get a source of machine-readable text for test purposes. There wasn't much out there.

    In 1971, I visited Western Union's installation in Mawah, NH, which was mostly UNIVAC gear. (I worked at a UNIVAC site a few miles away, so I was over there to see how they did some things.) I was shown the primary Western Union international gateway, driven by a pair of real-time UNIVAC 494 computers. All Western Union message traffic between the US and Europe went through there. And the traffic volume was so small that the logging tape was just writing a block every few seconds. Of course, each message cost a few dollars to send; these were "international telegrams".

    Sitting at a CRT terminal was a woman whose job it was to deal with mail bounces. About once a minute, a message would appear on her screen, and she'd correct the address if possible, using some directories she had handy, or return the message to the sender. Think about it. One person was manually handling all the e-mail bounces for all commercial US-Europe traffic. One person.

  12. "closing in on a zettabyte" by Supreme+Dragon · · Score: 3, Funny

    Is that the size of the next MS OS?

  13. Supply and demand by rufty_tufty · · Score: 4, Insightful

    I'm sorry, how stupid is this?
    "producing far more data than we will have room to store"

    That's like saying, for the last 2 months, my profit has increased by 10%. If my profit keeps increasing at 10% per month, then pretty soon I'll own all the money in the world, and then I'll own more money than exists! Damn I must stop making money now before I destroy the world economy!!!

    Who are these people who draw straight lines on growth curves? Why do people print the garbage they write and why weren't they the first against the wall after the dot com bust?
    The only things that seem certain are death, taxes, entropy and stupid people...

    --
    "The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
    1. Re:Supply and demand by Looce · · Score: 4, Insightful

      Actually, you're spending some of the money you earn, in investments. You are neither a sink nor a source of money.

      Though with data, some people, or even companies, are merely sinks. They store huge amounts of data, mostly for auditing purposes. Access logs for webservers. Windows NT event logs. Setup logs for Windows Installer apps. For ISPs, a track record of people who got assigned an IP address, in case they get a subpoena. Change logs for DoD documents. Even CVS for developers, to keep track of umpteen old versions of software. Even the casual Web browsing session replicates information in your browser cache. Many more of these examples could be given.

      We also need to produce more and more hardware to store these archived data, the most obiquitous of which is the common hard drive. In the end, we'll need more metal and magnetic matter than the Earth can provide.

      Martian space missions, anyone?

  14. We won't produce more data than can be stored. by ProfessionalCookie · · Score: 4, Funny

    Data that cannot be stored will not be produced because all data that is produced must be stored. Data that is not stored (for however short a time) is not really produced.

    Then again the past no longer exists anyway, the future doesn't exist yet and the present has no duration- so maybe the data never existed anyway. Maybe you don't exist?!?! Awe man maybe I *~/ disappears in a puff of logic*
    ----
    Kudos to Augustine and Adams

    1. Re:We won't produce more data than can be stored. by istartedi · · Score: 4, Funny

      disappears in a puff of logic

      Great. Now we're all going to be inhaling second-hand logic. There ought to be a law...

      --
      For all intensive purposes, "whom" is no longer a word. That begs the question, "who cares"?
  15. Internet | uniq by Duncan3 · · Score: 2, Insightful

    The problem is, everything is duplicated, a LOT. All those copies needs to be stored tho, so here we are swimming in data.

    My work machine that I backed up a couple weeks ago, was a 30MB zip file, and 3/4 of that was my local CVS tree. So out of a 30GB, less then 1/3000th was not OS, software, or just copied locally from a data store.

    At home, I've saved every email, every picture, everything from my Windows, Linux, OSX and every other box I've every had since ~1992, and that's barely a few GB uncompressed.

    The amount of non-duplicate useful material is far far smaller then your would think.

    --
    - Adam L. Beberg - The Cosm Project - http://www.mithral.com/
  16. Internet a product of biology? by blubadger · · Score: 2, Interesting

    In River Out of Eden Richard Dawkins traces the data explosion of the information age right back to the big bang.

    "The genetic code is not a binary code as in computers, nor an eight-level code as in some telephone systems, but a quaternary code with four symbols. The machine code of the genes is uncannily computerlike."
  17. How much is actually used? by basic0 · · Score: 4, Interesting

    Ok, so we generate some staggering amount of computerized data every year. This is one of those stories where I can't remember hearing about it before, but it really doesn't feel like "news".

    My question is how much of this data is actually being used? I'm horrible for constantly downloading e-books, movies, software, OSes, and other stuff that I'm *intending* to do something with, but often don't get around to. I end up with gigabytes of "stuff" just sucking up disc space or wasting CDs. I burned a DivX copy of Matt Stone and Trey Parker's popular pre-South Park indie film "Orgazmo" in about 2001. I've since seen the film 2 or 3 times on TV. I STILL haven't watched the DivX version I have, and now I can't find the CD I put it on. I know I'm not the only one who does this either, as many of my friends are using up loads of storage space on files they've just been too busy to have a look at.

    Right now I'm on a project digitizing patient files for a neurologist. We're going up to 10 years deep with files for over 18,000 patients. Most of this is *just* for legal purposes and nobody is EVER going to open and read the majority of these files. The doctor does electronic clinics where he consults the patient and adds new pages to their file, which will probably sit there undisturbed until the Ethernet Disk fails someday.

    I think a more interesting story (although probably MUCH more difficult to research) would be "How much computerized data is never used beyond it's original creation on a given storage medium?"

  18. Comment removed by account_deleted · · Score: 5, Funny

    Comment removed based on user account deletion

  19. Of course we will by PIPBoy3000 · · Score: 2, Interesting

    Think about scientific instruments that gather gigabytes of data per second. They hold on to that for as long as they have to, pulling out interesting data, summarizing it, and throwing out the rest. I track all the web hits for our corporate Intranet. The volume is so huge that the SQL administrators come and have a little heart-to-heart chat with me if I let it build up over a few months. I don't really care about the raw information past a month or so. Instead, I want to see running counts of which pages are being viewed, which people are big utilizers of our network, and so on.

    A good analogy is the human brain. We gather in huge amounts of information per second via touch, sight, and so on, but throw out the vast majority of the information. The key is to have good filtering systems so that things that are interesting and relevant are held onto.

  20. Exabyte tapes by Roger+W+Moore · · Score: 2, Funny

    So at this rate it won't be long before we will need real Exabyte tapes. I always thought the original ones should qualify for the award of world's most misleading name since their capacity was 500 million times less what their name suggested.

  21. Google Says: by nbritton · · Score: 2, Interesting

    (161 exabytes) / 6,525,170,264 people = 26.4931682 gigabytes per person.

  22. Google Says: by nbritton · · Score: 2, Interesting

    (161 exabytes) / 1,093,529,692 people[1] = 158.086639 gigabytes per person and 19.6380918 gigabytes per person if you don't count the duplicate data.

    [1] Total est. of people on the Internet:
    http://www.internetworldstats.com/stats.htm

  23. The number is way too low! by EmbeddedJanitor · · Score: 2, Insightful
    OK,ok, I didn't RTFA and did not really RTFSummary (that's not the point of /.).

    If we consider all digital data, not just the stuff that flows over the internet, then this is way too low. Consider the data in all the DTVs, GPS receivers etc.

    A top-end GPS is grinding over 10^9 bits per second in its correlators (about 50 correlator channels x 20Mbps or so sampling rate). That ends up being approx 3x10^15 bytes per year per GPS... or 40,000-odd top-end GPSs would be grinding 1.61x10^20 bytes per year. There are far more than 40k high end GPSs in the world, so the budget is already blown...

    --
    Engineering is the art of compromise.
    1. Re:The number is way too low! by Simon80 · · Score: 2, Informative

      RTFS - it's not about bandwidth, it's about unique data, knowledge, ideas, information.

    2. Re:The number is way too low! by Dun+Malg · · Score: 3, Insightful

      Again, RTFS: "IDC estimates that in 2006 we created, captured, and replicated 161 exabytes of digital information."

      If a NMEA lat-lon string gets spit out of the serial port of a GPS and there's nothing there to capture it, it is not part of their count. They're not counting bitrate on data generators and multiplying times bandwidth. They're counting discrete blocks of saved data. You cannot arrive at the latter from the former, just like you can't tell how much water is behind Hoover Dam on average during the year by measuring the average daily flow rate of the Colorado river and multiplying by 365.

      --
      If a job's not worth doing, it's not worth doing right.
  24. Low SNR by Jekler · · Score: 5, Insightful

    As interesting as the sheer volume is, most of it is garbage. I'd rather have 50 terabytes of organized and accurate information than 500 exabytes of data that isn't organized, and even if it were, it's accuracy is questionable at best. In essence, even if you manage to find what you want, the correctness of that information is likely to be very low.

    I've long said we are not in the information age, we are in the data age. The information age will be when we've successfully organized all this crap we're storing/transmitting.

  25. Malthus has just gone down to the shops by roesti · · Score: 2, Insightful

    We won't be running out of space just like we didn't run out of food. New technology will allow us to store ever more data.

    I remember when software came on cassettes and when food came from close to where you live.

    When floppy disks were too small, we made higher-density floppy disks, and we still needed a whole box of them.
    When there wasn't enough of a particular food, we got it shipped from further away.

    When CD-ROMs came out, we still ended up not only filling them but spreading things over multiple CDs.
    When the imported food got too expensive, we started using chemical fertilisers to grow more of them closer to home and more cheaply.

    We had to invent bigger CDs. DVD became HD-DVD and Blu-Ray. People are already complaining that they're not big enough.
    We got bigger trucks and bigger boats to cover food with more preservatives and ship it here from further away, and the more of this we bought, the cheaper it got.

    You got that bigger hard disk, so you could amass data and store it forever. Remember how you said you'd never fill it up? Then broadband happened, and P2P happened, and fill it up you did.
    You didn't worry about it, though, the same way you didn't worry about not having enough food, either. Your supermarket is awash with thousands of varieties of food, from wherever it's cheap, and you can eat as much as you want of whatever you want.

    Because everything is more available, more quickly and more easily, you now have more stuff than you could ever use. Nowadays, people don't think twice about Tivo-ing or downloading something that they're never even going to watch. As the technology gets better - as disks get bigger, and as networking gets faster - this is only going to become more prevalent.

    But there is a physical limit to what can be done. Do you need a new hard drive, or a new router? What metals and chemicals are required to make them? How much energy is required? Where are they built, and how do they get to you? There's only a finite amount of this stuff in the ground, and none of this is invincible to exponential growth. The people who think this can go on forever, or even for the rest of their natural lives, are kidding themselves.

    Eventually, these materials will be harder to get, things will start to become more difficult to make and more expensive, and everyone will be complaining about how expensive their last computer was. Really, though, I don't even want to know these people. They've gotten their priorities all wrong.

    The parent poster says we won't be running out of anything. All that's really happened is that we haven't run out yet. The planet simply can't sustain the 6.5 billion of us there are now, let alone the billions more to be born in the next few decades. The problem is that when there isn't enough to go around, some of us will be lining up for new video games and iPods, and some of us will be lining up for food, water and fuel.

    I should warn you to choose wisely, but really, what do I care? Choose unwisely, and leave more for the rest of us.

  26. Yes...but is it useful by stoicio · · Score: 4, Interesting

    It's well and fine to have a statistic like 161 exabytes
    of data, but what's the point. Is that data any more useful
    to people than the selective data that was used to run the world
    50, 60 or 100 years ago?

    We as individuals are only capable of assimilating a limited amount
    of information so most of those exabytes are just rolling around
    like so many gears in an old machine. If they are minimally used or
    never used they simply become a storage liability.

    As an example, the internet has not made *better* doctors.
    Even with all the latest information at thier finger tips
    professionals are still only the sum of what they can
    mentally absorb. Too much data, or wrong data (ie: wikipedia)
    can lead to the same levels of inefficiency seen prior to
    the 'information age'. What would a single doctor do with
    160 exabytes of reading material, schedule it into the work day?

    Also, if the amount of information is rated purely on bytes
    but not in *useful content* the stats get skewed. Things like
    movies and music should be ranked by the length of script
    and/or notation. That would make the numbers much less than
    160 exabytes.

    Saying that the whole world produced 160 exabytes of information
    is like saying the whole world used 50 billion tonnes of water. ...was that water just running down the pipe into the sewar or
    did somebody actually drink it to sustain life?

    Mechanistic stats are stupid.

  27. Dr Evil by steveoc · · Score: 4, Funny

    So DR Evil, after emerging from his suspended animation, would demand a computer big enough to store 100 Megabytes of evil data.

  28. Re:stupid by OneoFamillion · · Score: 2, Funny

    What do you think happens every single time you pick up a telephone and call someone? Praytell, where is that data stored? Uhh... Department of Homeland Security?
  29. 50 Exabytes for $30.5 Billion by nbritton · · Score: 4, Informative

    50 Exabytes = (50)1024 petabytes = (50)1048576 terabytes:

    RAID6 (24 Drives -2{Parity} -1{Hot Spare} = 21) 750GB, 13.48TB ZFS/Solaris:
      93,345,048 750GB Hard Drives:     $17,735,559,120
       3,889,377 Areca ARC-1280ML:       $4,317,208,470
       1,944,689 Motherboards/Mem/CPU:     $766,207,466
       1,944,689 5U Rackmount Chassis's: $4,546,682,882
         194,469 4 Post 50U Racks:          $45,700,215
           3,684 528-port 1Gbps Switches:  $374,294,400
              40 96-port 10Gbps Switches:   $11,424,000
       1,948,935 Network Cables:             $2,020,812
               ? Assembly Robots/Misc.     $111,000,000

    Sub Total:                          $27,910,097,365
    Tax/Shipping:                        $2,645,915,779
    Grand Total:                        $30,556,013,144

    $470 billion cheaper then the IRAQ war.