Digital Big Bang — 161 Exabytes In 2006
An anonymous reader tips us to an AP story on a recent study of how much data we are producing. IDC estimates that in 2006 we created, captured, and replicated 161 exabytes of digital information. The last time anyone tried to estimate global information volume, in 2003, researchers at UC Berkeley came up with 5 exabytes. (The current study tries to account for duplicating data — on the same assumptions as the 2003 study it would have come out at 40 exabytes.) By 2010, according to IDC, we will be producing far more data than we will have room to store, closing in on a zettabyte.
We won't be running out of space just like we didn't run out of food. New technology will allow us to store ever more data.
Web server log files with the history of people clicking around. My address stored by everybody I ever bought anything on line from. It's more an information land-fill than an information warehouse.
Leave the gun, take the cannoli -- Clemenza, The Godfather
I'm sorry, how stupid is this?
"producing far more data than we will have room to store"
That's like saying, for the last 2 months, my profit has increased by 10%. If my profit keeps increasing at 10% per month, then pretty soon I'll own all the money in the world, and then I'll own more money than exists! Damn I must stop making money now before I destroy the world economy!!!
Who are these people who draw straight lines on growth curves? Why do people print the garbage they write and why weren't they the first against the wall after the dot com bust?
The only things that seem certain are death, taxes, entropy and stupid people...
"The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
HA!
But seriously, I wonder what percentage of this data is text. I'd guess it is a very very small amount. When I had a film camera, in twenty years I bet I took less than 100 rolls of film. With digital cameras I've take thousands of pictures, sometimes taking a dozen or more of the same subject, just because the cost to me is practically zero. Now there are vendors that will let me upload large numbers of these amateurish photos for free, and let's pretend that there are enough people interested in seeing my pictures that these companies can pay for this storage with advertising. That's scary.
Excluding attachments I think it would be practically impossible for anyone to use up Googles 2 gig of storage, but I've heard of people using it up in little more than a week by mailling large attachments back and forth (oh yeah, I HAVE to have every single iteration of that Word document, sure I do!)
But what's scarier is that for some nominal fee (like $20 a year) they place no limit at all on my ability to hog a disk drive somewhere. I know people who are messed up in the head enough to want to test these claims. Give them 5 gig for photos and they've filled it up in a week, give them "unlimited" and they upload pure junk to see if they can break the thing.
Like any house of cards, this thing is gonna come down sooner or later. I just hope that people who are making sensible use of these online services don't lose everything along with the abusers.
As interesting as the sheer volume is, most of it is garbage. I'd rather have 50 terabytes of organized and accurate information than 500 exabytes of data that isn't organized, and even if it were, it's accuracy is questionable at best. In essence, even if you manage to find what you want, the correctness of that information is likely to be very low.
I've long said we are not in the information age, we are in the data age. The information age will be when we've successfully organized all this crap we're storing/transmitting.
Again, RTFS: "IDC estimates that in 2006 we created, captured, and replicated 161 exabytes of digital information."
If a NMEA lat-lon string gets spit out of the serial port of a GPS and there's nothing there to capture it, it is not part of their count. They're not counting bitrate on data generators and multiplying times bandwidth. They're counting discrete blocks of saved data. You cannot arrive at the latter from the former, just like you can't tell how much water is behind Hoover Dam on average during the year by measuring the average daily flow rate of the Colorado river and multiplying by 365.
If a job's not worth doing, it's not worth doing right.