Digital Big Bang — 161 Exabytes In 2006
An anonymous reader tips us to an AP story on a recent study of how much data we are producing. IDC estimates that in 2006 we created, captured, and replicated 161 exabytes of digital information. The last time anyone tried to estimate global information volume, in 2003, researchers at UC Berkeley came up with 5 exabytes. (The current study tries to account for duplicating data — on the same assumptions as the 2003 study it would have come out at 40 exabytes.) By 2010, according to IDC, we will be producing far more data than we will have room to store, closing in on a zettabyte.
And half of that is porn...
"Banking establishments are more dangerous than standing armies." -Thomas Jefferson
Without Slashdot dupes.
I am a believer of momentum and curves.
The furry porn gets deleted first.
... times does the Library of Congress fit in that? Exabytes simply don't speak to me.
Alternatively, you can also answer in anime episodes, or mp3 files.
I left cat /dev/urandom
running
"You mortals are so obtuse." -Q
We won't be running out of space just like we didn't run out of food. New technology will allow us to store ever more data.
Simply put, a lot
10^18 bytes, or One million terabytes
I imagine that a lot of this is web traffic logs. What if the US government really does force ISP's to keep records detailing the sites visited by their customers? Will my ISP rates increase to pay for all of that disk space?
So the sum total of data has increased by a factor of more than 30 since 2003? I knew Brent Spiner was putting on weight, but damn.
Web server log files with the history of people clicking around. My address stored by everybody I ever bought anything on line from. It's more an information land-fill than an information warehouse.
Leave the gun, take the cannoli -- Clemenza, The Godfather
What's really striking is how little data was available in machine-readable form well into the computer era. In the 1970s, the Stanford AI lab got a feed from the Associated Press wire, simply to get a source of machine-readable text for test purposes. There wasn't much out there.
In 1971, I visited Western Union's installation in Mawah, NH, which was mostly UNIVAC gear. (I worked at a UNIVAC site a few miles away, so I was over there to see how they did some things.) I was shown the primary Western Union international gateway, driven by a pair of real-time UNIVAC 494 computers. All Western Union message traffic between the US and Europe went through there. And the traffic volume was so small that the logging tape was just writing a block every few seconds. Of course, each message cost a few dollars to send; these were "international telegrams".
Sitting at a CRT terminal was a woman whose job it was to deal with mail bounces. About once a minute, a message would appear on her screen, and she'd correct the address if possible, using some directories she had handy, or return the message to the sender. Think about it. One person was manually handling all the e-mail bounces for all commercial US-Europe traffic. One person.
Is that the size of the next MS OS?
I'm just one person and I have 20GB just of OS and applications code. Plus another 20GB of MP3's. 161 billion /40 is about 4 billion 'gelfling people units'. Doesn't seem like a lot.
I'm sorry, how stupid is this?
"producing far more data than we will have room to store"
That's like saying, for the last 2 months, my profit has increased by 10%. If my profit keeps increasing at 10% per month, then pretty soon I'll own all the money in the world, and then I'll own more money than exists! Damn I must stop making money now before I destroy the world economy!!!
Who are these people who draw straight lines on growth curves? Why do people print the garbage they write and why weren't they the first against the wall after the dot com bust?
The only things that seem certain are death, taxes, entropy and stupid people...
"The weirdest thing about a mind, is that every answer that you find, is the basis of a brand new cliche" -
Data that cannot be stored will not be produced because all data that is produced must be stored. Data that is not stored (for however short a time) is not really produced.
Then again the past no longer exists anyway, the future doesn't exist yet and the present has no duration- so maybe the data never existed anyway. Maybe you don't exist?!?! Awe man maybe I *~/ disappears in a puff of logic*
----
Kudos to Augustine and Adams
An anonymous reader tips us to an AP story on a recent study of how much data we are producing. IDC estimates that in 2010 we created, captured, and replicated close to a zettabyte of digital information. The last time anyone tried to estimate global information volume, in 2006, researchers at IDC came up with 161 exabytes. (The current study tries to account for duplicating data -- on the same assumptions as the 2006 study it would have come out at 250 exabytes.) By 2012, according to IDC, we will be producing far more data than we will have room to store, closing in on 6 zettabyte.
The problem is, everything is duplicated, a LOT. All those copies needs to be stored tho, so here we are swimming in data.
My work machine that I backed up a couple weeks ago, was a 30MB zip file, and 3/4 of that was my local CVS tree. So out of a 30GB, less then 1/3000th was not OS, software, or just copied locally from a data store.
At home, I've saved every email, every picture, everything from my Windows, Linux, OSX and every other box I've every had since ~1992, and that's barely a few GB uncompressed.
The amount of non-duplicate useful material is far far smaller then your would think.
- Adam L. Beberg - The Cosm Project - http://www.mithral.com/
In River Out of Eden Richard Dawkins traces the data explosion of the information age right back to the big bang.
"By 2010, according to IDC, we will be producing far more data than we will have room to store, closing in on a zettabyte."
So I guess in the future you can't buy more hard drives or something...
A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
Ok, so we generate some staggering amount of computerized data every year. This is one of those stories where I can't remember hearing about it before, but it really doesn't feel like "news".
My question is how much of this data is actually being used? I'm horrible for constantly downloading e-books, movies, software, OSes, and other stuff that I'm *intending* to do something with, but often don't get around to. I end up with gigabytes of "stuff" just sucking up disc space or wasting CDs. I burned a DivX copy of Matt Stone and Trey Parker's popular pre-South Park indie film "Orgazmo" in about 2001. I've since seen the film 2 or 3 times on TV. I STILL haven't watched the DivX version I have, and now I can't find the CD I put it on. I know I'm not the only one who does this either, as many of my friends are using up loads of storage space on files they've just been too busy to have a look at.
Right now I'm on a project digitizing patient files for a neurologist. We're going up to 10 years deep with files for over 18,000 patients. Most of this is *just* for legal purposes and nobody is EVER going to open and read the majority of these files. The doctor does electronic clinics where he consults the patient and adds new pages to their file, which will probably sit there undisturbed until the Ethernet Disk fails someday.
I think a more interesting story (although probably MUCH more difficult to research) would be "How much computerized data is never used beyond it's original creation on a given storage medium?"
I'd take this study's fearmongering with a grain of salt. It probably came from one of those deletionist Think-Tanks.
- RG>
Hey pal, this isn't a pleasantforest, so don't waste my time with pleasantries!
Comment removed based on user account deletion
Think about scientific instruments that gather gigabytes of data per second. They hold on to that for as long as they have to, pulling out interesting data, summarizing it, and throwing out the rest. I track all the web hits for our corporate Intranet. The volume is so huge that the SQL administrators come and have a little heart-to-heart chat with me if I let it build up over a few months. I don't really care about the raw information past a month or so. Instead, I want to see running counts of which pages are being viewed, which people are big utilizers of our network, and so on.
A good analogy is the human brain. We gather in huge amounts of information per second via touch, sight, and so on, but throw out the vast majority of the information. The key is to have good filtering systems so that things that are interesting and relevant are held onto.
Make sure you watch out for giant crickets, especially if you visit Superior Japan.
So at this rate it won't be long before we will need real Exabyte tapes. I always thought the original ones should qualify for the award of world's most misleading name since their capacity was 500 million times less what their name suggested.
Seriously, forget about storing all this data, what exactly are we going to do with it?? How are we going to process and manage zettabytes worth of data? What tools are we going to use to sift through that much data and get what we need? Should we even be keeping it?? Hell %90 of it may well be porn. The more data we produce the more urgent it will become to ask these sorts of questions, and find the answers to them.
(161 exabytes) / 6,525,170,264 people = 26.4931682 gigabytes per person.
....and in the year of our lord 2012 our data became self aware.....
(161 exabytes) / 1,093,529,692 people[1] = 158.086639 gigabytes per person and 19.6380918 gigabytes per person if you don't count the duplicate data.
[1] Total est. of people on the Internet:
http://www.internetworldstats.com/stats.htm
Text (code, misc letters) IS very small. Up until just a couple of years ago, all the "good stuff" would fit on a CD-R or two.
Now, I have several full DVD-Rs with copies of digital photos, and I just finished making 30+ more DVD's of (compressed data) that hold 60+ hours of old home video before the tapes rot.
By-and-large, there is a lot of crap that I personally don't feel a need to save (because I can always get it from somewhere else, if need be) but even "personal" stuff is adding up to 100's of Gigabytes.
Still, data is smaller than boxes of pictures and video tapes.
I didn't RTFA, but I don't see what the big deal is. From where I sit, I have computer power and data storage equivalent to what cost millions and millions of dollars at one time, in my own lifetime.
And it keeps getting cheaper...
I suspect, just like in the physical realm, "important" digital items will survive, thru shear duplication and media updates, far more often than "unimportant" items... like my family photos, but at least they have a shot.
Upload all your snapshots to a royalty free photo site and gain digital immortality via file hoarders. The only rub is that you can't let your pix be personal enough to trace back to you or the stalkers will get you.
This issue is a bit more complicated than you think.
If we consider all digital data, not just the stuff that flows over the internet, then this is way too low. Consider the data in all the DTVs, GPS receivers etc.
A top-end GPS is grinding over 10^9 bits per second in its correlators (about 50 correlator channels x 20Mbps or so sampling rate). That ends up being approx 3x10^15 bytes per year per GPS... or 40,000-odd top-end GPSs would be grinding 1.61x10^20 bytes per year. There are far more than 40k high end GPSs in the world, so the budget is already blown...
Engineering is the art of compromise.
As interesting as the sheer volume is, most of it is garbage. I'd rather have 50 terabytes of organized and accurate information than 500 exabytes of data that isn't organized, and even if it were, it's accuracy is questionable at best. In essence, even if you manage to find what you want, the correctness of that information is likely to be very low.
I've long said we are not in the information age, we are in the data age. The information age will be when we've successfully organized all this crap we're storing/transmitting.
Run!
I remember when software came on cassettes and when food came from close to where you live.
When floppy disks were too small, we made higher-density floppy disks, and we still needed a whole box of them.
When there wasn't enough of a particular food, we got it shipped from further away.
When CD-ROMs came out, we still ended up not only filling them but spreading things over multiple CDs.
When the imported food got too expensive, we started using chemical fertilisers to grow more of them closer to home and more cheaply.
We had to invent bigger CDs. DVD became HD-DVD and Blu-Ray. People are already complaining that they're not big enough.
We got bigger trucks and bigger boats to cover food with more preservatives and ship it here from further away, and the more of this we bought, the cheaper it got.
You got that bigger hard disk, so you could amass data and store it forever. Remember how you said you'd never fill it up? Then broadband happened, and P2P happened, and fill it up you did.
You didn't worry about it, though, the same way you didn't worry about not having enough food, either. Your supermarket is awash with thousands of varieties of food, from wherever it's cheap, and you can eat as much as you want of whatever you want.
Because everything is more available, more quickly and more easily, you now have more stuff than you could ever use. Nowadays, people don't think twice about Tivo-ing or downloading something that they're never even going to watch. As the technology gets better - as disks get bigger, and as networking gets faster - this is only going to become more prevalent.
But there is a physical limit to what can be done. Do you need a new hard drive, or a new router? What metals and chemicals are required to make them? How much energy is required? Where are they built, and how do they get to you? There's only a finite amount of this stuff in the ground, and none of this is invincible to exponential growth. The people who think this can go on forever, or even for the rest of their natural lives, are kidding themselves.
Eventually, these materials will be harder to get, things will start to become more difficult to make and more expensive, and everyone will be complaining about how expensive their last computer was. Really, though, I don't even want to know these people. They've gotten their priorities all wrong.
The parent poster says we won't be running out of anything. All that's really happened is that we haven't run out yet. The planet simply can't sustain the 6.5 billion of us there are now, let alone the billions more to be born in the next few decades. The problem is that when there isn't enough to go around, some of us will be lining up for new video games and iPods, and some of us will be lining up for food, water and fuel.
I should warn you to choose wisely, but really, what do I care? Choose unwisely, and leave more for the rest of us.
Attack its weak point for massive damage!
Packrats are awesome:
http://en.wikipedia.org/wiki/Packrat_midden
http://www.google.com/search?q=packrat+middens
Watch what you say about them!
I try not to keep junk around; keyword is try. I set my 5MP camera to 3MP and don't feel like I am losing anything. Stuff I consider important fits on a DVD; stuff I will bother moving to a new drive is more like 120GB.
Nerd rage is the funniest rage.
Yotta, yotta, yotta...
What?
A lot of people are going to get burned by offsite file storage services - and I place this in two categories:
... I'm pleased so far with my choice (rsync.net) for offsite backup of my linux data, but it took a lot of research and a lot of reassurances before I would take the plunge. I concluded that I was far better off with a provider that put my data on real, rational unix filesystems (like exavault, strongspace, rsync.net, etc.) and then the decision came down to who had the most reassuring privacy and search warrant policy.
- people who get burned by AOL/xdrive/Amazon/S3 when the privacy of their data is ruined - either by sharing information to enable advertisements or by rolling over like cowards to any government agency that even picks up the phone
- people who get burned because the wacko distributed custom database-driven filesystem in the sky that they trust their data to has a glitch and goes down for days or weeks if there is ever any significant disruption in Internet connectivity and/or routing.
This is of special importance to me because regulatory concerns _require_ that I store critical data offsite, so I had to bite the bullet
We'll see...
I wonder if it would explain the amount going up as paranoid companies are heavily auditing everything they do and all financial data
http://saveie6.com/
It's well and fine to have a statistic like 161 exabytes
...was that water just running down the pipe into the sewar or
of data, but what's the point. Is that data any more useful
to people than the selective data that was used to run the world
50, 60 or 100 years ago?
We as individuals are only capable of assimilating a limited amount
of information so most of those exabytes are just rolling around
like so many gears in an old machine. If they are minimally used or
never used they simply become a storage liability.
As an example, the internet has not made *better* doctors.
Even with all the latest information at thier finger tips
professionals are still only the sum of what they can
mentally absorb. Too much data, or wrong data (ie: wikipedia)
can lead to the same levels of inefficiency seen prior to
the 'information age'. What would a single doctor do with
160 exabytes of reading material, schedule it into the work day?
Also, if the amount of information is rated purely on bytes
but not in *useful content* the stats get skewed. Things like
movies and music should be ranked by the length of script
and/or notation. That would make the numbers much less than
160 exabytes.
Saying that the whole world produced 160 exabytes of information
is like saying the whole world used 50 billion tonnes of water.
did somebody actually drink it to sustain life?
Mechanistic stats are stupid.
So DR Evil, after emerging from his suspended animation, would demand a computer big enough to store 100 Megabytes of evil data.
The article implies there is a bunch of new data.. The fact is much of the data is simply format shifted into the new medium. Examples of this are;
1 Photography
2 Letters and corrospondance
3 Fileing and records
4 Music
5 Telephone calls & faxes
6 Newspapers and magazines
7 Novels and books
8 Board games and puzzles
9 Movies
10 Radio and TV broadcasts
11 ??
All these form of data existed before. None of them was digital before. The numbers represent a format shift, not new content. Not many people archived every newspaper, phongraph record, photo, magazine, telephone call, TV show, etc. Even in the 1950's there was not enough space to archive all the data.
The truth shall set you free!
What's the difference between the Library of Congress and the House of Representatives?
In the Library of Congress, you're not allowed to lick the pages.
I belong to a philosophical group who assert that existence IS identity, that for something to exist it must have physical characteristics which are defined and obey causality, because this is the very essence of what it means to exist. I personally follow the classic pythagorean belief that "everything is number", and that all that truly exists is information. I this sense we not only are producing information but are in turn made up from it. If you believe that the universe is finite, then there may be a total sum of all the information, but I doubt it is in the exabit range. Nature uses qbits, so there is presumably some type of wave-like uncertainty to all information, which may mean that nature itself could have a signal-to-noise ratio approaching 50%.
Kharma is like a boomerang. Mine is broken.
30 years ago, the Exabyte Stringy Floppy (see http://en.wikipedia.org/wiki/Exabyte_Corporation ) was invented to solve our storage problems http://en.wikipedia.org/wiki/Stringy_floppy
As no one has actually seen a Dragonball Z fight conclude, we have no way of knowing how long a typical fight seen would go on for. Personally my bet is that after a few trillion episodes the vibration from the copious yelling of the same three words over and over causes the scene to collapse in on itself like a Moebius strip. This suggests the unpleasant possibility that the typical Dragonball Z fight scene is infinite in length. The truth of this conjecture is a hotly debated unsolved problem in mathematics, and unraveling it has already cost seventeen grad students their lives, and burned out the department's DVD player. The DVD player was deeply mourned.
Help poke pirates in the eyepatch, arr.
It's amazingly simple to produce and duplicate more data than you have room to store. You simply don't permanently store it all.
Is this a concept that is so hard to understand? Many replies above don't seem to grasp the concept of data not actually being kept.
50 Exabytes = (50)1024 petabytes = (50)1048576 terabytes:
RAID6 (24 Drives -2{Parity} -1{Hot Spare} = 21) 750GB, 13.48TB ZFS/Solaris:
93,345,048 750GB Hard Drives: $17,735,559,120
3,889,377 Areca ARC-1280ML: $4,317,208,470
1,944,689 Motherboards/Mem/CPU: $766,207,466
1,944,689 5U Rackmount Chassis's: $4,546,682,882
194,469 4 Post 50U Racks: $45,700,215
3,684 528-port 1Gbps Switches: $374,294,400
40 96-port 10Gbps Switches: $11,424,000
1,948,935 Network Cables: $2,020,812
? Assembly Robots/Misc. $111,000,000
Sub Total: $27,910,097,365
Tax/Shipping: $2,645,915,779
Grand Total: $30,556,013,144
$470 billion cheaper then the IRAQ war.
As everyone can read on wikipedia; Bonwick wrote: ...Thus, fully populating a 128-bit storage pool would, literally, require more energy than boiling the oceans.
Zfs can "only" store 16 exabyte. This 161 exabyte would need to be on at least 17 zfs storagepools in order to hold this and most of it would be full. Sounds like, bye bye fishies to me. Frying them in their ocean 10 times would surely kill most of them.
Besides; 3.4x10^27 J would be needed to boil the oceans.
"Just to propel a ship to Mars, requires 1.4e8 joules per kilogram. This includes leaving Earth, making the transfer orbit insertion, and matching velocities with Mars at the end of the Trip."
We could transport 2.43*10^19 kilogram to mars on this energy. Would this be enough to transport the Netherlands there?
Actually, when /dev/urandom is read at a higher rate than the Entropy Gathering Device (EGD) can deliver randomness, it reverts to a pseudorandom number generator, which will NEVER generate Slashdot text, as its periodicity is too low. (Although it may be argued that Slashdot's periodicity is even lower.)
Reading from /dev/random would theoretically generate such text (eventually), but its data rate is severely limited, absent a dedicated hardware random-number generator.
Even a hardware random-number generator may not generate TRULY random numbers (i.e. sufficiently random that they are mathematically guaranteed to eventually include all possible bit patterns, including inane Slashdot chatter). The proof is left as an exercise for the reader.
The Web is like Usenet, but
the elephants are untrained.
Actually ZFS can store 16 exbibytes, so you only need 9 of those (ceil(8.72782749)) to store 161 exabytes.
Seriously, they are one of the few who have insanely large amount of data with meaning and powerful tools. Wounder how many megabytes that disk is.
Like the previous poster said, we all produce tons of data daily without storing it. Mobile phone calls. VOIP. Video conferencing. IM without history. etc. That's countless gigabytes daily worldwide...
To get a feel for the size and scale of 50 exabytes using today's technology,
* Two copy's of the entire Library of Congress, 6000 TB[1], can be stored in the collective cache buffers of the RAID controllers.
* It would need a 1,712 MW (peak) power source, a typical PWR nuclear power station produces 2,000 MW. Tack on another $5 billion for the construction of a nuclear power station.
* You would likely need to employ an entire team (in 3 shifts) to replace defective drives every day.
* You would need 1,684,804 sq. feet to house all the racks, a building the size of the John Hancock Center would be needed. Add another $385 million to the bill.
[1] http://www.lesk.com/mlesk/ksg97/ksg.html
1.1 Million Libraries of Congress
Like the previous poster said, we all produce tons of data daily without storing it. Mobile phone calls. VOIP. Video conferencing. IM without history. etc. That's countless gigabytes daily worldwide...
I'm sorry that I have to clarify myself, but neither I or the survey include temporary data in the discussion. We're talking data that is stored and represents archived information..
Otherwise where do we stop? Do we count copies of the programs in RAM, swap files, temporary caches and so on? It'll become pointless pretty soon... That said, the whole study is pointless anyway.
99% of which will be cell phone videos of college students holding beer bottles and yelling "Woooo!"
Good point. I was thinking in terms of new data, not maintained data, but anyway.