27 Billion Gigabytes to be Archived by 2010
Lucas123 writes "According to a Computerworld survey of IT managers, data storage projects are the No. 2 project priority for corporations in 2008, up from No. 4 in 2007. IT teams are looking into clustered architectures and centralized storage-area networks as one way to control capacity growth, shifting away from big-iron storage and custom applications. The reason for the data avalanche? Archive data. In the private sector alone electronic archives will take up 27,000 petabytes (27 billion gigabytes) by 2010. E-mail growth accounts for much of that figure."
In other words, 27 Exabytes?
Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible. Scientific notation and SI exist for a reason.
"Live as if you'll die tomorrow." Ridiculous. You could die later today.
From the summary:
"E-mail growth accounts for much of that figure."
We're archiving spam?
Ubiquitously - A Ubiquity Developer Community
Some big projects are generating too many data that they have problems to deal with all that.
For example the Folding@home is implementing a distributed storage mechanism for their data and we'll likely have a new @home project soon - Storage@home.
http://en.wikipedia.org/wiki/Storage@home
http://www.stanford.edu/~beberg/Storage@home2007.pdf
http://folding.stanford.edu/English/Papers#ntoc7
E-mail is the biggest burden on the storage space, and so much of that is garbage (I'm not even talking about spam---most "legitimate" e-mail is garbage). I wonder if there would be appreciable negative repercussions to deleting most of it. It seems like as often as not, all you get from archived e-mails is well-documented and discoverable "smoking guns" when you get sued. What if we just stored less of it? Would it be that bad? How likely is it that you're going to need some random Word document from 1998? Not criticizing---I'd really like to know.
Today's Sesame Street was brought to you by the number e.
article summary:
Users in a lot of places use their email as a document management system. This is somewhat effective on an individual basis, but in large organizations shared documents get duplicated dozens or even hundreds of times as each user has their own copy. In the next few years products like Sharepoint will alleviate some of that, though storage is cheap enough that it may not be worth the cost to both reeducate users and build the infrastructure for it. A SAN can hold real a lot of word documents and PDFs after all...
All these archives are yours except Europa. ATTEMPT NO WRITINGS THERE.
Things like Libraries of Congress, Libraries of Alexandria, Spams per Square Inch. You know, the units that people have become familiar with. Besides which, are they power-two gigagytes or SI gigabytes? Also, how much bandwidth is needed to shift all that data? In the standard Imperial units of Clay Tablets per German Juggernaut per unit of French motorway, naturally.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
But it is mostly email they're talking about here, and I bet a HUGE part of this archiving is:
Yep! Solve problems 1-3, and you'd vastly decrease the amount of email that you have to archive! I won't complain about #4, since I actually value my job, but it would be nice if more PHBs knew more about tech,...
Only 27,000 petabytes? n00b!
My pr0n collection takes at least 3 Internets* to store, archived.
*(sorry, forgot the conversion rate for Libraries of Congress)
Just -1, Troll talking to another.
How do you figure that storage needs driving the increase in disk capacities and creating jobs is "a huge drain on the economy"?
And what do data-archiving rules have to do with welfare for programmers? Maybe for disk manufacturing firms or data admins, but programmers?
FRYS isn't an acronym... :)
and yes I do.
Does it bother you that much that these journalists want to make it easier for the general public to understand how big data storage they are talking about?
I agree. However, I would go even further and instead of using geekish bytes and bits we should use something like 400 billions of mp3s. You know, so that myspace user out there can understand TFA. They clearly have interest in this sort of news.
We're talking storage (sorry DASD) here... It's all about...
Hooking up a pair of EMC DMX's (or IBM ESSes, or HDS USPs) over a pair of OC48s for SRDF/PPRC/USR unless you are a zOS shop, then you could run XRC. Since this is a BC/DR plan, we'll run it over FCIP protected by IPSec over a DWDM leased line, which must be protected by a UPSR/BLSR, otherwise in the event of a link failure, the R1s will split from the R2s.
Then you're SOL.
Here is my helpful reference page for big numbers. I love big numbers. I'm actually working on a site right now which will help people to visualize big numbers. I can't give out the url yet because it'll be another month or two before it's ready to be seen. But, it'll have many fun options like Cow Stacking and Hamster Canyon.
Cow stacking is where you select cow as the animal and from earth to moon as the place and you'll see a graphic of cows being stacked to the moon and the number of cows which would be required to complete that stack.
Hamster Canyon will be where you select a hamster and the Grand Canyon and you'll see a picture of the Grand Canyon filled with hamsters and a number that indicates the total number of hamsters required to fill the canyon.
Cow Cube