27 Billion Gigabytes to be Archived by 2010
Lucas123 writes "According to a Computerworld survey of IT managers, data storage projects are the No. 2 project priority for corporations in 2008, up from No. 4 in 2007. IT teams are looking into clustered architectures and centralized storage-area networks as one way to control capacity growth, shifting away from big-iron storage and custom applications. The reason for the data avalanche? Archive data. In the private sector alone electronic archives will take up 27,000 petabytes (27 billion gigabytes) by 2010. E-mail growth accounts for much of that figure."
In other words, 27 Exabytes?
Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible. Scientific notation and SI exist for a reason.
"Live as if you'll die tomorrow." Ridiculous. You could die later today.
From the summary:
"E-mail growth accounts for much of that figure."
We're archiving spam?
Ubiquitously - A Ubiquity Developer Community
Some big projects are generating too many data that they have problems to deal with all that.
For example the Folding@home is implementing a distributed storage mechanism for their data and we'll likely have a new @home project soon - Storage@home.
http://en.wikipedia.org/wiki/Storage@home
http://www.stanford.edu/~beberg/Storage@home2007.pdf
http://folding.stanford.edu/English/Papers#ntoc7
E-mail is the biggest burden on the storage space, and so much of that is garbage (I'm not even talking about spam---most "legitimate" e-mail is garbage). I wonder if there would be appreciable negative repercussions to deleting most of it. It seems like as often as not, all you get from archived e-mails is well-documented and discoverable "smoking guns" when you get sued. What if we just stored less of it? Would it be that bad? How likely is it that you're going to need some random Word document from 1998? Not criticizing---I'd really like to know.
Today's Sesame Street was brought to you by the number e.
article summary:
Users in a lot of places use their email as a document management system. This is somewhat effective on an individual basis, but in large organizations shared documents get duplicated dozens or even hundreds of times as each user has their own copy. In the next few years products like Sharepoint will alleviate some of that, though storage is cheap enough that it may not be worth the cost to both reeducate users and build the infrastructure for it. A SAN can hold real a lot of word documents and PDFs after all...
hmm, I can believe this. I ran an e-mail server for the last company I worked for, and it was amazing how fast space got taken up just due to residual e-mails.
Since I'm the type to do the same thing, I can't be critical, so I left no quotas.
All these archives are yours except Europa. ATTEMPT NO WRITINGS THERE.
With corporations getting sued and having their own emails used against them in court, shouldn't they be destroying old email, not saving it?
Things like Libraries of Congress, Libraries of Alexandria, Spams per Square Inch. You know, the units that people have become familiar with. Besides which, are they power-two gigagytes or SI gigabytes? Also, how much bandwidth is needed to shift all that data? In the standard Imperial units of Clay Tablets per German Juggernaut per unit of French motorway, naturally.
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
But it is mostly email they're talking about here, and I bet a HUGE part of this archiving is:
Yep! Solve problems 1-3, and you'd vastly decrease the amount of email that you have to archive! I won't complain about #4, since I actually value my job, but it would be nice if more PHBs knew more about tech,...
That's exactly the message of this book. Email, although widely used, is neither practical nor effective as a means of divulging information in a company. And duplication of information is the lesser problem.
For instance, suppose someone leaves the company, either permanently or in a vacation, and somebody else takes over a job. How do you transfer the relevant information to the substitute? Forward several dozens of emails and hope it makes sense? What if Alice forwards an email to Bob but not to Charlie, how do you make sure everybody in the project has access to all the relevant information?
Email and http are widely used because they are widely available, but neither of them is a very good solution for information handling.
Just delete the crap.
Deleted
That's my pr0n collection allright.
printf($randomline(sigs.txt) \n "-- "$randomline(authors.txt));
-- myself
I suppose if I was crazy enough, I'd post my address here on slashdot to see if we can slashdot Pitt's email servers,... maybe we can turn 30 million messages into 60 million messages. On second thought, I don't want 30 million messages,... ;-)
So now, SOX and new discovery rules have created welfare for programmers. What value is all of this e-mail? The bulk of it is worthless and the cost of this is a huge drain on the economy. How many disk drives does it take to store 27 Ebs, and how many people will it take to manage it all?
This is my sig.
And a great deal of video archive from CCTV as well I expect.
The question that arises is how would you index all this?
politicians are like babies' nappies: they should both be changed regularly and for the same reasons
Only 27,000 petabytes? n00b!
My pr0n collection takes at least 3 Internets* to store, archived.
*(sorry, forgot the conversion rate for Libraries of Congress)
Just -1, Troll talking to another.
NetApp is a great company and makes a great product aimed for a specific market segment: Fileservices (NFS/CIFS). I don't see many customers tossing out the EMC DMX, HDS Tagmastore or IBM Shark for a FC enabled netapp array. I also don't see a lot of FICON shops asking netapp to support FICON.
Now the phase storage mgmt is entering is the 'good enough' phase. Does my organization need the current generation of "high end" arrays? Maybe not. The current generation of midrange with its better or cheaper $/GB and increasingly parallel featureset to the highend arrays, is starting to looking more attractive to many customers.
Thats what i will be doing!
www.IBuyMacs.com
and 26.7 exabytes are dedicated to porn storage!
thank you! i'll be here all week!
hey was that rotten fruit! HEY! SECURITY!
VLC FOR MAC IS DYING! IF YOU DEVELOP, PLEASE SAVE IT!!
Does it bother you that much that these journalists want to make it easier for the general public to understand how big data storage they are talking about?
I agree. However, I would go even further and instead of using geekish bytes and bits we should use something like 400 billions of mp3s. You know, so that myspace user out there can understand TFA. They clearly have interest in this sort of news.
I wonder how much of this data is really redundant--copies of other data. How many emails can really be unique? How many employees download the same video a hundred times on the company's server? As network speeds increase, it will be less necessary for multiple users to store the same thing (think streaming those videos), so could this really be an exaggeration of future storage requirements? Could a better system be designed to minimize redundancy?
If designers still optimized their images down from 50k to 15k instead of flirting with the design hotties and smearing poop on other peoples keyboards this might not be a problem.
personally I prefer nibbles (4bits each or 1/2 a byte used with old parallel ports)
to make the numbers look bigger
working under the assumption of 1024 to the power of 6
2,305,843,009,213,693,952 nibbles of information
now that's a lot a chewin
Shit, is it all Pr0n ? :-P
that will be lost or stolen as company employees fail to properly encrypt back-ups, leave laptops in their car while running in for a latte or some such? Seriously, though, the article says storage is corporations' number 2 concern. What's number one from this survey? Is it security?
Microsoft implemented something called Single Instance Storage (SIS) with Windows 2000 and 2003 (http://research.microsoft.com/sn/Farsite/WSS2000.pdf).
If it weren't quite so cryptic to implement and use it would probably help reduce some of the problem.
Here is my helpful reference page for big numbers. I love big numbers. I'm actually working on a site right now which will help people to visualize big numbers. I can't give out the url yet because it'll be another month or two before it's ready to be seen. But, it'll have many fun options like Cow Stacking and Hamster Canyon.
Cow stacking is where you select cow as the animal and from earth to moon as the place and you'll see a graphic of cows being stacked to the moon and the number of cows which would be required to complete that stack.
Hamster Canyon will be where you select a hamster and the Grand Canyon and you'll see a picture of the Grand Canyon filled with hamsters and a number that indicates the total number of hamsters required to fill the canyon.
Cow Cube
First of all, almost all elements used to build these laptops are belonging to somewhere else.
The components are possibly Chinese. the ideas are possible brewed from open source (a good concept, but a salad in the end, for this very same argument explained in here). Many of the "teaching contents" ain't local (with this, you know, local is "Local" in every place is a different culture-animals-religions-traditions-dictators-martyrs-heroes-ECONOMY).
After all, you start trying to give people a better education, but in the process, transform them in aliens, individuals separated from them own reality, and context (i think their context is being abused, since centuries, and robed, and being utilized, and being the last defecating end of giga-planet monopolies-mafias).
So, what happens if you "create" a "global" child in that medium?? usually chaos (think some of those lands are in chaos at this moment), and the necessity of "global people" to rescue them. (finally you generate a Trojan, more chaos, and local monsters that defend their land from the foreigners (attacks they think)).
So, in the end, OLPC, can do, to its maximum extent, provide a "transparent structure", to which, every land would fill with their history, and what they got in their blood.
BUT, HOW.
how can you override the material from which the laptop is made?, necessary evil some will say?
Most directly, people in the countries DON'T need, (nor needed in the past), computers.
They need peace.
They need the time and space to learn from their elders, to heal, and to cultivate the land. to learn from it, to recall what is which this land produces, and how you should take care of it.
All that, is not in a computer (although you can document it, its not advised), is in their will.
Introducing a big factory, the marked economy, in this lands that CANT TAKE THEM, that dont have that history.
Or SENDING THEM WEAPONS, WONT HELP, them achieve the reconciliation, the healing, or the sustainable growing their own land needs.
Even complaining and cursing, saying they ain't good people to do business with, is NOT WHAT THEY NEED. I mean, that does only harm.
In the end, interventionism, generates a monster.
But.. why is the aggressive reaction occurring in this lands? why is people "hunting" each other there?
Is it because of interventionism and the aliens "global culture" generates? (read: we are all living in america)
Is it because of the big factories emplaced in this poor lands? (poor in currency)
Is this because of the social strait stairs that the big factories/market economy generates to be able to "participate" in this economy?
Is this because of the intervention of mafia/monopoly interfering them to consolidate and consume those lands/people?
is this because they are CONSUMING PRODUCTS THAT ARE NOT FROM THEIR OWN LAND? (which generates another type of alien).
In part, those are stuff negroponte didnt took into account, when tracing his plan.
and are stuff market economy will never think about. If they would think of that, they couldnt destroy and colonize new lands. (read some resentment there).
Google.
That's at least 662,257,761,200,000,000,000 nybbles! (roughly) You may need extra floppies.
just compress it with 7ZA and the 27 exab's should come down to about 640KB or so.
Just ZIP up the data to a smaller zip file. Then zip the zip file to and even smaller zip file. Repeat until all your data is compressed into a couple of megs. :-)
Come On People ... That was Plus 5 Funny.
I mean, gigabytes, what kind of unit is that? Is this some sort of Star Trek reference? I need to know how much data that is in *real* terms, like songs, pictures and libraries of congress.
Unfortunately the idea was crushed by ruthlessly greedy band of small-minded bloodsuckers with large legal staffs.
They called it "Napster."
Ignoring even the spam issue, there's also the issue that Outlook encourages people to include the previous message in its entirety, causing an O(n^2) effect for legitimate message chains; that is, every message in a conversation tends to include all previous messages. This not only increases archival size, but it also causes mailboxes to approach their seemingly arbitrary upper bound on mailbox size much more rapidly than seems necessary.
It's a good example of how a single bad design decision can have amazingly multiplied consequences. If nothing else, you'd think Microsoft and other tools for managing email could explore having better tools for noticing and offering to remove the redundancy.
Kent M Pitman
Philosopher, Technologist, Writer
You know, a SaganByte of storage. It would have to store Billions of billions of bytes.
DID SOMEBODY SAY PRECISELY 88 MILES PER HOUR!?// Im going back in time baby!
Sorry but that number seems quite small to me. I bet quite a few exabytes slipped through the cracks.
For those of you still thinking in the present and near-future (2010 is considered near-future in this case), stop it. It's bad for personal welfare and certainly a negative characteristic to have in the tech industry. Myself, on the other hand, prefer to operate 7-10 years ahead of the present and offer the following for your own edification:
http://gizmodo.com/gadgets/bell-curve/google-sees-the-world-in-an-ipod-by-2020-333439.php
Certainly with the ability to save all the world's content, archiving all of your orgs data to your iPod won't be a problem.
In Charles Stross' book Glasshouse, the early 21st century is considered by the future one of the "Dark Ages" because of our use of proprietary formats and ephemeral storage media.
I suspect he's onto something!
Silly
... most of this will be documents in formats older than Office 2003.
Have gnu, will travel.
...immediately after you delete the email or file.
Business data really is pretty small. It really is just text for the most part.
Even if you start to scan every document 500 gigabytes is going to be a lot of documents.
Most servers I bet are pretty small compared to what people are using at home. You just don't need to store video or even a lot of audio in most businesses.
Of course this doesn't apply to video production houses, print shops, or any places that actually deals with a lot of media data.
I know that my companies customer database is under one gigabyte in size. The accounting data is probably not a lot more, and our document management system is under 100 gigabytes. So yes most of our data could fit on a 160 gigabyte iPod.
See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
It's not ALL companies as you state in your post. Regulations requiring e-mail archives are only for publicly traded companies (ie: on the stock exchanges). Private companies have no such requirement.
...and to think human genome is just a puny 800MB.
45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
So how library of congress is above figure ME FAILED
So that's why Sun created ZFS. Doesn't even begin to fill up a Zetabyte file system.