27 Billion Gigabytes to be Archived by 2010

← Back to Stories (view on slashdot.org)

27 Billion Gigabytes to be Archived by 2010

Posted by timothy on Tuesday January 1, 2008 @10:02AM from the if-not-sooner dept.

Lucas123 writes "According to a Computerworld survey of IT managers, data storage projects are the No. 2 project priority for corporations in 2008, up from No. 4 in 2007. IT teams are looking into clustered architectures and centralized storage-area networks as one way to control capacity growth, shifting away from big-iron storage and custom applications. The reason for the data avalanche? Archive data. In the private sector alone electronic archives will take up 27,000 petabytes (27 billion gigabytes) by 2010. E-mail growth accounts for much of that figure."

26 of 178 comments (clear)

Min score:

Reason:

Sort:

We have the prefixes, why not use them? by Valacosa · 2008-01-01 10:07 · Score: 5, Informative

In other words, 27 Exabytes?

Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible. Scientific notation and SI exist for a reason.

--
"Live as if you'll die tomorrow." Ridiculous. You could die later today.
1. Re:We have the prefixes, why not use them? by mincognito · 2008-01-01 10:35 · Score: 4, Funny
  
  Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible. Scientific notation and SI exist for a reason.
  Exactly! For the thousandth time, let's cut out the exaggerated and sensational writing Slashdot! If I had a dollar for every sensational headline I've read here, not to mention the gazillion overstated comments I read here per day, I'd be a billionaire by now!
  
  --
  
  Ludwig Wittgenstein
2. Re:We have the prefixes, why not use them? by phoebusQ · 2008-01-01 10:37 · Score: 4, Insightful
  
  SI does exist for a reason: to allow for short, precise, descriptive, standardized measurements. However, the point of the numbers in this article is to show how absurdly large this amount of data really is. This isn't a scientific paper, it's a piece of journalism. In that case, there's nothing wrong with using numbers that aren't completely reduced to demonstrate scale.
3. Re:We have the prefixes, why not use them? by thomasdz · 2008-01-01 10:42 · Score: 5, Insightful
  
  Yeah, but before the 1985 "Back to the Future" movie came out, how many "general public" people knew the prefix "Giga"? That's when I started hearing regular people start to use it.
  We gotta start using the prefixes before they start to become common. I'd rather see "27 Exabytes" followed by a parenthetical comment saying (27 Billion GigaBytes)
  
  --
  Karma: Excellent. 15 moderator points expire sometime.
4. Re:We have the prefixes, why not use them? by Anonymous Coward · 2008-01-01 10:50 · Score: 5, Funny
  
  No, you'd only be a thousand millionaire.
5. Re:We have the prefixes, why not use them? by mdwh2 · 2008-01-01 11:56 · Score: 5, Insightful
  
  Yes, but in Back to the Future, there wasn't a real need to explain how large "giga" really was, it was just there as a scientific-sounding buzzword. So whilst using the term in this article might have made people become familiar with the word, they wouldn't have any idea what size it actually meant.
  
  People didn't become familiar with Gigabyte because of Back to the Future anyway, they are familiar with it because that's what they now buy hard drives and ipods in. When they are sold in Exabytes, you'll see the term used in journalism too.
6. Re:We have the prefixes, why not use them? by SeaFox · 2008-01-01 16:59 · Score: 4, Insightful
  
  Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible.
  
  Joe Sixpacks digest technobabble at a rate that is relevant to them. While few would know what an Exabyte is, most would know what a Gigabyte is since they deal with numbers that size in relation to their own computing systems. I think it's less writing for sensationalism than it is writing in a language your audience will understand.
So, in other words... by thesymbolicfrog · 2008-01-01 10:07 · Score: 5, Interesting

From the summary:
"E-mail growth accounts for much of that figure."

We're archiving spam?
1. Re:So, in other words... by 4D6963 · 2008-01-01 10:19 · Score: 4, Insightful
  
  We're archiving spam?
  Which raises a question I find interesting, do we check for redundancy when archiving mails, in a way so that we can save a hell of a lot of space on spam (and other legitimate automated messages), since spam is by definition essentially the same message sent to a number of persons. Also, couldn't correlating stored mails for redundancy allow for better spam identification (although it would be no silver bullet since legitimate automated messages are often redundant).
  
  --
  You just got troll'd!
2. Re:So, in other words... by goodtim · 2008-01-01 10:59 · Score: 5, Interesting
  
  Actually, I have a partial answer to this question. As a sysadmin for a Novell GroupWise email system, I can tell you that the actually message data for duplicate incoming messages (such as spam that is sent to many people at the same time) are only stored on disk once. Some sort of "pointer" is used to reference the messages to the individual users mailboxe's. Check out the docs if you are interested.
  
  That said with about 1400 users (spread across multiple postoffices), we have probably about 400gb of email data. We are able to keep it low, by having a 120 day retention policy. After that point, email can be archived locally, otherwise its deleted. Independant of that, and to comply with regulations and disaster recovery scenarios, email data is backed up and replicated offsite using disk-to-disk backup (eVault in case anyone is interested).
  This gives us the ability to archive email for up to 27 years or something like that (with relatively low storage costs because the disk-to-disk is incremental, storing changes at the per-block level).
  As for Microsoft Exchange, I have not the slightest clue how data is stored.
  
  --
  "Flee at once, all is discovered."
E-mail growth... by Urger · 2008-01-01 10:07 · Score: 5, Funny

E-mail growth accounts for much of that figure.
They should have that looked at. A good dermatologist could remove it.

--
Ubiquitously - A Ubiquity Developer Community
Distributed Storage by Anonymous Coward · 2008-01-01 10:08 · Score: 3, Informative

Some big projects are generating too many data that they have problems to deal with all that.
For example the Folding@home is implementing a distributed storage mechanism for their data and we'll likely have a new @home project soon - Storage@home.
http://en.wikipedia.org/wiki/Storage@home
http://www.stanford.edu/~beberg/Storage@home2007.pdf
http://folding.stanford.edu/English/Papers#ntoc7
How Much do We Need to Store? by Zordak · 2008-01-01 10:12 · Score: 4, Insightful

E-mail is the biggest burden on the storage space, and so much of that is garbage (I'm not even talking about spam---most "legitimate" e-mail is garbage). I wonder if there would be appreciable negative repercussions to deleting most of it. It seems like as often as not, all you get from archived e-mails is well-documented and discoverable "smoking guns" when you get sued. What if we just stored less of it? Would it be that bad? How likely is it that you're going to need some random Word document from 1998? Not criticizing---I'd really like to know.

--

Today's Sesame Street was brought to you by the number e.
duh...users store their files in their email! by Maskirovka · 2008-01-01 10:13 · Score: 4, Informative

article summary:

Users in a lot of places use their email as a document management system. This is somewhat effective on an individual basis, but in large organizations shared documents get duplicated dozens or even hundreds of times as each user has their own copy. In the next few years products like Sharepoint will alleviate some of that, though storage is cheap enough that it may not be worth the cost to both reeducate users and build the infrastructure for it. A SAN can hold real a lot of word documents and PDFs after all...
1. Re:duh...users store their files in their email! by Znork · 2008-01-01 10:53 · Score: 4, Insightful
  
  Better article summary:
  
  Storage vendors want to sell expensive solutions to gullible execs, pay analysts to produce credible-sounding FUD scenarios.
  
  "monthly e-mail traffic at more than 30 million messages, vs. 17 million just one year ago."
  
  Like, wow. In the meantime 500GB disks cost the same or less than 250GB disks did a year ago.
  
  "The university settled on an IBM storage infrastructure that will afford the institution 350TB of capacity"
  
  350TB? 350 disks? Half that in a year and a quarter in 2? That's not really a huge amount of storage. Anymore. It's an amount of storage I could go order from my friendly online computer store and get delivered tomorrow.
  
  The fact is, corporate storage isnt driving the market anymore, the consumer market is. Most people I know have more storage in their home PC than the average server requires. Companies want to save video? Consumers want their PVR's to save the cable-tv stream.
2. Re:duh...users store their files in their email! by leenks · 2008-01-01 11:27 · Score: 3, Insightful
  
  More like 1000 or 2000 disks, not 350. 1TB drives haven't really hit the enterprise yet. The biggest SAS drives in use are still 300GB.
2010 by Anonymous Coward · 2008-01-01 10:15 · Score: 5, Funny

All these archives are yours except Europa. ATTEMPT NO WRITINGS THERE.
Use standard units people understand. by jd · 2008-01-01 10:19 · Score: 3, Funny

Things like Libraries of Congress, Libraries of Alexandria, Spams per Square Inch. You know, the units that people have become familiar with. Besides which, are they power-two gigagytes or SI gigabytes? Also, how much bandwidth is needed to shift all that data? In the standard Imperial units of Clay Tablets per German Juggernaut per unit of French motorway, naturally.

--
It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
Surprising . . . by cashman73 · 2008-01-01 10:20 · Score: 3, Insightful
That 90% of that 27,000 petabyte figure isn't for archiving p0rn,... Although I guess, from the corporate IT perspective, they're not worried about backing up p0rn, since most people probably don't do that at work.
But it is mostly email they're talking about here, and I bet a HUGE part of this archiving is:
1. spam
2. Email forwards that have been sent 1,000 times that still have all the original message headers attached
3. Non-business-related multimedia emails sent by administrative assistants using the company's email and time to send and receive cutesy messages from/to their family & friends
4. Business-related powerpoint and multimedia emails by non-techie PHBs that don't know how to transfer such files via FTP, and who are too damn lazy to use a thumbdrive
Yep! Solve problems 1-3, and you'd vastly decrease the amount of email that you have to archive! I won't complain about #4, since I actually value my job, but it would be nice if more PHBs knew more about tech,...
This is starting to be Manditory by Smordnys+s'regrepsA · 2008-01-01 10:54 · Score: 3, Funny

Only 27,000 petabytes? n00b!

My pr0n collection takes at least 3 Internets* to store, archived.

*(sorry, forgot the conversion rate for Libraries of Congress)

--
Just -1, Troll talking to another.
Re:Wow, welfare for programmers... by phoebusQ · 2008-01-01 11:04 · Score: 3, Interesting

How do you figure that storage needs driving the increase in disk capacities and creating jobs is "a huge drain on the economy"?

And what do data-archiving rules have to do with welfare for programmers? Maybe for disk manufacturing firms or data admins, but programmers?
Re:Moving away from Big Iron? by HockeyPuck · 2008-01-01 11:25 · Score: 3, Funny

FRYS isn't an acronym... :)

and yes I do.
will someone think of the kids! by metamorfoza · 2008-01-01 12:09 · Score: 5, Funny

Does it bother you that much that these journalists want to make it easier for the general public to understand how big data storage they are talking about?

I agree. However, I would go even further and instead of using geekish bytes and bits we should use something like 400 billions of mp3s. You know, so that myspace user out there can understand TFA. They clearly have interest in this sort of news.
Re:Moving away from Big Iron? by HockeyPuck · 2008-01-01 12:36 · Score: 3, Funny

We're talking storage (sorry DASD) here... It's all about...

Hooking up a pair of EMC DMX's (or IBM ESSes, or HDS USPs) over a pair of OC48s for SRDF/PPRC/USR unless you are a zOS shop, then you could run XRC. Since this is a BC/DR plan, we'll run it over FCIP protected by IPSec over a DWDM leased line, which must be protected by a UPSR/BLSR, otherwise in the event of a link failure, the R1s will split from the R2s.

Then you're SOL.
a helpful reference page for large numbers by HappyEngineer · 2008-01-01 13:59 · Score: 4, Interesting

Here is my helpful reference page for big numbers. I love big numbers. I'm actually working on a site right now which will help people to visualize big numbers. I can't give out the url yet because it'll be another month or two before it's ready to be seen. But, it'll have many fun options like Cow Stacking and Hamster Canyon.

Cow stacking is where you select cow as the animal and from earth to moon as the place and you'll see a graphic of cows being stacked to the moon and the number of cows which would be required to complete that stack.

Hamster Canyon will be where you select a hamster and the Grand Canyon and you'll see a picture of the Grand Canyon filled with hamsters and a number that indicates the total number of hamsters required to fill the canyon.

--
Cow Cube
1. Re:a helpful reference page for large numbers by Anonymous Coward · 2008-01-02 00:08 · Score: 3, Funny
  
  Hamster Canyon will be where you select a hamster and the Grand Canyon and you'll see a picture of the Grand Canyon filled with hamsters and a number that indicates the total number of hamsters required to fill the canyon. That's much better than Libraries of Congress. Most people haven't even seen the Library of Congress, but who hasn't seen huge piles of hamsters?