Slashdot Mirror


27 Billion Gigabytes to be Archived by 2010

Lucas123 writes "According to a Computerworld survey of IT managers, data storage projects are the No. 2 project priority for corporations in 2008, up from No. 4 in 2007. IT teams are looking into clustered architectures and centralized storage-area networks as one way to control capacity growth, shifting away from big-iron storage and custom applications. The reason for the data avalanche? Archive data. In the private sector alone electronic archives will take up 27,000 petabytes (27 billion gigabytes) by 2010. E-mail growth accounts for much of that figure."

18 of 178 comments (clear)

  1. How Much do We Need to Store? by Zordak · · Score: 4, Insightful

    E-mail is the biggest burden on the storage space, and so much of that is garbage (I'm not even talking about spam---most "legitimate" e-mail is garbage). I wonder if there would be appreciable negative repercussions to deleting most of it. It seems like as often as not, all you get from archived e-mails is well-documented and discoverable "smoking guns" when you get sued. What if we just stored less of it? Would it be that bad? How likely is it that you're going to need some random Word document from 1998? Not criticizing---I'd really like to know.

    --

    Today's Sesame Street was brought to you by the number e.
    1. Re:How Much do We Need to Store? by Naturalis+Philosopho · · Score: 2, Insightful

      In the U.S., it's the law that a company must retain all electronic documents just in case they do ever have to go to court, for whatever reason. IMO, this is one of those very poorly thought out laws as 1) how do you punish a company for contempt when they can't hand over their e-mails because of 2) almost nobody currently archives all of their e-mails. Also, how do you prove that you've not deleted any? Plus, how does anybody ever sort through them all during discovery? I pity that law clerk.

  2. Re:So, in other words... by 4D6963 · · Score: 4, Insightful

    We're archiving spam?

    Which raises a question I find interesting, do we check for redundancy when archiving mails, in a way so that we can save a hell of a lot of space on spam (and other legitimate automated messages), since spam is by definition essentially the same message sent to a number of persons. Also, couldn't correlating stored mails for redundancy allow for better spam identification (although it would be no silver bullet since legitimate automated messages are often redundant).

    --
    You just got troll'd!
  3. Surprising . . . by cashman73 · · Score: 3, Insightful
    That 90% of that 27,000 petabyte figure isn't for archiving p0rn,... Although I guess, from the corporate IT perspective, they're not worried about backing up p0rn, since most people probably don't do that at work.

    But it is mostly email they're talking about here, and I bet a HUGE part of this archiving is:

    1. spam
    2. Email forwards that have been sent 1,000 times that still have all the original message headers attached
    3. Non-business-related multimedia emails sent by administrative assistants using the company's email and time to send and receive cutesy messages from/to their family & friends
    4. Business-related powerpoint and multimedia emails by non-techie PHBs that don't know how to transfer such files via FTP, and who are too damn lazy to use a thumbdrive

    Yep! Solve problems 1-3, and you'd vastly decrease the amount of email that you have to archive! I won't complain about #4, since I actually value my job, but it would be nice if more PHBs knew more about tech,...

    1. Re:Surprising . . . by houghi · · Score: 2, Insightful

      About 4. I do not understand management where I am.

      I make several excel files every week for reporting. They are located on a shared drive. Only extra data is added every monday, yet instead of puting a link to the files, or the directory, management wants me to send them by email every week to several people.

      Utterly stupid, if you ask me.

      --
      Don't fight for your country, if your country does not fight for you.
  4. For Fucks sake by Colin+Smith · · Score: 2, Insightful

    Just delete the crap.

    --
    Deleted
  5. Re:We have the prefixes, why not use them? by N+Nomad · · Score: 1, Insightful

    Does it bother you that much that these journalists want to make it easier for the general public to understand how big data storage they are talking about? please, get off your high horse, nerd. Find something better to do with yourself.

  6. Re:We have the prefixes, why not use them? by phoebusQ · · Score: 4, Insightful

    SI does exist for a reason: to allow for short, precise, descriptive, standardized measurements. However, the point of the numbers in this article is to show how absurdly large this amount of data really is. This isn't a scientific paper, it's a piece of journalism. In that case, there's nothing wrong with using numbers that aren't completely reduced to demonstrate scale.

  7. Re:We have the prefixes, why not use them? by thomasdz · · Score: 5, Insightful

    Yeah, but before the 1985 "Back to the Future" movie came out, how many "general public" people knew the prefix "Giga"? That's when I started hearing regular people start to use it.
    We gotta start using the prefixes before they start to become common. I'd rather see "27 Exabytes" followed by a parenthetical comment saying (27 Billion GigaBytes)

    --
    Karma: Excellent. 15 moderator points expire sometime.
  8. Re:So, in other words... by Smordnys+s'regrepsA · · Score: 2, Insightful

    Good Spamers uses multiple methods of fooling spam scans.

    ~They use pictures of text, instead of text, so it takes more effort to filter based on content.

    ~They use random text at the bottom of their message to give the filter something to read.

    ~They generate random noise to superimpose over the picture. Every batch has a different noise layer.


    I'm sure they do more [IANASB - spam bot - so I wouldn't know the details] but the slight differences between what WE would perceive as the same message foil both the spam filters and your plan of reducing redundancy. If you find a way to implement your idea, please release it as FOSS! I'm sure you could get a Nobel Peace prize out of it, or at least some free (as in beer) drinks! :)

    --
    Just -1, Troll talking to another.
  9. Re:duh...users store their files in their email! by Znork · · Score: 4, Insightful

    Better article summary:

    Storage vendors want to sell expensive solutions to gullible execs, pay analysts to produce credible-sounding FUD scenarios.

    "monthly e-mail traffic at more than 30 million messages, vs. 17 million just one year ago."

    Like, wow. In the meantime 500GB disks cost the same or less than 250GB disks did a year ago.

    "The university settled on an IBM storage infrastructure that will afford the institution 350TB of capacity"

    350TB? 350 disks? Half that in a year and a quarter in 2? That's not really a huge amount of storage. Anymore. It's an amount of storage I could go order from my friendly online computer store and get delivered tomorrow.

    The fact is, corporate storage isnt driving the market anymore, the consumer market is. Most people I know have more storage in their home PC than the average server requires. Companies want to save video? Consumers want their PVR's to save the cable-tv stream.

  10. Re:Moving away from Big Iron? by phoebusQ · · Score: 2, Insightful

    FTFA, RAID, TFA, COTS, CPU, FC, GigE, FRYS, JBOD, CFS, CIFS, EMC, DMX, HDS, IBM, FC, FICON... 17+ acronyms in one post...that's pretty impressive. Do you kiss your mother with that mouth? :)

  11. Re:So, in other words... by Smordnys+s'regrepsA · · Score: 2, Insightful

    I'm simply saying, the same thing that stops spam from being blocked in the first place stops your idea from coming to fruition. Millions of almost, but not quite, the exact same Nigerian scams are sent/stored without us having the ability of accurately checking for redundancy. With ~95% of all email being spam, you could make millions if you developed a program/process for correctly identifying multiple emails that are almost, but not quite, the exact same email CORRECTLY as spam, instead of let's say... a forwarded quiz with answers about yourself that is almost, but not exactly, the same email as the original quiz with your friend's answers (or, insert your example here). Do that, and you not only have found a way to check for redundancy in email storage, you have found a way to stop the redundancy (or, ~95% of the redundancy) from happening in the first place (I'm sorry, the lameness filter has kicked in, please stop attempting to send spam through this email address).

    So, no, I'm not rejecting your idea outright. I'm saying that by the time it is possible, it won't be AS needed.

    --
    Just -1, Troll talking to another.
  12. Re:duh...users store their files in their email! by leenks · · Score: 3, Insightful

    More like 1000 or 2000 disks, not 350. 1TB drives haven't really hit the enterprise yet. The biggest SAS drives in use are still 300GB.

  13. Re:We have the prefixes, why not use them? by mdwh2 · · Score: 5, Insightful

    Yes, but in Back to the Future, there wasn't a real need to explain how large "giga" really was, it was just there as a scientific-sounding buzzword. So whilst using the term in this article might have made people become familiar with the word, they wouldn't have any idea what size it actually meant.

    People didn't become familiar with Gigabyte because of Back to the Future anyway, they are familiar with it because that's what they now buy hard drives and ipods in. When they are sold in Exabytes, you'll see the term used in journalism too.

  14. So that's about 20 billion gigabytes of data... by LordHuggington · · Score: 2, Insightful

    that will be lost or stolen as company employees fail to properly encrypt back-ups, leave laptops in their car while running in for a latte or some such? Seriously, though, the article says storage is corporations' number 2 concern. What's number one from this survey? Is it security?

  15. Re:We have the prefixes, why not use them? by SeaFox · · Score: 4, Insightful

    Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible.


    Joe Sixpacks digest technobabble at a rate that is relevant to them. While few would know what an Exabyte is, most would know what a Gigabyte is since they deal with numbers that size in relation to their own computing systems. I think it's less writing for sensationalism than it is writing in a language your audience will understand.
  16. Re:We have the prefixes, why not use them? by SharpFang · · Score: 2, Insightful

    his isn't a scientific paper, it's a piece of journalism. In that case, there's nothing wrong with using numbers that aren't completely reduced to demonstrate scale.

    No, standard != wrong.

    In this case, there's precisely the same thing wrong that is with all of journalism: use specific language constructs to push certain emotional messages along with information. AKA manipulation.

    --
    45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2