Slashdot Mirror


27 Billion Gigabytes to be Archived by 2010

Lucas123 writes "According to a Computerworld survey of IT managers, data storage projects are the No. 2 project priority for corporations in 2008, up from No. 4 in 2007. IT teams are looking into clustered architectures and centralized storage-area networks as one way to control capacity growth, shifting away from big-iron storage and custom applications. The reason for the data avalanche? Archive data. In the private sector alone electronic archives will take up 27,000 petabytes (27 billion gigabytes) by 2010. E-mail growth accounts for much of that figure."

178 comments

  1. We have the prefixes, why not use them? by Valacosa · · Score: 5, Informative

    In other words, 27 Exabytes?

    Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible. Scientific notation and SI exist for a reason.

    --
    "Live as if you'll die tomorrow." Ridiculous. You could die later today.
    1. Re:We have the prefixes, why not use them? by N+Nomad · · Score: 1, Insightful

      Does it bother you that much that these journalists want to make it easier for the general public to understand how big data storage they are talking about? please, get off your high horse, nerd. Find something better to do with yourself.

    2. Re:We have the prefixes, why not use them? by mincognito · · Score: 4, Funny

      Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible. Scientific notation and SI exist for a reason.
      Exactly! For the thousandth time, let's cut out the exaggerated and sensational writing Slashdot! If I had a dollar for every sensational headline I've read here, not to mention the gazillion overstated comments I read here per day, I'd be a billionaire by now!
    3. Re:We have the prefixes, why not use them? by failedlogic · · Score: 0, Redundant

      I've read reports that journalists have a million billion* words in their vocabularies. Exabyte seems to be one of the few missing.

      *Sorry, had to. ;)

    4. Re:We have the prefixes, why not use them? by phoebusQ · · Score: 4, Insightful

      SI does exist for a reason: to allow for short, precise, descriptive, standardized measurements. However, the point of the numbers in this article is to show how absurdly large this amount of data really is. This isn't a scientific paper, it's a piece of journalism. In that case, there's nothing wrong with using numbers that aren't completely reduced to demonstrate scale.

    5. Re:We have the prefixes, why not use them? by thomasdz · · Score: 5, Insightful

      Yeah, but before the 1985 "Back to the Future" movie came out, how many "general public" people knew the prefix "Giga"? That's when I started hearing regular people start to use it.
      We gotta start using the prefixes before they start to become common. I'd rather see "27 Exabytes" followed by a parenthetical comment saying (27 Billion GigaBytes)

      --
      Karma: Excellent. 15 moderator points expire sometime.
    6. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 0

      You must be new here.

    7. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 5, Funny

      No, you'd only be a thousand millionaire.

    8. Re:We have the prefixes, why not use them? by redalien · · Score: 1

      I'd settle for knowing if they mean 27 exabytes or 27 exbibytes

    9. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 1, Informative

      I'm no nerd (well I guess I'm a wannabe nerd since I'm reading Slashdot) and true, I wouldn't have known how much an exabyte is. But a billion is such a large number that I can't really comprehend that either. I agree with the op though that exabyte should have been used.

      kilobyte
      megabyte
      gigabyte
      terabyte
      petabyte
      exabyte

      Seeing it like that, when you can relate it to the other ones, makes it easier to understand than "a billion, gajillion, fafillion bytes!"

    10. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 0

      Only?! He could withdraw ten million $100 bills from the bank!

    11. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 0

      Exactly! For the thousandth time, let's cut out the exaggerated and sensational writing Slashdot! If I had a dollar for every sensational headline I've read here, not to mention the gazillion overstated comments I read here per day, I'd be a billionaire by now!

      Twice over, no doubt.

    12. Re:We have the prefixes, why not use them? by mdwh2 · · Score: 5, Insightful

      Yes, but in Back to the Future, there wasn't a real need to explain how large "giga" really was, it was just there as a scientific-sounding buzzword. So whilst using the term in this article might have made people become familiar with the word, they wouldn't have any idea what size it actually meant.

      People didn't become familiar with Gigabyte because of Back to the Future anyway, they are familiar with it because that's what they now buy hard drives and ipods in. When they are sold in Exabytes, you'll see the term used in journalism too.

    13. Re:We have the prefixes, why not use them? by callmetheraven · · Score: 1

      Only we all thought they were "Jigawatts," a word that sounds politically incorrect but is actually meaningless.

      --
      You can have my SIG when you pry it from my cold, dead hands.
    14. Re:We have the prefixes, why not use them? by platykurtic · · Score: 1

      Are you sure you wouldn't be a million-thousandaire?

    15. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 0

      Then why not just say 20615843020800 Floppy disks, or 42409734214.21 CD's

      That would even be easier for the general public to understand.

      Even easier, about 29686813949952 pornographic images.

    16. Re:We have the prefixes, why not use them? by ILuvRamen · · Score: 1

      but it sounds so much cooler. In fact, they should have counted it in bytes. I believe that would be 27 fuckingbigillion bytes. Now that sounds impressive!

      --
      Google's Super Secret Search Algorithm: SELECT @search_results FROM internet WHERE @search_results = 'good'
    17. Re:We have the prefixes, why not use them? by Yez70 · · Score: 1

      Journalists must write at an eighth grade level or the majority of their readers would not be able to understand them. Of course, for an arrogant intellect, such as yourself, maybe you should just stop reading so you can be happy.

    18. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 0, Troll

      Slashdot isn't the general public, jack ass.

    19. Re:We have the prefixes, why not use them? by dasmoo · · Score: 1

      I'm fairly sure he was saying 1.21 Jigawatts anyway

    20. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 0
      In other words, 27 Exabytes?

      Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible. Scientific notation and SI exist for a reason.

      Holy screaming shit -- that's more than one-quarter pornobyte.

    21. Re:We have the prefixes, why not use them? by Slashdot+Suxxors · · Score: 1

      Calling someone 'nerd' on /. is a bit redundant, wouldn't you say?

    22. Re: Re: We have the prefixes, why not use them? by Moodie-1 · · Score: 1

      Yeah, but before the 1985 "Back to the Future" movie came out, how many "general public" people knew the prefix "Giga"?
      Yeah, even though Professor Brown should have known better than to pronounce 'gigawatts' as 'jiggawatts'. I guess you can't expect scriptwriters to know much about science. So where was the science advisor on this flick?
    23. Re:We have the prefixes, why not use them? by SeaFox · · Score: 4, Insightful

      Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible.


      Joe Sixpacks digest technobabble at a rate that is relevant to them. While few would know what an Exabyte is, most would know what a Gigabyte is since they deal with numbers that size in relation to their own computing systems. I think it's less writing for sensationalism than it is writing in a language your audience will understand.
    24. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 0

      No, you'd only be a thousand millionaire.

      That will make him a milliardarie, according to SI.

    25. Re:We have the prefixes, why not use them? by infidel13 · · Score: 0

      I want a one exabyte iPod!

      --
      quia potentia mens mentis
    26. Re:We have the prefixes, why not use them? by WaroDaBeast · · Score: 2, Informative

      SI only seems to exist outside the UK and the US -- talking about the ordinary people.

      --
      "The body may heal, but the mind is not always so resilient." -- Deus Ex: Human Revolution
    27. Re:We have the prefixes, why not use them? by SharpFang · · Score: 2, Insightful

      his isn't a scientific paper, it's a piece of journalism. In that case, there's nothing wrong with using numbers that aren't completely reduced to demonstrate scale.

      No, standard != wrong.

      In this case, there's precisely the same thing wrong that is with all of journalism: use specific language constructs to push certain emotional messages along with information. AKA manipulation.

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
    28. Re:We have the prefixes, why not use them? by joost · · Score: 1

      Yes, good idea. Let's all dumb down every "teh diffucult" words because we need the lowest common idiot to understand it all! Especially tech articles on a tech website!

    29. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 0

      isn't this site slashdot - news for nerds, stuff that matters? i think its funny that somebody even brought this up. The intended audience of this site is nerds. So using the word nerd as an insult is interesting to say the least

    30. Re:We have the prefixes, why not use them? by mdwh2 · · Score: 2, Informative

      I'm fairly sure he was saying 1.21 Jigawatts anyway

      That's just a different way of pronouncing Gigawatts :)

    31. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 1, Funny

      Would you yanks please learn to count! Million Millionaire.

    32. Re:We have the prefixes, why not use them? by CarpetShark · · Score: 1

      there wasn't a real need to explain how large "giga" really was, it was just there as a scientific-sounding buzzword.


      Jiggawhats is scientific-sounding? Are you sure? ;)
    33. Re:We have the prefixes, why not use them? by Anonymous Coward · · Score: 0

      the point of the numbers in this article is to show how absurdly large this amount of data really is
      In that case, why not use the well-introduced and unambiguous term gazillion? Slashdot needs to be bolder in its quest to become n00z 4 n00bz.
    34. Re:We have the prefixes, why not use them? by ogminlo · · Score: 1

      Indeed. Describing a base-eight counting system (bytes) with base-ten numbers is dishonest and confusing. It only washes here since this is an estimate, but this is the same reason we buy storage with an asterisk on it telling us that for the purposes of bogus marketing a GB = 1000 MB. The general public is not doing the math in their heads to comprehend the scale this headline describes, so PB of EB is more appropriate and more accurate.

    35. Re:We have the prefixes, why not use them? by bobbocanfly · · Score: 0, Flamebait

      The worrying thing here is not that you are being completely retarded (Last time i checked 1 exabyte arrays were a little bigger than an iPod) but you actually want to buy an iPod.

    36. Re:We have the prefixes, why not use them? by argStyopa · · Score: 1

      Is that 1024-millions, or only 1000?

      --
      -Styopa
    37. Re:We have the prefixes, why not use them? by poot_rootbeer · · Score: 1

      Does it bother you that much that these journalists want to make it easier for the general public to understand how big data storage they are talking about?

      Scientific notation makes that goal extremely simple to obtain. Or at least, it would, if journalists could trust that their audiences had the basic high-school level understanding that they ought to have.

      Concepts like "million" and "billion" are hard to visualize and even harder to distinguish, and that's without the regionalization issue over whether 1 billion means 1 thousand million or 1 million million.

    38. Re:We have the prefixes, why not use them? by wed128 · · Score: 1

      I think you mean

      !standard = wrong

      or maybe even

      ~standard = wrong

      but

      (standard != wrong) == wrong

    39. Re:We have the prefixes, why not use them? by NASA+NERD · · Score: 1

      i agree, it's stupid, we have so many words but no one uses them. You don't say i have 1024 gigabytes, or i have 1048576 megabytes, you say "I have 1 terabyte." We may have these words and abriviations but people are just scared of them.

      --
      Scotty thats not funny! Beam down my clothes RIGHT NOW!-Capt. Kirk
    40. Re:We have the prefixes, why not use them? by madprof · · Score: 1

      Isn't that a type of French shop?

    41. Re:We have the prefixes, why not use them? by SharpFang · · Score: 1

      standard != right.

      --
      45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
  2. So, in other words... by thesymbolicfrog · · Score: 5, Interesting

    From the summary:
    "E-mail growth accounts for much of that figure."

    We're archiving spam?

    1. Re:So, in other words... by 4D6963 · · Score: 4, Insightful

      We're archiving spam?

      Which raises a question I find interesting, do we check for redundancy when archiving mails, in a way so that we can save a hell of a lot of space on spam (and other legitimate automated messages), since spam is by definition essentially the same message sent to a number of persons. Also, couldn't correlating stored mails for redundancy allow for better spam identification (although it would be no silver bullet since legitimate automated messages are often redundant).

      --
      You just got troll'd!
    2. Re:So, in other words... by webmaster404 · · Score: 1

      Exactly. E-mail use is declining in non-company use to IM and text messaging. Due to spam and other factors I would highly disagree that E-mail will grow that much. With cross-platform IM clients such as Pidgin, the OS is no problem for IM and in young people both IMs and text messaging have made E-mail needless.

      --
      There is no "disagree" moderation, and troll, flamebait and overrated are not valid substitutes
    3. Re:So, in other words... by wizardforce · · Score: 1

      homeland security.

      --
      Sigs are too short to say anything truly profound so read the above post instead.
    4. Re:So, in other words... by Anonymous Coward · · Score: 0

      That's a shitload of v1agr7 and c1ali5. It will keep future digital archaeologists up all night (if it's longer than 4 hours they should see their holographic doctor).

    5. Re:So, in other words... by Smordnys+s'regrepsA · · Score: 2, Insightful

      Good Spamers uses multiple methods of fooling spam scans.

      ~They use pictures of text, instead of text, so it takes more effort to filter based on content.

      ~They use random text at the bottom of their message to give the filter something to read.

      ~They generate random noise to superimpose over the picture. Every batch has a different noise layer.


      I'm sure they do more [IANASB - spam bot - so I wouldn't know the details] but the slight differences between what WE would perceive as the same message foil both the spam filters and your plan of reducing redundancy. If you find a way to implement your idea, please release it as FOSS! I'm sure you could get a Nobel Peace prize out of it, or at least some free (as in beer) drinks! :)

      --
      Just -1, Troll talking to another.
    6. Re:So, in other words... by Anonymous Coward · · Score: 0

      U yoots myt tink IM n txtmsgng mak emil needlesh, buht ish jst duh saim ting.

      DOIK!

    7. Re:So, in other words... by goodtim · · Score: 5, Interesting

      Actually, I have a partial answer to this question. As a sysadmin for a Novell GroupWise email system, I can tell you that the actually message data for duplicate incoming messages (such as spam that is sent to many people at the same time) are only stored on disk once. Some sort of "pointer" is used to reference the messages to the individual users mailboxe's. Check out the docs if you are interested.

      That said with about 1400 users (spread across multiple postoffices), we have probably about 400gb of email data. We are able to keep it low, by having a 120 day retention policy. After that point, email can be archived locally, otherwise its deleted. Independant of that, and to comply with regulations and disaster recovery scenarios, email data is backed up and replicated offsite using disk-to-disk backup (eVault in case anyone is interested).

      This gives us the ability to archive email for up to 27 years or something like that (with relatively low storage costs because the disk-to-disk is incremental, storing changes at the per-block level).

      As for Microsoft Exchange, I have not the slightest clue how data is stored.

      --
      "Flee at once, all is discovered."
    8. Re:So, in other words... by 4D6963 · · Score: 1

      OK so basically you're dismissing my entire idea (which was part a question, I mean why wouldn't it be done to a certain extent already?) just because some an unknown (by you and me) ratio of the spam data isn't redundant.

      That would be kind of like saying "Why bother with implementing compressed file systems! Most people fill their disks with file that can't be significantly compressed anyways!". Sure, but you've still got millions of copies of the exact same Nigerian scams out there which are stored without any redundancy check, or so I presume.

      --
      You just got troll'd!
    9. Re:So, in other words... by multipartmixed · · Score: 1

      I suspect that the behaviour you're describing is only for the case when multiple deliveries occur via a single SMTP transaction (i.e. multiple RCPT TO commands before DATA) rather than the general case of messages-which-happen-to-be-identical, which is what the OP was positing.

      Either that, or when the sending system sends the same message in multiple transactions (i.e. poor mailer, or a mailer interrupted by a 452 response code) and the messages have the same Message ID header.0

      That said, the original poster makes an assumption that identical-looking messages are likely to be indistinguishable, which they in fact are not, unless generated by a non-compliant mailer and probably get received by a non-compliant mailer. Message ID must vary from message-to-message, and the Date and Received-By: headers are extremely likely to vary from message-to-message.

      So, the OP then faces a HUGE search problem which will only "hit" when the sending MTA, and probably the receiving MTA, are non-comformant. This is unlikely to occur with any great frequency, making that search heuristic non-productive. He'd get better lucky archiving large message fragments as some huffman-coding variant (and surely much better could be done with a little thought).

      --

      Do daemons dream of electric sleep()?
    10. Re:So, in other words... by Smordnys+s'regrepsA · · Score: 2, Insightful

      I'm simply saying, the same thing that stops spam from being blocked in the first place stops your idea from coming to fruition. Millions of almost, but not quite, the exact same Nigerian scams are sent/stored without us having the ability of accurately checking for redundancy. With ~95% of all email being spam, you could make millions if you developed a program/process for correctly identifying multiple emails that are almost, but not quite, the exact same email CORRECTLY as spam, instead of let's say... a forwarded quiz with answers about yourself that is almost, but not exactly, the same email as the original quiz with your friend's answers (or, insert your example here). Do that, and you not only have found a way to check for redundancy in email storage, you have found a way to stop the redundancy (or, ~95% of the redundancy) from happening in the first place (I'm sorry, the lameness filter has kicked in, please stop attempting to send spam through this email address).

      So, no, I'm not rejecting your idea outright. I'm saying that by the time it is possible, it won't be AS needed.

      --
      Just -1, Troll talking to another.
    11. Re:So, in other words... by 4D6963 · · Score: 1

      That said, the original poster makes an assumption that identical-looking messages are likely to be indistinguishable

      No, I make the assumption that identical-looking messages have most of their data in common, and that this common data, even if only a chunk of the message starting and stopping at an arbitrary point, could be stroed efficiently.

      That means cutting messages into blocks, if it is found that some part has something in common with another one, to store common blocks of data all in one place. This way, a personalised message with only a few words varying from a copy of it to another would get all of its redundant data stores only once.

      Here's how it would be stored, once messages would be correlated and that all the similarities would be identified, messages would be cut into blocks, depending on whether these blocks are unique or redundant. Every block would be given an ID, and stored using that ID, and mails would be stored as a list of these IDs.

      --
      You just got troll'd!
    12. Re:So, in other words... by 4D6963 · · Score: 1

      I see, but my idea is more focused on solving the storage problem, and to get around the "95% redundancy" problem my idea was based on cutting messages into blocks depending on whether they're redundant or unique, as described here.

      --
      You just got troll'd!
    13. Re:So, in other words... by TapeCutter · · Score: 1

      "That means cutting messages into blocks, if it is found that some part has something in common with another one, to store common blocks of data all in one place."

      Substitute "words" for "blocks" and you will find you have invented a dictonary.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    14. Re:So, in other words... by 4D6963 · · Score: 1

      Substitute "words" for "blocks" and you will find you have invented a dictonary.

      Duh, of course by blocks I mean blocks of a significant threshold size. You're just nitpicking ;-)

      --
      You just got troll'd!
    15. Re:So, in other words... by LoudMusic · · Score: 2, Interesting

      From the summary:
      "E-mail growth accounts for much of that figure."

      We're archiving spam? No, we have associates using their email as a file storage device - sending documents to eachother through email rather than just sending an email that says "Your 38MB file is on the file server in /X/here/where/there/document.type".
      --
      No sig for you. YOU GET NO SIG!
    16. Re:So, in other words... by TapeCutter · · Score: 2, Informative

      "You're just nitpicking ;-)"

      Ummm, no. I have CS degree and 20yrs experience. What you are talking about is the attacking the problem of redundant information by comparing blocks, this has already been 'solved'. ;)

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    17. Re:So, in other words... by damn_registrars · · Score: 1

      we check for redundancy when archiving mails, in a way so that we can save a hell of a lot of space on spam
      I could see that helping if the same spam is sent to the clients on your network, but it doesn't account for all the subsequent iterations of the spam.

      YMMV, but I see a lot of spam carrying highly varied introductory garbage (to attempt to fool spam filtering software, of course). Some of my email accounts easily receive 10x as much spam as legitimate email, which would make a redundancy check difficult to apply.

      But if it works for you, then more power to you.
      --
      Damn_registrars has no butt-hole. Damn_registrars has no use for a butt-hole.
    18. Re:So, in other words... by Anonymous Coward · · Score: 0

      Are you two fellas having fun jerking each other off over this stupid fucking topic?

    19. Re:So, in other words... by igny · · Score: 1

      We're archiving spam?

      Archiving is the best way to deal with any unnecessary and unneeded information, spam included. So many times I archived my workfiles with the thought that if I don't open that archive in 12 months, it is all junk and I can just toss it away. I believe my brain is working the same way only faster. What are we talking about again?

      --
      In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
    20. Re:So, in other words... by buggycoder · · Score: 1

      Data deduplication is the term used to describe techniques to avoid storing multiple copies of the same bit streams. Avamar (www.avamar.com), since acquired by EMC, Data Domain (www.datadomain.com) have products based on this technology.

    21. Re:So, in other words... by Anonymous Coward · · Score: 0

      In this scenario, you have to first hash every email on your server, then submit that hash to every other email server in the whole world to compare against the emails stored on their end.

      You're talking of a problem with an complexity factor of at least O(n^2). While not exactly intractable, there are a hell of lot of Email Servers and messages out there to check for redundancy on!

      Correlating on a local level only (might) be doable.

    22. Re:So, in other words... by 4D6963 · · Score: 1

      Ummm, no. I have CS degree and 20yrs experience.

      And? You were nitpicking anyways... Yay, a Wikipedia link that's barely even relevant! Anyways, maybe that's already been 'solved', but the question is not whether this has ever been solved but if it's ever been implemented as such for e-mail storage. But maybe you can tell me what's flawed with my idea of (large) block redundancy detection for e-mail storage to begin with instead of rubbing your credibility in my face.

      --
      You just got troll'd!
    23. Re:So, in other words... by 4D6963 · · Score: 1

      you have to first hash every email on your server, then submit that hash to every other email server in the whole world

      I didn't talk about hashing entire e-mails but parts of e-mails (which makes the problem more complicated) and then, who talked about other e-mail servers in the rest of the world? Why would you wanna do that?

      --
      You just got troll'd!
    24. Re:So, in other words... by wed128 · · Score: 1

      So, basically what your saying is, that in South Korea, E-mail is for old people?

    25. Re:So, in other words... by TapeCutter · · Score: 1

      "But maybe you can tell me what's flawed with my idea of (large) block redundancy detection for e-mail storage to begin with"

      Cost, you pimple-faced little shit.

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
    26. Re:So, in other words... by 4D6963 · · Score: 1

      Cost, you pimple-faced little shit.

      Care to elaborate, you decrepit cocky douchebag?

      --
      You just got troll'd!
    27. Re:So, in other words... by TapeCutter · · Score: 1

      No, but I got a good laugh out of your reply. :)

      --
      And did you exchange a walk on part in the war for a lead role in a cage? - Pink Floyd.
  3. E-mail growth... by Urger · · Score: 5, Funny

    E-mail growth accounts for much of that figure.

    They should have that looked at. A good dermatologist could remove it.
  4. Distributed Storage by Anonymous Coward · · Score: 3, Informative

    Some big projects are generating too many data that they have problems to deal with all that.
    For example the Folding@home is implementing a distributed storage mechanism for their data and we'll likely have a new @home project soon - Storage@home.
    http://en.wikipedia.org/wiki/Storage@home
    http://www.stanford.edu/~beberg/Storage@home2007.pdf
    http://folding.stanford.edu/English/Papers#ntoc7

    1. Re:Distributed Storage by danwat1234 · · Score: 0, Flamebait

      Storage@home!? That's hilarious. I thought that was a joke at first... Sounds like a good alternative to spending more money on the project just to store data.

  5. How Much do We Need to Store? by Zordak · · Score: 4, Insightful

    E-mail is the biggest burden on the storage space, and so much of that is garbage (I'm not even talking about spam---most "legitimate" e-mail is garbage). I wonder if there would be appreciable negative repercussions to deleting most of it. It seems like as often as not, all you get from archived e-mails is well-documented and discoverable "smoking guns" when you get sued. What if we just stored less of it? Would it be that bad? How likely is it that you're going to need some random Word document from 1998? Not criticizing---I'd really like to know.

    --

    Today's Sesame Street was brought to you by the number e.
    1. Re:How Much do We Need to Store? by Naturalis+Philosopho · · Score: 2, Insightful

      In the U.S., it's the law that a company must retain all electronic documents just in case they do ever have to go to court, for whatever reason. IMO, this is one of those very poorly thought out laws as 1) how do you punish a company for contempt when they can't hand over their e-mails because of 2) almost nobody currently archives all of their e-mails. Also, how do you prove that you've not deleted any? Plus, how does anybody ever sort through them all during discovery? I pity that law clerk.

    2. Re:How Much do We Need to Store? by phoebusQ · · Score: 1

      In the US (and I'm sure other places as well), companies are required to archive electronic data.

    3. Re:How Much do We Need to Store? by Bluehorn · · Score: 1

      This reminds me of my data loss night mare back in 2004. While I was still a student, I lost both the disk of my work station and shortly afterwards the server. Of course, I had a backup of really important data, which did not include email archives at that time (silly me).

      I was bothered that I had lost some ten thousand emails due to that double disk failure.

      Actually, I never remembered that accident again until I read that slashdot story just now... Seems like no important data was lost.

      Anyway, my backups now include email ;)

    4. Re:How Much do We Need to Store? by kestasjk · · Score: 1

      This already happened when MS lost a bunch of e-mail relating to the IE case, didn't it?

      --
      // MD_Update(&m,buf,j);
    5. Re:How Much do We Need to Store? by ZorbaTHut · · Score: 1

      Every once in a while I need to dig out an ancient email from my email repository. I don't have any way of knowing which one ahead of time - sometimes it's something obviously important, sometimes it turns out to be something incredibly unimportant (one of my friends deleted an important Livejournal entry once accidentally, but I'd responded to the entry with a mostly-unimportant comment and Livejournal emails me with the entire entry text when I do that. Surprise! It's important!)

      On top of that, the sheer effort involved in figuring out which emails are important and which aren't simply isn't worth it. I've got around 400mb of email, containing at least 50,000 individual messages - it's cheaper, in terms of time and effort, to keep it all.

      --
      Breaking Into the Industry - A development log about starting a game studio.
    6. Re:How Much do We Need to Store? by kent_eh · · Score: 1

      I seem to recall several recent articles about new data retention laws requiring companies to do just that - store potentiality incriminating e-mails for absurdly long periods of time.

      So, to answer your question:

      What if we just stored less of it?

      You might get fined or jailed.

      --

      ---
      "I can't complain, but sometimes still do..." Joe Walsh
    7. Re:How Much do We Need to Store? by Zordak · · Score: 1

      I know of no such law. I know that the Federal Rules of Civil Procedure require litigants to produce archived data, and I know that litigants can be sanctioned for destroying data in bad faith. The Rules also provide a safe harbor for data destroyed in good faith in accordance with a reasonable data retention policy. So what's reasonable? What is the real probability that a business will have non-litigation problems?

      --

      Today's Sesame Street was brought to you by the number e.
    8. Re:How Much do We Need to Store? by Hillgiant · · Score: 1

      How likely is it that you're going to need some random Word document from 1998?

      Extremely likely in my case. My industry involves a lot of repeat work for existing customers. It is very handy when researching a new job to have access to everything done on previous jobs for the same customer. We tried several different methods of organizing all the different types of information. Because Outlook really doesn't have a good way of storing emails as files (dumb), we end up just storing it all in the email archive. We use a public archive to limit the amount of duplication on the mail server, but it is still all there in Outlook.

      --
      -
    9. Re:How Much do We Need to Store? by Anonymous Coward · · Score: 0
      Most legitimate email is garbage? I haven't a clue what you're talking about?!

      ps. check out this site, it's soooooo funny!!! LOL http://slashdot.org/

      Anonymous s.d. Coward
      Go Home Team!
      (555) 555-5555
      "A commune is where people join together to share their lack of wealth." -- R. Stallman
      "Your reasoning powers are good, and you are a fairly good planner."

      On Tuesday January 01, @05:12PM Zordak wrote:

      E-mail is the biggest burden on the storage space, and so much of that is garbage (I'm not even talking about spam---most "legitimate" e-mail is garbage). I wonder if there would be appreciable negative repercussions to deleting most of it. It seems like as often as not, all you get from archived e-mails is well-documented and discoverable "smoking guns" when you get sued. What if we just stored less of it? Would it be that bad? How likely is it that you're going to need some random Word document from 1998? Not criticizing---I'd really like to know.
    10. Re:How Much do We Need to Store? by humpy101 · · Score: 0

      You'd be surprised at what people like to keep. In a previous life I was sysadmin for a (smallish) research centre. About 90 users, 75% of them phDs. One guy (nice, intelligent fellow, phD in maths or something like that), had a *set* of about 10 5Gb mail files (PSTs). When I asked him what was in them (what could he possibly want with all this data?!!) his reply (which I will never forget), "I keep all email that I ever send or receive!". Yes, this included spam. He never *ever* deleted anything.

      --
      Wherever you go There you are
  6. duh...users store their files in their email! by Maskirovka · · Score: 4, Informative

    article summary:

    Users in a lot of places use their email as a document management system. This is somewhat effective on an individual basis, but in large organizations shared documents get duplicated dozens or even hundreds of times as each user has their own copy. In the next few years products like Sharepoint will alleviate some of that, though storage is cheap enough that it may not be worth the cost to both reeducate users and build the infrastructure for it. A SAN can hold real a lot of word documents and PDFs after all...

    1. Re:duh...users store their files in their email! by webmaster404 · · Score: 1

      I don't get it. Most large companies have servers that store documents and such, along with that, most computers have 40 gig- 120 gig hard drives and drives up to 1 TB or so can be bought for cheap. How are we running out of space in a large company? And why "archive" E-mail thats stored on the computer AND an E-mail server?

      --
      There is no "disagree" moderation, and troll, flamebait and overrated are not valid substitutes
    2. Re:duh...users store their files in their email! by Znork · · Score: 4, Insightful

      Better article summary:

      Storage vendors want to sell expensive solutions to gullible execs, pay analysts to produce credible-sounding FUD scenarios.

      "monthly e-mail traffic at more than 30 million messages, vs. 17 million just one year ago."

      Like, wow. In the meantime 500GB disks cost the same or less than 250GB disks did a year ago.

      "The university settled on an IBM storage infrastructure that will afford the institution 350TB of capacity"

      350TB? 350 disks? Half that in a year and a quarter in 2? That's not really a huge amount of storage. Anymore. It's an amount of storage I could go order from my friendly online computer store and get delivered tomorrow.

      The fact is, corporate storage isnt driving the market anymore, the consumer market is. Most people I know have more storage in their home PC than the average server requires. Companies want to save video? Consumers want their PVR's to save the cable-tv stream.

    3. Re:duh...users store their files in their email! by Slippy. · · Score: 1

      Ah, the simple questions. A civic is nice, and maybe you've souped it up to pull a little trailer, but that doesn't mean a tractor-trailer is going to be cheap too.

      Unreliable, slow desktop storage is cheap. Reliable, redundant, fast, networked storage - not cheap.

      You can dump a TB on your local disk. Now copy it. Boy-howdy, that's a looooonnnngggg wait. Now let 1000 people all do this at the same time.

        - redundant storage,
        - fast bus,
        - redundant controllers,
        - redundant locations maybe,
        - redundant power,
        - cooling,
        - redundant wiring,
        - and expert management ('cause somebody has to be blamed!)

      *All this* is still expensive. Storage prices (hard drives) drop quickly, but not as fast as usage. And consumer drives - not so reliable in large storage arrays.

      --
      -- Life is good. Tastes like chicken.
    4. Re:duh...users store their files in their email! by leenks · · Score: 1

      Go and work for a large company and find out. You can't use the hard drive in a workstation to store anything other than applications - the machine will (out of necessity) be a standard image that will get blasted from time to time with updates, or when something breaks on the Windows install.

      For enterprise storage, hard drives are not cheap. Yes, you can buy domestic IDE drives for cheap, but check the prices on SAS or "enterprise grade" storage. A large company will have potentially petabytes of data - backups for that amount of data aren't cheap, let alone archiving.

      Emails are "archived" because most companies age off old emails. Any sensible company will archive emails that users delete (look at Enron as an example of why you'd want to do this).

    5. Re:duh...users store their files in their email! by leenks · · Score: 3, Insightful

      More like 1000 or 2000 disks, not 350. 1TB drives haven't really hit the enterprise yet. The biggest SAS drives in use are still 300GB.

    6. Re:duh...users store their files in their email! by hjf · · Score: 1

      zfs.

    7. Re:duh...users store their files in their email! by Torque · · Score: 1

      Shockingly, this is one area that Exchange does a reasonable job. Since we know the behavior is "send files via email", you want an email system that doesn't croak under that kind of load. Exchange, with single-instance storage, actually gets this right.

      If I send, via Exchange, the same email to 30 users, with an attachment to it, that email (and attachment) are stored once. With any other mail system I get 30 copies of it. THAT is a huge improvement.

      (Zimbra may actually do single-instance storage, but I haven't done investigation enough yet to be sure)

    8. Re:duh...users store their files in their email! by Slippy. · · Score: 1

      Yeah, I'm hoping zfs works out too. Looks pretty good in some dev environments.

      Doesn't solve the backend physical costs for decent performance though, just shaves some savings on file system licensing and simplifies some admin (perhaps - I'll believe it when it happens). And zfs isn't production ready yet.

      You know you can't remove storage from a zfs pool on solaris yet? Makes migrating more of a pain. Soon to be solved, I'm told.

      --
      -- Life is good. Tastes like chicken.
    9. Re:duh...users store their files in their email! by Anonymous Coward · · Score: 0

      Perhaps you should look up the term RAID

      Redundant array of INEXPENSIVE disks ..

      if you are using "enterprise" grade disks .. you are simply wasting your money.

      Also .. I have yet to see a real study ( not produced by a drive manufacuture ) that shows an "enterprise" disk lasts longer then a non.

    10. Re:duh...users store their files in their email! by TooMuchToDo · · Score: 1

      I've got an array of web servers we just brought up (10 of them) that are Supermicro boxes with 1TB drives in a RAID1 configuration (2 drives). They're SATA2 instead of SAS, but they're still quick as hell and have to deal with a LOT of daily log files.

    11. Re:duh...users store their files in their email! by bmgoau · · Score: 1

      I work for a wholesale company in Sydney Australia, and we ship terabytes a week, if we had the order it wouldnt be to big a jump for us to provide 350 terabyte hard drives. At the same time we're seeing huge sales surges in NAS/SAS units of 1 terabyte and up. For consumers, the 1TB Western Digital World Book is the best example, we sell alot of those to media and entertainment stores already.

      In only a year the size and value of hard disk drives has increased monumentally, and tomorrow at work i see no sign of that ceasing.

    12. Re:duh...users store their files in their email! by fifedrum · · Score: 1

      mod parent up.

      hard drives aren't cheap. RAID arrays aren't cheap (heck, 4 hard drives for the capacity of 3). Backups of all this data isn't cheap, especially when you want to have a year of it offsite, with mothly and weekly rotations worked in the mix. Offsite storage in secured storage facilities with automatic rotation and retention isn't cheap. Auditing the security of this data isn't cheap. Hiring people to manage all this shouldn't be cheap either.

    13. Re:duh...users store their files in their email! by hjf · · Score: 1

      Yes, and you can't add more devices to a raidz, that's something I'd like. But you can keep adding devices (single or raid) to a zpool and make it grow more and more. I have 4x500GB, and when I run out of space I'll add 4x1TB (if my calculations are correct, then I'll fill up my 4x500GB or 1.3TB by the time TB drives are cheap enough). The older drives could be used for backing up sensitive data from the array (not everything is worth backing up). And with ZFS's incremental "send", they will come very handy (sadly, tape backup is far too expensive for me...)

      *When* ZFS allows to remove storage from a zpool, I'll be able to remove the old 4x500GB and replace them by something larger (4x2TB?) and keep repeating (whenever the usage reaches 2/3 of the capacity, remove the smaller set of disks and replace by 2x the capacity of the larger disks).

      Now... if zpool would ever allow to replace disks one by one, I'll be able to fill up the whole array and change the disks one by one. When all disks are replaced, zpool should detect that the maximum capacity of all disks is larger than before, and automagically expanding itself to use all available spacing. But I doubt they would ever implement that. Not for technical reasons but because, who would want to do that besides the Geeky home user?

  7. shocking? by Anonymous Coward · · Score: 0

    hmm, I can believe this. I ran an e-mail server for the last company I worked for, and it was amazing how fast space got taken up just due to residual e-mails.

    Since I'm the type to do the same thing, I can't be critical, so I left no quotas.

  8. 2010 by Anonymous Coward · · Score: 5, Funny

    All these archives are yours except Europa. ATTEMPT NO WRITINGS THERE.

    1. Re:2010 by Atele · · Score: 1

      But in 2061 it's okay.

    2. Re:2010 by pandrijeczko · · Score: 1

      Yep, "The Amazing Wallet Of Arthur C Clarke" and his propensity for sequels.

      --
      Gentoo Linux - another day, another USE flag.
    3. Re:2010 by AndroidCat · · Score: 2, Funny

      Oh my God, it's full of pr0n.

      --
      One line blog. I hear that they're called Twitters now.
    4. Re:2010 by ediron2 · · Score: 1

      My God, it's full of (porn)STARS

      There, I fixed that typo for you.

  9. corporate email storage by Anonymous Coward · · Score: 0

    With corporations getting sued and having their own emails used against them in court, shouldn't they be destroying old email, not saving it?

  10. Use standard units people understand. by jd · · Score: 3, Funny

    Things like Libraries of Congress, Libraries of Alexandria, Spams per Square Inch. You know, the units that people have become familiar with. Besides which, are they power-two gigagytes or SI gigabytes? Also, how much bandwidth is needed to shift all that data? In the standard Imperial units of Clay Tablets per German Juggernaut per unit of French motorway, naturally.

    --
    It's a small world and it smells funny; I'd buy another if it wasn't for the money; Take back what I paid (SoM)
  11. Surprising . . . by cashman73 · · Score: 3, Insightful
    That 90% of that 27,000 petabyte figure isn't for archiving p0rn,... Although I guess, from the corporate IT perspective, they're not worried about backing up p0rn, since most people probably don't do that at work.

    But it is mostly email they're talking about here, and I bet a HUGE part of this archiving is:

    1. spam
    2. Email forwards that have been sent 1,000 times that still have all the original message headers attached
    3. Non-business-related multimedia emails sent by administrative assistants using the company's email and time to send and receive cutesy messages from/to their family & friends
    4. Business-related powerpoint and multimedia emails by non-techie PHBs that don't know how to transfer such files via FTP, and who are too damn lazy to use a thumbdrive

    Yep! Solve problems 1-3, and you'd vastly decrease the amount of email that you have to archive! I won't complain about #4, since I actually value my job, but it would be nice if more PHBs knew more about tech,...

    1. Re:Surprising . . . by houghi · · Score: 2, Insightful

      About 4. I do not understand management where I am.

      I make several excel files every week for reporting. They are located on a shared drive. Only extra data is added every monday, yet instead of puting a link to the files, or the directory, management wants me to send them by email every week to several people.

      Utterly stupid, if you ask me.

      --
      Don't fight for your country, if your country does not fight for you.
    2. Re:Surprising . . . by ZorbaTHut · · Score: 1

      The directory is backed up and version-controlled, right?

      Because if not, that might be an (admittedly crummy) attempt at a backup system.

      --
      Breaking Into the Industry - A development log about starting a game studio.
    3. Re:Surprising . . . by igny · · Score: 1

      My way to archive my email is of course version controlled. Every month I just archive my inbox, date it, and send it to myself via email.

      --
      In theory there is no difference between theory and practice. In practice there is. - Yogi Berra
  12. Practical Internet Groupware by mangu · · Score: 1

    Users in a lot of places use their email as a document management system. This is somewhat effective on an individual basis, but in large organizations shared documents get duplicated dozens or even hundreds of times

    That's exactly the message of this book. Email, although widely used, is neither practical nor effective as a means of divulging information in a company. And duplication of information is the lesser problem.


    For instance, suppose someone leaves the company, either permanently or in a vacation, and somebody else takes over a job. How do you transfer the relevant information to the substitute? Forward several dozens of emails and hope it makes sense? What if Alice forwards an email to Bob but not to Charlie, how do you make sure everybody in the project has access to all the relevant information?


    Email and http are widely used because they are widely available, but neither of them is a very good solution for information handling.

    1. Re:Practical Internet Groupware by TooMuchToDo · · Score: 1

      Email is good for communication, a company wiki (or other sort of document/information management system that is web-based) is good for knowledge storage/retrieval/transfer.

  13. For Fucks sake by Colin+Smith · · Score: 2, Insightful

    Just delete the crap.

    --
    Deleted
    1. Re:For Fucks sake by Smordnys+s'regrepsA · · Score: 1

      Old, but little known axiom - "One Man's Crap is Another Man's...Fetish?"

      --
      Just -1, Troll talking to another.
  14. Yep by krunchyfrog · · Score: 0

    That's my pr0n collection allright.

    --
    printf($randomline(sigs.txt) \n "-- "$randomline(authors.txt));
    -- myself
  15. 30 million emails? by cashman73 · · Score: 1
    30 million emails went through the pitt.edu email servers last year, and my account there didn't get squat during the christmas break! I wonder where all the email is going? Although the university is closed anyway, so that might have something to do with it,...

    I suppose if I was crazy enough, I'd post my address here on slashdot to see if we can slashdot Pitt's email servers,... maybe we can turn 30 million messages into 60 million messages. On second thought, I don't want 30 million messages,... ;-)

    1. Re:30 million emails? by JustNiz · · Score: 1

      It sounds like someone might be using your servers for sending/forwarding spam. Your system might be telling us all how we can "improve ur p3n1s size" or "help Dr. mbongo from Burkina-faso move $99999999 into your account".

  16. Wow, welfare for programmers... by tjstork · · Score: 1

    So now, SOX and new discovery rules have created welfare for programmers. What value is all of this e-mail? The bulk of it is worthless and the cost of this is a huge drain on the economy. How many disk drives does it take to store 27 Ebs, and how many people will it take to manage it all?

    --
    This is my sig.
    1. Re:Wow, welfare for programmers... by phoebusQ · · Score: 3, Interesting

      How do you figure that storage needs driving the increase in disk capacities and creating jobs is "a huge drain on the economy"?

      And what do data-archiving rules have to do with welfare for programmers? Maybe for disk manufacturing firms or data admins, but programmers?

    2. Re:Wow, welfare for programmers... by tjstork · · Score: 1

      How do you figure that storage needs driving the increase in disk capacities and creating jobs is "a huge drain on the economy"?

      We wouldn't need to store the data except for government intervention. So, instead of companies investing in their actual products, such as making better cars and airplanes, they are investing in something that adds no value to the product whatsover. The result is a transferrance of wealth to computer people but without any consumer benefit. In other words, its welfare for computer people.

      --
      This is my sig.
    3. Re:Wow, welfare for programmers... by phoebusQ · · Score: 1

      I disagree.

      We don't have a complete enough picture of the effects of data storage requirements. First, they may have some economic benefits. Second, it seems unlikely that the costs are so massive that they have any serious impact on bottom-line product development. Third, welfare would imply that there was no productive benefit caused by these "computer people", which we know is untrue.

    4. Re:Wow, welfare for programmers... by tjstork · · Score: 1

      We don't have a complete enough picture of the effects of data storage requirements. First, they may have some economic benefits. Second, it seems unlikely that the costs are so massive that they have any serious impact on bottom-line product development. Third, welfare would imply that there was no productive benefit caused by these "computer people", which we know is untrue

      Assuming a cost of $2 per gigabyte year, based on a rough quote from An online storage service, then, we're really talking about is a sunk cost of at least 55 billion dollars a year to store this data, and to what productive end? You are talking about spending 50 billion dollars, a year, or, to put it another way, a trillion dollars over 20 years, to store a bunch of old emails, solely because it might be worth something. Truth be told, the actual cost of this would probably be more like around 100 billion a year, if not more, because you are going to need a lot of services to ensure that the data is properly searchable, has established program practices and audit trails, and so forth. So, we're basically talking about as much as we're spending on the War in Iraq, just to keep junk email around. I'm not seeing the utility. This is welfare, pure and simple, a ton of money for no benefit.

      --
      This is my sig.
    5. Re:Wow, welfare for programmers... by phoebusQ · · Score: 1

      First, 55 billion is only about .4% of the GDP, even if that were accurate (which it isn't, as you're numbers assume 100% increase or more each year). Second, your numbers assume that there are no offsetting benefits, which again isn't accurate. Third, your definition of welfare is incorrect. I agree that storing much of this email is probably a waste of time and money, but that doesn't make it "welfare".

    6. Re:Wow, welfare for programmers... by tjstork · · Score: 1

      (which it isn't, as you're numbers assume 100% increase or more each year)

      No, my numbers factor in annual maintenance costs. You can't just look at the cost of a hard drive. You have to look at the annual cost of getting a service of storage in $/GB. I chose an online storage provider to see the costs, and used their largest bulk rate for enterprise level storage services. I would expect that businesses will actually pay even more than this, as, they won't have the economies of scale to match what a storage provider can do. However, the existence of businesses charging a certain rate for a storage service is perhaps the best idea of its genuine cost.

      Second, your numbers assume that there are no offsetting benefits, which again isn't accurate.

      It's very accurate, particularly, since you have not given any yourself!

      Third, your definition of welfare is incorrect. I agree that storing much of this email is probably a waste of time and money, but that doesn't make it "welfare".

      Welfare is transfer payments from a productive end to an unproductive end. Therefor, mandating email storage is as much welfare as a railroad being required to pay to have a fireman (guy that shovels coal), on a diesel engine.

      --
      This is my sig.
    7. Re:Wow, welfare for programmers... by Repossessed · · Score: 1

      But storage isn't a linear cost. It doesn't cost me substantially more to store 10 GB versus 400 GB. All of the time, labor, and other associated issues with backing up are fairly linear. The only difference between storing everything I need, and everything I have, is the cost of a 500 GB hard drive, less the cost of a 80 GB hard drive.

      I'm in the process of putting together backup functions for my home network right now. Almost all of my backup costs are incurred getting networking and servers (And power, though green computing tech is helping there). Now, all of this is going to average me less than a dollar a gig, though I'm not incurring the labor costs or redundancy of a business network, so this isn't quite the same as corporate backup. But if I made a bare minimum system that could only backup the things I need instead of want (or maybe just isn't trying to hold my FLAC collection) I'd only save maybe 80 dollars. If I was buying high reliability and redundant drives, that extra 400 GB (Which is what a post mentioned as the size of the email archives for his company) might run as high as 400 dollars.

      --
      Liberte, Egalite, Fraternite (TM)
    8. Re:Wow, welfare for programmers... by Bryansix · · Score: 1

      How do you figure that storage needs driving the increase in disk capacities and creating jobs is "a huge drain on the economy"?
      We are not talking about Storage needs here. We are talking about stuff you have to keep around because of regulations on big business but which have no value, add no value to the company, and most of all add no value to the product or service the company sells. The point is that all of this takes away from profit margins and so it is a "Drain on the Economy".
  17. how much is surveillance data? by petes_PoV · · Score: 2, Interesting
    E-mail growth accounts for much of that figure

    And a great deal of video archive from CCTV as well I expect.
    The question that arises is how would you index all this?

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
    1. Re:how much is surveillance data? by statemachine · · Score: 1

      And a great deal of video archive from CCTV as well I expect.
      The question that arises is how would you index all this?


      By time. And then you can go by difference and then by motion.

      You could even have a second pass running that picks out faces and objects. These can then be compared to another database of similar faces and objects. All of these would then also be stored with references back to the original video.

      It can be as simple or as complicated as you want. The technology exists today (and I'm sure is being used somewhere).

  18. This is starting to be Manditory by Smordnys+s'regrepsA · · Score: 3, Funny

    Only 27,000 petabytes? n00b!

    My pr0n collection takes at least 3 Internets* to store, archived.


    *(sorry, forgot the conversion rate for Libraries of Congress)

    --
    Just -1, Troll talking to another.
    1. Re:This is starting to be Manditory by Kjella · · Score: 1

      I must say that's really impressive without connecting to the Internet. After all, if you did you'd be part of Internet and thus it'd be bigger than you.

      --
      Live today, because you never know what tomorrow brings
  19. Moving away from Big Iron? by HockeyPuck · · Score: 2, Funny
    FTFA:

    Mounting interest in these approaches highlights a pronounced shift away from "big-iron storage" - traditional storage arrays typically composed of custom application-specific integrated circuits, RAID controllers, and fixed-disk and cache-scalability ceilings. Now TFA goes on to say customers are turning towards Network Appliance as a company that uses COTS parts and software. They use an intel CPU and FC/GigE adapters from other vendors, but I wouldn't call them 100% COTS. It's not like it's a generic PC built from FRYS with JBOD on the back.

    NetApp is a great company and makes a great product aimed for a specific market segment: Fileservices (NFS/CIFS). I don't see many customers tossing out the EMC DMX, HDS Tagmastore or IBM Shark for a FC enabled netapp array. I also don't see a lot of FICON shops asking netapp to support FICON.

    Now the phase storage mgmt is entering is the 'good enough' phase. Does my organization need the current generation of "high end" arrays? Maybe not. The current generation of midrange with its better or cheaper $/GB and increasingly parallel featureset to the highend arrays, is starting to looking more attractive to many customers.
    1. Re:Moving away from Big Iron? by phoebusQ · · Score: 2, Insightful

      FTFA, RAID, TFA, COTS, CPU, FC, GigE, FRYS, JBOD, CFS, CIFS, EMC, DMX, HDS, IBM, FC, FICON... 17+ acronyms in one post...that's pretty impressive. Do you kiss your mother with that mouth? :)

    2. Re:Moving away from Big Iron? by HockeyPuck · · Score: 3, Funny

      FRYS isn't an acronym... :)

      and yes I do.

    3. Re:Moving away from Big Iron? by DMUTPeregrine · · Score: 1

      From the F*cking article, Redundant Array of Inexpensive Disks, The F*cking article, Commercial Off-The-Shelf, Central Processing Unit, Fiber Channel, Gigabit Ethernet, Fry's Electronics (not an acronym), Just a Bunch Of Disks, Caching File System, Common Internet File System, Electro-Magnetic Compatibility, DataMining Extensions, Hierarchical Data System, International Business Machines, Fiber Channel (again), FIber CONnectivity. Didn't Read The F*cking Article (RTFA) yet, so some acronyms with more than 1 definition may be using the wrong one....

      --
      Not a sentence!
    4. Re:Moving away from Big Iron? by HockeyPuck · · Score: 3, Funny

      We're talking storage (sorry DASD) here... It's all about...

      Hooking up a pair of EMC DMX's (or IBM ESSes, or HDS USPs) over a pair of OC48s for SRDF/PPRC/USR unless you are a zOS shop, then you could run XRC. Since this is a BC/DR plan, we'll run it over FCIP protected by IPSec over a DWDM leased line, which must be protected by a UPSR/BLSR, otherwise in the event of a link failure, the R1s will split from the R2s.

      Then you're SOL.

    5. Re:Moving away from Big Iron? by nprz · · Score: 1

      EMC != Electro-Magnetic Compatibility in this context. EMC DMX as in http://www.emc.com/products/systems/DMX_series.jsp

    6. Re:Moving away from Big Iron? by Awod · · Score: 1

      Every morning when I emerge from the basemen.............. erm, go visit her for the holidays..

    7. Re:Moving away from Big Iron? by chuckymonkey · · Score: 1

      (YMMBH)You make monkey's brain hurt. (ITTTEMA)I take time to explain my acronyms.

      --
      "Some books contain the machinery required to create and sustain universes."-Tycho
  20. time to buy EMC stock! by mgranit11 · · Score: 1

    Thats what i will be doing!

  21. and 26.5 exabytes are porn by plasmacutter · · Score: 1

    and 26.7 exabytes are dedicated to porn storage!

    thank you! i'll be here all week!

    hey was that rotten fruit! HEY! SECURITY!

    --
    VLC FOR MAC IS DYING! IF YOU DEVELOP, PLEASE SAVE IT!!
  22. will someone think of the kids! by metamorfoza · · Score: 5, Funny

    Does it bother you that much that these journalists want to make it easier for the general public to understand how big data storage they are talking about?

    I agree. However, I would go even further and instead of using geekish bytes and bits we should use something like 400 billions of mp3s. You know, so that myspace user out there can understand TFA. They clearly have interest in this sort of news.

    1. Re:will someone think of the kids! by fellip_nectar · · Score: 1

      Or, for the Slashdot reader, how many billion choked chickens of pr0n it equates to.

      --
      Worst. Signature. Ever.
    2. Re:will someone think of the kids! by Anonymous Coward · · Score: 0

      Yes but could you somehow translate that into the standard unit of measure, the football field? Like say, 10 trillion football fields laid edge to edge with pages from the Library of Congress?

    3. Re:will someone think of the kids! by Deliveranc3 · · Score: 1

      Back in my day we had a few, some and lots.

      And by golly that's few words it's almost some words!

  23. Redundant Data by tm8992 · · Score: 2, Interesting

    I wonder how much of this data is really redundant--copies of other data. How many emails can really be unique? How many employees download the same video a hundred times on the company's server? As network speeds increase, it will be less necessary for multiple users to store the same thing (think streaming those videos), so could this really be an exaggeration of future storage requirements? Could a better system be designed to minimize redundancy?

  24. If designers still optimized their images by Anonymous Coward · · Score: 0

    If designers still optimized their images down from 50k to 15k instead of flirting with the design hotties and smearing poop on other peoples keyboards this might not be a problem.

  25. nibbles! by garlicbready · · Score: 1

    personally I prefer nibbles (4bits each or 1/2 a byte used with old parallel ports)
    to make the numbers look bigger
    working under the assumption of 1024 to the power of 6

    2,305,843,009,213,693,952 nibbles of information
    now that's a lot a chewin

    1. Re:nibbles! by broggyr · · Score: 2, Funny

      I thought half of a byte was called a nybble...

      --
      Irony? Yea, it's like goldy and bronzy, only it's made of iron!
  26. it must be Seagate by Anonymous Coward · · Score: 0

    Shit, is it all Pr0n ? :-P

  27. So that's about 20 billion gigabytes of data... by LordHuggington · · Score: 2, Insightful

    that will be lost or stolen as company employees fail to properly encrypt back-ups, leave laptops in their car while running in for a latte or some such? Seriously, though, the article says storage is corporations' number 2 concern. What's number one from this survey? Is it security?

  28. Single Instance Storage by Anonymous Coward · · Score: 0

    Microsoft implemented something called Single Instance Storage (SIS) with Windows 2000 and 2003 (http://research.microsoft.com/sn/Farsite/WSS2000.pdf).

    If it weren't quite so cryptic to implement and use it would probably help reduce some of the problem.

  29. a helpful reference page for large numbers by HappyEngineer · · Score: 4, Interesting

    Here is my helpful reference page for big numbers. I love big numbers. I'm actually working on a site right now which will help people to visualize big numbers. I can't give out the url yet because it'll be another month or two before it's ready to be seen. But, it'll have many fun options like Cow Stacking and Hamster Canyon.

    Cow stacking is where you select cow as the animal and from earth to moon as the place and you'll see a graphic of cows being stacked to the moon and the number of cows which would be required to complete that stack.

    Hamster Canyon will be where you select a hamster and the Grand Canyon and you'll see a picture of the Grand Canyon filled with hamsters and a number that indicates the total number of hamsters required to fill the canyon.

    1. Re:a helpful reference page for large numbers by Anonymous Coward · · Score: 3, Funny

      Hamster Canyon will be where you select a hamster and the Grand Canyon and you'll see a picture of the Grand Canyon filled with hamsters and a number that indicates the total number of hamsters required to fill the canyon. That's much better than Libraries of Congress. Most people haven't even seen the Library of Congress, but who hasn't seen huge piles of hamsters?
    2. Re:a helpful reference page for large numbers by HappyEngineer · · Score: 1

      *laugh* That's actually a good quote. I think I'll use that on the site.

  30. Foreigners in their own country by Anonymous Coward · · Score: 0

    First of all, almost all elements used to build these laptops are belonging to somewhere else.

    The components are possibly Chinese. the ideas are possible brewed from open source (a good concept, but a salad in the end, for this very same argument explained in here). Many of the "teaching contents" ain't local (with this, you know, local is "Local" in every place is a different culture-animals-religions-traditions-dictators-martyrs-heroes-ECONOMY).

    After all, you start trying to give people a better education, but in the process, transform them in aliens, individuals separated from them own reality, and context (i think their context is being abused, since centuries, and robed, and being utilized, and being the last defecating end of giga-planet monopolies-mafias).

    So, what happens if you "create" a "global" child in that medium?? usually chaos (think some of those lands are in chaos at this moment), and the necessity of "global people" to rescue them. (finally you generate a Trojan, more chaos, and local monsters that defend their land from the foreigners (attacks they think)).

    So, in the end, OLPC, can do, to its maximum extent, provide a "transparent structure", to which, every land would fill with their history, and what they got in their blood.

    BUT, HOW.

    how can you override the material from which the laptop is made?, necessary evil some will say?

    Most directly, people in the countries DON'T need, (nor needed in the past), computers.

    They need peace.

    They need the time and space to learn from their elders, to heal, and to cultivate the land. to learn from it, to recall what is which this land produces, and how you should take care of it.

    All that, is not in a computer (although you can document it, its not advised), is in their will.

    Introducing a big factory, the marked economy, in this lands that CANT TAKE THEM, that dont have that history.

    Or SENDING THEM WEAPONS, WONT HELP, them achieve the reconciliation, the healing, or the sustainable growing their own land needs.

    Even complaining and cursing, saying they ain't good people to do business with, is NOT WHAT THEY NEED. I mean, that does only harm.

    In the end, interventionism, generates a monster.

    But.. why is the aggressive reaction occurring in this lands? why is people "hunting" each other there?
    Is it because of interventionism and the aliens "global culture" generates? (read: we are all living in america)
    Is it because of the big factories emplaced in this poor lands? (poor in currency)
    Is this because of the social strait stairs that the big factories/market economy generates to be able to "participate" in this economy?
    Is this because of the intervention of mafia/monopoly interfering them to consolidate and consume those lands/people?
    is this because they are CONSUMING PRODUCTS THAT ARE NOT FROM THEIR OWN LAND? (which generates another type of alien).

    In part, those are stuff negroponte didnt took into account, when tracing his plan.

    and are stuff market economy will never think about. If they would think of that, they couldnt destroy and colonize new lands. (read some resentment there).

  31. Why, the answer is simple: by Anonymous Coward · · Score: 0

    Google.

  32. For you C64 users. by Joe+U · · Score: 1

    That's at least 662,257,761,200,000,000,000 nybbles! (roughly) You may need extra floppies.

  33. no need to check for redundancy by Anonymous Coward · · Score: 1, Funny

    just compress it with 7ZA and the 27 exab's should come down to about 640KB or so.

  34. The solution is data compression by careysb · · Score: 2, Funny

    Just ZIP up the data to a smaller zip file. Then zip the zip file to and even smaller zip file. Repeat until all your data is compressed into a couple of megs. :-)

  35. Not Plus 2 Informative by Anonymous Coward · · Score: 0

    Come On People ... That was Plus 5 Funny.

  36. Summary is too technical. by Anonymous Coward · · Score: 0

    I mean, gigabytes, what kind of unit is that? Is this some sort of Star Trek reference? I need to know how much data that is in *real* terms, like songs, pictures and libraries of congress.

  37. The solution was available a decade ago by commodoresloat · · Score: 1

    Unfortunately the idea was crushed by ruthlessly greedy band of small-minded bloodsuckers with large legal staffs.

    They called it "Napster."

  38. Email Squared by NetSettler · · Score: 1

    From the summary:
    "E-mail growth accounts for much of that figure."
    We're archiving spam?

    Ignoring even the spam issue, there's also the issue that Outlook encourages people to include the previous message in its entirety, causing an O(n^2) effect for legitimate message chains; that is, every message in a conversation tends to include all previous messages. This not only increases archival size, but it also causes mailboxes to approach their seemingly arbitrary upper bound on mailbox size much more rapidly than seems necessary.

    It's a good example of how a single bad design decision can have amazingly multiplied consequences. If nothing else, you'd think Microsoft and other tools for managing email could explore having better tools for noticing and offering to remove the redundancy.

    --

    Kent M Pitman
    Philosopher, Technologist, Writer

    1. Re:Email Squared by ConceptJunkie · · Score: 1

      Too bad Microsoft doesn't have a research department where some of the many boffins who work for them could solve some of these interesting problems and provide useful technology. Oh, wait, they do. Too bad nothing from Microsoft research ever seems to see the real world. MS Management would never go for selling software that people actually want. Only loser companies who aren't monopolies do that.

      The biggest problem I found with Outlook is that its performance is O(n^2) based on the number of messages in the folder. Or maybe the bug where it starts randomly losing and corrupting data after the mail store gets over something like 1.5GB? Did they ever fix that? Did they ever even acknowledge that problem? I have never seen software that works so hideously and yet is considered a real product that people would actually choose to use. Oh, wait, yes I have, it's called "Word".

      Microsoft only cares about maintaining their monopoly. They stopped being a software company about 10 years ago.

      --
      You are in a maze of twisty little passages, all alike.
  39. Re: I need a SB of storage by Douglas+Goodall · · Score: 2, Funny

    You know, a SaganByte of storage. It would have to store Billions of billions of bytes.

  40. jigga what!? by jmickle · · Score: 1

    DID SOMEBODY SAY PRECISELY 88 MILES PER HOUR!?// Im going back in time baby!

  41. Only??? by Anonymous Coward · · Score: 0

    Sorry but that number seems quite small to me. I bet quite a few exabytes slipped through the cracks.

  42. No worries... by tdj114 · · Score: 1

    For those of you still thinking in the present and near-future (2010 is considered near-future in this case), stop it. It's bad for personal welfare and certainly a negative characteristic to have in the tech industry. Myself, on the other hand, prefer to operate 7-10 years ahead of the present and offer the following for your own edification:

    http://gizmodo.com/gadgets/bell-curve/google-sees-the-world-in-an-ipod-by-2020-333439.php

    Certainly with the ability to save all the world's content, archiving all of your orgs data to your iPod won't be a problem.

  43. The 21st Century Dark Ages by StCredZero · · Score: 1

    In Charles Stross' book Glasshouse, the early 21st century is considered by the future one of the "Dark Ages" because of our use of proprietary formats and ephemeral storage media.

    I suspect he's onto something!

  44. p0rn 365/24 for everyone on the planet by peter303 · · Score: 1

    Silly

  45. Even Worse ... by PPH · · Score: 2, Funny

    ... most of this will be documents in formats older than Office 2003.

    --
    Have gnu, will travel.
  46. The probability approaches 1... by jabelli · · Score: 1

    ...immediately after you delete the email or file.

  47. You probably already can. by LWATCDR · · Score: 1

    Business data really is pretty small. It really is just text for the most part.
    Even if you start to scan every document 500 gigabytes is going to be a lot of documents.
    Most servers I bet are pretty small compared to what people are using at home. You just don't need to store video or even a lot of audio in most businesses.
    Of course this doesn't apply to video production houses, print shops, or any places that actually deals with a lot of media data.
    I know that my companies customer database is under one gigabyte in size. The accounting data is probably not a lot more, and our document management system is under 100 gigabytes. So yes most of our data could fit on a 160 gigabyte iPod.

    --
    See my blog http://ilovecookes.blogspot.com/ for light hearted technical information.
  48. Wrong. by tacokill · · Score: 1

    It's not ALL companies as you state in your post. Regulations requiring e-mail archives are only for publicly traded companies (ie: on the stock exchanges). Private companies have no such requirement.

  49. genome... by SharpFang · · Score: 1

    ...and to think human genome is just a puny 800MB.

    --
    45 5F E1 04 22 CA 29 C4 93 3F 95 05 2B 79 2A B2
  50. Help me ... I don't understand ... by OricAtmos48K · · Score: 1

    So how library of congress is above figure ME FAILED

  51. ZFS by kildurin · · Score: 1

    So that's why Sun created ZFS. Doesn't even begin to fill up a Zetabyte file system.