Slashdot Mirror


To Purge Or Not To Purge Your Data

Lucas123 writes "The average company pays from $1 million to $3 million per terabyte of data during legal e-discovery. The average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up — so a 5,000-worker company will pay out $1.25 million for five years of storage. So while you need to pay attention to retaining data for business and legal requirements, experts say you also need to be keeping less, according to a story on Computerworld. The problem is, most organizations hang on to more data than they need, for much longer than they should. 'Many people would prefer to throw technology at the problem than address it at a business level by making changes in policies and processes.'"

190 comments

  1. Easier to keep by Geoffrey.landis · · Score: 5, Insightful

    The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping.

    --
    http://www.geoffreylandis.com
    1. Re:Easier to keep by Daimanta · · Score: 5, Insightful

      True, proper archiving takes huge amounts of time since it adds overhead to your operation.

      In an ideal world, everything that you store is automatically labeled and old data will automagically be purged. But storing all kinds of shit is just that much easier. It also doesn't help that data storage is so dirtcheap. 1TB can be bought for around $100 if I am not mistaken. It doesn't pay to kill old useless stuff you have floating on your hard disk.

      --
      Knowledge is power. Knowledge shared is power lost.
    2. Re:Easier to keep by Sobrique · · Score: 4, Insightful
      Add to that legal requirements of retention - you'll need to filter your 'customer communications' from your 'shopping lists'. That's what actually makes this a nuisance - the possibility that there will be legal action in 5 years time, that you'll need to fight.

      Yes, less data need to be kept, but first there needs to be a _massive_ re-education of the 'data packrat' culture that the users of it have.

    3. Re:Easier to keep by sunking2 · · Score: 4, Insightful

      Cheaper to keep. Every hour I waste cleaning house costs more than it does to keep it stored. Storage continues to get cheaper, salaries typically don't. Sure, that $1.25M is a big scary number. But nothing compared to the salaries/benefits at a 5000 person company. Now you can argue the cost of data retrieval goes way up because chances are it'll take a hell of a lot longer to find, but that's a different argument altogether and you can just as easily question what the cost of not being able to recover something that was cleaned by accident is.

    4. Re:Easier to keep by zappepcs · · Score: 2, Insightful

      The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping or training staff to organize their data and retain only that which is necessary .

      There, fixed that for you. Meta-tags and other efforts might change this in the future, but until there is a generalized understanding of things that should be archived and things that should not, and a better way to store, find, retrieve, and utilize company data, there will be tons of data saved that really should not be. Humans are like that.

    5. Re:Easier to keep by daeg · · Score: 5, Insightful

      The bigger problem is that you will fight different battles. If you're fighting a sales rep that sold your clients to a competitor, you want as much ammunition as possible. If a client is suing you for incorrect information relayed 8 years ago and you're probably guilty, you want as little information as possible.

    6. Re:Easier to keep by COMON$ · · Score: 3, Interesting
      What I want to know is how these numbers are broken down. $5 per gigabyte to back up? Maybe if you factor in the cost of a robotic library. Considering that tapes currently run about $30 a pop for for 800GB and that I am on a 12 month rotation, I still don't come NEAR that price. 1.25 million for a 5000 person company? What kind of company? 10GB average is about 9GB over my average user here. Even when I worked at a larger company, we still weren't even breaching 700MB average INCLUDING e-mail.

      Lovely scaremongering, but what did they mean by legal e-discovery? The time it takes to sort through the data or what?

      --
      CS: It is all sink or swim...oh and did I mention there are sharks in that water?
    7. Re:Easier to keep by hesaigo999ca · · Score: 1

      I agree, I used to work for a company who was keeping all their documents for the past 7 years and they had a warehouse full, (importing and exporting) in paper, they wanted to digitize it all.
      However they had the usual "archive it" attitude, well they had documents about everything from
      everyone in double and triple....to say the least, they probably would never have needed to do this had they kept a better handle of organizing what was being kept, even if it was in paper format.

    8. Re:Easier to keep by Lord+Ender · · Score: 1

      Exactly. If it takes me two hours per week to sort through every bit of my data and decide what to pitch, that cost has to be compared to the archival cost to decide whether it is a worthwhile endeavor.

      Of course, at my office, we just bought a server and a controller with 16 SATA ports, filled the sucker up with off-the-shelf 500GB disks, and built a 7TB RAID6 using Linux software RAID. The whole job only cost about $2k, and we no longer waste any time deciding what to delete and what to keep.

      --
      A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
    9. Re:Easier to keep by Chrisq · · Score: 3, Interesting

      We went paperless, and when application forms, etc. arrive they are scanned and stored. Examination of the data shown that very often people would print out all the existing infromation on a customer and add it to the pile sent for scanning.

      Result, look up a customer and you would find some files scanned half a dozen times.

    10. Re:Easier to keep by TheRaven64 · · Score: 2, Insightful
      The $5 presumably includes the physical media, the backup operator's time spent configuring the system, the hardware for performing the backup, and the safe, secure, off-site storage costs. 10GB per years is a lot more than I produce - my PhD was only 1.5GB in total, including temporary files (build cruft and so on), with only 210MB needed for the subversion repository (176MB after bzip2) - the bzip2'd repository of my book (including all text and code examples) is only 4.6MB. My mail folder is only 3GB, and that contains over ten years of email messages (and would compress very well).

      On the other hand, I don't use Word, which manages to make single-page documents that are more or less plain text take up a few MBs. If you're in a company where everyone sends Word document attachments as emails instead of plain text (I've seen it done[1]) then you could probably generate 10-20MB of date per day from around 5KB of actual content, and backing this up might be cheaper than educating your users. Assuming some other work as well as emails this can easily get to 10GB.

      [1] Even worse was my publisher, who sent me a scanned version of a contract as a Word document. A PNG of the same image was around 100KB, while the word document was 5MB and contained nothing other than the image. A lot of people just treat Word documents as a default container format for any content.

      --
      I am TheRaven on Soylent News
    11. Re:Easier to keep by BobMcD · · Score: 3, Interesting

      you'll need to filter your 'customer communications' from your 'shopping lists'

      Actually, I thought it was a fairly common legal tactic to make the data as difficult to actually find as possible, without revealing too much to the other side.

      "They want records from three years ago? Send a truck with printouts of all the files we have, that'll keep them busy..."

      Does anyone know that this is no longer the case?

    12. Re:Easier to keep by vvaduva · · Score: 3, Insightful

      Well, I did not RTFA in detail but it does not seem to address key regulations like HIPAA and SOX which put hard numbers on data retention. So whether or not it's expensive, you have to do it if you want to be legit. If the issue is discovery, a sound archival system will eliminate expenses related to discovery and would allow one to provide requested information very quickly and efficiently. I say let the legal people fight discovery requests and unless you have something to hide, stick with the requirements for archival and retention. The argument "the less you keep the less they ask for" is simply stupid. In certain SOX-related situations, even the appearance of impropriety will come back to bite you, so I always tell folks to do the right thing, by running your business properly, identifying document types correctly and sticking to regulatory requirement as much as possible.

    13. Re:Easier to keep by Geoffrey.landis · · Score: 2, Interesting

      The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping or training staff to organize their data and retain only that which is necessary .

      There, fixed that for you.

      According to the original article, ("The average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up ") the cost of backups is fifty dollars a year per employee.

      So if that an average employee costs the company $100 per hour (including overhead), then if "training training staff to organize their data and retain only that which is necessary" takes more than half an hour per year, it's more cost effective to archive the junk than it is to train the employees to sort it.

      --
      http://www.geoffreylandis.com
    14. Re:Easier to keep by Anonymous Coward · · Score: 0

      1TB can be bought for around $100 if I am not mistaken.

      Whoa there! Yeah, you can probably score a 1 TB disk on sale for a $100 for your desktop but not for enterprise level stuff. Even if I can get a 1 TB disk for a server or storage device I still need to get another 2 or 3 for RAID and possibly an online spare.

    15. Re:Easier to keep by cmause · · Score: 5, Interesting
      There used to be a sort of gentlemen's agreement between attorneys to not dig in to electronically stored information (ESI). That was back when everything important ended up on paper anyway, which was discoverable.

      As time went on, fewer things ended up on paper, but the rules of discovery didn't evolve. That was the time of backing up a U-Haul full of printed out copies of every file, e-mail, etc. that a company had. Now the opposition had to dig through mounds of trash in the hopes that they will find that one incriminating document.

      Then attorneys got more savvy, and in the so-called Rule 26 (refers to the Federal Rules of Civil Procedure), the attorneys would agree on the format of ESI to be exchanged. In December, 2006, the Federal Rules of Civil Procedure changed to directly address ESI and electronic discovery.

      Now, in litigation, parties may still get obnoxious amounts of data, but it's electronic. Once it's processed and converted (usually to TIFFs with extracted text, but sometimes PDF), attorneys can do what amounts to a Google search through the files and find what they want pretty quickly. In fact, paper documents are usually scanned and OCRed so they can be handled and searched in the same manner.

      Actually, I thought it was a fairly common legal tactic to make the data as difficult to actually find as possible, without revealing too much to the other side.

      "They want records from three years ago? Send a truck with printouts of all the files we have, that'll keep them busy..."

      Does anyone know that this is no longer the case?

      So no, it's no longer the case. But the first guy who did it must have thought he was pretty funny.

    16. Re:Easier to keep by euri.ca · · Score: 1

      Yeah, I was a little skeptical of the line "at $5 a gigabyte" line.

      Ignoring any cost savings in the future, if 1T=$100 (which is pretty close in USD) then they are planning on replicating their data 50 times, which is redundantly redundant.

      (Not to mention that most of that 10Gb feature will be sending the same powerpoint presentation back and forth 100 times and will compress fantastically. Users aren't actually typing 2 billion words every year in emails.)

    17. Re:Easier to keep by PietjeJantje · · Score: 1

      I find it surprising that this issue is simplified to cost of storage. As others noted, who cares about the cost/employee for storage. What's much more important is the cost of information retrieval. I'd like to make a comparison with paper storage, because much research has been done there to cut costs. So putting aside physical storage costs, if you store all crap for a while, the storage just becomes a black hole where nothing can be found back. Storing crap is human nature, a "what if I need this document in two years?" even if the chance is very small. However, it turned out it's -much- more cost efficient to indiscriminately nuke stuff that wasn't labeled vital after not being referenced for three months (your mileage may vary). In those few cases where old documents were needed, the combined cost to reproduce them is much lower. In the meantime, you have a relative very clean and light information store where you can easily find things that are relevant to what you're doing and have recently been doing.

    18. Re:Easier to keep by Anonymous Coward · · Score: 0

      Cheaper to keep. Every hour I waste cleaning house costs more than it does to keep it stored.

      The Messiest Home in the Country 2

      More videos at Clean House.

    19. Re:Easier to keep by guruevi · · Score: 2, Informative

      1) This is the average. Your company might have 700MB/user, in my organization, it's close to 1TB/user/year that gets added. We're doing medical imaging.

      2) It's not just tape libraries. The cost for D2D2T or D2D2D (what we're doing) goes way up compared to a 'simple' backup scheme. Especially if you're like us and require mulitple gigabit streams, disk storage can't be just 4 cheap SATA disks in RAID5. We have 2 storage arrays with 14 drives each for general access and another storage array with 10 SATA disks for primary backup and those things don't come very cheap especially since you need multiple servers to handle the load.

      3) Encryption, tape rotation or multiple locations add to the costs.

      4) If you're buying a solution eg. from IBM (Tivoli), you need to pay for a consultant and/or another employee to get that stuff running. We're doing what we're doing with open source and it's going well, but if you can't and need to pay for software, it adds up (especially for Windows systems)

      --
      Custom electronics and digital signage for your business: www.evcircuits.com
    20. Re:Easier to keep by corbettw · · Score: 0

      If you're cleaning up that much data by hand, you're doing it wrong. Set an expiration date on the data, and purge it from your systems automatically.

      --
      God invented whiskey so the Irish would not rule the world.
    21. Re:Easier to keep by geekoid · · Score: 1

      Jeez, when did you last work at a large company?
      We easily get close to 10 GB per person, and we are reasonably vigilant about it.
      Then you ahve the Total Cost of the back up. The drive(not as cheap as a home drive, but still cheap) the person receiving, the people to put a drive in, the process of managing the disk arrays, the NAS, the backing up, and insurance. Plus normal overhead.

      Legal e0discovery is time consuming becasue it needs humans involved. People may be trying to hide what they are doing in a manner that a computer search can't find.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    22. Re:Easier to keep by COMON$ · · Score: 1
      I know it is average, what I was getting at was is this an average for all businesses or what?

      When I worked in enterprise environments, my cost went up for backups but cost per GB went down. In general that is the rule I have found, in larger environments my cost per MB goes down significantly not up.

      My point boils down to this, general stats like they have above are useless because we have environments like yours where you do medical imaging, and environments like mine where we do a mixture of marketing and data processing. Even with that your price per MB varies extraordinarily based on what you are doing with that Data and where it is going. On a Texas Memory San your price per MB is through the roof. If you are backing up to a magneto Optical solution it goes up more, and if you are paying for a dedicated pipe to a DR location even more.

      But it seems to reason, that in similar environments you will get a consistent reduction per MB as data gets larger.

      --
      CS: It is all sink or swim...oh and did I mention there are sharks in that water?
    23. Re:Easier to keep by afidel · · Score: 1

      You really have no idea how a business operates, do you. Between robots and tapes a decent backup policy probably costs way MORE than $5k/year/GB. A single 1TB drive is not a business backup solution, even for a small business. Heck, looking at my last quote my enterprise storage costs ~$10/GB of usable space and my vendor is one of the cheaper ones.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    24. Re:Easier to keep by euri.ca · · Score: 2, Interesting

      Lets not get snippy here, but I think the consensus is that:

      • $5/GB is reasonable (or low) for hardcore backups like the source tree, accounting records (anything where you have a person verifying that it's there is super expensive by default)
      • 90-100% of what any typical user makes (the 5GB/year figure) doesn't (or at least shouldn't) make its way into the expensive storage. But it might anyway, because your options for backing up email easily are limited.

      Of the 30 gigs of things I've put on this laptop this year, maybe 100 megs have been checked-in to CVS (and the expensive backups), I doubt accounting and HR have generated another 4.9Gigs on me this year.

    25. Re:Easier to keep by Malevolyn · · Score: 1

      Wouldn't it be cheaper to just company-wide subscription to all those porn sites than it would be to spend $1.25 million every year backing up all the saved pictures and videos from preview pages?

      --
      Your ad here.
    26. Re:Easier to keep by jgrahn · · Score: 1

      10GB per years is a lot more than I produce - my PhD was only 1.5GB in total, including temporary files (build cruft and so on), with only 210MB needed for the subversion repository (176MB after bzip2)

      210MB is a lot. That's as large as my CVS repository, which I have added to daily for ten years or so, and which contains lots of external data too (a copy of The Great Gatsby in troff format is in there somewhere).

      On the other hand, I don't use Word, which manages to make single-page documents that are more or less plain text take up a few MBs. If you're in a company where everyone sends Word document attachments as emails instead of plain text (I've seen it done[1]) then you could probably generate 10-20MB of date per day from around 5KB of actual content, and backing this up might be cheaper than educating your users.

      Yes; all those Windows formats are a plague. Look into them and find immense, desolated areas of zero bytes. Another source of mail bloat in Windows+Exchange+Outlook places is BMP images. If you cut&paste a screenshot or something in a mail, it tends to become megabytes of uncompressed 24-bit image data.

      All this would be no problem at all if mail was compressed when sent. I cannot understand why Outlook (and other mailers) don't do that. IIRC, it's even supported by the MIME RFCs. You'd typically save 95% of the space, and the CPU time needed for compression and decompression is negligable these days.

    27. Re:Easier to keep by kmac06 · · Score: 1

      attorneys can do what amounts to a Google search

      No, they can do a search. Why compare it to Google?

    28. Re:Easier to keep by Eivind · · Score: 1

      Even that is only true if data-storage costs are constant -- or employee-data grows parallell to cost-falling. Which seems unlikely.

      Storing something for a year costs half of storing it forever, more like it, because storage-costs drop like a lead balloon and data grow.

      If I were to delete EVERY file in my home-directory that is more than 3 years old -- I'd save 15% of the space used. If I where to delete every file more than 5 years old, I'd save 4% of the space used.

      Which frankly ain't worth it.

    29. Re:Easier to keep by Anonymous Coward · · Score: 0

      Wish I'd been practicing when those "gentlemen" were alive.

      "Google searching" works best for cases where there's a simply worded smoking gun -- "Let's kill the witness," or "I don't care what the Clean Water Act says, dump it in the river."

      In other cases, though, particularly corporate crime or regulation, it's page by page by page.

      That's where you get the huge e-discovery fees. An army of lawyers, going through piles of documents, for months on end at $200, $300, or $500 an hour...

    30. Re:Easier to keep by afabbro · · Score: 1

      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.

      Cutesy, but there's no particular reason that "jury" has to come after "ballot". Needs a rewrite if you want it to retain its cuteness.

      --
      Advice: on VPS providers
    31. Re:Easier to keep by badspyro · · Score: 1

      Because Google actually supplies servers for this use in corporate entities.

    32. Re:Easier to keep by Ihmhi · · Score: 1

      Anyone consider that the real gentleman's agreement is "Don't worry about who wins or loses, let's help each other's firms rack up as many billable hours as we can?"

      Phila Lawyer helped me learn a lot about what practicing law is really like from a no bullshit standpoint, and it helped me realize that most case law is just glorified secretary work.

    33. Re:Easier to keep by mabhatter654 · · Score: 1

      but if you get sued, you need to know what you're sending out. It's to prevent "fishing" expeditions as much as anything else, ugly stuff that might not be illegal is fun to be "leaked" to make you look bad. If something is not legally required or needed for business purposes you shouldn't keep it. It's like keeping dirty laundry around... some stuff like customer complaints and employee's "mistakes" you don't want to keep on your system any longer than absolutely necessary.

    34. Re:Easier to keep by aurtherdent2000 · · Score: 1

      Hmm, the directory dump shows I've used about 50 GB of space and still have an year of PhD left. I think you're not counting in the inflatory growth of data and decrease in the cost of storing. The 1 million cost quoted for 5000 people was perhaps what would have costed in the last 5 years.

    35. Re:Easier to keep by Sobrique · · Score: 1
      Well, you can pick out key areas/keywords, and search based around timeframe or involved people, that kind of thing.

      But it's a pretty big business, especially if they take the approach we did - store all tapes and hard drives indefinitely, in a big warehouse.

      It's like the truckload of paper effect in many ways but even more inconvenient, as they can't visually inspect any more.

  2. hmm by solraith · · Score: 0

    Seems to me that companies would keep all that data just in case a legal issue came up, in order to have a leg to stand on. Lawsuits are unpredictable that way.

    1. Re:hmm by NoisySplatter · · Score: 3, Insightful

      It's not so much that you want your company to have a leg to stand on, its that you don't want your legal opposition to get their foot in the door. Innocent until proven guilty remember?

      --
      In Soviet Russia meme tires of you!
    2. Re:hmm by MrMr · · Score: 5, Interesting

      The top 500 company I worked for did just the opposite: Destroy all data in case a legal issue comes up.
      They called it 'desk cleanout day', and unless you were an official dedicated contact on a particular subject you were to wipe all correspondence of more than a year old.
      (There were also other grades of information, but erase after a year was the default).

    3. Re:hmm by Smidge204 · · Score: 1

      "Innocent until proven guilty" only applies in criminal cases. In civil cases - the kind a business is most likely to encounter - the exact opposite is typically true.

      =Smidge=

    4. Re:hmm by FourthLaw · · Score: 1

      Seems to me that companies would keep all that data just in case a legal issue came up, in order to have a leg to stand on. Lawsuits are unpredictable that way.

      Unless they have an electronic data retention policy that states data will be kept for five years. At which case it must be purged.

      Problem is, what if some manager sexually harassed an employee six years ago via email. Is it in the company's business interests for that data to be discovered?

      On the other hand, if people are exceeding company policy and keeping a personal mail archive on their user volume, and that data can be demonstrated to have a history of over five years (the policy limit), then that company is in violation of their own policy. If that can be demonstrated, then they will be hammered legally for destroying evidence. In other words, they can no longer claim a policy for destroying records in five years.

      So it is a two-edged sword. Better have a policy and better be sure that it is being followed.

      --
      Skilled in differentiating ravens from a writing desks.
    5. Re:hmm by NoisySplatter · · Score: 1

      It definitely still applies in civil cases unless the plaintiff already has overwhelming evidence to the contrary. However, since we're talking about pretrial discovery here in the case of backups, if the other party can't find the data to support their case they may just drop it.

      But then again I'm not a lawyer.

      --
      In Soviet Russia meme tires of you!
    6. Re:hmm by Kjella · · Score: 1

      The top 500 company I worked for did just the opposite: Destroy all data in case a legal issue comes up. They called it 'desk cleanout day',

      Enron?

      --
      Live today, because you never know what tomorrow brings
    7. Re:hmm by Anonymous Coward · · Score: 0

      I worked for a year at AT&T's MIS department. The one that handles all the customer T1s to OCx lines for private use (ie, John Smith's Widget company wants a T3 to service their business).

      AT&T had a rather strict policy regarding email (what you could and could not do, etc), they also had the exchange system automagically deleted after 60 days.
      So every Monday you would come in and all the emails on the threshold would be moved a "Pending deletion" folder. If you didn't copy them out to the local PC by the next Monday, POOF.

    8. Re:hmm by Lumpy · · Score: 2, Interesting

      That was a common company wide AT&T policy wipe everything after 60 days. all email to be deleted after 60 days. it was a fireable offense for creating a pst file on your desktop and we did a regular sweep for pst files on corperate pc's on a regular basis.

      It really did not stop anyone from keeping info, many managers simply printed out the emails and kept them in files, one IT manager we let go had 3 years of email printed and stored in file cabinets in his office. it was insane.

      --
      Do not look at laser with remaining good eye.
    9. Re:hmm by JohnFluxx · · Score: 1

      What's a pst file?

    10. Re:hmm by afidel · · Score: 1

      That explains why I can never get accurate information out of MIS!

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    11. Re:hmm by MrMr · · Score: 1

      Higher.
      And they claim to be ethical.

    12. Re:hmm by tonyyarusso · · Score: 1

      MS Outlook-format dump.

    13. Re:hmm by ACMENEWSLLC · · Score: 1

      >>unless you were an official dedicated contact on a particular subject you were to wipe all correspondence of more than a year old.

      So I keep track of how much I paid for these $10,000+ software packages by looking at my e-mails with the bids from the last year, and previous year. Sometimes the software costs just quite a bit and I use that to get costs back down.

      If I have to delete this correspondence, how am I to do that? I can't stick that information in an Excel document or write it down, as that is retaining the correspondence.

      I really don't understand why I should need to delete e-mails of me asking for prices from my vendors. How is that ever going to come up in a lawsuit?

      I understand they why -- IT legal doesn't want to have to shift through that data. But that's legal's problem that the legal profession created. I'm not going to suffer to make their job easier.

      I do understand that you keep e-mail to just the facts. Keep opinion out of any electronic document. Anything where having a written record would be questionable gets done verbally. Not that I'm doing anything wrong, but that someone could put it in an improper context and make it look like it means something it doesn't.

      It's a screwed up world we live in. A lot of people think this only applies to e-mail. eDiscovery applies to any electronic file you have. If you user saved off that e-mail as a .MSG from your IMAP server to his personal drive on his work PC, it's still discoverable. Sucks to be legal.

    14. Re:hmm by Boricle · · Score: 1
      Your employer must think they are more exposed to having to provide compensation, than they are to being able to prevent it.

      I've mentioned this here before....

      If you do wholesale deletion (sorry, "archiving/not retaining") of documents, it is also likely you are destroying documents that will prevent you from having to pay out - maybe this should be called the "smoking airbag". You might end up deleting documents where third parties accepted risks, provided evidence of good faith behaviour, or would otherwise have eliminated the exposure.

      The only rational reason to run the risk of removing the documents that would protect you, that I can think of, is that the organisation feels that they are more likely to be exposed, rather than protected by the contents.

      Which is enough to make someone worry about what they're up to!

    15. Re:hmm by mabhatter654 · · Score: 1

      I believe Microsoft had that policy for a long time as well until SOX was put into place. The idea is that if it is your policy to destroy information, you can't be held in contempt when it's gone. Of course in cases like Microsoft who gets sued all the time, it was quite handy as the court cases to sue them might take years, and the contract negotiations that failed are long since wiped out by the time you can legally "reach" the information. But the purge is a "system function" so it's "plausable denyability" that stuff happens automatically unless the proper judge signs the exact, proper paperwork in time. Again, this is why SOX was put into place to allow the SEC to adjust "best practices" more quickly to stop abuse of the system.

    16. Re:hmm by mabhatter654 · · Score: 1

      This is where things like Sharepoint come in, so that you can identify projects, and attach important information for retention by project instead of in the email system. Companies have to be vigilant because if some lawyer finds even 1 email outside the time limit then they open the scope of discovery even more, that may mean hauling in individual PCs for "dumps" and get messy fast.

    17. Re:hmm by syousef · · Score: 1

      The top 500 company I worked for did just the opposite: Destroy all data in case a legal issue comes up.
      They called it 'desk cleanout day', and unless you were an official dedicated contact on a particular subject you were to wipe all correspondence of more than a year old.
      (There were also other grades of information, but erase after a year was the default).

      Can I get a job working as a coder in this organization?

      Boss: Have you finished your work on XYZ?
      Me: Yes boss.
      Boss: Can we deliver it next monday?
      Me: Sorry but I started that code more than a year ago. Cleaned it up this morning for desk cleanout day
      Boss: How long before we can have it then?
      Me: Well I'll have to put together a project plan but we should be able to deliver in 366 days. 367 tops.

      --
      These posts express my own personal views, not those of my employer
  3. Purging is bad. by PsyberS · · Score: 1, Funny

    Data bulimia is a serious problem. If you know someone effected, make sure to get them the help they need asap.

    1. Re:Purging is bad. by hcpxvi · · Score: 1

      A well-known IT professional has been advocating this policy for some years now. http://bofh.ntk.net/Bastard.html

  4. Huh? by qoncept · · Score: 4, Insightful

    $250k a year for a 5000 employee company? To put it in perspective, if the average employee at this company is making $60k a year, this company will be paying $1.5 billion in salaries over the same 5 years. To be fair, I think the estimated cost from the article is very much underestimated. But while corporate storage costs more than you'd think, and companies are definately storing a whole bunch of data they don't need, what about the costs of reviewing and purging that data? That is straight up time, whether it's reviewing existing data or spending the time to create guidelines for which data to keep. And time costs money. More than storage.

    --
    Whale
    1. Re:Huh? by fast+turtle · · Score: 1

      and those costs are even higher when done by a law team during a discovery process. Gets quite expensive when law teams are billing $1k per hour to do discovery.

      I myself have found it far cheaper as a small business owner to have a written document retention policy along with a written policy that all business docs have a VCS Date and Number. In fact after I discussed the matter with another local small biz owner, I'm damn glad I've got such a policy in place as they're already going through the distraction and resulting loss of business while attempting to archive to CD/DVD several years worth of email archives (Outlook/Outlook Express/AOL).

      As far as having a written policy goes, that's not enough. You also have to follow it, otherwise the court will hang your ass for destruction of records along with contempt, possibly costing you the case. So once the policies are in place follow them.

      --
      Mod me up/Mod me down: I wont frown as I've no crown
    2. Re:Huh? by TubeSteak · · Score: 1

      what about the costs of reviewing and purging that data? That is straight up time, whether it's reviewing existing data or spending the time to create guidelines for which data to keep.

      Right now, the-way-things-are-done is to save it all and pay for it.
      You can train employees to change the-way-things-are-done.

      The learning curve is expensive, but the general idea (aspirational, as with anything corporate) is that once everyone figures out the policies, time is used more efficiently and the 'cost' goes down.

      And time costs money. More than storage.

      Can I see the report that verifies your assertions?
      You did have someone study the long term costs and give you hard numbers, didn't you?
      A company isn't going to fsck around their multi-million IT budget without someone in-house or consultant studying the matter.

      --
      [Fuck Beta]
      o0t!
    3. Re:Huh? by John+Hasler · · Score: 1

      > and those costs are even higher when done by a law team during a discovery
      > process. Gets quite expensive when law teams are billing $1k per hour to do
      > discovery.

      This is a very good point. The more data you have and the more poorly organized it is the more it costs you to honor discovery requests whether or not anything relevant is found. Thus there exists incentive to index your archive and minimize its size even if you are confident that it contains nothing that could be used against you.

      --
      Warning: this article may contain humor, sarcasm, parody, and perhaps even irony. Read at your own risk.
    4. Re:Huh? by Archangel+Michael · · Score: 1

      To put this into perspective, we have PRA requests for all sorts of "data" that we are supposed to keep. It has become almost a full time job going through all the crap to find what the PRA requests are asking for.

      And we're a SMALL school district.

      --
      Agent K: A *person* is smart. People are dumb, stupid, panicky animals, and you know it.
    5. Re:Huh? by qoncept · · Score: 1

      "Can I see the report that verifies your assertions?"

      I'm a slashdot poster and not any an analyst, but I'll bite. We'll use the numbers from the study above. Let's assume for a minute that we're going to make a team of 5 people to create and implement this policy, and it costs each of the 5000 employees a modest 5 minutes a day to fulfill their requirements, and we're going to result in needing half the storage.

      Half the storage means we're saving $675k for that storage over 5 years.
      We'll say those 5 guys are going to get paid $50k a year. That's... the original cost of the storage we were trying to reduce, $1.25m over 5 years.
      Everyone else getting paid $60k a year is now wasting 60,000 / 52(weeks) / 40(hours) / 12(5 minutes) * 5000(employees) ~= $12k. A drop in the bucket I suppose.

      Pay 3 guys $45k a year to implement this program and you'll break even. That said, I have a vague idea of what my company pays for storage and I don't think the numbers above are very realistic. Storage costs more, or at least a company that isn't a start up will have much more data than that. Still, my company's motivation for data retention policies has much less to do with storage costs and much more with the Sarbanes-Oxley Act, as made apparent by our training materials.

      --
      Whale
  5. 10GB of data!! by PinkyDead · · Score: 0, Offtopic

    Maybe it'd be cheaper for the companies to buy the employees an annual subscription to Penthouse.

    Jeez lads, would you lay off the porn?!

    --
    Genesis 1:32 And God typed :wq!
  6. More data is good by Anonymous Coward · · Score: 0

    I work for a storage company, stop messing with my job security.

  7. 10 GB user data? Not likely by arth1 · · Score: 5, Insightful

    10 GB of data per user, sure.
    10 GB of user data, no way.
    If assuming 300 work days per employee, that would mean that the average employee creates 1.2 kB of data per second.

    The only way this could be true is if you count data that isn't user generated, and they count the total data storage for the company and divide it by employees.
    If so, users deleting their e-mails won't have much of an effect.

  8. It's not the storage... it's the apps by paulhar · · Score: 4, Insightful

    Apps aren't really well designed for this in mind. They don't come at the problem from a "document lifecycle" perspective but instead a "document creation".

    This is generally because data has a variable lifespan. Lets take an email as part of a project as an example. As the author I may decide that the email isn't needed after a week so set an expiry of 1 week. But you, as the recipient, may take that email and turn that into several tasks so for you the email is much more important and thus want to keep it for much longer.

    Users aren't really going to be good at making these decisions unless some application continually bombards them with "go check the status of these 1000 documents you've got".

    1. Re:It's not the storage... it's the apps by ubercam · · Score: 3, Informative

      Users aren't meant to be making those decisions, the Records Management department should be... that is if you even have one! If you leave everything up to the users, you WILL have a cluster fuck of records.

      I work in Records Management at a large company with many different divisions in diverse fields. RM is completely left up to us. We manage well over 10,000 boxes and there's only 3 of us. We alone determine when something is to be destroyed (but require authorization from dept heads to be shredded), how long it's kept, etc.

      Disclaimer: We work mainly with paper records, but the exact same principles apply to electronic records.

      You need a retention schedule. Look at your national, state/provincial and municipal laws to determine the minimum legally required length of time each TYPE of record is to be kept. Employee time cards are different from pension plans, sales invoices and legal files. It's not *always* 7 years either. Some are less, some are more, some are permanent. Also, you don't have to shred when the law says it's time if there's a valid business reason to keep that set of records. I mean, let's get this straight. You don't HAVE TO shred at all, but you're digging yourself a deep hole if you do... "You can get in just as much trouble by keeping records too long as you can by destroying them too quickly." - Dr. Mark Langemo

      If this was all left up to individuals, they would just keep everything. I've seen what this is like, and it's pathetic, maddening and counter productive. Things must be properly named and catalogued down to the file level when put in storage, or you will NEVER find ANYTHING without an exhaustive search EVERY time. It might be alright when it's on your desk or in your local filing area and you know what's where, but when you archive it, you can't assume the guy looking for your file you need knows anything about it. We need explicit details or else we can't help you. At my company we require everyone to fill out a nice sheet detailing the contents of their box, the type of records, dates (most remember dates above all else), sender's name, dept, etc.

      We are by no means a perfect operation here, but we're far better than 90% of other companies out there.

      There is a series of excellent seminars done by Dr. Mark Langemo (sorry no links) to teach you how to deal with records. Also check out ARMA International if you're looking to get in touch with other Records Managers in your area. They have local chapters all over the place.

      To summarize, if your company doesn't have a Records Manager, HIRE ONE NOW and give him/her the resources to get your records under control! Check out ARMA, they have jobs posted on their site. There are also many companies out there that will help you clean up your stuff and get you started on the right track.

    2. Re:It's not the storage... it's the apps by paulhar · · Score: 1

      While I agree that for records / data that are structured it may be possible to implement a better regime, a lot of the data that flows around companies isn't structured and is stored either in email that goes back and forth, or in group shared folders in Word\Excel style documents.

      I pity the poor Records Manager who would have to go through everyones email to subjectively decide if an email can be deleted without the context that surrounds them.

    3. Re:It's not the storage... it's the apps by ubercam · · Score: 1

      It's not the Records Manager's job to go through their email, but rather together with Legal and/or IT, implement an email and electronic records retention policy, and do periodic (preferably annual) audits to see to what degree people are complying, what areas can be improved on, etc. You will get the people who hate change, but most will take to the new system and those that simply won't are dealt with eventually by time.

      To do this successfully, you absolutely need someone high up in Legal who has the ear of the Board to back you 100%, and give your bark some bite. When Legal speaks, everyone listens. You also need IT to be backing you 100%. Without them helping users as much as possible to ease into the new systems and ways of doing things, your users will be bitter about the changes and be less likely to comply.

      Never mind the electronic storage, you'd be surprised at how many people PRINT their emails "JUST IN CASE!" It's unbelievable that one person does it, let alone more than one. People also photocopy things incessantly and keep 100 copies of random crap they were working on 10 years ago in their desk drawers "JUST IN CASE!" Just in case of what exactly? If there was ever a legal discovery process, those copies could be incredibly useful to the opposition, especially if the originals had already been destroyed according to your retention schedule. There are a fuck load of ticking legal time-bombs in desks just like yours in offices everywhere...

    4. Re:It's not the storage... it's the apps by sledge_hmmer · · Score: 1

      The problem I find is that even emails you think are not particularly important end up saving your ass when shit hits the fan.

      A few months we faced massive delays and significant costs on a product field trial because a supplier didn't build their product to the specs we gave and then they denied ever being told about it. Of course our fantastic email retention policy of only 300MB or 60 days had ensured that almost everyone except me had lost the email - I had archived to my hard drive. Saved us from a huge falling out with them, since they refused to admit fault and we were footing the bill.

  9. It depends upon business by William+Robinson · · Score: 2, Informative

    For example, Financial institutions are required to keep data for longer period for legal purpose as well as traceability (during investigation of fraud or other kind of crimes). The banks worked for had legal requirement of keeping data at 2 places at least 15 km apart, with all kind of protection against fire and intrusion.

    A good manufacturing company would keep data for longer period ot only to comply with ISO standards, but to trace manufacturing defects and a good evidence of past history for insurance company against theft/fire and other kind of problems.

    We used to keep daily changes of source code of only previous releases, and purge rest of of the releases (we kept the final source code and patches of all previous releases, but purge daily changes).

    In a nutshell, it depends upon your type of bussines.

    1. Re:It depends upon business by PainKilleR-CE · · Score: 3, Insightful

      Additionally, there are many businesses that don't understand their data retention requirements beyond 'we need to keep some data for 10 years', so instead of compartmentalizing their data and saying 'keep this for 10 years, that for 5 years, and purge this every year and that every 3 months', they just keep everything. Further, if they have a data retention requirement for 3 years or 10 years, they might wait longer before purging it just because it's easier to keep it then it is to go find and remove the 5 or 12 year old data.

      I only recently organized some data being maintained by the company I work for that was basically divided into 'archived' and 'live' data, logs generated by a many-user application. The 'archived' data went back 4 or 5 years with no easy distinction between data that was many years old and data that was generated in the most recent archive. Now at least the data is sorted by date (and being archived by date), so that when someone decides on how long we want to keep it (they can't seem to make up their mind, and while everyone seems to agree that we don't need data from 2005 and earlier, no one's willing to say I can delete it, either), it won't be hard to dump the older data at least on an annual or semi-annual basis.

      --
      -PainKilleR-[CE]
  10. Too Much Cost? by Kuriomister · · Score: 1

    so $50 a year is now too much for a large company to tag onto employee costs? If someone is making $30,000 a year, whats another $50. The problem might be in multi-year retention, in which a 2 year employee will require $100 of storage and so on. but this does not account for the diminishing price in memory costs or other, associated costs. Maintaining a 10 year archive at that price, and assuming that employees where putting out 10gb of data 10 years ago, would cost $500 a employee, and scale that up to a larger company, and you can see data storage prices in the millions. This is assuming that the data is: 1)Stored serverside 2)Not kept only as a physical backup after lets say 3 years. It would be cheaper in the long run to after some point x to move everyting to hard storage and keep it offline, only to be used in the case of lawsuits and other, archival needs. Using a model like this allows for near unlimited storage time with minimal costs. If a new format of storage comes about, the biggest pain might be updating these records, but in the terms of memory costs for such a operation, look at the advancements in storage space; 10 years ago, people thought 10gb was large.

  11. Choosing your battles by ka9dgx · · Score: 1
    It just doesn't make sense to expend the limited political capital of the IT department to nag people into cleaning up their folders. If you're in a small company, and can more than double your server storage for $1000, instead of pissing off 25 people, you'll spend the money, and so will the CFO. I should know, we've done it more than once over the past 10 years.

    It's far better to spend a few $K than to waste literally weeks of time trying to sort things out, especially when you need sales to be selling and not worried about their computers.

    --Mike--

    1. Re:Choosing your battles by cowscows · · Score: 1

      Exactly. I've worked at my current company for about three years. It'd take me a few days at least to go through all the documents that I've created since I've been here. The cost of storing all those documents is significantly less than the billable hours that my company would have to give up for me to spend those days sorting paper. Not to mention the fact that I can't imagine have the luxury of a few days without having to worry about projects/clients/etc and have the time to focus on sorting through stacks of documents and emails.

      Coincidentally, I generally feel the same way about all my "life" paperwork that I get at home (bills and receipts and such). My wife is a bit obsessive about filing things and having it all very organized, while I'm perfectly happy to just throw everything in a box and forget about it. Sure, when I have to find something, I'll spend five minutes digging compared to my wife who could just walk over the file cabinet and find it in about 30 seconds. But I so rarely need to actually go retrieve something that a few five minute searching sessions per year adds up to significantly less time than I'd require to consistently file everything.

      --

      One time I threw a brick at a duck.

    2. Re:Choosing your battles by geekoid · · Score: 1

      Well, if you ahve done it more then once at some tiny shit hole company, I guess that's the way to do it...

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
  12. Email Attachments by whisper_jeff · · Score: 4, Insightful

    I don't know what most major companies' policies are regarding backing up emails (just back up the text or back up emails plus attachments) but, as but one example, I'm sure this would be an easy spot for most companies to dramatically reduce the amount of storage space required. Most business communications I see from corporate personnel have various attachments on every email - things like logos, custom backgrounds, etc. Forget getting rid of all the unnecessary attachments - getting rid of the "look at my pretty email that looks like a page from a spiral-bound notebook with my company logo at the bottom" images, and the hundreds and thousands of duplicates of those images, would reduce storage requirements, bandwidth requirements, and probably make corporate communications look more, you know, professional. So many emails are filled with unnecessary garbage and, if that's being backed up, that garbage can get costly.

    Then again, I'm biased - I believe email should just be pure text. Perhaps that's a sign that I'm now old...

    1. Re:Email Attachments by xgr3gx · · Score: 1

      Hmm, maybe I should stop doing my weekly email of the "Monkey Drinking his own pee" video to 200 hundred people in my department.
      I guess that might explain all the SAN storage requests for our email archive servers.

      --
      Shameless plug alert: Game server control panel
    2. Re:Email Attachments by daniel_newby · · Score: 1

      "... getting rid of the "look at my pretty email that looks like a page from a spiral-bound notebook with my company logo at the bottom" images, and the hundreds and thousands of duplicates of those images, would reduce storage requirements, bandwidth requirements, ...

      So parse the MIME headers, separate the files, and store them in a content-addressable filesystem. A content-addressable filesystem hashes each file, then indexes the file under its hash instead of its name. Duplicates are automatically consolidated into a single file.

      Folks who are especially aggressive could even diff each email against all recent emails and extract common fragments. It wouldn't even be especially slow if implemented right.

    3. Re:Email Attachments by euri.ca · · Score: 1

      I've been getting stupid emails like that for years and years... you'd think that Outlook would've dealt with that by storing includes by size & hash.

      But no. Actually, I think they still store them inline in the .pst file. (fun Microsoft fact/theory: it's not the devs/PMs' fault, they could fix it easily if enough people weren't buying Outlook because of the way it stores files.)

    4. Re:Email Attachments by Vancorps · · Score: 1

      For those of us with NetApp SAN storage we use A-SIS to dedup all of those files so we only store them once even if they are referenced in one hundred locations. This dramatically reduces storage requirements at the cost of cpu cycles at night while it scans all the new files and determines if any are duplicates.

    5. Re:Email Attachments by Anonymous Coward · · Score: 0

      Then again, I'm biased - I believe email should just be pure text. Perhaps that's a sign that I'm now old...

      Can't be a sign of age. I'm 26 and I think the same thing. My boss who is 50+ thinks quite the opposite.

    6. Re:Email Attachments by Anonymous Coward · · Score: 0

      Since you mention "pure text"...

      Those same people that doll up email with graphics would do something similar in a "pure text" environment.
      When scanning the content of a message, imagine how much more difficult it becomes with text garbage.

      It's much easier to just block images by default.

    7. Re:Email Attachments by afidel · · Score: 1

      Wow, you just explained how single instance storage works in Exchange, I wonder why it's one of the most popular email packages....

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    8. Re:Email Attachments by gatesvp · · Score: 1

      Then again, I'm biased - I believe email should just be pure text. Perhaps that's a sign that I'm now old...

      No just a sign that you're communicating inefficiently. Font sizing, font highlighting, font types and nested bullets are used because they are effective tools for communication.

      Pure text e-mail is like using a monotone voice. It simply doesn't encapsulate the richness of human communication. If I'm stuck with pure text e-mail I end up doing stuff like *this* anyways.

      In fact, you used the bold-facing in your own post to highlight every e-mail. Why you desire pure text e-mail but are more than happy to highlight your forum post kind of baffles me, not really a great sell.

  13. Poor Decision Making and Follow Through... by RyansPrivates · · Score: 0

    "...most organizations hang on to more data than they need, for much longer than they should."

    As an infrastructure consultant, I see this EVERYWHERE. At the average client, I find the same INSTALL MEDIA (O/S ISOs) in three or more locations, all of which are being backed up. WHY??? You already have a TRUE backup, it's called the CD the software came on, or the electronic source you downloaded it from. Just gigs upon gigs of wasted space.

    And don't even get me started on email limits and policies. I can't think of a single company that actually enforces the mail limits. And even those that do, will "extend" or except nearly anyone who requests it.

    We in the IT field need to create better policies and actually follow through on them...

    --
    If at first you don't succeed... How does that go again? Ah, forget it.
    1. Re:Poor Decision Making and Follow Through... by Atrox666 · · Score: 1

      I'm guilty as hell here.
      The fact is that I do backup certain installs on the network.
      My reasons for doing this are:
      1) I work with idiots and they lose stuff all the time. The original disks for half the software in the company are probably under someone's coffee mug somewhere.
      2) I work with thieves. People "borrow" my disks all the time. I've fixed this by writing "virus infected", "Damaged", "old version do not use" or "Corrupt" on most of my important stuff.
      3) The archiving / version control system we bought is too expensive so we aren't allowed to put most things we need to preserve
      4) Our archivist doesn't really do anything or know how and until her ass starts to sag she won't get fired.
      5) Using drive shares and active directory for file storage is inefficient and results in a lot of duplication of files due to the lack of maintainable granular security control. Even Sharepoint would be a better idea.
      6) If I save them money they never share any of it with me so what the hell do I care if they make any profit? I actually get a big smile on my face every time I see massive waste because these are not good people and bad people deserve to have bad things happen to them. If I were to try and fix it I would get all the blame if it went bad and none of the credit if it went well.

  14. Data Discovery Woes by Anonymous Coward · · Score: 1, Informative

    I work for a few lawyers and we just began running into issues with "data discovery". Two recent examples:

    1.
    They are a medium sized law firm and they were involved in a lawsuit with another law firm. The other law firm (much smaller) required a copy of all the data from the firm.

    Data from encrypted laptops = 80GB x 6 users
    2 hours per laptop to decrypt and image (12 hours)
    Data from 4 servers and email = 65GB (2 hours)
    That's now almost 500GB and 14 billable hours of support.

    2.

    The law firm was involved in a lawsuit where they were doing discovery and had to review evidence.
    They were going to get data from 10 laptops (800GB total) that will require backups of the data and archival for X years (so far it is 1 year and indefinite).

    Quickly the data discovery is getting expensive - and annoying on a technical level.

  15. I'm 500% better than average! by Simonetta · · Score: 1

    average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up...

    I cry nonsense in the statement above.

    I put a 25 cent blank DVD into the DVDwriter of my PC. Then I copy the entire contents of my 'C:\backup' folder onto this DVD. I start the program, and go do something else. Total dedicated time: 2 minutes

    When the DVD write is done, I write a label code on the DVD (date, employee, backup number) and put the disk back on the stack in the file cabinet. Total dedicated time: 2 minutes

    My salary and benefits: @ $18/hr time used on backup: 0.067 hrs My cost per gigabyte of backup: $1

    So if I'm an average marginally competent employee, why can I do backup %500 more efficiently than the average.
    This statistic must be junk.

    1. Re:I'm 500% better than average! by confused+one · · Score: 1

      Large corporations back up servers on tape. Good tapes and tape drives are expensive. Including support, maintenance and replacement costs, $5 per gigabyte probably isn't that bad.

    2. Re:I'm 500% better than average! by Chris+Mattern · · Score: 2, Insightful

      Unfortunately, writable DVDs are not an acceptable archive medium, and a stack of disks with written labels is not an indexing solution that will scale beyond one person.

    3. Re:I'm 500% better than average! by Vellmont · · Score: 1


      My salary and benefits: @ $18/hr time used on backup: 0.067 hrs My cost per gigabyte of backup: $1

      And you backed it up a total of once. The cost of $5 is likely a yearly cost (as the volume is yearly), Backups are usually done 1/day. Your yearly costs would be in the hundreds of dollars per gigabyte.

      --
      AccountKiller
    4. Re:I'm 500% better than average! by Anonymous Coward · · Score: 0

      I put a 25 cent blank DVD into the DVDwriter of my PC. Then I copy the entire contents of my 'C:\backup' folder onto this DVD. I start the program, and go do something else. Total dedicated time: 2 minutes

      When the DVD write is done, I write a label code on the DVD (date, employee, backup number) and put the disk back on the stack in the file cabinet. Total dedicated time: 2 minutes

      My salary and benefits: @ $18/hr time used on backup: 0.067 hrs My cost per gigabyte of backup: $1

      So if I'm an average marginally competent employee, why can I do backup %500 more efficiently than the average.
      This statistic must be junk.

      Because your "backup" will go poof in a number of cases that a real backup or archive will survive easily. Just to name the top three:

      • If the DVD with your backup develops more defects than ECC can cope with, your data is gone.
      • If the building hwere teh file cabinet is located burns down, your data is gone
      • If you need your data in 10 years time and discover that DVD drives are as obsolete as 8" floppy disks are nowadays, your data will still be there but you won't be able to read it, which is only sligthly better than gone.

      If you want to avoid only these three common pitfalls for archiving, you'd have to:

      • Write every backup to at least two DVDs
      • Drive to a different location to store the second one every time you mae a backup
      • Check all accumulated backups for media defects on a regular basis, and copy those where the defect rate increases to new media.
      • Copy all accumulated backups to different media in time when the current backup media starts going out of fashion.

      which will take considerably more time than four minutes a day...

    5. Re:I'm 500% better than average! by Geoffrey.landis · · Score: 1

      My salary and benefits: @ $18/hr time used on backup: 0.067 hrs My cost per gigabyte of backup: $1

      You haven't counted overhead. First, there is your personal overhead. Do you talk to your co-workers in the hall? Get coffee on company time? Go to the bathroom? Fill out time sheets to account for what you do all day? Read memos telling you that you have to fill out time sheets? Read your e-mail? Post comments to slashdot at 10:47AM on a workday? Only robots are 100% efficient in their use of time.

      And then there is company overhead-- your computer, pens, paper, copy machine, office, lighting, secretaries, payroll, all that stuff. Even if you don't notice it, even if you don't use it, it's there.

      $18/hour?? You'd be lucky if you cost the company less than $100/hour.

      --
      http://www.geoffreylandis.com
    6. Re:I'm 500% better than average! by geekoid · · Score: 1

      That is a completly ignorant example of needing to back up 1000's or people, and billions of transactions.

      --
      The Kruger Dunning explains most post on /. http://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect
    7. Re:I'm 500% better than average! by toddestan · · Score: 1

      Well, considering that it the cost you quote is only $4 less than doing it properly, I can't say from your post that the statistic is bunk.

  16. It was easier with paper... by swm · · Score: 1

    Used to be records were kept on paper,
    paper was kept in boxes,
    and boxes were dated MM/YY.

    I came into the office one fine 1998 January 02,
    and the hallway was stacked full of boxes dated 01/94,
    02/94, 03/94, etc.

    Company policy was discard records after three years,
    so all records from 1994 were on their way to the dumpster.

    1. Re:It was easier with paper... by cashman73 · · Score: 1
      Used to be records were kept on paper, paper was kept in boxes, and boxes were dated MM/YY.

      So THAT explains why they kept moving Milton's desk (image)! I guess all those TPS reports take up space!

    2. Re:It was easier with paper... by Anonymous Coward · · Score: 0

      >> were on their way to the dumpster.

      To the dumpster huh? Can you tell me the company name so I don't do business with them?

  17. keep, but not on the high-performance disk arrays by petes_PoV · · Score: 1

    The major cost of purging is the manpower and downtime. Therefore it's easier to keep the stuff, possibly with occasional housekeeping if your schema isn't as scalable as it should be. While the legal and tax requirements (which vary from country to country) have a limited lifetime, there are always possibilities, such as legal defences, where old data may be needed. These uses will not require the performance (and cost) of enterprise class storage: speed, redundancy, administration, warranties.So migrate it to a few 1TB drives in someone's desk. That way if subpoena'd you can plausibly have "lost" it, whereas if it's in your interests, it can miraculously be found.

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
  18. Yeah this whole thing seems a little fishy... by Smeagel · · Score: 1

    On top of what you said - $5 a gigabyte? What is this 1998? Even if you get WD's highest quality consumer hard drives they're about $1 a gigabyte, plus if you buy them in bulk they're probably considerably cheaper. You can use 2 or 3 of them for data redundancy, and it's still significantly cheaper. I question where they got that number.

    1. Re:Yeah this whole thing seems a little fishy... by jimicus · · Score: 1

      On top of what you said - $5 a gigabyte? What is this 1998? Even if you get WD's highest quality consumer hard drives they're about $1 a gigabyte, plus if you buy them in bulk they're probably considerably cheaper. You can use 2 or 3 of them for data redundancy, and it's still significantly cheaper. I question where they got that number.

      As soon as you say that I can be reasonably sure that you've never factored in storage costs for anything fancier than a desktop PC.

      SAS disks are typically 3-5 times more expensive per drive. Factor in RAID (level 5 if you want capacity, 10 if you want performance, 6 if you want a compromise of both) and can potentially double the cost per gigabyte. But you can't get 15,000 RPM SATA disks and you can't bond SATA channels together for performance.

      Secondly, seeing as the subject is archiving they're probably talking about tape rather than hard disks. Tapes have the big advantage that you can handle them a lot more roughly, transport them more easily than disks and they can be archived for longer because they don't suffer from sticktion.

      Thirdly, I don't think the cost of media is the biggest factor by a long way. They've probably also factored in cost of a contract with Iron Mountain, cost of robotic tape library, licensing costs for TSM (or similar) and a proportion of the wages involved in paying someone to swap the tapes out and hand them over to Iron Mountain every day.

    2. Re:Yeah this whole thing seems a little fishy... by RyansPrivates · · Score: 0

      We are not talking about consumer hard drives here, we're talking about enterprise storage solutions. You can't just throw 100 terabyte consumer SATA disks in a closet and expect to have a "storage solution".

      An enterprise solution comprises the MUCH pricier SAS and FC disks inside of a SAN. Just at this first level, you've already spent more than your $1/GB.

      Then, throw in the associated SAS and SATA disks for backup, as well as tape for archiving, and all of the infrastructure to support it and labor required to make it work.

      There is nothing fishy about it. Enterprise storage solutions are PRICEY beasts...

      --
      If at first you don't succeed... How does that go again? Ah, forget it.
    3. Re:Yeah this whole thing seems a little fishy... by Smeagel · · Score: 1
      Yes there is something fishy.

      One of us is either making a wrong assumption:

      1) I was assuming they couldn't have been talking about long term storage, because no way an *average* user produces 10GB a year that needs long term storage.

      2) You were assuming that somehow the average user produces 10GB a year that requires long-term storage.

      There is no way that the average user generates 10GB of data that makes it into long-term storage in a year.

      That's approximately 200MB of data a week. Most corporate users generate a few megs in email and a few megs in spreadsheets in a week.

    4. Re:Yeah this whole thing seems a little fishy... by RyansPrivates · · Score: 1, Insightful

      I definitely see where you're coming from, and you SHOULD be right. However, this goes to the heart of the article: most companies are OVER-retaining their data. Backing up things that shouldn't be backed up, and retaining things beyond legal requirements or indefinitely.

      Additionally, even though we may not agree on the figures, we definitely agree that storage costs have exponentially decreased. This has led to the trend to just keep adding storage, as opposed to actually going through what is being stored and for how long.

      Like I stated in another post, this problem needs to be attacked from a business policy angle, not merely from a technological capacity (pun fully intended).

      --
      If at first you don't succeed... How does that go again? Ah, forget it.
    5. Re:Yeah this whole thing seems a little fishy... by landonf · · Score: 1

      Thirdly, I don't think the cost of media is the biggest factor by a long way. They've probably also factored in cost of a contract with Iron Mountain, cost of robotic tape library, licensing costs for TSM (or similar) and a proportion of the wages involved in paying someone to swap the tapes out and hand them over to Iron Mountain every day.

      Indeed -- the cost is in offsite storage and archival. I've previously used Amazon S3. They charge .15 cents per gigabyte-month for redundant online storage, and if you want redundancy against bit flip failures on their end, you can also employ something like reed-solomon error correction on uploaded data.

      When I set up uploading of (encrypted) backup archives, the total overhead was approximately $102/month in data transfer costs (1 terabyte amortized over a month) and $307/month in data storage (2 terabytes/month), with minimal month over month growth. This is a yearly cost of $4908, or $2.39 per gigabyte per year.

      --
      http://plausible.coop
    6. Re:Yeah this whole thing seems a little fishy... by Vancorps · · Score: 1

      You're also making an assumption that all data is user generated and not automated which could include log files for access times and in the case of my company 100gigs a day of security footage which we don't retain longer than 30 days. Whenever there is an incident that video is retained though.

      The cost per gig is on par with what I've seen after deploying a 30tb SAN along side my 60TB SAN. 168 drives in the large SAN, a good number of them are fibre channel drives too which are quite costly.

      Add in the cost of fabric switching, HBAs for servers and the cost per gig rises pretty fast in an enterprise environment. Now factor in the cost of tape backup which may or may not be contracted with iron mountain for a year long cycle and the cost rises even more with the cost of a tape library for a 60TB. There is also storage system redundancy if data access is critical to the company which is why I have two SANs instead of one but that doubles the current cost per gig.

      $5/gig is actually pretty decent from estimates I've seen, $15/gig wouldn't necessarily be out of the question for those that need to stay up.

      Don't forget, there is the cost of cooling, electricity in general, switching equipment, alternate pathing, offsite live backup, offsite offline backup like Iron mountain, and any number of other factors including the cost of backup software and other data management software combined with enterprise antivirus and encryption strategies.

    7. Re:Yeah this whole thing seems a little fishy... by Kuriomister · · Score: 1

      the cost of storage here is assuming "live" storage. "Cold" storage is 10 times cheaper than live storage, and has the ability to not require a upkeep cost other than to keep it compatible with current technology.

    8. Re:Yeah this whole thing seems a little fishy... by Vancorps · · Score: 1

      If it's all live storage then it makes all the more sense at 10GB total would be 5GB across both units in my case.

    9. Re:Yeah this whole thing seems a little fishy... by afidel · · Score: 1

      Hmm, my full backups for an 800 employee company are ~7.5TB, pretty damn close to 10GB/user. This includes email, database, and file servers. So yeah I think the number is definitely in the right ballpark. It's at least close enough to get an order of magnitude calculation for costs.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  19. My last job by dj245 · · Score: 2, Interesting

    My last job had some files from the 1890's. The company had moved from New York to New Jersey to Houston in all that time. I can't imagine that material would ever need to be used, or would be called up during a legal investigation. Even if it were, would the authorities penalize a company for files that were that old??? At some point, everything is trashable or museum material.

    This company occasionally needed blueprints from the 1930s/1940s (great lakes ships), but none of their ships went back much further than that.

    --
    Even those who arrange and design shrubberies are under considerable economic stress at this period in history.
  20. Re:10 GB user data? Not likely by cashman73 · · Score: 1
    Only 10 GB?!?! Pfft! Amateurs,...

    I've been in my current position almost a year now, and I've already generated about 1/2 a terabyte of data; and that's only the stuff I've decided is worth keeping (I've probably generated several terabytes in reality),... Of course, I'm probably not your average office worker -- my data is mostly monte carlo simulations of proteins, on the order of millions (some in the billions) of steps long. Some of the largest trajectories are 45 GB (yes, that's one file).

  21. Communicate less by Yvanhoe · · Score: 2, Interesting

    In a world where backup takes money, a law that says to companies "keep every communication backuped" is saying essentially the same thing as "communicate less".

    --
    The Wise adapts himself to the world. The Fool adapts the world to himself. Therefore, all progress depends on the Fool.
    1. Re:Communicate less by BobMcD · · Score: 1

      Or communicate less in writing - I personally have had this policy for a long time. If I worry that a question, comment, concern, etc might not reflect well on me in the future, I walk into my boss's office and ask out loud (with the door shut.) If I want the communication to be recorded for all eternity I use email...

    2. Re:Communicate less by NoisySplatter · · Score: 1

      Little do you know he has a microphone recording everything that goes on in his office to .wav format. Your little conversations are costing millions to back up.

      --
      In Soviet Russia meme tires of you!
    3. Re:Communicate less by Anonymous Coward · · Score: 0

      Backuped? Is that like hiccuped but through your butt?

  22. easy solution by circletimessquare · · Score: 2, Funny

    put everything on one disk drive, unRAIDed. when it fails, problem solved. voila, built in obsolescence

    --
    intellectual property law is philosophically incoherent. it is your moral duty to ignore it or sabotage it
  23. Re:10 GB user data? Not likely by confused+one · · Score: 1

    You're obviously not writing software, doing CAD work, or any kind of computational modeling. It's easy to have that much data -- my source tree alone is 2GB.

  24. Re:10 GB user data? Not likely by Anonymous Coward · · Score: 0

    Only 10 GB?!?! Pfft! Amateurs,...

    I've been in my current position almost a year now, and I've already generated about 1/2 a terabyte of data; and that's only the stuff I've decided is worth keeping (I've probably generated several terabytes in reality),... Of course, I'm probably not your average office worker -- my data is mostly monte carlo simulations of proteins, on the order of millions (some in the billions) of steps long. Some of the largest trajectories are 45 GB (yes, that's one file).

    At least that's what you tell your boss so he won't find your porn.

  25. "Let's keep it all" is not a solution by PotatoSan · · Score: 1

    Conversely, keeping all of that data also opens you up to legal trouble. Different types of records should be kept for different lengths of time, in accordance with your company's records schedule.

    If you have too many records, you may have to turn over information that could be damaging to your case in any litigation against you - information you aren't even required to keep in the first place. Confidential information may be leaked, stolen, or lost, and the probability of that happening only goes up with time. Additionally, if you have a ton of records that you don't need and won't use, your ability to find the information you do need is severely hampered.

    While high storage costs may be a factor for disposing of unneeded data, it is not the reason for doing so. You shouldn't be keeping more data just because storage is getting cheaper.

  26. Slaw? by Anonymous Coward · · Score: 0

    I can't see how wanting "more slaw" is on topic, or why it would be spelled with two Os. Oooh .. Moore's Law. Then there's the sound-alike Mooer's Law which was summarized as "focus[ing] on [the] idea that people may not want information, as it obliges them to study the information, and come to an understanding about it."

    Storage vs Study, Moore vs Mooer - fight!

  27. Yes and... by Xest · · Score: 1

    ...whilst policies and procedures often solve a lot of things in a cleaner, more common sense manner there are unfortunately far too many people lacking common sense.

    Throwing hardware at it guarantees it'll be done, expecting people to follow policies and prcoedures will likely leave you with a 50% success rate in ensuring the correct data is kept/binned and that's if you're lucky.

    The world as a whole would be so much more efficient if we could get people to follow policies and procedures or at least the common sense, good practice ones.

  28. Re:10 GB user data? Not likely by value_added · · Score: 2, Funny

    If assuming 300 work days per employee, that would mean that the average employee creates 1.2 kB of data per second.

    Top posting and absence of editing by Microsoft Outlook users engaged in a brief inter-departmental discussion could easily account for that volume.

    Is that what you meant by "isn't user generated"?

  29. Future BI. by jellomizer · · Score: 1

    Business Intelegence Software just may make use of the software. Wile a lot of buisness are STUPID in their use of BI Software. There may be some point either the company dies or will get a clue and do some BI analysis on its data.
    You actually can do some amaizing things with BI. Say for example You are storing Time Card Data from employees. And you want to check the effectivnes of managers. So with say 20 years of time card data and employee records of which manager is which. You just may find a coraltion between differn't managers how long people take their breaks, how many sick days they take. Factor out difference of age and experience in the company, then possible create a coralation of how much value the department makes over the next.... And you will have in nice number form proof that Manager A sucks, while Manager B is effective. Even if people may not like Manager B as much, or the people under him like him, but his managers don't... (as they may have been found to be bad managers by the same calculations).

    Oddly enough computers are really good at doing a lot of complex math... Imagine that... So it can far easier handle crunching 20 years of data and finding coralations far better then many peoples gut feeling.

    --
    If something is so important that you feel the need to post it on the internet... It probably isn't that important.
  30. Re:10 GB user data? Not likely by Lord+Ender · · Score: 1

    They count more than just the stuff you typed as "user data." For example, Linux admins download ISOs, lawyers download PDFs, Windows admins download patches, service packs, and malware cleaning tools, and sales people download porn. All this data is used by the users and must be archived.

    --
    A slashdotter who didn't build his own computer is like a Jedi who didn't build his own lightsaber.
  31. The funny thing is it depends on your MTA by brunes69 · · Score: 1, Interesting

    My 10GB mail box in outlook, when mirrored to my local hard drive in MBOX format, automagically becomes 2 GB - and that's before compression and attachment pruning.

    I have no idea what the hell Outlook is doing on the server, if it is just storing things in multiple formats at once or if it is just mis-calculating all the space, but that is one hell of a difference.

    1. Re:The funny thing is it depends on your MTA by ckaminski · · Score: 1

      What do you use to MBOX your outlook data, if I may ask?

    2. Re:The funny thing is it depends on your MTA by jgrahn · · Score: 1

      What do you use to MBOX your outlook data, if I may ask?

      I cannot say what he does, but if the Sexchange server is open for IMAP, you can telnet to it and pass an IMAP command to dump everything in RFC 822 format. It ends up very close to mbox format; it might even have a _From line.

  32. By your interesting math... by Smeagel · · Score: 1
    It should be $10-20 a gigabyte....

    Drive is 3-5x more expensive than $1 a gigabyte...raid level 5 means 2+ drives, we're to $6-10 already, then you say the majority of the cost wouldn't be in the media...

    From working at a large university, three fortune 500 companies, and now the small business I work for, I don't think it's even suggestible that most user data is backed up in an out-sourced tape data center. That's an absurd suggestion. The vast majority of data never makes it off either a local hard drive or a temporary, lightly backed up network "drive".

    No matter how you skew it, the numbers they came up with in the original post are absurd. Be it either how much the data costs to store, or how much data is being stored - one of those is way out of wack with reality.

    1. Re:By your interesting math... by jimicus · · Score: 1

      Drive is 3-5x more expensive than $1 a gigabyte...raid level 5 means 2+ drives, we're to $6-10 already, then you say the majority of the cost wouldn't be in the media...

      RAID 5 means 3+ drives, and means that you lose 1 drive worth of capacity.

      You probably wouldn't use RAID 5 with drives that size because rebuilding the array would take too long. And I don't think you can get 1TB SAS drives yet.

      From working at a large university, three fortune 500 companies, and now the small business I work for, I don't think it's even suggestible that most user data is backed up in an out-sourced tape data center. That's an absurd suggestion. The vast majority of data never makes it off either a local hard drive or a temporary, lightly backed up network "drive".

      I'm sorry, but that is so far at odds with all my experience that it's not even worth my time to discuss it.

      "Local hard drive"??!

  33. Yes--deleting costs money! by mkcmkc · · Score: 4, Insightful

    I did a back-of-the-envelope calculation on just this question in 2004, and estimated that file deletion was not productive unless we could do it at a rate of at least 17MB per minute (of labor). Four years later the threshold is probably at least 45MB per minute.

    Generally, this means that if we can blow away whole disks or huge directories of data, it may pay off. Users going through their files one by one is usually an absolute waste.

    --
    "Not an actor, but he plays one on TV."
    1. Re:Yes--deleting costs money! by ckaminski · · Score: 2

      Currently filesystems track the following:

            Creation time
            Last Access Time
            Last Modified Time

      If we also had a

            Last backed up time/scanned time

      that virus scanners and backup software could use instead, then you can track last-access to eliminate files that haven't been opened by end-users in a particular time period for permanent offsiting or removal. Making today's complex HSM architectures easier to implement or not necessary at all.

    2. Re:Yes--deleting costs money! by Anonymous Coward · · Score: 0

      On any filesystem that supports arbitrary extended attributes, this is trivial to do. The thing we need is for solution providers (AV people, backup people, ...) to agree on a particular xattr format and that's all you need.

    3. Re:Yes--deleting costs money! by whoever57 · · Score: 1

      Currently filesystems track the following:

      Creation time
      Last Access Time

      Access time tracking is routinely turned off to improve performance of filesystems.

      --
      The real "Libtards" are the Libertarians!
    4. Re:Yes--deleting costs money! by Anonymous Coward · · Score: 1, Informative

      Afaik one of windows file systems has Archive bit in order to represent this.

    5. Re:Yes--deleting costs money! by SETIGuy · · Score: 1

      If we also had a "Last backed up time/scanned time" that virus scanners and backup software could use instead.

      Then all virus writer would be sure to adjust the "last access time" and "last modified time" of files they infect to be before the "last scanned time."

      Seriously, virus scanners should never believe file attributes, and serious backup software should do hash checks to determine whether a file has changed.

    6. Re:Yes--deleting costs money! by Anonymous Coward · · Score: 0

      you're dumb. that's not what he was talking about.

  34. litigation hold by Benjamin_Wright · · Score: 2, Informative

    Any record destruction policy must include a "litigation hold". A litigation hold means that record destruction must stop when litigation is anticipated or pending. But in a complex enterprise, it is tricky to know what litigation the enterprise anticipates. It was the trickiness of litigation hold that led to the demise of Arthur Andersen. The risks associated with litigation hold give enterprises incentive to store lots more records. --Ben http://hack-igations.blogspot.com/2008/07/document-discovery-litigation-hold.html

    --
    Benjamin Wright, Dallas, Texas, benjaminwright.us
  35. Who decides what to delete? by HockeyPuck · · Score: 1

    Look at how people deal with email. I've got coworkers that have every single email (including mailing lists they've subscribed to) they've ever sent or received since they started (~8yrs ago). They're probably got 20GB of email on their laptop. Now we only allow 100MB of server based email storage, so that helps on the server side, but we're still backing up this guys laptop.

    On the datacenter side, we had a database corruption about 10years ago so we implemented snapshots, and then snapshots of those snapshots... we actually now carry about seven copies of our database. Why still seven? Because nobody wants to have to recommend that we have fewer copies of data in case we have another problem again. The funny part is that nobody in operations was around at the time of this outage.

    Atleast de-duplication technology is being adopted, which gives us an excuse to hoard even more data. However from a legal standpoint, tell the Judge we don't retain data older than X years is easier than recalling 50k tapes you sent offsite 8years ago.

    Bottom Line: It's just easier to store it than to be the one that "Recommended we delete XYZ files, that's why we don't have the data."

  36. Re:10 GB user data? Not likely by Anonymous Coward · · Score: 0

    Big whoop.

  37. Re:keep, but not on the high-performance disk arra by NoisySplatter · · Score: 1

    Holy Enron Batman! I hope you aren't suggesting perjury is better than accountability.

    --
    In Soviet Russia meme tires of you!
  38. Mod parent way up! by khasim · · Score: 3, Interesting

    Congratulations. You're the first person I've seen who understands that.

    Accounting understands the need to close one year and open the next. They have processes for what is carried over and how it is identified.

    Yet no other department (or application) understands the need to close old data and archive it.

    1. Re:Mod parent way up! by radarsat1 · · Score: 1

      Yet no other department (or application) understands the need to close old data and archive it.

      Is this significantly different from tagging a release in a version control system?

    2. Re:Mod parent way up! by Anonymous Coward · · Score: 0

      No other department works on such an arbitrary timeline. Projects may be 3-4 years long, and you need that documentation for the entire period. 'Accounting' is simply prepping summaries and reports of numbers that 'finance' actually uses.

    3. Re:Mod parent way up! by inKubus · · Score: 1

      Well, ERP solutions try to assign other units to "resources" (not just money) and store them in a subledger somewhere. And BPM systems are trying to do that with everything else.

      --
      Cool! Amazing Toys.
    4. Re:Mod parent way up! by afidel · · Score: 1

      Ha, we are working towards an archiving solution for our ERP system and accountants are just as bad as anyone. A simple date based approach will NOT work in an ERP system, you need a tool which understands the relationship between objects in the system and which only performs an archive if all related objects fall into the archive period. Plus books are relatively simple from an archive perspective, they have a legally defined life, most ad hoc data is not so neatly categorized.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    5. Re:Mod parent way up! by Anonymous Coward · · Score: 0

      The next time my CEO comes to me looking for a fact, image or video that was used years ago as part of some obscure meeting or project, I'll just explain that to him then.

      Oh wait, no I won't. I'll just keep saving everything so that I can deliver on the random requests that frequently come up from senior executives.

    6. Re:Mod parent way up! by im_thatoneguy · · Score: 1

      Or maybe you accountants just work with a completely different kind of data which actually does operate on a predictable expiration schedule. A lot of our data never expires.

      We actually have 6 file classifications.

      1) Working (highspeed)
      2) Permanant (Persistent data)
      3) Tools and executables
      and
      4) Archive (Where the highspeed data moves when it's no longer needed. Which has to be sorted by hand)
      5) Temporary (Gets deleted instead of archived from the highspeed server)
      6) Local instance (copy from highspeed to even higher speed local raids) of #1 using PowerToy

      How are our applications supposed to know which is which? We often can't even tell.

  39. What about History? by jumbomojo · · Score: 1

    Altho I agree that the inertia of keeping records trumps the work of evaluating them, the large financial services company I work for is turning with the tide, starting to focus on deletion and destruction, mainly for potential liability reasons. Not just aged documents, but prior versions, drafts, notes, etc. It makes me wonder what the historians of the future will have left for primary sources--besides the final, signed-off Establishment-sanctioned records of events. Are we on the road to compromising their ability to determine and describe What Really Happened, and thus our own ability to understand our past? Could John M. Blair write "The Control of Oil", or Ron Chernow "Titan: the Life of John D. Rockefeller Sr." fifty years hence?

  40. skipped the article, loved the tag by Anonymous Coward · · Score: 0

    mmmmmmmmmmmmmmmmmmmmmmmmmm, mooreslaw.

    1. Re:skipped the article, loved the tag by Anonymous Coward · · Score: 0

      Moore slaw sound swedish. Is that the salad they have on pizza?

  41. Re:10 GB user data? Not likely by euri.ca · · Score: 1

    Nope that sounds pretty typical :)

    After all, most coders come in everyday and re-copy the source tree, libraries and all to a new folder, in case they make a mistake and need to go back to a previous version.

    No? Really? They told me that this was industry standard practice.

  42. This is what Retention Policies are for by Phrogman · · Score: 1

    IANAL. This is why most companies spend some money developing a retention policy and planning its implementation. It requires a bit of time from every employee to decide if a piece of information is something that requires short term, long term or permanent storage but if you get people into the habit of sorting things like email into folders that reflect the company retention policies (which need to be pretty clear and well planned both from an IT and a legal perspective) then you can reduce the cruft you retain considerably.

    With clear policies on when the various categories of information can be safely and legally deleted you can reduce the storage costs and simplify the e-discovery phase if it comes up.

    Likewise you need good planning and employee training on what to do when a Hold is placed. Ie, if your company enters litigation, you will place a hold on data deletion and *NOTHING* gets deleted so that the courts can't find you guilty of attempting to hide information from them in a litigation.

    Any company that doesn't come up with a retention policy that takes everything into consideration, doesn't train its employees on those policies and doesn't practice what it has decided will be its policy is in for a world of hurt when suddenly its in court and has to produce emails from a specific individual or individuals from 3 years ago etc.

    If your employees can generate 10Gb of data during the course of a year, then they can learn how to apply retention principles to it while they do so. Its just one more aspect of the job.

    Now there are various attempts at software to automatically filter and organize your data - email and documents etc - according to key words and phrases, email addresses etc. I believe some of them are pretty well evolved and take a lot of the burden off your employees - and cover you when those employees can't be bothered to do what they should be doing according to the rules, but I have no experience with how well these work.

    Here's an article on email retention (from a quick google search, no idea how well its written)
    http://searchstorage.techtarget.com/tip/0,289483,sid5_gci1212767,00.html

    --
    "The first time I got drunk, I got married. The second time I bought a chimpanzee, after that I stayed sober" Arian Seid
  43. Re:10 GB user data? Not likely by Profane+MuthaFucka · · Score: 1

    That's nothing. I work for a computer consultantcy and I have half a terabyte of attachments and meeting invites alone.

    --
    Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
  44. Re:10 GB user data? Not likely by BrokenHalo · · Score: 1

    At least that's what you tell your boss so he won't find your porn.

    Bullshit. Proteins are MUCH more interesting than porn. ;-)

    Actually, I am only half joking - I waste far too much time here on Slashdot, and from time to time I have to give myself a nudge to get on with my job, only to find that the work is more interesting...

  45. Re:10 GB user data? Not likely by BrokenHalo · · Score: 1

    They told me that this was industry standard practice.

    No. The industry standard practice is to store the source code for every program you have ever written on punch-cards in a locked filing cabinet.

    Or didn't you know that? ;-D

    (Just to spell it out for the irony-impaired: if this slips under the radar of your world view, google "Real Programmers Don't Eat Quiche".)

  46. Throwing Policies at a Technology Problem? by LittleBigScript · · Score: 1

    What about throwing company policies at a technology problems?

    Hypothetically (never happens in the real world of course), what if there was a document management server, samba dropbox, where all documentation for deliverables are kept in portable excel 2003 format? What if content identification is done my creating folders with "project" and "project"_old naming conventions, hyperlinking is done in excel (because html is complicated), and ad nauseum for the automated process called "company policy"?

  47. Re:10 GB user data? Not likely by cashman73 · · Score: 1
    That's nothing. I work for a computer consultantcy and I have half a terabyte of attachments and meeting invites alone.

    It sounds like you work with a lot of old people that have to send attachments to themselves because they haven't heard of this new-fangled thing called "FTP",...

  48. 1TB enterpise storage is more like $8000 and more by as400master · · Score: 1

    It is true that a single 1TB ide desktop drive can be bought for around hundred bucks but in the enterprise world most companies use scsi drives. The largest capacity SAS drive you can buy now is 146GB so adding in raid etc. it will be $8000 later before you walk out the store with 1TB usable storage.

    From TFA ediscovery on 1TB of data cost around $1 to $3 million so suddenly that 1TB ide drive "cost" to the company is way more than just $100.

    So simply buying another $100 1TB drive before considering other options is not very wise.

  49. Sarbanes Oxley by Anonymous Coward · · Score: 1, Informative

    I'm not sure if most of you understand what is really being written about here. There are laws in place that REQUIRE that companies retain EVERY document according to a certain set of rules. These rules change depending on the type of company, but a good rule of thumb is 2 years of document retention. Publicly traded companies are under even more extremely strict guidelines including Sarbanes Oxley.

    Exchange servers alone will generate huge amounts of data in no time at all. When these companies go into litigation (and they almost always do), all of this data is considered discoverable and can cost the legal department huge fees.

    When involved in litigation, these documents can not simply be pulled out of archive and made available for review. There is a very strict set of rules that require these documents are produced in a non-editable, read-only image format (usually tif) and then put into discovery review platforms such as Concordance or Summation. This costs tons of money to have produced because they typically do not produce them in house.

    The cost of producing the documents is only the beginning though. After they are produced, the legal fees of having them reviewed is where the really steep fees come into play. Lawyer fees can run upwards of 250 - 400 dollars per hour. That means that an email that took someone 10 minutes to type out might be reviewed for 30 min- 1 hr by the Legal Team (depending on relevance). So that single email could end up costing several hundred dollars between document production and review.

    Now, if there are suspicious documents that have links to files that no longer exist, the opposing counsel has the right to do a forensic investigation on the system to look for deleted files. If they are found to be deleted when they should have been kept, the court can actually sanction the company in question... not a good position to be in!

    Electronic Discovery is huge business these days and only grows as more and more companies enter litigation each year.

  50. Re:10 GB user data? Not likely by Profane+MuthaFucka · · Score: 1

    FTP? Is that a new mainframe thing? Will it work with Lotus Notes? I guess that gives too much away about where I work.

    --
    Fascism trolls keeping me up every night. When I starts a preachin', he HITS ME WITH HIS REICH!
  51. The major cost is getting tapes offsite by as400master · · Score: 1

    The media itself is a small cost, the major cost is in the sending, storing and retrieving the media from off site vendors like Iron Mountain.

    With the amount of data growth it's no wonder Warren Buffett has a position in Iron Mountain.

  52. Re:10 GB user data? Not likely by Apoorv+Khatreja · · Score: 1

    You moron, they counted the BitTorrent data transfers too.

    --
    RutSum.com
  53. Far more than 10GB of original data by AliasMarlowe · · Score: 1

    10GB of original data is easy, and it doesn't take a year, just a week or two. Today and yesterday, I measured physical properties of a lot of output from a particular industrial process (just one plant in a factory, and I only recorded measurements of a few instruments). This only gave me a few hundred MB of raw data, but it will result in several GB of data after analysis. This is all original data, and this is a normal amount of output. I regularly fill several DVDs with this sort of archive data.

    Of course, this kind of data (spectroscopic, radiometric, structural anisotropy, etc.) is probably not what the lawyers would be interested in. It would take far longer to explain it to them than to collect and analyze it.

    FWIW, the data in question does not involve recording videos or images. We don't have the bandwidth or storage space for that, as it could result in TB per day of raw data in industrial contexts. Camera-based instruments analyze the images in real time. The images are discarded; only the analysis results are recorded.

    --
    Those who can make you believe absurdities can make you commit atrocities. - Voltaire
    1. Re:Far more than 10GB of original data by arth1 · · Score: 1

      10GB of original data is easy, and it doesn't take a year, just a week or two.

      Sure, but that's an exception. You have to produce a LOT of data to make up for how little data receptionists, managers, janitors, canteena personnel and all your average desk slaves produce.
      And even if you do, not all companies have the need for super-producers like that. (And when they do, they tend to provide them with back-up systems that are separate from the rest of the company)

      At a guess, I would estimate that the average worker produces 7-8 e-mails a day, 3-4 word and excel documents a week, and half a line of code. Most produce less, some produce more, and a very few will produce a lot more.

      One typical problem is when some brain dead HR employee sends an e-mail to all 5000 employees, there will be 6000 copies of it (cause 1000 of the recipients are idiots who'll forward it on or reply to it including the text), that ALL are backed up. Usually in a horrible Microsoft format that makes a single e-mail several hundred k or even megabytes, when it contains less text than this post.

      Another problem are people who consistently save their documents with a new name every time, instead of using a version control system. Seeing hundreds of copies of big documents isn't uncommon. Each of these documents isn't "generated data". It's 99% copied data, with a few modifications. The whole document shouldn't count towards how much an employee produces.

  54. Identifying useful data is still manual by billstewart · · Score: 1

    Sure, it's easy to just discard everything older than X. The problem is that there _is_ data you need to keep for a long time, so that crude an approach isn't very effective. (For instance, for financial records you need to keep a record of everything you've bought until you sell it or declare it fully depreciated, and then keep those records for N years longer for tax purposes.)

    For my work, I don't usually need files much older than 2-3 years, but occasionally I do need to drag out something 10 years old (typically standards documents or RFCs, though), and one of my customers has an access ring that we installed 4-5 years ago and occasionally need to look up things about. In the telecom business, you regularly need to drag up design documents and database schemas for anything that's still in the field, which is sometimes quite antique. (For instance, the data format of a T1 hasn't changed much since the early 80s, and it's mostly the same as the ~1960 original, even though the implementation hardware and software have changed radically over the decades, and robbed-bit has mostly been abandoned. The European E1 standards were more flexible, since they learned some lessons from T1's signalling limitations, but that means that each different telco does some ugly unique cruft that you have to look up.)

    Of course, there are extreme cases - my wife had a summer job in college converting a several-year-old database from a hand-rolled format into a then-current IBM database format, just in case the data got subpoenaed in an regulatory lawsuit of some sort (it never was, AFAIK.) But there's still telco data out there that predates the practical viability of the Relational Database...

    --

    Bill Stewart
    New Fast-Compression-only CPR http://preview.tinyurl.com/dy575ks
  55. Yes local hard drive by Smeagel · · Score: 1
    Not even worth your time to discuss it..hahahahaha. OK, you're right, every company in the world uses network-drive-only setups, and bans their users from any writing to either linux scratch drives or C:\ (usually the location of MY DOCUMENTS). That's completely accurate...dws.

    We're not talking SQL servers here, or customer information databases, we're talking average employees performing their business duties.

    1. Re:Yes local hard drive by jimicus · · Score: 1

      Not even worth your time to discuss it..hahahahaha. OK, you're right, every company in the world uses network-drive-only setups, and bans their users from any writing to either linux scratch drives or C:\ (usually the location of MY DOCUMENTS). That's completely accurate...dws.

      Which can be trivially redirected to a network drive through Group Policy.

      Hell, it can be trivially redirected to a network drive using a Windows NT 4 domain policy.

      Unless you're a very small company indeed, this is the only sensible thing to do unless you plan to backup every PC individually.

    2. Re:Yes local hard drive by Smeagel · · Score: 1
      Every place I've ever worked at had a network drive that you knew would keep your data safe and backed up, and then full access to your normal hard drive (including locally stored my documents) which was used to store non-critical information, and your own knowledge of its potential for loss. That's not a rare setup at all, unlike what you might pretend.

      And FYI, I worked at one of the largest business software companies in the states, and one of the largest pharma companies in the states - so no, it's not limited to tiny companies.

  56. Once again by Anonymous Coward · · Score: 0
    Your definition of "data" is far different than an average corporate user. The average user in a 5000 person company is not going to be performing sysadmin duties (or backing up security footage), they're going to be creating documents, spreadsheets, powerpoint presentations. In a software company they're going to be writing code, and compiling it to the local tmp directory that certainly isn't backed up.

    It cracks me up all the "experts" of backup on this site suggesting that an average corporate user is generating insane amounts of data. I understand you generate absurd amounts of data, I understand I generate insane amounts of data. I also understand the vast majority of corporate users wouldn't be able to generate 10 gigs of data without a file sharing program and an mp3 player.

  57. Sarbanes Oxley vs. History by argent · · Score: 1

    The end result of Sarbanes-Oxley, on top of the increasing amount of encryption and the use of high-density short-lived storage, is going to be a frustrating gap in the historical record for future generations.

  58. Store Smarter, Not Just More by Doc+Ruby · · Score: 2, Interesting

    Let's say your corp is more than 50% likely to go through "e-discovery" once every 10 years. Each worker will generate 10GB * 10 years = 100GB, backing up all the increasing data pile is (pairing the balancing ends of the accumulation for half the accumulation years) 101GB * 5 = 505GB, at $5:GB is $2525, plus about $2M:TB / 505GB = $1.01M, for a total of $1,012,525 per worker, times at least 0.50 probability is at least $506,262 average predictable cost per employee.

    One approach is to keep much less data. But when you keep less data, you have to guess right every time what data you'll need later. If your process discards data that's valuable later (but lost) it better be worth less than the amount you save. That's too hard to know, which is one reason companies keep all the data, and figure it out later.

    A better approach is just to cut that $1-3M:TB e-discovery cost. Of course, the best way is to avoid being investigated, but one has less than 100% control over that, especially from inside the IT department. A much better way to do it is to better inventory the data stored as you go along accumulating it, in the terms in which a later e-discovery would search it. Which also can have the benefit of making the info in the data more available in the normal course of business, which can make that data's increased value (and lowered costs of searching it) worth the entire process. The cheaper possible e-discovery would be just a bonus.

    What really gets me is how these economics are the true cost of storage. A 1TB drive costs $120, and maybe a better 1TB in a 100% redundant RAID costs $250. But it really costs something like $300,000 over its lifetime (probably replaced every 3 or so years, across the 10 years I analyzed). If IT spent a few hundred hours a year streamlining the navigation of all that data, at a cost of a few dozens of thousands of dollars, divided across all those employees, the entire org's IT operations would be much more economical, when the large cumulative risk of e-discovery costs are factored into the true cost.

    --

    --
    make install -not war

    1. Re:Store Smarter, Not Just More by Anonymous Coward · · Score: 0

      listen asshat, we don't need to hear more crap from you.

  59. Does it matter? by SanityInAnarchy · · Score: 1

    There should be enough local cache for every user to have access to every document they could possibly create, unless you are working at a movie company. Given proper indexing, it should be possible for users to find what they need.

    Storage is cheap enough for this to work, even if some documents are slow (compressed, maybe combined as deltas with other very similar documents) or very slow (have to pull from tape or something). But again, all of that which an average user needs should be cacheable on their own local hard drive.

    Granted, the tech isn't really there, especially for desktop apps. But how much is it costing not to purge? How much would it cost to write software to make it easier to purge (and train users on that software), vs writing software to better archive (and just taking the hit on hardware)?

    --
    Don't thank God, thank a doctor!
  60. Although I agree with you in principle.... by Degrees · · Score: 2, Informative

    I've become the e-discovery guy (at least for email) where I work. Our lawyers told me that the latest revision of FRCP (Federal Rules of Civil Procedure) require an entity to keep evidence, even if automatic purging systems are in place.

    Rule 37 of FRCP says that if you are ordered to hand over the evidence, and you cannot, then the judge can order that "designated facts be taken as established for purposes of the action, as the prevailing party claims". In other words, if the person suing you claims you sent them an email offering a million dollars to not go to court, and you auto-purge your email (taking away the ability to prove you didn't send the email), the judge has the option of deciding that yes you did make an offer of a million dollars via email. T'would suck to be you.

    It even gets a little worse. Although you must keep evidence after being told you are being taken to court, it turns out you need to keep all evidence in case you are taken to court. I'm told that the criteria here is "reasonable expectation that the matter will go to court". It's reasonable (for example) to expect to end up in court if an employee dies while on the job (and it wasn't due to natural causes). The point here is that if a person dies, you'd better keep any email about the situation that lead to death - 60 day auto-purging email expiration practice be damned.

    Auto-purging is a fine thing, as long as you have the ability to except items out, in case they become evidence.

    --
    "The most sensible request of government we make is not, "Do something!" But "Quit it!"
  61. Re:10 GB user data? Not likely by afabbro · · Score: 1

    You're obviously not writing software, doing CAD work, or any kind of computational modeling. It's easy to have that much data -- my source tree alone is 2GB.

    And what about our colleagues in the porn production industry? I mean, one hour of hi-res MPEG is a lot of megabytes. Multiply it by the number of, ah, employees...

    --
    Advice: on VPS providers
  62. mainframe retention period by Anonymous Coward · · Score: 0

    The dataset in os/390 had a metadata field for it's retention period. The file system would delete the file if it was older that X time. Pity that UNIX and Windows do not have such a concept.

  63. ratmail sux by Anonymous Coward · · Score: 0

    Folders (no, you can't make your own folders):

    # New contains all un-opened mail.
    # Read contains opened mail.
    # Inbox contains mail held via the Hold button.
    # Outbox contains draft and future mail.
    # Sent contains mail you sent.*
    # Old contains deleted mail.

                *If you include yourself as a recipient on a message it appears in your New folder rather than your Sent folder.

    Message Retention
    # New, Read, Inbox and Outbox messages will be available for 60 days from the original date.
    # Sent messages will be available for 7 days from the original date.
    # Old messages will be available for 3 business days from the date of deletion.

    I hate it.

  64. Purging data can be disasterous by Anonymous Coward · · Score: 0

    Later on you may need that data:
    * To prove you used or thought of or invented something before someone else later applied for a patent on it, or
    * To prove a claim that some agreement was made, or
    * To show you acted in good faith in some matter

    Toss the email and you may lose trails of evidence about such things. Also, you can't always tell at the time you get/send mail which mail will need to be kept, nor for how long. I've had patent relevant stuff that was needed 10+ years later, for example. Toss the spam, but the rest should be mostly kept. (Attachments might be trimmed a bit more than your own words.)

  65. yawn by Anonymous Coward · · Score: 0

    Predict score -5

    why do i care? slow 'news' day?

    My score:
    9/10 on the WGAS (who gives a) scale, bigger is worse

  66. One solution... by FlyByPC · · Score: 1

    Back up everything and hide the media somewhere. If they subpoena it, deny everything.

    I'd probably do something like this -- assuming I ever back up my data. Which, as far as you know, I don't.

    --
    Paleotechnologist and connoisseur of pretty shiny things.
    1. Re:One solution... by Degrees · · Score: 1

      Actually this strategy could cause you to auto-lose your case. Obviously, it depends on the evidence the person has that is taking you to court. If they don't have much, you might get away with it. If they do have even a little evidence, and you give the finger to the court, the results won't be pretty.

      --
      "The most sensible request of government we make is not, "Do something!" But "Quit it!"
  67. JUST ENCRYPT IT GODDAMNIT by scientus · · Score: 1

    they cant get the password/private keys unless you give it to them, its called the fifth amendment, protection against self-incrimination.

  68. Re:1TB enterpise storage is more like $8000 and mo by pixr99 · · Score: 1

    in the enterprise world most companies use scsi drives.

    Or fibre channel.

    The largest capacity SAS drive you can buy now is 146GB

    I just bought a bunch of 300GB SAS drives for one of my NAS. Actually, it looks like 1TB drives are available now too!

    Technicalities aside, you're completely right. Folks don't seem to understand that the hard drives you walk out of Best Buy with are not the same that plug into your CLARiiON. And then of course, there is backing up that data because the enterprise backs up more of its data than it doesn't. You need a backup platform, tape systems, tape, backup licenses, off site storage and people to manage all of it.

  69. Out of curiosity by Anonymous Coward · · Score: 0

    The "experts" cited in this, they wouldn't happend to be professionals affiliated with companies that offer services to comb through your data and help you get rid of all the old stuff, would they?

  70. Fucking LIES by Anonymous Coward · · Score: 0

    $5 per gigabyte to back it up? Fuck YOU. Liar. Try 50 cents per gig or less! This whole article's credibility is nothing...

  71. First thing first by martin_dk · · Score: 1

    It usually turns out that few files takes up the most place. SELECT filename, size FROM allDisks WHERE size > 2 MB ORDER BY size DESC Add your filter of choise to remove important files from the result, and go ahead erase a bunch of useless huge files.