Slashdot Mirror


Dell Says 90% of Recorded Business Data Is Never Read

Barence writes "According to a Dell briefing given to PC Pro, 90% of company data is written once and never read again. If Dell's observation about dead weight is right, then it could easily turn out that splitting your data between live and old, fast and slow, work-in-progress versus archive, will become the dominant way to price and specify your servers and network architectures in the future. 'The only remaining question will then be: why on earth did we squander so much money by not thinking this way until now?'" As the writer points out, the "90 percent" figure is ambiguous, to put it lightly.

224 comments

  1. Coincidence? by Hognoxious · · Score: 5, Funny

    90% - just like the percentage of statistics that are made up on the spot.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    1. Re:Coincidence? by dov_0 · · Score: 5, Funny

      Or is dell about to make a press release about faulty storage in their servers resulting in about 90% data loss?

      --
      sudo mount --milk --sugar /cup/tea /mouth /etc/init.d/relax start
    2. Re:Coincidence? by espiesp · · Score: 5, Funny

      Or having developed a new memory technology.

      "Dell releases a new drive based on their patented WORN architecture. Because this device forgoes the need to read your data they can be made lighter and faster and more power efficient than even the latest SSD drive technology."

    3. Re:Coincidence? by Anonymous Coward · · Score: 0

      90% of Slashdot comments are written, never having read TFA.

    4. Re:Coincidence? by Anonymous Coward · · Score: 0

      I never leave comments on Slashdot, but I couldn't resist the urge to commend you on summing up your insight in one sentence with a piercing poetic humor that puts the fun into being a technologist. Bravo!

      90% - just like the percentage of statistics that are made up on the spot.

    5. Re:Coincidence? by Jeremiah+Cornelius · · Score: 1

      I have an Outlook Folder on my work machine, labeled "Things I Will Never Read". Last count, there were 4,700+ unread items in that location...

      Of course, they are available for CYA through the search capability.

      --
      "Flyin' in just a sweet place,
      Never been known to fail..."
    6. Re:Coincidence? by Anonymous Coward · · Score: 0

      I prefer to store all my data on the infinite storage medium known as /dev/null.

    7. Re:Coincidence? by bennomatic · · Score: 1

      Makes you wonder why so much of this data is even written, if it's never going to be read.

      --
      The CB App. What's your 20?
    8. Re:Coincidence? by hairyfeet · · Score: 3, Insightful

      Probably SOX and other data required for CYA. I have set up small business networks for quite a few businesses, and while I don't know about 90% I'd say a good 70% of the data they had me set up backup solutions for was stuff they would never break out unless a CYA situation came up like an IRS audit. The simple fact is you have to keep a LOT of stuff to CYA nowadays, and most of that stuff won't be used in any other situation.

      So while I'm not sure about the 90% part at least from my own experience I can believe 70-80% easy. With the possibility of lawsuits (both you suing them for unpaid bills or them suing you because they decide they don't like the work) IRS audits, SOX, there is a whole lot of data that unless a specific set of circumstances come up will be WORN. That is just a part of doing business in the digital age.

      --
      ACs don't waste your time replying, your posts are never seen by me.
    9. Re:Coincidence? by theCoder · · Score: 1

      Dude, if you're never going to read it, you should just unsubscribe from LKML.

      --
      "Save the whales, feed the hungry, free the mallocs" -- author unknown
    10. Re:Coincidence? by Jeremiah+Cornelius · · Score: 1

      You have no idea, the numbing internal lists that I have been subscribed to through role-policy.

      Pissing matches about zero-copy would be a welcome substitute.

      --
      "Flyin' in just a sweet place,
      Never been known to fail..."
    11. Re:Coincidence? by turbidostato · · Score: 1

      "Or is dell about to make a press release about faulty storage in their servers resulting in about 90% data loss?"

      Funny. A bit more on the serious side, they might be opening the field for the annoucement that they go into the tiered storage market for short/mid companies. It seems the idea is flourishing among people that never considered it before because of the SSD. That broght to them the idea that there's fast storage, but short and expensive and then there's large and (relatively) cheap storage, but slow. Couple that with bright coloured brochures "demonstrating" that you don't need most of your data to be there, first line, and hop! a new market is opened.

      Of course, tiered storage is no news for the big folks, but Dell is not focused on them anyway.

    12. Re:Coincidence? by skids · · Score: 1

      Exactly, you hope you never have to read it, but will absolutely need to under some contingencies.

      Kinda like a large percentage of condoms go unused.

    13. Re:Coincidence? by kmoser · · Score: 1

      So the suits can include it in their TPS reports. Seriously, you can always erase data; you can't un-erase data you never had to begin with. Therefore, if you're in doubt as to whether you'll need it in the future, you might as well save it.

  2. Which 90% ? by mbone · · Score: 5, Insightful

    I could believe the 90% number. There is plenty of data sitting around in case it is needed. Some of it will be needed. Much of won't be. How do you predict which is which ?

    1. Re:Which 90% ? by eldavojohn · · Score: 5, Insightful

      I could believe the 90% number. There is plenty of data sitting around in case it is needed. Some of it will be needed. Much of won't be. How do you predict which is which ?

      Yeah, as someone who has implemented a few auditing solutions where I work, I must confess that it seems to be 99% of the data we archive is never looked at again. A lot of it is due to policies and is only used after something goes dreadfully wrong. If they are well thought out, the metrics can be collected as the data is written instead of needing to search across the data.

      I think their "90% dead-weight rule" is really a misnomer as you could probably claim that 90% of Google's indexing is never read but we all know that it's the potential that data holds that makes it so valuable and necessary. If Google knew every future possible search then they could delete the data they will never use ... but how do they know they will never use it? How do I know that the auditing data will never have a use--by new metric or incident investigation? The truth is simply that you don't.

      --
      My work here is dung.
    2. Re:Which 90% ? by AnonymousClown · · Score: 1
      Name, address, phone #, and shit purchased.

      Anything else is a waste.

      --
      RIP America

      July 4, 1776 - September 11, 2001

    3. Re:Which 90% ? by Anonymous Coward · · Score: 2, Interesting

      I work for a large resource company and we collect loads of data... some of which is valuable today and some of which is valuable tomorrow... interestingly what is of value tomorrow is dependent on the maturity for data consumption is today.......

      so we collect the data not because it's of value today, but because we might analyse it tomorrow in a new way.

    4. Re:Which 90% ? by sco08y · · Score: 2, Insightful

      I think their "90% dead-weight rule" is really a misnomer as you could probably claim that 90% of Google's indexing is never read but we all know that it's the potential that data holds that makes it so valuable and necessary.

      Another problem is figuring out _why_ data isn't used before archiving it. Is it not useful, or are the tools not in place to use it?

      If companies decide that the x% least used data will be shoved away in the attic, then "x% of data isn't useful" becomes a self-fulfilling prophecy.

    5. Re:Which 90% ? by shentino · · Score: 1

      Just like 90 percent of the time you don't need to file an insurance claim, but when you do, you really do need it.

      It's just insurance.

      Sorta like how we have a big military that is spending more time in training than actual combat.

    6. Re:Which 90% ? by camperdave · · Score: 2, Funny

      People change addresses and phone numbers at the drop of a hat, so recording that would be pointless.

      --
      When our name is on the back of your car, we're behind you all the way!
    7. Re:Which 90% ? by alexhs · · Score: 3, Interesting

      If each piece of data has 90% probability of not beaing read again...

      You discard only 10 pieces out of 100, or out of 1 billion, whatever...

      The probability that none of these 10 pieces of data would have ever been needed again is 0.9^10 = 0.348 = 34.8%

      Which means that you keep all of your data.

      Caveats :

      • This assumes that all pieces have equal interest (but maybe you store a field that the interface doesn't allow you to retrieve).
      • Assuming a random access on the 10% used, if you remove 10 out of 100, you have a much more important retrieve failure than if you remove 10 out of a billion. Some retrieve failure rate could be acceptable.
      --
      I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
    8. Re:Which 90% ? by Anonymous Coward · · Score: 0

      I worked for a company like this once. All incoming, outgoing faxes, and fax confirmations (physical copies) were stored for 6+ months on the off chance something needed to be verified. There were lots of errors in our system and our vendor's system, so who had the oldest archived copy usually ended up not having to pay for that particular mistake. My job was to archive and label all these, and then rotate out the oldest and fill the old, now emptied boxes with newer papers. Sales quotes (on paper!) were saved for 3-5 years. This was a full time position, rotating paper, so we could save about a thousand dollars a week in fees and clerical errors. What we saved on clerical errors almost paid for my salary+benefits. I ended up engineering myself out of the job when I suggested we could just scan in all the documents for the day that I was archiving, and then have a monthly checklist to delete anything older than 5-7 years if we ran out of hard disk space. The job went from a full time salaried position to a 3 hour a week job that a college intern in a short skirt did on friday mornings scanning in papers and making coffee. Shockingly, this happened in 2008, not 1958.

    9. Re:Which 90% ? by ta+bu+shi+da+yu · · Score: 1

      HIPPA laws say differently. And fair enough too - they should definitely be retaining your medical records for up to 7 years!

      It's funny that the last sentence in this slashdot piece asks why didn't people do this before. They did, and indeed they still do! People have secondary and even tertiary backup - in fact I happen to know that even stodgy old EMC have made a mint out of their Centerra storage devices for this sort of thing. It's called Content Addressable Storage, and despite a particularly brain-dead mechanism for addressing stored data (the content becomes the address - it gets hashed!), it's been pretty popular in the marketplace.

      --
      XML is like violence. If it doesn't solve the problem, use more.
    10. Re:Which 90% ? by bryonak · · Score: 1

      The same applies to backups as well.
      Almost all of the backups you make are never actually needed, but that's a weak argument to forgo them.

    11. Re:Which 90% ? by CAIMLAS · · Score: 1

      A big part of it is: how are they quantifying "data"?

      We keep machine backups. They are each anywhere from 3GB up through 20Gb and averaging around 10GB, and each host has 2-3 copies (taken weekly). Then we've got database backups which, likewise, are taken multiple times a month. These databases aren't pure data, but are instead part of larger systems - transactional tables, but also the application's "we need this data to run" tables which, from what I've seen, rarely get used much at all. The subset of data within the database which is actually used is relatively small (though I can't offer a percentage).

      We also archive all incoming and outgoing mail for compliance issues. This ends up taking a lot of storage - far, far more than the users' actual inbox.

      Add to that the many, many desktop files (Word, Excel, etc.) which are created, used once or twice for a project, and never touched again - they get put in a project folder and archived. You've still got to store those files, but as the data is usually time-contextual, it's useless except for reference. And the likelihood of reference is negligible.

      How long is your data retention plan? Weeks? Months? Years? I suspect that the larger the organization (and the more highly regulated) the longer the period of time is that you (have to) keep your data.

      "Read once" data is certainly a very high percentage, though I suspect that it's probably higher than 90%, personally. Checking the ctime and atime on one of our backup systems, the percentage is actually pretty close to 100%.

      --
      ~/ssh slashdot.org ssh: connect to host slashdot.org port 22: too many beers
    12. Re:Which 90% ? by jeffmeden · · Score: 2, Insightful

      Bingo. The first thing I thought of is "sure 90% goes to waste but you don't know *which* 90% until after the fact"...

      Is Dell working on a patent to send information back from the future about what stored data is never used again? I just hope they don't stumble on the Slashdot comment archives, the future-tubes would be clogged indefinitely.

    13. Re:Which 90% ? by Anonymous Coward · · Score: 2, Insightful

      the metrics can be collected as the data is written instead of needing to search across the data.

      Yet if you are only going to ever look at it once then why bother optimizing to that case? I have also seen cases where doing this you loose some other piece of information. Like my example bellow you maybe right now only care about total time at a drop off. But maybe at some future point you care more about the time it started and ended? So careful what you prune.

      Having implemented a few systems myself one of the first questions I ask is 'how do you want to archive the data?'. Most people get a deer in the headlight look. Large databases affect performance in the long run. So prune your data. In many cases this is worth doing. You made bad decisions in the past. The data is gone after the regulatory period. No records of what happened. In many cases pruning data is a good decision. As there is no data to support you doing something wrong 20 years ago, even though you have fixed the issue now. Is it morally right? No. Good business sense, sometimes.

      Many companies are data hoarders. They gloom onto data and never let it go. They *might* make a report someday. But their culture will never let it happen as they really do not care to improve. They merely want to give the impression that they do. Hence the hoarding of data. Unless a cultural shift of actually wanting to improve the way things are that data is useless. In fact I would say a waste of resources.

      I have seen business's that truly use these data warehouses to great effect. Then I have seen *MANY* others that collect the data but then dont really do anything with it. Its just a report they can hand to their manager to show 'they are doing something'. Sure you can measure things. But are you going to do anything with it?

      It is also about knowing what to ask. Like one I saw 'my drivers are always way late to their last drop off'. Yet the right question was 'why is my driver not able to get out of the yard in the morning'. The drivers were making up most of time at the drop offs during the day. But eventually got way behind by the end of the day. The root cause was 200 people were all starting their shift at the same time and there were not enough dock spaces for trailers. So many drivers stood around waiting to be loaded up. But it took someone in the yard looking at the reports and said 'what if we shifted half the guys work shift by a half hour'. It worked. My point? You can sample the hell out of things. Have petabytes of data. Yet if you do not have people willing and able to ask the right questions to look at the data it is useless. Many companies are not willing to do this. As many people see their jobs as 'essential' and do not want to jeopardize that essentiality in any way.

    14. Re:Which 90% ? by iwaybandit · · Score: 1

      Do they track the number of times an indexed page's link has been clicked? Or perhaps, the number of times a page was linked on the first results page. It might be interesting to look at some of the zeroes, just don't include these accesses in the stats. It's almost like an anti-zeitgeist.

    15. Re:Which 90% ? by obarthelemy · · Score: 2, Funny

      as someone once said: "50% of my advertising budget is wasted... only I don't know which 50%"

      --
      The Cloud - because you don't care if your apps and data are up in the air.
    16. Re:Which 90% ? by Mspangler · · Score: 4, Insightful

      Note that I'm working from a process control perspective in a chemical plant, but 90% of data written is never read again sounds about right for when things are going well. It's when something goes wrong and you have to figure out what went wrong at exactly what time and what the regulatory consequences were that having all that previously unread data suddenly becomes very interesting indeed.

      And also when you start looking at a system in detail to see if you can increase output, or change a composition, all that usually ignored data becomes very valuable.

    17. Re:Which 90% ? by Planesdragon · · Score: 1

      If each piece of data has 90% probability of not [being] read again...

      Each piece of data has a 10% chance of being necessary. For any given sample, 1/10th of them will be necessary.

      Now, a MUCH more useful set of data is probability over time. 1/10 within 10 years? 5 years? 1 week?

    18. Re:Which 90% ? by Cylix · · Score: 2, Insightful

      I'm afraid they will run into issues if they do. There are already storage providers that will determine what data you are accessing frequently and move said data chunk to the faster storage area. Conversely it will move less frequently accessed data to the slower and cheaper bulk disks.

      It's a nifty optimization/shuffle technique that allows you to mix ssd, sas and sata disks for their various needs. The best part is it is rather auto-magic.

      We used to do something similar in a very manual process by keeping the most frequently access oracle data on the leading edge of the disk platters.

      The problem with all of these approaches is the data may not be needed now. Hell, I would certainly say that 90% of the data I store is useless. Except when they want to roll back to a certain period in the archive's life or we lose a chunk of data. The other half of the time is just legal requirements that necessitates storing EVERYTHING.

      --
      "You should always go to other people's funerals; otherwise, they won't come to yours." -- Yogi Berra
    19. Re:Which 90% ? by BrokenHalo · · Score: 3, Interesting

      Another problem is figuring out _why_ data isn't used before archiving it.

      The problem is that so much data is made available without anyone ever considering how useful it might be. At least we've come some way in the last 20 years:

      Back in the '70s and '80s I worked at many sites where mainframe ops used to clear tonnes of fanfold paper every day. This is why we had separate printer rooms: a bank of 6 or 8 barrel-printers belting out 132 columns of text at 1800 lines/minute created sacksful of dust.

      Most of that rubbish was never read in any depth - it was physically impossible to do so before it became out of date, so most of that paper went straight to the shredders, which often shared space with the printers that created the stuff in the first place. I used to have fantasies about lining up the shredders directly behind the printers to save everybody the trouble of distributing the printouts.

    20. Re:Which 90% ? by BrokenHalo · · Score: 2, Informative

      We used to do something similar in a very manual process by keeping the most frequently access oracle data on the leading edge of the disk platters.

      I haven't really kept up to date with HDD technology in recent years, but there was a time when some operating systems (Data General's AOS/VS, for example) allowed you to keep your most frequently accessed files (or even records in a database) around the middle of the disk platter, on the principle that the heads spent more time on average around the middle than at the extremities. Bear in mind that this was in the days when such a drive would typically hold 700MB of data, and of course that this principle has no value if you partition that drive.

      Having said that, I remember testing this at the time when I was sysmgr at a large DG site, and didn't find any conclusive evidence as to the value of this concept, so ended up ditching it as more trouble than it was worth.

    21. Re:Which 90% ? by BrokenHalo · · Score: 1, Funny

      Shockingly, this happened in 2008, not 1958.

      Not that shocking when you remember that the flatbed scanner was almost (but not quite) an object of science fiction in 1958.

    22. Re:Which 90% ? by alexhs · · Score: 3, Informative

      For any given sample, 1/10th of them will be necessary.

      I'm sorry but you're wrong. That's not how stats are working.

      Let's play heads or tails.
      Each toss has a 50% chance of being heads.
      According to you, for any number of tosses, 50% of them will be heads. In other words, you're saying that there is a 100% chance that half of them will be heads.

      For a sample of two tosses, that would mean a 100% probability of one head(s) and one tail(s).
      I hope that you see how this is wrong. You would actually have 50% probability of one head and one tail, 25% probability of two heads, 25% probability of two tails.

      For a sample of size n, 10% probability for a piece of data to be necessary, the correct formula says that the probability for at least one element of the sample to be necessary is 1-(0.9^n), which quickly approches 1 (100%) as n increases.

      Now, a MUCH more useful set of data is probability over time. 1/10 within 10 years? 5 years? 1 week?

      It depends of what you mean by probability over time. What I can tell you is that as more time elpases, the probability of an element to be necessary (more correctly, to having been necessary) increases. The 90% never read is supposedly for an infinity of time (that's what "never" means, right ?).

      --
      I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
    23. Re:Which 90% ? by Anonymous Coward · · Score: 0

      Not that shocking when you remember that the flatbed scanner was almost (but not quite) an object of science fiction in 1958.

      Excuse me, why is this considered flamebait?

    24. Re:Which 90% ? by OnePumpChump · · Score: 1

      You identify criteria by which you can divide data into activity categories. Say data from within the warranty period for whatever you're selling has a 30 percent likelihood of being needed again (if, for instance, you're selling Xbox 360s), but things from outside that period have less than 1 percent, you keep them both available, but the least likely data to be needed gets stored by cheaper means. Occasionally, you will have slower access to data you need right now, but most of the time what you need will be available by the quickest means. It isn't about knowing with absolute certainty which data is important and which is not.

    25. Re:Which 90% ? by Anonymous Coward · · Score: 0

      Did you piss of someone with an unrelated post, and they're spend all their mod points to mod down any post of yours they can find? /. moderation at its finest.

    26. Re:Which 90% ? by arctan1701 · · Score: 1

      There are already storage providers that will determine what data you are accessing frequently and move said data chunk to the faster storage area.

      Could you please list what products do this? Is there a solution for this for OSX/OSX Server?

    27. Re:Which 90% ? by kage.j · · Score: 1

      This is database architecture and IS used. It works like this: You tier where your data is stored based on how often it is read. Depending on your database load, you could use a tier similar to.. 1+ reads per second 1+ reads per day 1+ reads per month 1+ reads per 6 months 1+ reads per year (this is your archive/old data level) ..Based on your needs, you make data more available; data that isn't used is eventually shelved further back. Eventually unused data migrates toward 'old data' and takes increasingly more time/energy to read, but is still available when needed. Whereas your frequently accessed data is available immediately.

      --
      he demonstrated by A plus B minus C divided by Z that the sheep must be red, and die of the rot
    28. Re:Which 90% ? by kage.j · · Score: 1

      This is database architecture and IS used.

      It works like this: You tier where your data is stored based on how often it is read.

      Depending on your database load, you could use a tier similar to..
      1+ reads per second
      1+ reads per day
      1+ reads per month
      1+ reads per 6 months
      1+ reads per year (this is your archive/old data level)

      Based on your needs, you make data more available; data that isn't used is eventually shelved further back. Eventually unused data migrates toward 'old data' and takes increasingly more time/energy to read, but is still available when needed. Whereas your frequently accessed data is available immediately.

      --
      he demonstrated by A plus B minus C divided by Z that the sheep must be red, and die of the rot
    29. Re:Which 90% ? by kage.j · · Score: 1

      Sorry..I clicked "submit" instead of "preview." I had to clarify this a little bit because there was some redundancy . Also, "HTML Formatting" was selected instead of Plain old text. I replied again below which is more readable. =)

      --
      he demonstrated by A plus B minus C divided by Z that the sheep must be red, and die of the rot
    30. Re:Which 90% ? by h4rr4r · · Score: 1

      It's called hylafax and it is free. No paper ever! You can find something else for the intern in the short skirt to do I am sure.

    31. Re:Which 90% ? by Anonymous Coward · · Score: 0

      You're missing the point a little when you say:

      If Google knew every future possible search then they could delete the data they will never use

      It's not about deleting data. It's about setting up a hierarchy. This data is accessed every minute. Put it on the fastest SSD we have. This data is accessed once a day. Put it on the next tier down - not quite as fast as the SSD, but still pretty damn fast. This data is accessed once a month. Put it on SATA. This data is accessed once a year. Stick it in the HSM system, on tape that's reasonably quick to access. This data doesn't seem to be ever accessed. Stick it on whatever tape system (or even WORM media, such as DVD-R) is cheapest on a per-gigabyte basis, and the hell with the access speed.

      Generally, storage comes at a cost: the faster and easier it is to access data, the more expensive it is. If you get faster storage that's cheaper than another, slower storage mechanism, then that slower storage mechanism will die a painful death in the marketplace. Think, for example, of drum memory, or bubble memory. Ideally, your storage should be tuned to match the business needs; if you put everything on the fastest SSDs you can buy, you're wasting money.

    32. Re:Which 90% ? by afidel · · Score: 2, Informative

      Look for auto tiering, most of the newer products from EMC now support it. The technology is OS agnostic because it is done at the block level. Compellant and Isilon are two other vendors I'm familiar with that do auto-tiering.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
    33. Re:Which 90% ? by dropadrop · · Score: 1

      as someone once said: "50% of my advertising budget is wasted... only I don't know which 50%"

      He probably did not try some datawarehousing and analyzing what goes on with your product / sales during different kinds of ad campaigns...

    34. Re:Which 90% ? by jgrahn · · Score: 1

      Back in the '70s and '80s I worked at many sites where mainframe ops used to clear tonnes of fanfold paper every day. This is why we had separate printer rooms: a bank of 6 or 8 barrel-printers belting out 132 columns of text at 1800 lines/minute created sacksful of dust. Most of that rubbish was never read in any depth - it was physically impossible to do so before it became out of date, so most of that paper went straight to the shredders, which often shared space with the printers that created the stuff in the first place. I used to have fantasies about lining up the shredders directly behind the printers to save everybody the trouble of distributing the printouts.

      "They give birth astride a grave, the light gleams an instant, then its night once more."
      -- "Waiting for Godot" by Samuel Beckett

      (The connection between office papers and the quote is not mine; I heard it as a paraphrase somewhere.)

    35. Re:Which 90% ? by warriorpostman · · Score: 1

      Back in the '70s and '80s I worked at many sites where mainframe ops used to clear tonnes of fanfold paper every day. This is why we had separate printer rooms: a bank of 6 or 8 barrel-printers belting out 132 columns of text at 1800 lines/minute created sacksful of dust. Most of that rubbish was never read in any depth - it was physically impossible to do so before it became out of date, so most of that paper went straight to the shredders, which often shared space with the printers that created the stuff in the first place. I used to have fantasies about lining up the shredders directly behind the printers to save everybody the trouble of distributing the printouts.

      "They give birth astride a grave, the light gleams an instant, then its night once more." -- "Waiting for Godot" by Samuel Beckett

      (The connection between office papers and the quote is not mine; I heard it as a paraphrase somewhere.)

      "There you go again, blaming on your printers the faults of your business processes".

    36. Re:Which 90% ? by im_thatoneguy · · Score: 1

      Yep that's our approach as well. 100% of our data has to be stored because the need to access that 10% is completely random and beyond our control. "Hey remember that job you guys did 3 years ago. Can you do something just like it but change a few words?"

      Our primary storage system is three tiered:

      'Local' RAID0 for >200MBs with no backup or redundancy.
      'Online' Working Storage on a daily mirrored server and RAID15'ed NAS for ~70MBs This also goes offsite every day on a portable RAID.
      and
      'Nearline' which is a JBOD server for easy access to all the archives but is only periodically backedup. It is however on multiple LTO tapes offsite so we could restore it within about 24 hours.

      After a job is finished it goes into a 'hopper' on the Online server. The hopper is periodically emptied onto the Nearline storage and backed up at the time to Offline Tapes and drives.

      In this arangement about 90% of our data is on a JBOD server and tape. Both of which are incredibly easy to increase in capacity. You could buy 40TB of storage and keep it accessible to everybody for less than $5k. It won't be speedy but it doesn't need to be. Put 90% of your data on something which is only accessed 10% of the time. Then put everything that's "active" onto the main server.

      If someone needs something to be responsive then they can move it back to the main server until it's no longer needed and put back into the archive hopper.

    37. Re:Which 90% ? by Hognoxious · · Score: 1

      Yeah, as someone who has implemented a few auditing solutions where I work, I must confess that it seems to be 99% of the data we archive is never looked at again.

      Presumably you only archive data that's more than a few years old. A customer who bought a few times and then never called again. That supplier who wasn't up to scratch. Discontinued products. And don't forget any orders/invoices/shipments etc involving them.

      I'd think it's an overestimate even in that case, but to say that 90% of data is never looked at again seems ridiculous.

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    38. Re:Which 90% ? by Hognoxious · · Score: 1

      They gloom onto data

      They fill it with despair and make it depressed and despondent?

      --
      Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    39. Re:Which 90% ? by lordlod · · Score: 1

      The discussion isn't about discarding the data, rather pushing it back to slower and cheaper storage with less frequent backup.

      You could implement this like a cache. The front server interacts with the clients and holds 1TB of data, the back server(s) hold 10TB of data. You interact with the front server. When you read a file if the server has it you receive it quickly. If the server doesn't you get a 'cache miss', the server pulls it from the back store, caches it and provides it but takes 30 seconds to do so. Modified data is routinely pushed back to the back server with dirty flags etc. like most other caches.

      From a users perspective the 30 second delay is annoying but they wouldn't notice if it only happened once a week. Keeping the incident rate of that down is where stats like the 90% come in handy.

      Back to looking at your probability statistics. We don't have equal interest, that's what the entire article is about. Having 35% of this data needed again is fine, a quick reflection on standard work practices would suggest that these would be spread through time.

    40. Re:Which 90% ? by vuffi_raa · · Score: 1

      I think that you are missing the big one- the chances that your company has to pay out overtime or that your boss goes ballistic when you lose a vital document because it was erroneously deleted is right around 100%...

    41. Re:Which 90% ? by Anonymous Coward · · Score: 0

      On a similar vein, why can't cars have an indicator that lights up to tell you if you'll be needing your seatbelts on that particular journey?

  3. Hope They Don't Want the Z-Series by eldavojohn · · Score: 2, Insightful
    From the article:

    Opportunity too good to pass up

    It was just about then that one of my favourite bargain-hunting websites turned up a device called the CORAID EtherDrive. Take a look at the product range at CORAID, but don’t spend too long on it.

    That's the same device from a story I submitted yesterday. I hope they don't plan on getting a Z-Series running ZFS.

    --
    My work here is dung.
  4. It's true and pretty well known by Anonymous Coward · · Score: 0

    Or at least I've certainly heard it before, that in large storage systems, the average number of times that a file is accessed during its lifetime is less than one. That is, some files are accessed lots of times, but most are never accessed.

    That's certainly true of lots of the files I use. For example, I shoot a lot of digital photos and upload them to my computer. A few of them get a lot of use, the rest just sit there occupying some tens of GB of disk space (which is pretty cheap these days) and are never accessed except for the occasional migration to a new disk drive.

  5. which 90% by marmusa · · Score: 3, Insightful

    Which 90% though? Like the Coca Cola exec who remarked that he was pretty sure half of his advertising budget was wasted, he just wasn't sure which half.

    1. Re:which 90% by dohzer · · Score: 1

      Exactly. You don't know what you need until you need it, which is why you record all that you can so in the odd case that you do require it, it's where you need it.
      Deciding how quickly and frequently you will require the data is a separate problem.

    2. Re:which 90% by Koby77 · · Score: 5, Informative

      I worked in a call center, and I can definitely believe that 90% of the data is never read again. However, when a customer is calling back (and is angry!), you don't have time on a live call to wait to see what's up with the account. Also there can be some litigious aspects, and a lot of information was recorded for C.Y.A. purposes. Again, you never know which part is needed for C.Y.A. purposes, but that 10% sure is valuable.

      So yeah, we needed to store ALL the account information, and we needed fast access to ALL of it ALL the time.

    3. Re:which 90% by bwintx · · Score: 2, Informative

      Like the Coca Cola exec who remarked that he was pretty sure half of his advertising budget was wasted, he just wasn't sure which half.

      FWIW, and pointing this out only because I've seen this quote referenced so many times over the years...

      John Wanamaker, a 19th century entrepreneur, Lord Leverhulme, founder of consumer goods giant Unilever, and Franklin Winfield Woolworth, the founder of Woolworth's, have all been credited with the quote: "I know that half of my advertising is wasted. I just don't know which half."

      -- Citation
      -- Google search

      --
      Discussion System prefs link: http://slashdot.org/users.pl?op=editcomm
    4. Re:which 90% by itwerx · · Score: 2, Insightful

      "...we needed to store ALL the account information, and we needed fast access to ALL of it ALL the time."

      Which is why decent needs analysis is critical. In other situations that would not be the case.

      I must say this line at the end of the article does more to reflect the ignorance of the author than anything else, "...why on earth did we squander so much money by not thinking this way until now?"
            Who is this "we", kemosabe? Smart IT people have been thinking this way since the dawn of computers. Think of the huge storage rooms of archive, (not backup!), tapes that were around back in the mainframe days. We might store a higher percentage of it online nowadays but there's still a brisk market in optical storage arrays, high-speed tape libraries, various utilities for automatic email and database record archiving etc etc

    5. Re:which 90% by guyminuslife · · Score: 1

      I also worked in a call center, and while we had the same needs, we didn't get anything like that.

      --
      I don't believe in time. It's a grand conspiracy designed to sell watches.
    6. Re:which 90% by dbcad7 · · Score: 1

      The urgency in dealing with an angry customer is yours.. You want to get them off the phone. The majority of "angry" calls are BS, and a ploy to get something free. Companies rewarding people for yelling at them is wrong.. I am not saying that compensating someone for inconvenience is wrong, that should happen.. However someone calling in yelling and screaming and cussing does not help resolve the problem, and usually wastes time that could be spent on solving the problem, so that you can finish and help someone else. Access to previous caller information can be useful, but it really depends on how thorough the previous agents notes and troubleshooting are. Often just a few quick questions to the customer are faster than looking up the notes on the previous calls.. but yes it's good to have them is you need them. I have developed a pretty thick skin in dealing with the angry customers.. All you can do is let it play it's course until either the customer is ready to start solving the problem or crosses the line into abuse and you disconnect them.. I have not yet had to disconnect someone.. so that is why I say the majority are BS.

      --
      waiting for ad.doubleclick.net
    7. Re:which 90% by turbidostato · · Score: 1

      "However, when a customer is calling back (and is angry!), you don't have time on a live call to wait to see what's up with the account."

      Exactly. That's the stupid part on tiered storage systems based on access frecuency. It is not access frecuency but access *urgency* what should stablish what goes on the faster systems. Yes, I know that usually there's a correlation between frecuency and urgency if only from the efficency standpoint but it's still the wrong way to look at it (after all, up the point that frecuency and urgency correlates, data accessed more frecuently will end up on the faster devices anyway).

    8. Re:which 90% by arkane1234 · · Score: 1

      I also worked in a call center, and while we had the same needs, we didn't get anything like that.

      You obviously worked at another call center, then.

      --
      -- This space for lease, low setup fee, inquire within!
    9. Re:which 90% by Anonymous Coward · · Score: 0

      I've worked in a Dell call center and they are by far the biggest culprits when it comes to the 90% statistic. Each entry in their system is regarded as a legal document and they require a lot of information in every entry.

      For a while Dell was offering to replace the GX270 motherboards with bad capacitors for free, even when if the system was out of warranty. If someone would call in with 15 or more of those systems, it could take you hours to enter and set up the replacements in their system. The amount of information you would have to generate for each system was astounding. Your employee performance metrics are also heavily affected by the amount of information you generate within their system, the more you make the better your metrics would look.

    10. Re:which 90% by guyminuslife · · Score: 1

      Most likely. I'm just venting. A great crash course in what not to do in terms of UI, and a nice thought experiment in terms of trying to figure out WTF made interfacing with their database so miserable.

      --
      I don't believe in time. It's a grand conspiracy designed to sell watches.
  6. It's like Office features by drinkypoo · · Score: 5, Informative

    People always bitch that they have to pay for Microsoft (or whatver) Office's features because they only use 5% of its functionality. But you buy all those features at once because you don't know which you will need in the future. Data warehousing is the same way. If you start taking data offline you'll just need that data. That's why analyses of very large data sets are performed before archiving.

    But what is really wanted is a way to cluster the database servers, with old data automatically cycled to the slowest, most remote nodes, and with the most frequently-altered data heavily replicated and aggressively synchronized.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    1. Re:It's like Office features by 1u3hr · · Score: 2, Insightful
      People always bitch that they have to pay for Microsoft (or whatver) Office's features because they only use 5% of its functionality. But you buy all those features at once because you don't know which you will need in the future.

      Bullshit. True only if you've never used a wordprocessor in your life before. If you have, you know what you use. And you can read the description of other features to decide if you want them.

      And this is a pointless analogy because if in the future you decide you do need the 3D porn embedding, you can upgrade to get it. If you don't backup some of your data, you can never change your mind if you find you need it 10 years later.

    2. Re:It's like Office features by icebraining · · Score: 2, Insightful

      No, I think Office features are different; everyone only uses 5%, but each person uses a different 5%.

    3. Re:It's like Office features by Anonymous Coward · · Score: 1, Insightful

      >

      But what is really wanted is a way to cluster the database servers, with old data automatically cycled to the slowest, most remote nodes, and with the most frequently-altered data heavily replicated and aggressively synchronized.

      George Santayana: "Progress, far from consisting in change, depends on retentiveness. When change is absolute there remains no being to improve and no direction is set for possible improvement: and when experience is not retained, as among savages, infancy is perpetual. Those who cannot remember the past are condemned to repeat it."

      The concept and implementations of hierarchical storage are http://en.wikipedia.org/wiki/Hierarchical_storage_management several decades old in the mainframe world. Why did "we squander so much money by not thinking this way until now"? Because "we" are savages/infants who refuse to retain experience.

    4. Re:It's like Office features by Anonymous Coward · · Score: 0

      That isn't really expanding on anything to be honest.
      5% is still 5%, it isn't 100%, which is what they are paying for.

      I remember a time when Microsoft used to go on about snap-ins and modular programs. Yeah, that was a good time.
      Now we just have monolithic crapware, that, yep, you guessed it, barely anybody uses!
      What happened to that awesome modular Windows 7 they went on about for a year+? Oh yeah, that's right, never existed since they took the lazy and fast route and just made a service pack to Vista and renamed it. Maybe Windows 8, maybe. We will get our Modular Windows one day. /mini-rant

      It's not like it is hard to make modular software. They could even add trials in for each of the features so people can download it on the spot and try it out.
      Microsoft are too stupid to realize just how BIG a market this could make for them if they sold Office cheaper and add in paid-for upgrades.

    5. Re:It's like Office features by drinkypoo · · Score: 2, Insightful

      Bullshit. True only if you've never used a wordprocessor in your life before. If you have, you know what you use. And you can read the description of other features to decide if you want them.

      It doesn't make it unreasonable to purchase a lighter word processor with less features, but I for one would not want to support a word processor where you buy access to toolbar buttons. And if I'm doing database reporting (for which I have been paid in the past) I would not want to have to request that pieces of data be reloaded into the database so I can perform analyses. And further, if I have to do a year-by-year analysis, I do not want to have to load and unload data sets, crunching one year at a time. I want to build one report that goes forth and executes subreports to produce year-by-year reports without me having to sit at my desk and watch Crystal Reports grinding.

      --
      "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
    6. Re:It's like Office features by Anonymous Coward · · Score: 0

      No, I think Office features are different; everyone only uses 5%, but each person uses a different 5%.

      I like these phrases with a punch, that deliver the idea clearly and shortly. But if it was true, then 100%/5% = 20 total Office users that use different 5%.

      Things get further complicated by the fact Office does have multiple tiers (Starter, Home, Edu, Enterprise), so it looks like Microsoft wouldn't agree with the above.

    7. Re:It's like Office features by Anonymous Coward · · Score: 0

      Not just mainframes. Windows used to support this but the feature got removed because nobody used it.

    8. Re:It's like Office features by Anonymous Coward · · Score: 0

      A modular Windows will never be titled under the Windows family; it will be too different. It may be a new operating system entirely, possibly underneath Singularity. Old programs won't work (by necessity) unless they run in a VM.

      I look forward to this day... Microsoft has the unique capability to create the only good user-oriented OS since BeOS. Apple could be argued as being another contender, but they don't have the experience to ground-up design a massive-scale project in this field IMO. Although, lately I begin to question the future of the general-purpose OS entirely.

    9. Re:It's like Office features by Anonymous Coward · · Score: 0

      I could take the nickel-and-diming if it was ultimately going to cost less. The problem is, I'd probably have to use that feature *at work*, where the purchase process is typically a complex arcane bureaucratic process that is best summed up as "you'll be dead first".

      (captcha, apropos enough, is "keeled")

    10. Re:It's like Office features by Vellmont · · Score: 1


      People always bitch that they have to pay for Microsoft (or whatver) Office's features because they only use 5% of its functionality. But you buy all those features at once because you don't know which you will need in the future.

      Heh. Individuals use about 5% of Office's features. 80% (as a group) use 20% of offices features. 50% of Offices features are never or rarely used by anyone, and exist solely as marketing and justification to buy the thing again. (Numbers all made up on the spot to illustrate a point).

      MS Office reached "good enough" status about 10 years ago. The additional crap they've packed in over the last 10 years is simply gold plating and whiz-bang marketing to get enough people to buy the product again so everyone else has to buy the product again when MS changes the file format to be incompatible with the previous version.

      But what is really wanted is a way to cluster the database servers, with old data automatically cycled to the slowest, most remote nodes, and with the most frequently-altered data heavily replicated and aggressively synchronized.

      Maybe. There's still a lot of cost considerations, since that's what this is ultimately about. Is it cheaper to implement this relatively complex strategy than it is to just buy more fast storage? Does the long term maintenance of a more complicated solution east up any savings in cheaper storage? Does past history of usage indicate future data access?

      --
      AccountKiller
    11. Re:It's like Office features by Vellmont · · Score: 1


      Why did "we squander so much money by not thinking this way until now"? Because "we" are savages/infants who refuse to retain experience.

      Or.. maybe because when a resource expands at exponential rates it's cheaper to just get more resources than try to conserve resources.

      The article seeks to vastly over simplify a complex problem. Spitting out figures like 90% and then going down the road of making a huge number of hidden assumptions about the rest of the unanswered questions is just as stupid as not remembering the past.

      --
      AccountKiller
    12. Re:It's like Office features by Rob_Bryerton · · Score: 1

      I'd guess it's more like 1% use a different 5% of Office features, and 99% use the same 5%.

    13. Re:It's like Office features by cgenman · · Score: 1

      The other question, is why would not selling you certain features in Microsoft Office reduce what the consumer has to pay?

      1. The additional software features have zero per-unit cost. They don't save anything by not shipping it to you.
      2. Microsoft already charges individual markets more or less whatever it thinks the market will bear.
      3. Microsoft wants you to try out and get locked into the advanced features.

      Now, there are end-user experience and programmatic reasons to kill the bloat. But from a business standpoint, selling you 5% of the functionality of Office would not actually reduce the cost.

    14. Re:It's like Office features by 1u3hr · · Score: 1
      It doesn't make it unreasonable to purchase a lighter word processor with less features, but I for one would not want to support a word processor where you buy access to toolbar buttons.

      You're talking about what you want to support, I'm talking about what the user wants. Which may be simplicity and speed; some people prefer that to 20 tool bars and the need for a 6-core processor to open a memo. Anyway, MS doesn't give you any such choice: you take the whole multi-gigabyte package, or nothing. So the original analogy is even more flawed. People buy MS Office for compatibility and inertia; hardly anyone knows what the new features are and fewer ever use them. I deal with documents all the time and hardly any users know how to enter a pagebreak or set the spellcheck language, let alone anything more complex. They just type and use the formatting buttons and hit ENTER a dozen times to start a new page.

    15. Re:It's like Office features by icebraining · · Score: 1

      I like these phrases with a punch, that deliver the idea clearly and shortly. But if it was true, then 100%/5% = 20 total Office users that use different 5%.

      Wrong. Two "five percents", even if they intersect in part (for example, 3% of each are equal), they're still different.

      The set [A, B, C] is different than the set [A, B, F], even though a part of each set intersects with a part of the other.

    16. Re:It's like Office features by FoolishOwl · · Score: 1

      Word processors aren't a great example. When I've worked as a word processor, I found that almost all the work could be done, more quickly and conveniently, in a simple text editor. The only things for which a word processor was needed were setting the margins, the line spacing, and the font. The only reason that Microsoft Word would be needed, rather than Wordpad, was so that you could read other Microsoft Word files.

  7. Just like /. by mtmra70 · · Score: 1

    Wow, this percentage is the same as /. articles! Well, at least I assume - I haven't read the article.

    1. Re:Just like /. by sco08y · · Score: 1

      Wow, this percentage is the same as /. articles!

      Much more than 90%. When it comes to uselessness, /. has a rock solid 5 9's methodology.

  8. Acutally, there's one more question: by AnonymousClown · · Score: 1

    It’s an odd statistic. How is that data measured? 90% of all documents? 90% of stored bytes? When they said “ever again” did they mean explicitly retrieved by name, or should we include free text searches in that statistic? How long an interval needs to pass before some piece of data is clearly identified as belonging to the 90%, so that steps can be taken to reflect its reduced importance?

    Why is so much data being collected? They should go back and review what data they're collecting and why.

    --
    RIP America

    July 4, 1776 - September 11, 2001

  9. The problem is "Write-only" applications by shoppa · · Score: 4, Insightful

    Interesting that this seems to have been written up as a "hardware" or "storage" topic.

    The problem is, that IT people dream up all these "write only" applications that record data, without any rational plan for what the data might actually be used for in the business.

    For example, some people worry about privacy when they go to the grocery store and know that all their purchases are being tracked by their loyalty card, or worry that the big bad US government is tapping all the E-mail.

    In fact, I'm 100% sure that some IT geek had some wet dream years ago about recording everybody's purchases and E-mail and phone call and it's being done every which way.;

    The true "IT application" issue is that there is no real business need for this data 99.999% of the time. It gets recorded, probably gets staged off to tape, maybe indexed in some giant table, and then ... sits there for years with no actual need for it.

    I'm sure the IT geeks who dreamed up the technical ability to record all this stuff, thought they were hot shit when they came up with it. Oh, man, those IT architects were just having a big go-round whipping this problem in scalability. In their heads, they were gonna record everything on disk, then go home and fuck the prom queen.

    1. Re:The problem is "Write-only" applications by mikael_j · · Score: 5, Insightful

      The problem is, that IT people dream up all these "write only" applications that record data, without any rational plan for what the data might actually be used for in the business.

      These plans mostly come into being because us "IT people" (read: developers) know that the "business people" love changing the specs and they'll blame us if they want to start using data they didn't ask us to save and we tell them we can't save data retroactively (really, they'll basically blame the developers for not being able to time-travel). This is why we'd rather save everything than not save enough.

      --
      Greylisting is to SMTP as NAT is to IPv4
    2. Re:The problem is "Write-only" applications by DerekLyons · · Score: 2, Insightful

      The problem is, that IT people dream up all these "write only" applications that record data, without any rational plan for what the data might actually be used for in the business.

      Seems to me that the IT folks shouldn't be making these decisions (what data to capture and store) any more than they should be deciding what to stock for the Memorial Day sale.

    3. Re:The problem is "Write-only" applications by zarzu · · Score: 1
      yes, exactly! don't we all know this scenario where the management talks to the it head and is all like:

      we need a system to save customer data. what data is really no concern to us, you decide and we see that our customers provide it! you just go ahead and implement whatever kind of sophisticated system you see fit, we'll pay for everything, tell us when it's done and we'll launch it! there's no need to check back with anyone here, we're sure you're gonna find the most profitable solution to this! we love you it guys!"

    4. Re:The problem is "Write-only" applications by CharlyFoxtrot · · Score: 1

      Who's exactly going to say no in the decision chain ? The vendors wine and dine the managers because they all got lots of stuff to push: Sun the hardware, EMC the storage, IBM the hardware, storage and backup solution, Oracle the database and analysis tools, etc. The manager wants to justify his position and this stuff sounds nice and science-fictiony and "pro-active" and really, really expensive so you know its good. The IT guys get an increased budget, lots of new shiny toys to play with and a couple of problems to solve along the way to make it interesting. Nobody loses.

      It's not IT's job to come up with the business case either, that's the business analysts' job.

      --
      If all else fails, immortality can always be assured by spectacular error.
    5. Re:The problem is "Write-only" applications by Anonymous Coward · · Score: 0

      Software with many features is an easier sell than software that only has one feature. Whether those features are used or not, the customer wants to feel they are getting more for their money. This is magnified greatly when trying to get the purchasing dept to spring for some new program.

    6. Re:The problem is "Write-only" applications by jthill · · Score: 1

      Say what? You really think that's an IT idea?

      Take a look at the discounts supermarkets offer you to use their card. Notice that they don't tie it to how often you use it, there aren't any rewards for using it a lot, no loyalty points, nothing. Just use the card, get hefty price breaks and deals.

      They're letting real money, lots of it, walk out the door, and they're getting nothing but data in return.

      --
      As always, all IMO. Insert "I think" everywhere grammatically possible.
    7. Re:The problem is "Write-only" applications by Bungie · · Score: 1

      In the real world the manager justifies his job by cutting costs wherever he can. I've never seen a manager get rewarded for being proactive and spending a lot of money. If anything they'll ask him to reduce the IT budget this quarter because none of them know what the IT guy actually does.

      --
      The clash of honour calls, to stand when others fall.
  10. This isn't a 'new way of thinking' by sirwired · · Score: 5, Insightful

    Automated Hierarchical Storage Management has literally been around for decades. It may be new-ish on low-end crap x86 servers, but for say, mainframe users, it isn't new at all.

    What is new is available implementation choices. When your tier choices are between enterprise disk and enterprise tape, you are biased towards keeping data on disk; there's still use cases for HSM with only high-end disk and tape, but they aren't as great. Now with lower-cost disk available, you have a cheap disk choice too, with fairly reasonable access time.

    SirWired

  11. In other news by Anonymous Coward · · Score: 0

    ...Dell/EMC have several products available to help with this problem.

    Coincidence?

  12. Perfect by Andreaskem · · Score: 4, Funny

    A perfect application for my patented write-only memory.

    1. Re:Perfect by Dachannien · · Score: 1

      A perfect application for my patented write-only memory.

      Bob Pease, is that you?

    2. Re:Perfect by Anonymous Coward · · Score: 0

      You too? I built a hard drive with a write time in the nanoseconds, no matter how much data you throw at it. It has almost unlimited bandwidth. But it's write only.

      It's also considerably lighter than a normal hard drive because with all these advances in technology I was finally able to remove the platters and the motor.

    3. Re:Perfect by mysidia · · Score: 1

      Sorry, every DOS and Linux install already has an unlimited amount of write only memory, it came with the computer.

      In DOS, it can be accessed using the NUL special file.

      The Linux character device is called /dev/null to write to the included write-only memory.

  13. Human brain by Anonymous Coward · · Score: 0

    Only a small part of the human brain is active at any given time, but you just try to think without the rest of it...

    1. Re:Human brain by sco08y · · Score: 1

      Only a small part of the human brain is active at any given time, but you just try to think without the rest of it...

      I'll bet I'd still get credit card offers.

  14. University Text by Anonymous Coward · · Score: 0

    I'm taking Business Systems units as electives to my degree and they always push the assumption that "more data is always a good thing because it could help you later in decision support etc.". I always found that assumption to be poorly grounded, guess I was right.

  15. So, how many of us have 2 hard drives? by Golbez81 · · Score: 1

    A fast SSD or 10,000 RPM'er for your OS or critical apps, and a larger 7200 or 5400 drive for all your other "media"? Personally I've been doing that setup since like 1997...

    1. Re:So, how many of us have 2 hard drives? by jabuzz · · Score: 1

      You have the idea right, but what you want to do is automate the process. For example why should say a Brazilian keyboard layout or a driver for a printer I don't own be on fast disk just because it happens to be part of the OS? Why does the word document I am working on today get to be on slow disk?

      That is you have some fast disk that new stuff gets written to, and then after a period of time if it is not accessed it gets moved to slower disk. You can even add in an extra layer so stuff that has not been used for a long period of time is moved to tape. If I access the file on tape, I want it to come back automatically, and if I start using that word document I wrote last year, I want it to come back to fast disk.

      All the major storage vendors are introducing block based storage tiering to their line up as we speak. The other option is to build into the file system like IBM's GPFS so you can have more control to begin with, such as forcing all ISO images onto slow storage from the get go, along with all those MP3's. You also get the option of tape here as well which you don't get with block based tiering.

      It is one of the reasons why ZFS simply does not cut the mustard. If a file system does not have storage tiering then it sucks; period. Of course the fan boys who think ZFS is the greatest thing since sliced bread just don't have a clue about enterprise storage of course.

    2. Re:So, how many of us have 2 hard drives? by afidel · · Score: 1

      ZFS DOES have auto tiering, ARC in RAM, L2ARC and ZIL on tier0 SSD and then bulk storage on either teir1 or tier2 storage.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  16. Better safe than sorry? by Allnighte · · Score: 1

    I imagine it's more a matter of "better safe than sorry" when management asks whether or not something should be kept.

    Chances are, this "business data" is somewhat financially related. This also means there's a fair chance the government can/has/will tax something in those documents. And how long are we supposed to keep our records in case of audits?

    1. Re:Better safe than sorry? by Xacid · · Score: 1

      "And how long are we supposed to keep our records in case of audits?"

      7 years if gov.

  17. This is new? by rapturizer · · Score: 4, Interesting

    I saw this over a decade ago when I was working as an IT consultant in the advertising industry. They regularly used only 5% - 10% of their information (and that's being generous). The systems I designed included a server for active work, an archive server for information used in the last 24 months, and then an archive solution (Magneto Optical at the time) that allowed for the information to be available, just not on demand. This idea has been working since for the clients that are still in business.

    1. Re:This is new? by daeglo · · Score: 1

      This idea has been working since for the clients that are still in business.

      Seems as though it is working REAL well.

    2. Re:This is new? by rapturizer · · Score: 1

      The advertising industry in the 2000's went through the worst recession since the great depression. Companies that specialized in print media and ones that were marginally profitable either went out of business or were purchased by other companies. Of all the clients I had, only one closed its doors (they specialized in quick turn around newspaper advertising), I did have three acquired by other companies. One of them purchased by another client of mine. One of the points that sealed the deal was compatible data structures that made the acquisition mostly seamless.

    3. Re:This is new? by Bill,+Shooter+of+Bul · · Score: 1

      Yes, I've also been doing this forever. It only makes sense. And to be fair, large enterprises know this too. Its not just us smart people ;)

      --
      Well.. maybe. Or Maybe not. But Definitely not sort of.
    4. Re:This is new? by dbIII · · Score: 1

      Google is an advertising company and seems to be doing extremely well to me.

  18. We already know about it by Wolfraider · · Score: 0

    Businesses already know that most data is stored once and never looked at again. The simple solution would be to offer multiple locations to store data, one for frequently accessed data and one for archival, etc. The problem comes down to is training. There are a lot of people that can barely use a computer and the whole concept of the multiple folders would confuse them. Another solution is to solve the issue with software. There are several archival solutions that will look at the file accessed date and either move it to cheaper disk or even tape. It leaves a stub file in place in the original location and if a user tries to access the file, it will pop up a box saying "please wait while the file is restored". This solution is nice in where the users don't have to change how they save data but it is harder to manage. You have your data spread across multiple systems instead of one and backups could become harder. Overall, it just depends on which direction you want to go with your data and what makes the most sense.

  19. Signetics invented the needed chip back in the 70s by ve3id · · Score: 3, Funny

    FINALLY !!! AN APPLICATION FOR THE WOM!!!! http://www.national.com/rap/files/datasheet.pdf Bob Pease sure was fore-sighted, since this memory chip was invented back in the seventies!

  20. In my experience by AbbyNormal · · Score: 1

    In my experience with small businesses, it may be never read but will absolutely need to be found for some type of emergency presentation/proposal.

    --
    Sig it.
  21. Good argument for tape? by mlts · · Score: 3, Interesting

    This is one reason I like tape: The drives are expensive, but the tapes are $30-$50 (LTO-4 is $30 on mail-order). So having an autochanger moving all the rarely used data into storage is likely the most efficient way of moving data to long term archiving. Even better is making sure that 2-3 sets of tapes are used (one onsite, one offsite.)

    Of course, hard disks by themselves may seem cheaper, but they are not a true archival medium. There are so many moving parts in a HDD and each of them (bearings, heads, spindles, motors, controller card) are a point of failure.

    With HDD capacities starting to not grow as exponentially as they did last decade, it would be nice if tape companies would not just catch up with 2-3TB native tape offerings, but be able to offer drives at a lower price so home and SOHO users can use them for long term storage. I'm sure that if someone offered a consumer level tape drive for $500 with a decent capacity, that a lot of small businesses would buy it, especially if it came with decent backup software (Retrospect, Backup Exec, Amanda, bru, or another utility that is similar.) Since some tape drives are even bootable (some HP offerings have a section of the tape to emulate a boot CD or DVD), it would be ideal for bare metal recoveries even by nontechnical users. Pop in the tape, boot the machine, type in the encryption key, select where the data should be restored to, walk off for a bit and it is done.

    Even though the SAN companies have said tape is going to die, until another form of media (perhaps super-inexpensive flash media [1]) is as reliable as tapes and can be put in the Iron Mountain case and sent offsite for safekeeping for decades on end, tape will be with us. Only optical comes close to tape for long term archiving abilities.

    [1]: I can see someone make flash media that is semi-smart where it is put in a specific case, shipped to an offsite warehouse, and that warehouse plugs in the cases into 5-12VDC. Then over time, the circuitry on the flash drives periodically checks the stored flash media for damage or bit rot, corrects errors by rewriting blocks, and good blocks it would periodically move to ensure that there is a high signal to noise level on all media. Of course, this requires power, while tapes can happily sit in a climate controlled warehouse and be still recoverable.

    1. Re:Good argument for tape? by Anonymous Coward · · Score: 0

      There are so many moving parts in a HDD and each of them (bearings, heads, spindles, motors, controller card)

      Controller cards don't move during normal operation.

    2. Re:Good argument for tape? by mbone · · Score: 3, Informative

      Tapes are not archival storage either. In either case, archival storage is a system, not a medium.

      I hope you are reading all of those tapes on a 5 year cycle, and writing new ones with the recovered data. I also hope you are making sure that the humidity and temperature are strictly controlled at all times in the tape storage room.

    3. Re:Good argument for tape? by Vellmont · · Score: 1


      I hope you are reading all of those tapes on a 5 year cycle, and writing new ones with the recovered data. I also hope you are making sure that the humidity and temperature are strictly controlled at all times in the tape storage room.

      Not everyone has the same standards as to data retention. Believe it or not, some people actually couldn't care less if a 6 year old version of a document they last touched 5 years ago can't be recovered!

      In my experience, this kind of extreme level of data retention has more to do with the tendency of IT folks to think they can create "the perfect system" than it has to do with actual business requirements. Sometimes laws and auditing come into play, but even that is partially influenced by the IT perfection syndrome.

      --
      AccountKiller
    4. Re:Good argument for tape? by vrmlguy · · Score: 2, Informative

      I also hope you are making sure that the humidity and temperature are strictly controlled at all times in the tape storage room.

      That's why the OP said to use Iron Mountain. They maintain the humidity and temperature at all times in their storage rooms.

      It costs a little extra, but if you want long term storage, rent some underground space. According to http://mic.imtc.gatech.edu/preservationists_portal/presv_costcompare.htm, underground storage costs can get as low as $2/year per cubic foot (not including relocation, initial filing charges, retrieval & re-file charges) if you're buying four delivery trucks worth of space.

      --
      Nothing for 6-digit uids?
    5. Re:Good argument for tape? by Anonymous Coward · · Score: 0

      They are still a point of failure. With tape, all the logic is on the drive. If that fails, buy another tape drive and continue reading/writing tapes. If a drive controller fails, all the data that is on that drive may be permanently gone, depending on how much someone is willing to spend for a clean room decoding.

    6. Re:Good argument for tape? by mlts · · Score: 2, Informative

      5 year cycles are close enough. In business, with laws like Sarbanes Oxley, FERPA, HIPAA, PCI-DSS, and many others, if a business puts it on tape (where the maker says the archival life is in decades), drops it off at Iron Mountain, and has a documentable chain of custody system, should an audit happen and some tapes are not readable, they are off the hook. Management can look at the auditor and say that any missing data was stored in multiple places, and if anything is lost due to tape failures/bit rot over time, shit happens. The audit ends with the company passing, and life goes on. Fifty year audits are different (anything aerospace related needs a 50 year audit trail), but tape drives are more than enough to deal with the 7 years that most regulations require.

      Things are different if the data is worth keeping, versus sticking it on a tape to languish in a bucket offsite until the 7 years are up. For data worth keeping, it needs to be stored multiple places, and checked for issues every so often. Most businesses have multiple SANs, one at the main data center, one offsite and both are synced to deal with this. It is expensive, but it ensures that data doesn't "rot".

    7. Re:Good argument for tape? by MoralHazard · · Score: 1

      Of course, hard disks by themselves may seem cheaper, but they are not a true archival medium. There are so many moving parts in a HDD and each of them (bearings, heads, spindles, motors, controller card) are a point of failure.

      First of all, when a hard drive is turned off and unplugged, the moving parts aren't moving. So the mechanical wear and stress of constant operation aren't really a problem. If you dump a backups set to an HDD, seal it up like from the factory, and stick it back in a box on the shelf, the differences between the construction of a tape cartridge and a hard drive are irrelevant.

      And even if stored hard drives were less reliable: Ever heard of RAID mirroring? Or how about just making a second copy? I bought an external tray-less eSATA hot-swap dock (http://www.newegg.com/Product/Product.aspx?Item=N82E16817153112&cm_re=esata_hot_swap_dock-_-17-153-112-_-Product) for about $55. Combined with a hotplug script and BTRFS mirroring, it's just easy-peasy. Sure, my 1 TB backup set costs me $160 (2x 1 TB SATA drives at $80)--but I didn't have to spend a couple of grand on an over-priced tape drive.

      Every time I hear another IT guy parrot this line about moving parts, a little part of me dies. It's a nonsense assertion, pretty thinly supported. It's real function is to mentally justify, ex post facto, a lack of mental effort to consider better ways.

    8. Re:Good argument for tape? by Anonymous Coward · · Score: 0

      Reading this thread and your reply makes me wonder: Have you worked in IT in a production environment where one sysadmin mistake would cost millions? There is a big difference between what it takes to have enterprise-grade items versus something for a home, SOHO, or even a small SMB.

      btrfs is NOT a production filesystem, and there is NO way in hell any sysadmin who is worth their salt would trust a company's valuables to it. Development/experimental filesystems doesn't count. Nor does a SATA dock. That is great for backing up your World of Warcraft screenshots, but that isn't anywhere near production work in the enterprise.

      RAID is to keep data fine against disks killing themselves. But what about malware? One piece of malware will destroy the data off a drive no matter how many times it has been mirrored. Even if a SAN supported snapshots of the LUN, that won't help against a site loss like a fire or earthquake. Tapes do.

      A lack of better ways? Trust me, IT guys have seen all kinds of stuff offered that has purported to replace tape. Vendors have paraded by the IT people and their PHBs all kinds of VTLs, trays of hard disks, Flash drive robots, optical drives, and God knows what else. However, can you find a hard disk rated as archival grade, made to store data for 10 years and the maker guarantees the life or they will replace the drive for this long. Go, do it. Find a hard disk that the maker says will keep data on it 10 years or more with a guarantee. If you can, I'll piss on a spark plug. Tapes, it is virtually guaranteed that data you slap on it will be there. If someone made a device that could replace tape with a removable type media that is inexpensive, has an easy to hold (both human and by mechanical grippers) form factor, has a high capacity (greater than 2TB native for starters), insanely reliable (pick one up in 20 years and have a high chance of getting data from it), has immediate availability in large quantities, and that the technology to read it would be present 10-20 years from now, I'm sure there would be a lot of IT people who would happily fellate the developer of such product. Tapes are nowhere near perfect, but there is absolutely nothing on the enterprise front that is going to replace them, other than having multiple SANs in multiple locations which automatically sync up.

      Of course, nothing is 100%, which is why you have multiple tape sets and check your data. But it is a heck of a lot more reliable than hard disks. Of the thousands of backup tapes I worked with this year, I have had one tape... yes, one, go bad. And the data was completely recoverable from it, it just exceeded a soft error count too many times for the backup program to bother using it.

    9. Re:Good argument for tape? by Anonymous Coward · · Score: 0

      Tape is not a cure-all. Tape is a contact medium and therefore wears out over time. Most tape systems do NOT put the tape on a shelf for years at a time. Instead they are part of rotating catalogs of tapes and get regular use.

      My experience is that the average tape is far more prone to failure than the average hard drive. Furthermore, with modern RAID arrays, drives are typically ganged together, so the failure of any single drive becomes a minor event. On the other hand, how many RAIT systems have you ever seen? I've never seen one. What I have heard about is MAID systems.

      So yes, you can put a tape on a shelf for years, but when the time comes and you need that data, will you even have a drive capable of reading that tape?

      There's even another issue. Sure, 90% of the data is never read, and that's partly a function of the fact that people cannot find that data.

      All archiving systems I know of separate the primary data store from the archive data store. This automatically creates a barrier to using the archive*. Often some manual step, either by the user or the IT department, must be performed to make the archive available to the client.

      * Someone else mentioned mainframe HASM systems which are an exception. HASM systems leave a stub file in place on the live data store. The client accessed the stub file, which looks normal to them, but there's a delay while the HASM system automatically retrieves the bulk of the data from the archive. This is great, but it implicitly only works with document based systems. When the need is for archiving records in a database system, HASM can't help you (unless HASM has been upgraded way, way, way beyond where it was the last time I used one. Which admittedly was a long time ago now).

    10. Re:Good argument for tape? by dbIII · · Score: 1

      Disk drives die even in storage because they are not designed for a long life.
      A common point of failure is that the lubricant in the bearings breaks down over time. Another is that highly polished surfaces in close contact diffuse together and parts get stuck. Corrosion is also more of a problem than with tapes due to there being a lot of different materials in there.
      After things like that happen the moving parts don't move anymore.


      I came from a materials science background (starting back before the first web browser was written) then moved into IT some years later, so I'm not blindly parroting anything and neither is the other poster.

    11. Re:Good argument for tape? by Anonymous Coward · · Score: 0

      To be honest, I don't actually recall a HSM system in production recently. The last several I saw was one based on TSM running on AIX, and another back in the days of Windows 2000 Server attached to an autochanger.

      The features of SAN storage just outweigh the benefits of having a tape mounted as a filesystem. SANs can offer direct to tape backups of LUNs, snapshotting, mirroring via WAN (which *can* be considered a tape alternative for some uses), deduplication, encryption, and a ton of other kinky things. I'm sure you can slap a NAS head on a SAN and perhaps use one as a Time Machine data store.

      I have yet to see a RAIT, much less a RAIL system. I'm sure they are out there. Instead, I see people use backup rotations and multiple tapes, or D2D2T and the tapes head off to Iron Mountain for offsites. I have seen some people drop 2-3 tape sets off for indefinite archiving though. Tape isn't forever. However, what tape does, it does well, and that is getting massive amounts of data onto a medium that is reliable, transportable, fairly inexpensive, and at very good access speeds. Plus, all tape formats support a hardware read/write switch, so setting a tape read-only ensures that it won't be touched by malware, unless something is able to get into the tape drive's firmware and override the setting.

      Just to make sure: Tape is a tool in the IT toolbox. It isn't a one size fits all solution. Ideally, it is combined with disk, and used for an offsite rotation scheme (some backup programs do synthetic fulls. Others do full/incremental/differential. Others do a combination.) For SOHO users, a tape drive would be overkill, as it would require a computer with dedicated I/O to prevent shoe-shining. For SMBs, maybe a backup server running Backup Exec with a Drobo Pro and a tape drive or small library attached would be the best. For larger firms, it depends. Some like hitting the LUNs on a SAN directly and backing up images to tape. Others organizations like installing a backup agent and backing machines up to disk, or directly to tape. Every organization is different.

    12. Re:Good argument for tape? by Bungie · · Score: 1

      First of all, when a hard drive is turned off and unplugged, the moving parts aren't moving. So the mechanical wear and stress of constant operation aren't really a problem.

      Hard disk parts can also become seized and have trouble starting if the drive is inactive for a lengthy amount of time. I'm not sure how long it takes or how often it happens though.

      --
      The clash of honour calls, to stand when others fall.
    13. Re:Good argument for tape? by MoralHazard · · Score: 1

      If you're concerned about the drives breaking down over time, you can periodically test the copies. My data is all checksummed by BTRFS, but you can do it manually with any number of common tools on any filesystem.

      See what I mean? There's a perfectly simple, perfectly obvious solution to your complaints, but you refuse to even try to think about how it could work. You're willfully stupid, which is pretty sad for someone who claims to have a science background.

      Oh, and if you're dumb enough to think that tape media is particularly long-lived, I'd suggest you read a few of the real-world studies about stored tape lifetimes. Unless you carefully control the storage environment's temperature, humidity, etc., the failure rates will eat you alive within a few years.

      As for your supposed failure probabilities of stored drives: I've stored somewhere around 40-50 PATA and SATA drives in boxes under my bed, in my closet, etc. for years--in some cases, as long as 7 years. In that whole time, I've never had a single stored drive fail to spin up, or even fail within the first 24 hours of use.

    14. Re:Good argument for tape? by MoralHazard · · Score: 1

      You're not sure because you have no firsthand experience with it. You don't even have any engineering papers with original research to cite, or even a single blog post with somebody else's anecdotes.

      In short, you've got jack shit, and you're just parroting the same crap received wisdom as everybody else. Thank you for proving my point, for me.

    15. Re:Good argument for tape? by MoralHazard · · Score: 1

      God, what a turd! You premise your whole fucking stupid post on the argument that because your tape backups are offline and offsite, they're superior to my mirrored RAID drives.

      Too bad you didn't bother to read my post, where I mentioned that I TAKE THE MIRRORED RAID DRIVES OFFLINE AFTER MAKING A BACKUP SET and store one of the mirrored copies offsite.

      Untreated ADHD fucks like you are the reason the Internet sucks. Go make some Youtube comments, you'll fit in over there.

  22. "Once" may be pushing it by jayhawk88 · · Score: 1

    But if you revised this to say, "Never accessed again a week after it's creation", I'd believe it.

    1. Re:"Once" may be pushing it by lorg · · Score: 1

      Not so sure, there is always the "what the heck is this thing ..." **accessing data** "oh it's this bollocks .. nevermind ...", some period of time passes and you come back to and repeat this procedure again.

  23. We aren't thinking this way until now? by al-ahlex · · Score: 1

    So why do all serious RDBMS systems have functionality for dynamically partitioning data based on the relevance of the data? Big databases are often set up to (for instance) have the last month's data on fast storage, and older data on slower/cheaper storage.

    1. Re:We aren't thinking this way until now? by theshowmecanuck · · Score: 1

      We have been. It is called data warehousing, data marting, operational data stores, etc (and they aren't the same things). People have been doing this for a long time. That is why there are analysts who specialize in these areas. They help the business identify the things that are used regularly, things not used often, and things that are nice to keep somewhere, and things that you can throw out after a few weeks. And the most ideal storage mechanisms (but not necessarily the specific technology).

      Whenever I've seen these issues, it is when when managers assume people who are SMEs in hardware management are also SMEs in data management just because they know how to set up the hardware the data resides on. Adding more drives is easy... to a point. From the little of the bio on the author of the article, it looks like maybe he is a SME on the hardware side of things...

      Seriously this kind of smells to me like maybe this is the beginning of a new marketing push. Maybe Dell wants to start marketing solutions for Data Warehousing or similar.

      --
      -- I ignore anonymous replies to my comments and postings.
  24. this is actionable: think of the storage savings by rubycodez · · Score: 4, Funny

    this helps me to be a better employee. From now on I'll only save 25% of the data I acquire, because the odds are the other 75% would only be needed 7.5% of the time. In other words, 92.5% chance not likely to be needed at all.

  25. Much, much higher - probably 99% +++ by petes_PoV · · Score: 3, Funny

    If you're talking about blog entries. Almost all of them (well, almost all of *mine* :-) are written once and never read, unless you count spiders as reading them.

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
    1. Re:Much, much higher - probably 99% +++ by BikeHelmet · · Score: 1

      Yeah, mine too! Zero comments on all of them, except from myself.

  26. I only read the headline... by erroneus · · Score: 1

    ...I didn't bother to read any further because I felt it was probably useless data anyway.

  27. dell's new line of fire extinguishers coming soon! by drfireman · · Score: 5, Insightful

    Over 92% of fire extinguishers will never be used, we could probably save a bit of space by having the unneeded ones stored off-site, or in less accessible corners of the garage.

    Slightly more seriously, we can certainly answer this question posed by the linked article easily: "why on earth did we squander so much money by not thinking this way until now?" The answer is: because you are a moron. Anyone who has given even a moment's thought to storage has known this, either implicitly or explicitly, for a long time. So whoever's included in your "we," Steve Cassidy, is just profoundly stupid. I think that quite easily explains why you all squandered so much money by not thinking about this. Next question?

  28. In other news... by argStyopa · · Score: 2, Interesting

    ...at least 70% of the crap you store in your house isn't really needed, either. Do you really ever LOOK at the pictures hanging on the walls? Are you sure you're going to read every book you own, again?

    --
    -Styopa
    1. Re:In other news... by Anonymous Coward · · Score: 0

      Are you going to fap over that ever again?

      Don't answer that.

  29. Use Plan 9 file system? by Anonymous Coward · · Score: 0

    Plan 9 from Bell Labs, an OS they released in the early 90s, had a file system for this. Hard drives as cache with WORM drives for bulk storage.

    It did some interesting things. cd /2009/12/25/ puts you in the root of the file system as it existed last Christmas.

  30. Because storage is easy. by Yaos · · Score: 1

    Why go to all the trouble of setting of two different systems for live data and archived data when you can spend half the money on just one system for both and more storage space?

  31. Databases should handle this automagically by Proudrooster · · Score: 1

    Anyone who manages large systems know that this is very true, yet the data piles up. I've often wished that databases would allow us to make a view or some other type of abstraction which would allow you to make the decision whether or not to join an archive table. Right now, everything needs to be handled on a program by program or query by query basis. Hey, maybe I should quickly patent this idea, then I can license it to Oracle. :)

    1. Re:Databases should handle this automagically by afidel · · Score: 2, Informative

      Oracle's way ahead of you, they've had programatically partitioned tables for quite some time. Queries don't need to altered, if they call for data outside of the active tables range then the archive table(s) are automatically used.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  32. This should be obvious... by jridley · · Score: 1

    to anyone aware of Sturgeon's Law. 90% of everything is crud.

  33. If you weren't thinking this way by Anonymous Coward · · Score: 0

    You should have been. This isn't new.

    Many of the SAN's I've worked on automatically migrate data off to slower storage or even tape as it ages without modification. If you do a read to something that's offline, it has to get fetched from the tape juke and it takes a bit.

  34. So what? by davidbrit2 · · Score: 4, Insightful

    And if you didn't have that 10% that is eventually needed, you'd be totally screwed. Do we really need to play the 20/20 hindsight game every time somebody thinks of something like this?

    1. Re:So what? by DaveGod · · Score: 1

      And if you didn't have that 10% that is eventually needed, you'd be totally screwed. Do we really need to play the 20/20 hindsight game every time somebody thinks of something like this?

      I know /. summaries are traditionally highly unreliable and jumping to obvious conclusions after picking up on a couple of key words is often a safer bet, but this time we have a good one. It goes straight (perhaps too straight) to the point that some data is in use that needs to be on expensive servers, and there is data that is not in use and can be stored on much slower and cheaper systems. There is no suggestion at all in TFA that the other 90% should be deleted or not collected in the first place - a debate worth having at individual companies perhaps, but that's another story.

      There's nothing new in TFA except that the unused data is as high as 90%, and that there's a few gizmo's on the way to facilitate, so the cost savings may be much more significant now than previously perceived.

    2. Re:So what? by aaarrrgggh · · Score: 1

      We are legally required to keep every file for 7 years from project completion. We will never erase anything, because even after that timeframe your ability to defend yourself from lawsuit is much better if you have everything. ...and we aren't even in a high risk business for lawsuits!

      We could easily archive 70% of our data, but there really isn't an incentive as last years' data is 20% smaller than this years', and the pattern always continues.

  35. Only 90% by flyingfsck · · Score: 1

    Many businesses work with a customer file a few times and then never again - for example lawyers and realtors. I'd like to see a file system that will auto archive data and shift it transparently into long-term storage, and then transparently undo it when needed again.

    --
    Excuse me, but please get off my Pennisetum Clandestinum, eh!
    1. Re:Only 90% by IrquiM · · Score: 1

      Dell delivers servers with this system. It's nothing new. We've used it the last 3-4 years.

      --
      This is blinging
  36. Health Insurance? by Kozz · · Score: 1

    I also wonder if +90% of all health insurance benefits go unused each year. And you probably have business data and insurance for some of the same reasons: it's better to have it and not need it than need it and not have it. amirite?

    --
    I only post comments when someone on the internet is wrong.
  37. Question by chazzf · · Score: 1

    Is there a reliable metric as to which 10% will be needed again at the time the data is written? If not then I don't see what this buys us.

    --
    No statement is true, not even this one.
  38. If Dell is talking about it's failure rate by christoofar · · Score: 2, Funny

    If the data was recorded by Dell computers... then yeah I would expect that 90% of business customers aren't able to read it back.

    1. Re:If Dell is talking about it's failure rate by Anonymous Coward · · Score: 0

      Yep. We know the data on those bad capacitors is in that 90%.

  39. It's written for a purpose by davidwr · · Score: 1

    I create a lot of business data, and 90% of "never read again" or "never read again after 2-3 days" is not far from true.

    However, the data serves a purpose. I frequently do searches on the data and you never know what you wrote months or years ago will turn out to be just the document you need. Keeping records for years instead of days has more than paid for itself in the long run.

    Now, will these records be useful 5 or 10 years from now? Probably not except to an archivist or someone researching how we did business during 2010 and earlier. Or perhaps to a lawyer *groan*.

    --
    Knowledge is how to play a game, intelligence is how to win, wisdom is knowing what game to play.
  40. what about law enforcement data policy? by Anonymous Coward · · Score: 0

    There are some laws that says you must keep some data for over a period of time, five years or more.
    Even if you know that you wont be using it in the future almost for sure, the law is the law and you must obey.

  41. About 90% of slashdot posts are also WORN by aapold · · Score: 1

    At least 90% at Write Once Read Never.

    Wonder if you could go into business archiving never-read data. I mean you could guarantee privacy....

    --
    "Waste not one watt!" - CZ
  42. Datenbrief by Anonymous Coward · · Score: 0

    The german Chaos Computer Club (CCC) was addressing this with its draft law "Datenbrief". Businesses would have to report all types of data collected about one person to them via slow-mail or e-mail on an annual basis. The time and effort spend on this would constitute a 'fee' for data and thus force them to hold as less data as possible. See http://www.ccc.de/de/datenbrief (german)

  43. Exactly. by brusk · · Score: 3, Insightful

    I wasted money on a dictionary that has tens of thousands of words but have only ever looked up a few hundred. I should have bought one that just had the words I would actually need.

    --
    .sig withheld by request
    1. Re:Exactly. by ksd1337 · · Score: 1
      Ever heard of dictionary.com?

      (ducks)

  44. Solutions: by drolli · · Score: 5, Interesting

    a) Forbid *unmanaged* of documents. If the question: "where is the most up-to-date version of this document stored?" is systematically and easily answered then people can delete the crap from their laptops.

    b) Forbid in-company attachments to mails. If the last version can be easily found, including the revision history, a link to this revision is worth *more* than the current state of the document. Most space in my inbox are totally useless attached documents.

    c) Forbid the use of formats unsuitable for storing a certain kind of information. (Where i work, they use powerpoint/word files for electronics forms)

    d) Provide a good archiving and backup service. Besides the quality improvement by using a service, also the 100th copy done in some unsystematic way of some data is prevented (forbid this explicitely)

    e) Thin clients. store the data on a server. Deduplicate.

    f) i would expect that most of the documents in a company can (and should) be stored in a database.

  45. It's not like this is something new by IrquiM · · Score: 1

    Dell has been doing this for our company the last 3-4 years now.

    --
    This is blinging
  46. I think they're already implementing this . . . by bedouin · · Score: 0, Offtopic

    Made an order for 6 computers and received 11 in January. Returned the extra 5 and they refunded me for all 11, then took 6 months to realize their mistake.

    All this after I called them trying to tell them about their error, and getting some script/screen-reading Indian who didn't understand me.

    Imagine what it would have been like if the situation was reversed . . . yikes.

    Fuck Dell. This is the kind of dumb thinking that will lead to their inevitable downfall. Welcome to Gateway Country Part Deux.

  47. The more things change... by Anonymous Coward · · Score: 0

    Back in the day I ran (operations manager) a very large mainframe shop with three supercomputers churning out enough numbers to consume several tons of paper each month. The scientists would make their way to the output distribution area each morning, pick up a 6" stack of 11"x17" paper, and flip to the last page to find an eigenvalue. All too frequently they'd shake their heads and say something like "Should have been higher" and drop the whole stack into the recycling bin strategically posted near the exit.

    A purely statistical analysis might suggest that we have the trucks delivering the paper just drop it off at the recycling center, saving wear and tear on the printers, printer ribbon costs, and scientists' time, as they would no longer have to come by to pick up their output. Could probably have cut a couple of staff as well. And the loss of information would be negligible, statistically speaking. The scientists failed to see the humor in this, however, so we continued killing trees at an alarming rate.

  48. In related news... by GodfatherofSoul · · Score: 1

    Backup snapshots are wasting space 99% of the time!

    --
    I swear to God...I swear to God! That is NOT how you treat your human!
  49. Re:dell's new line of fire extinguishers coming so by mbone · · Score: 2, Insightful

    Well over 99% of all lifeboats are never used.

  50. Cost of storage by mangu · · Score: 1

    it's the potential that data holds that makes it so valuable and necessary

    What matters is the cost/benefit ratio.

    The potential for the data being valuable may be very low, but the cost of storing it is going down all the time. Disk space today is a dime a gigabyte, so let's keep it just in case.

    1. Re:Cost of storage by vivian · · Score: 1

      The real cost is not storing it - but rather the cost in recording all that info in the first place. Someone has to type in all that data to start with, and possibly someone else has to at least glance at the resulting reams of reports that are produced from it.

      It is all too tempting to create database apps to record all sorts of information "just in case", but more often than not all you end up doing is making the system more complex than it has to be, and more time consuming in maintenance of both the application and the data.

    2. Re:Cost of storage by Chris+Mattern · · Score: 2, Interesting

      Someone has to type in all that data to start with

      Not true; a lot of data is harvested automatically these days. And if you're getting the data by having the customer fill something out, then you're not paying for the typing.

    3. Re:Cost of storage by Anonymous Coward · · Score: 0

      then you're not paying for the typing.

      Not quite true. Any task the company gives the customer is going to cost the company in terms of attractiveness to the customer. Sometimes necessary but there's no free lunch.

  51. I can easily believe it by onyxruby · · Score: 1

    Most people don't understand the nature of large amounts of data like that. They think "I want more, more, more" and never beyond that. Getting data is easy, getting useful data is far more important and for that you need to have your customers spend some time with the database where they can tell you everything that they don't need or want. Once you can confirm the accuracy of that information you can then purge your data of the clutter.

    What people really fail to understand though is that getting rid of data is just as important. Unless your dealing with something like scientific research data, or have a compelling legal reason (SEC etc), or another really good reason (manufacturing plans) than your data needs to have a planned lifecycle just like any other asset. You need to have a date for end of life for data (SQL Data, documents, etc) just like you would for emails or other documents. As a rule of thumb, set up an end of life asset policy for your data, notify the stakeholders and users and from that point forward - every chance you have to destroy that data, do so.

    If you destroy data when you had a subpeona, knew a subpeona was coming or knew a criminal investigation was coming you can end up a felon. Any data that isn't destroyed can be used against you in a court of law. However - if your data is destroyed via policy on a given date and that destruction doesn't violate something like a SEC requirement that you are safe. Yes, I do speak as someone that has at times been heavily involved in litigation (the technical expert that has prepared data for use in court and explained what everything means to lawyers) more than once.

  52. 90% is reasonable value by stanlyb · · Score: 1

    Let's face it, in practice, you are makking backup every friday, of almost everything: Database, SVN, CVS, builds, release, etc...And there is a good reason for it, like computer burned, sysadmin left without giving up the password ;).... But in reality, this backup data is almost never used. In my long long practice i never had the chance to see the need of these backups, nevertheless, you just have to have it. Period.

  53. Observation by halcyon1234 · · Score: 1

    And now that Dell's looked at the files, they've been read. There goes that theory.

  54. Data warehousing by v1x · · Score: 1

    I have serious doubts about how they came up with that number. Data captured once can be stored in a data warehouse and analyzed and reused in many different ways for analytics and reporting, so I am not sure how they estimate that 90% of data is never used again (unless, of course they meant that it is not pulled up again on the frontend application side, which would still make no sense at all).

    At our hospital, they have replaced the inpatient electronic medical records system at least 3 times in the last 20 years, and our data warehouse, which has been around for more than 15 years, contains a large percentage of that clinical data from the different (current & historical) systems. A lot of this data is still used pretty actively for retrospective research, recruitment of patients for clinical trials, operational and financial resource planning, forecasting, cost-accounting, etc. In other words, at our institution, most of our data is used all the time, but for different purposes.

  55. Why is this reported as news? by Anonymous Coward · · Score: 1, Interesting

    Folks, hierarchical storage has been discussed in one form or another since the 70s (probably much earlier, but I'm not that old). Everybody and their mothers already have some implementation of archival media.

    As for 90% of the data never being read, I beg to differ. Data is sliced and summarized many times in its lifetime (and sometimes those summaries need to be refreshed to include new dimensions or details), even if there's nobody really looking closely at the finest grain. But if you throw away the oft-unused detail, how can you re-summarize?

    And one warning to all (mainly Dell): try to tell the judge that you deleted that important evidence of your wrong doing because it was "dead weight" and let me know how that goes.

    Having said all that, vendors are apparently just recently becoming aware that there's a need for automated deprecation, for moving unused data to slower/cheaper storage and fetch it back efficiently when needed. From memory to local disk to network storage to slower/cheaper network storage to tape.

  56. In a non-IT field by KenSeymour · · Score: 1

    I work on communications and control systems for subway and light-rail.

    A lot stuff is recorded in case there is an incident or accident that they want to investigate. Even phone calls to the control center and radio transmissions are recorded. CPUC and FRA regulators come by, especially during construction and early service, and poke around, ask questions, pull records and so on.

    There is a regulatory retention period. If nothing happens for that period, the stuff gets deleted. But a lot of minor stuff gets investigated. Supervisors check reports of safety rule violations and such.

    I think financial auditing is similar. The auditor wants to be able to randomly select some
    set of transactions during the audit, but is not going to look at the totality of records.

    On another note, I do remember a story about a system admin who worked for the legislature.
    He was asked to destroy some backup tapes. Instead, he handed them over to the FBI.

    http://articles.latimes.com/1988-09-23/news/mn-2790_1_legislative-counsel

    In the case of IT, perhaps they should have whatever number of meetings it takes to come up with a written retention policy. That way, you are covered when you delete something according to written policy.

    --
    "We can't solve problems by using the same kind of thinking we used when we created them." -- Albert Einstein
  57. How to handle telemarketers by Seahawk · · Score: 0, Offtopic

    I just answer the phone, asks them to hang on for a second, and then just deposit the phone somewhere silent, like my bedroom, and wait for them to hang up. My current record is 7 minutes and 49 seconds.

    This makes it as expensive for them to call me as possible with me just spending 5 seconds of my time.

  58. Dell Blown Capacitors by ogfomk · · Score: 1

    That would explain why Dell did not pay attention to the blown capacitor issue.

  59. This isn't surprising. by asdf7890 · · Score: 1

    For the major app that I work on for my company, I would say that a lot of the data is write-only until something goes wrong. There is a lot of data that is recorded simply for auditing purposes. The system keeps a copy of every version of a form that it has seen and in ideal situations these data rows, and sometimes entire documents that someone has written, are not looked at again - they are there so that if a problem is found or a complaint made everything can be tracked down to the source and procedures updated (and/or wrists slapped) so the problem is less likely to happen again in future.

    I suspect that less then 1% of data is read a week after it is generated. There will be a lot of information out there, be it full documents or rows of stats in a data table, that is generated, made available to people by some means, read (or just skimmed) once by those people, and then "filed" for future reference. It was nearly the same with paper based systems, why should it be any different for electronic storage - the ease of storing and searching through the data (assuming it is well indexed) encourages more data to be stored like this because you don't have quite the same logistical problems associated by massive paper filling systems.

    And yes, it will affect how people purchase and use storage. It has done for years, at least for large databases (main active store and transaction logs on fancy drives in a RAID 10 array possibly of SSDs these days, archive data pushed off (by data partitioning inside the one DB or by actually migrating data to another DB) to a slower array of spinning disks, backups to tape and moved off-site) and home users (active content on one drive, gobs of video on recordable media - though with large drive as cheap as they are these days most people don't need to offline storage unless they want a proper backup).

  60. Too expensive to separate it by hawguy · · Score: 1
    From the summary:

    'The only remaining question will then be: why on earth did we squander so much money by not thinking this way until now?'"

    The reason is that for 90% of businesses, the software and processes that could actually manage migrating unused data off to where it's not on primary storage but still accessible is so expensive and complex so as to not be worth it.

    My company has about 10TB of corporate fileserver data. It's all sitting on SATA disks in a big NAS (well, used to be "big", nowadays it's "small"). Much of that data is important to the company, but may rarely be needed. While I could purchase a tiered storage system with fast FC drives for recently used data, slower SATA for nearline storage and tape for rarely used data plus software to manage it all, in reality it's cheaper to just keep adding SATA shelves to keep it all online. No one wants to pay for a librarian to manage Documentum or other such product to enable us to move unused data offline.

    Plus there's the fact that personally, I don't trust tape for "offline" storage -- if it's not spinning and scrubbed, then it may not be readable. I still use tape for offsite disaster recovery, but would feel better if I could replicate data offsite.

    In the article they don't even seem to advocate a tape backend, just SAS disks on the front end and SATA for nearline, but I don't see the point in that -- I've got a measured 98% read to write ratio -- SATA with a good NAS gives me more than enough performance for corporate fileserver needs, why would I want to pay the price premium to put a fast SAS cache in front of my SATA disks? SAS or FC gives me about 3X the IOPS at 8X the cost of SATA.

    Perhaps in a large company it makes more sense to move to tiered storage, but in the 1000 user range, I just don't see the benefit.

  61. Rate of access does not equal importance of... by Dcnjoe60 · · Score: 2, Insightful

    Rate of access does not equal importance of data. How important are, say, dental records or DNA? To the majority of people, probably not too important. However, in law enforcement, they could be very important. The US military has DNA records on all of its members. However, unless you are dead and they are trying to identify your body, 99% of it is just stored and never used.

    Medical records are stored and unlikely to be used on a regular basis, however, someone coming into the emergency room at the local hospital with chest pains, access to those records in a quick and timely manner may be important.

    What the author seems to be proposing, however, is that records be stored on the basis of how often they will be needed (needed frequently - high speed storage, once in a blue moon, slow or offline storage). In reality, data should be stored on the cost associated with it not being available when needed.

    Using the medical example, it seems that patient data would have a high cost of not being available when needed (death). Payroll information, however, which is needed somewhat frequently, has a lower cost if not available (employee having to wait for the information). As such, the metric should not be on how often the data is accessed, but instead on how vital quick access is.

  62. The problem is-- by Chris+Mattern · · Score: 2, Insightful

    If you can't figure out which 10% you'll need later, you can't use this fact to cut down on your data storage.

  63. WORN storage by captain_dope_pants · · Score: 1

    Rather than using WORM (Write Once Read Many) storage perhaps Dell should invent Write Once Read Never and put 90% of their info on that. It should be cheap to produce, testing would be a doddle too :-)

    --
    while (true != false) process_more_stupid_code();
    1. Re:WORN storage by Arimus · · Score: 1

      Nah, Seagate have too many patents and experience in that market place ;)

      --
      --- Users are like bacteria -> Each one causing a thousand tiny crises until the host finally gives up and dies.
  64. Going full circle? by bkeahl · · Score: 1

    I developed custom manufacturing tracking systems until the market died (between ongoing 50 year exodus from the US and "Enterprise Solutions") and I tended to store the data in to sets for two reasons.

    First, data retrieval for the end-user was faster in the live system if old data were kept elsewhere. Second, it made daily backups of 24/7 systems easier because there was less data to copy.

    The "live" system kept recent data (for some companies that was measured in weeks, for others months). The "Archive" system kept it for years (often legally required). Data would be moved from the live to archival system if the last time it had been touched exceeded some time limit we set AND the data wasn't related to some material still sitting on a shelf or moving through the production process.

    I doubt I would have abandoned the model despite the speed/storage improvements made over the years.

  65. Hadoop by Anonymous Coward · · Score: 0

    This is why systems like Hadoop are taking hold. Working with unstructured and semi-structured data is hard and there is a heck of a lot of it out there. The fact that TFA starts going into storage sort of misses the point--storing it isn't hard. It is the processing... and Hadoop can handle both quite efficiently.

  66. Not surprised by this figure by rcarovano · · Score: 1

    There are a number of tools available to analyze how NAS is being used. Here's one free tool--I'm sure there are others, too. http://www.f5.com/products/data-manager/

  67. Oracle already has that by Moraelin · · Score: 1

    1. Oracle already has that, under partitioning. If there's a column you can define intervals on, you can have your database partitioned like that. E.g., you can have the database sliced by year, and move the old tablespaces to another HDD.

    Probably DB/2 too, though I don't have that much experience with that one.

    2. _But_ as Oracle itself points out, if you're doing it because of some delusions of gaining speed, you're doing it wrong. In this case while "90% never read", don't forget that in a well indexed database time will only increase with the logarithm of the number of records. So don't be surprised if dumping 9 million records out of 10 into another partition, will result in much less speed gain than you'd think.

    And frankly that's the most common reason I see invoked for partitioning. Someone who has no clue thinks that he'll gain the uber-speed by spliting the data. Some of them even people from the IT department who should know better. Sometimes you can't move them from there no matter what, because the poor dumb beast already promised some PHB that as the great performance optimization and would lose face if he admitted he was wrong.

    And especially data which is as in TFA just dead weight and never read, actually has very little impact.

    So basically do a proper analysis first.

    - are you splitting the data thinking that 1/10 of the data will be 10 times faster to access? Think again.

    - are you splitting because of HDD costs? How much data do you have there? I mean, sure, if you're Google or Amazon, it adds up. Otherwise, exactly how much more would the extra complexity cost you? Very few meals are free, and sometimes the extra couple of hundred or even thousands of dollars in just buying fast hard drives can be easily cheaper than the cost of such an overhaul or just the extra admin overhead in the long run.

    - do you actually need all that stuff? I don't read that 90% figure in just number of records, but probably most of it is in columns or whole tables which really aren't needed for anything, but are dutifully stored out of some delusion that some day they might be important. Do you actually need all that trivia? Or would you be better off just dropping a few tables instead of partitioning?

    E.g. just think about how many details some sites want to know about you just to let you download a patch for a game you've bought. And I don't mean to ship it to you, or check your credit card, but just to register.

    Really, there is a difference between data even for mining and pointless trivia. As a trivial example: team's averages for the last season are data that can be used meaningfully, but "which team won the most games (as in, a whole two of them) on a rainy Tuesday night under artifficial lighting" is trivia. The thing is so fine sliced that you're seeing statistical flukes. In the case of those registration sites I mentioned, demographics by age intervals are meaningful data, but by exact birthday is pure pointless trivia. Statistics by region are useful data, but statistics by street name and number are pointless trivia. No, seriously, you won't hit some jackpot that allows you to create a genre specifically for gamers on the even side of the road, nor specifically for people born on a day of the year divsisble by 3.

    At any rate, that's what 90% figure is really about. And partitioning won't solve that. Even dividing the database between old records and new records won't change the fact that you still have 90% of any given partition consinsting of trivia nobody really needs.

    Or, hey, if you absolutely can't let go of any byte of trivia, how about just moving those tables wholesale on a slower HDD? I mean, they're never read anyway.

    --
    A polar bear is a cartesian bear after a coordinate transform.
    1. Re:Oracle already has that by lgw · · Score: 1

      Good advice, but you missed a key idea. Do you need that data in a database at all? Databases need expensive storage, relative to what you can get away with on a flat filesystem, and the cost difference can be quite significant. If most of your data (byte-wise) is in unindexed columns, you can probably store the data in those columns it in simple files instead, and only keep track of where it is in the DB.

      The cost savings from this vary, but can be impressive if it reduces your database size to the point where you can use a cheaper category of storage. Data moved to files can go from big-box storage to commodity CIFS storage, reducing the cost to 10% of what is was before. If you can get your DB storage small enough, you can move it from big-box to simple DAS, and reduce the cost to 10-30% of the big-box price, depending on redundancy needs (if you need enough redundancy, this will add enough management hassle to not be worth it for the DB storage - not everyhting needs 5 9s though).

      If you think all your data must live in a DB, you probably have blinders on.

      --
      Socialism: a lie told by totalitarians and believed by fools.
    2. Re:Oracle already has that by Moraelin · · Score: 1

      Well, the savings aren't that huge actually, given that basically you could save the same thing in BLOBs or the new XML types, and really it's the same space on the hard drive regardless of whether it's connected to a database machine or to a NAS. Still, that's not a bad idea on the whole, but it's not one I would give someone before seeing their actual use case or trusting them to know what they're doing. You'd be surprised what can go wrong or inefficient when you have people do random writes in a file over NFS from several cluster members, because at some point someone wanted to also update this or that field in the file.

      --
      A polar bear is a cartesian bear after a coordinate transform.
    3. Re:Oracle already has that by afidel · · Score: 1

      Partitioning helps by making the indexes in the main table smaller meaning that the number of objects in the data structure is that much smaller and hence faster. My main financials database has significantly more space in indexes than it does in table data because different applications need different sets of indexes. Also if you could shrink your main table to a small enough size that it will fit into an affordable amount of SSD space you really could see a 10-100x speedup in data access.

      --
      There are 4 boxes to use in the defense of liberty: soap, ballot, jury, ammo. Use in that order. Starting now.
  68. Legally required to document and store data by Anonymous Coward · · Score: 0

    I work in a lab that does analysis of tissue samples from clinical trials. We're required by federal law to document an absurd amount of data. And electronic documentation for the most part isn't feasible with the way the tests are performed and the way the laws are written. End result is that I have to print out around 100 pages a day, just for myself, and initial and date documents several hundred times a day. Add in the other 15 analysts that work in my lab, and yeah, the amount of hand-written data is huge. All of this of course gets copied on a photocopier (extra physical copy stored in a separate offsite location), scanned into a pdf for electronic distribution, and then audited (and it's very rare for any errors in the audit to actually have any impact on the test results). The thing is that those pages and documents get boiled down to just a few numbers. From a 100-page stack of testing info, we typically derive about 20 numbers, and a margin of error for each. Chances of anyone going back to investigate any single sample is vanishingly small, and yet all the data and copies of that data have to be retained.
     
    TL;DR - I can certainly believe that 90% figure, my personal experience would indicate it to be much much higher. I'd like to reduce the amount of data that needs stored; but the law prevents that from happening.

  69. Has any progress been made... by iamacat · · Score: 1

    On identifying the 10% which will be needed ahead of time? I think the focus should be the opposite - to preserve MORE data and index it better. It's not hard to imagine that an addition 10% could have been used if made available at the point of need in a relevant format, effectively doubling productivity. How many employees in a company with 10K+ developers are still coding hashtables. Sure there are variations in languages and needs, but some HAS already written JUST what you need and if you had access to company owned code with the same ease as browsing GPL code on Google you would benefit.

  70. But which 10%? by tcgroat · · Score: 1

    The devil is in the details: figuring which part is the 90% that you'll never need again, and which is the 10% that will be needed. Some of that "write-only" data is stuff that companies are legally obligated to retain, some is CYA records that you hope you'll never need again. In both cases when the court, IRS, etc. orders you to produce the documents, you'd better have them.

  71. Two words by jav1231 · · Score: 1

    Purge policy. This is not news, though the figure may be ambiguous. Any SA can tell you, if asked how long has the data remain untouched? You see this in database backups where they go un-queried for years. We give it 3 years then it's gone. Storing data for 3 years isn't going to break the bank, per se'.

    1. Re:Two words by Sulphur · · Score: 1

      On line for six months, and archived indefinitely afterwards. Phone company policy, so i am told.

  72. Not ambiguous by Anonymous Coward · · Score: 0

    What's ambiguous about 90%? It's inaccurate, not very precise, but completely unambiguous.

  73. Compliance and Lawsuits by bagsc · · Score: 1

    Probably 95% of the records are for compliance and things legal wants saved for CYA purposes. This is more a function of the legal environment, where everyone wants to sue every business that looks at them funny, and how courts expect tons of documents on everything you've ever done. It'd be an interesting analysis to see what the costs of excess records retention are compared to the legal losses, and more importantly, the losses consumers incur because they can't afford to fight well documented machines or what consumers lose because companies are under-documented.

    --
    http://www.accountkiller.com/removal-requested
  74. Useless stats by Anonymous Coward · · Score: 0

    It is like saying that 99.99% of backups are never restored. You never know in advance which 10% you'll need later.

  75. delete all power point files? by DMoylan · · Score: 1

    massive files. zero content. that will get your useless content below 50% right there.

  76. What a coinkydink! by meburke · · Score: 1

    One advantage of being an old fart is that sometimes you can remember the way things were...back in the 70's. For some reason, during the last 2 weeks I've had the coincidental task of explaining what we used to call "Data Structured Systems Development", specifically for people who were concerned about data overload.

    Ken Orr and Jean Dominique Warnier used to say, "Data that is not used is not accurate." DSDD started from the outputs and systems were designed to capture and use only data that were necessary for those outputs.

    Research on information has pretty much debunked the idea that data and information are the same. Even back in the 70's, we had programmers who wanted to capture everything, "..because we might need it some day." The good news is that capturing tons of data allowed us to use statistics to streamline business behavior and do scientific and systems research on the behavior of the businesses. The bad news is that this isn't done very often, and data deteriorates; it never gets to make the transition from bits to information.

    --
    "The mind works quicker than you think!"
  77. Old is new again: HSM by swordgeek · · Score: 1

    When I first got into enterprise computing 15 years ago, HSM (Hierarchal Storage Management) was making a comeback. At that time, the vendors were promoting it as something new in their promo cruft, but if you read the technical manuals, they claimed that it was an old idea that wasn't effective before but was now.

    Here it is again, and I'm sure it had a resurgence in the middle as well, with NetApp and others coming online. HSM waxes and wanes every handful of years, but it ultimately only makes sense in very rigidly structured realms. It rarely works well for average users in an average mixed-use environment, and any attempt I've seen to implement it in a case like that has failed.

    Now in the mainframe world, it used to work brilliantly. I used to work for a telco that had an IBM S/390 with HSM managing the storage. Data would be written to fast directly accessible disk, then gradually automatically migrated off to slower secondary disk, then to tape, and then booted out of the tape library. If you tried to access any file that had been ignored for 'x' and was moved to tape, you would get a pop-up message letting you know that it was being retrieved, and would be available in a few minutes. The tape would be mounted, the file retrieved to fast 1st tier storage, and you were in business. If you were looking at a file older than that, then you'd get a service call ticket number, the ops guys would get a 'mount this tape' request, and you'd get a notification when it was available.

    Part of the problem though, is trying to keep track of filesystem sizes. With tape on the back-end, you can have effectively infinite filesystems, parts of which are slower than others. The OS has to understand it at a fairly low level to make it work. And the users need to understand it as well.

    --

    "People who do stupid things with hazardous materials often die." -- Jim Davidson on alt.folklore.urban
  78. Just In Case by DynaSoar · · Score: 1

    As with White House emails, to cite a prominent example, business keeps as much as it does Just In Case they get audited/investigated/sued. They want to be able to provide a (non-)paper trail.

    90% is probably high for most. In my experience 50% is probably low. I'd figure they crank out twice as much digital effluvia as they use. And nothing makes the other side sit back like telling them 'You want an audit? Fine, we've got elevnty jigglebytes to go through, conservative estimate to go through it, twentyteen months'. Such occurences aren't common except in knowledge, so they're saving the trash for a rainy day.

    --
    "I may be synthetic, but I'm not stupid." -- Bishop 341-B
  79. opportunity cost by yyxx · · Score: 1

    Someone doesn't understand opportunity cost.

    Or, in different words, 99% of tax audit-related data is saved by never read. But if you get audited and you don't have it, you're in trouble, and your costs likely far exceed 100x the cost of having kept the records in the first place.

    90% of backup data is also never read. That doesn't make it useless or "dead weight".

  80. Data Storage by helix2301 · · Score: 1

    What I wonder now is if the Data is never read again. How is the data stored? Is the data archived or sitting on drive on a server in the backroom. The other question I wonder is what data is stored there old PC specs, support documentation or HR and Customer Data. Then I wonder is 90% is never read again how secure is that data especially if the data is confidential. Just things to wonder about. This is why most companies don't leave old data laying around so they don't have to worry about storage and safety concerns.

  81. Nearly 200 posts.... by ruinevil · · Score: 1

    ... and not a single mention of the Pareto Principle.

  82. EMC Did This Work Many years ago by Anonymous Coward · · Score: 0

    There is nothing really new in this analysis. EMC has for the past 10 years made an entire business out of Information Lifecycle Management. The basic tenets of ILM is that information is most likely to be accessed immediately after it is created, and that the likelihood that it will be accessed drops off as time advanced.

      The standard example, that of a banking transaction. The transaction record will be most active in the first month, it is created, backed up, and then later read for the statement. AFter the statement is printed, then it is unlikely ever to be resurrected.

    So ILM says to progressively migrate data from active, online storage to nearline, and then offline storage as time progresses.

    ACtually, the whole thing mirrors the way human memory works. The most active memories we have are of the past few days, after that they migrate to long term memory and the experiences summarised and becomes experience.

  83. It's not really news by _Shad0w_ · · Score: 1

    I'm not sure this is really ground breaking or startling. A DBA I worked with pointed out that the vast majority of data in a database is written and never read - something you have to take in to account when deciding whether or not to place an index on a table (they slow down the speed of your inserts). It doesn't take much effort to extrapolate that to include any form of data.

    Most of the data we store on disk at work is never read, it just sits there taking up space. If we actually thought about it we could put in place a mechanism to move it to an offline storage mechanism of some variety.

    --

    Yeah, I had a sig once; I got bored of it.

  84. Higher Pay by Dun+Kick+The+Noob · · Score: 1

    Data is always needed from manufacturing, operations, sales, marketing, administration, outsourcing etc. Realistically speaking 10% usage is very high, im tending towards 5% usage. But of course depends on how in depth the researcher went, there are always unofficial sources of data that are used in corporations. Anyway whats so bad, big waste means more need, more staff, more employment and higher wages. You always appoint a new rep to oversee data, and monitor changes when new information is required. It shows management incompetency but hey i get better pay, why is anyone complaining. I

  85. It's a feature not a bug? by Hognoxious · · Score: 1

    is dell about to make a press release about faulty storage in their servers resulting in about 90% data loss?

    Turns out we were doing you a favor. You didn't really need it anyway!

    I don't think anybody would believe that. Now if it had been Apple...

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
  86. Need a records retention policy... by a-freeman · · Score: 1

    (IAAL) Most of the comments here relating records retention for audits or compliance purposes are dead on.

    That said, nearly every company that I've worked at could benefit from a new or updated records retention policy. Typically, the mentality is to just keep everything until someone finally realizes that certain records are 27 years old and starts asking around: "hey, can we just delete this?"

    The better approach is to actively look for stuff to toss, and do so on a monthly or quarterly basis. Not only is it cheaper to store, but for most industries, *not* having the records available for many common causes of litigation is beneficial.

  87. "never"????? by seekertom · · Score: 1

    the tagline says "90%... is NEVER read..." yet the article doesn't agree: ALL of it is read, just maybe not more than onct! makes me wonder whether /. folks, among others, only see what they want to see in the headline, and get awlkindza frothy at the mouth to explain their point on what they 'thought' they read. no wonder washington only has to say something in a headline to get folks to believing it! thanks fer lis'nin' seekertom

  88. Data Progression by Anonymous Coward · · Score: 0

    Keep your data and have it automatically moved to lower cost SATA storage on your SAN.

    http://www.compellent.com/Products/Software/Automated-Tiered-Storage.aspx

    We have been using a compellent SAN for about a year now and I couldn't be happier!

  89. Do what you want but disks are fragile by dbIII · · Score: 1

    Not dumb enough, experienced enough and I've ready enough instead of taking a seat of the pants guess - I'll back my approach over yours anytime.

    The obvious fucking answer to your strawman is that everyone that stores tapes for a long time keeps them dry. Storing disks for long periods of time is not as simple since they are not designed to last as long and have multiple time based modes of failure, and they are far more fragile so with your strawman poor storage example you would see far more problems with the disks.
    The second answer is 7 years and your tiny sample size is nothing, I've had drives spinning for that long - some survived some didn't - stored cold for that long with silica gel - some survived some didn't. See what I wrote about lubricant problems in long term storage in the comment above, there is no perfect environment to store hard drives. They will die over time if it's dry or if it's humid.
    The third answer is I've seen a huge number of drives and a much larger number of tapes - guess what experience and the literature say has the larger number of failures. While I don't have 40 dead drives yet the number is starting to mount up and I should destroy them properly some time.
    It also annoys me a great deal when people pre-emptively insult others by accusing then of their own faults and invent results that are the opposite of what they actually are. Review what you've written, compare it to what reality exhibits and you'll see I'm not being wilfully ignorant here.