Slashdot Mirror


Dell Says 90% of Recorded Business Data Is Never Read

Barence writes "According to a Dell briefing given to PC Pro, 90% of company data is written once and never read again. If Dell's observation about dead weight is right, then it could easily turn out that splitting your data between live and old, fast and slow, work-in-progress versus archive, will become the dominant way to price and specify your servers and network architectures in the future. 'The only remaining question will then be: why on earth did we squander so much money by not thinking this way until now?'" As the writer points out, the "90 percent" figure is ambiguous, to put it lightly.

27 of 224 comments (clear)

  1. Coincidence? by Hognoxious · · Score: 5, Funny

    90% - just like the percentage of statistics that are made up on the spot.

    --
    Confucius say, "Find worm in apple - bad. Find half a worm - worse."
    1. Re:Coincidence? by dov_0 · · Score: 5, Funny

      Or is dell about to make a press release about faulty storage in their servers resulting in about 90% data loss?

      --
      sudo mount --milk --sugar /cup/tea /mouth /etc/init.d/relax start
    2. Re:Coincidence? by espiesp · · Score: 5, Funny

      Or having developed a new memory technology.

      "Dell releases a new drive based on their patented WORN architecture. Because this device forgoes the need to read your data they can be made lighter and faster and more power efficient than even the latest SSD drive technology."

    3. Re:Coincidence? by hairyfeet · · Score: 3, Insightful

      Probably SOX and other data required for CYA. I have set up small business networks for quite a few businesses, and while I don't know about 90% I'd say a good 70% of the data they had me set up backup solutions for was stuff they would never break out unless a CYA situation came up like an IRS audit. The simple fact is you have to keep a LOT of stuff to CYA nowadays, and most of that stuff won't be used in any other situation.

      So while I'm not sure about the 90% part at least from my own experience I can believe 70-80% easy. With the possibility of lawsuits (both you suing them for unpaid bills or them suing you because they decide they don't like the work) IRS audits, SOX, there is a whole lot of data that unless a specific set of circumstances come up will be WORN. That is just a part of doing business in the digital age.

      --
      ACs don't waste your time replying, your posts are never seen by me.
  2. Which 90% ? by mbone · · Score: 5, Insightful

    I could believe the 90% number. There is plenty of data sitting around in case it is needed. Some of it will be needed. Much of won't be. How do you predict which is which ?

    1. Re:Which 90% ? by eldavojohn · · Score: 5, Insightful

      I could believe the 90% number. There is plenty of data sitting around in case it is needed. Some of it will be needed. Much of won't be. How do you predict which is which ?

      Yeah, as someone who has implemented a few auditing solutions where I work, I must confess that it seems to be 99% of the data we archive is never looked at again. A lot of it is due to policies and is only used after something goes dreadfully wrong. If they are well thought out, the metrics can be collected as the data is written instead of needing to search across the data.

      I think their "90% dead-weight rule" is really a misnomer as you could probably claim that 90% of Google's indexing is never read but we all know that it's the potential that data holds that makes it so valuable and necessary. If Google knew every future possible search then they could delete the data they will never use ... but how do they know they will never use it? How do I know that the auditing data will never have a use--by new metric or incident investigation? The truth is simply that you don't.

      --
      My work here is dung.
    2. Re:Which 90% ? by alexhs · · Score: 3, Interesting

      If each piece of data has 90% probability of not beaing read again...

      You discard only 10 pieces out of 100, or out of 1 billion, whatever...

      The probability that none of these 10 pieces of data would have ever been needed again is 0.9^10 = 0.348 = 34.8%

      Which means that you keep all of your data.

      Caveats :

      • This assumes that all pieces have equal interest (but maybe you store a field that the interface doesn't allow you to retrieve).
      • Assuming a random access on the 10% used, if you remove 10 out of 100, you have a much more important retrieve failure than if you remove 10 out of a billion. Some retrieve failure rate could be acceptable.
      --
      I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
    3. Re:Which 90% ? by Mspangler · · Score: 4, Insightful

      Note that I'm working from a process control perspective in a chemical plant, but 90% of data written is never read again sounds about right for when things are going well. It's when something goes wrong and you have to figure out what went wrong at exactly what time and what the regulatory consequences were that having all that previously unread data suddenly becomes very interesting indeed.

      And also when you start looking at a system in detail to see if you can increase output, or change a composition, all that usually ignored data becomes very valuable.

    4. Re:Which 90% ? by BrokenHalo · · Score: 3, Interesting

      Another problem is figuring out _why_ data isn't used before archiving it.

      The problem is that so much data is made available without anyone ever considering how useful it might be. At least we've come some way in the last 20 years:

      Back in the '70s and '80s I worked at many sites where mainframe ops used to clear tonnes of fanfold paper every day. This is why we had separate printer rooms: a bank of 6 or 8 barrel-printers belting out 132 columns of text at 1800 lines/minute created sacksful of dust.

      Most of that rubbish was never read in any depth - it was physically impossible to do so before it became out of date, so most of that paper went straight to the shredders, which often shared space with the printers that created the stuff in the first place. I used to have fantasies about lining up the shredders directly behind the printers to save everybody the trouble of distributing the printouts.

    5. Re:Which 90% ? by alexhs · · Score: 3, Informative

      For any given sample, 1/10th of them will be necessary.

      I'm sorry but you're wrong. That's not how stats are working.

      Let's play heads or tails.
      Each toss has a 50% chance of being heads.
      According to you, for any number of tosses, 50% of them will be heads. In other words, you're saying that there is a 100% chance that half of them will be heads.

      For a sample of two tosses, that would mean a 100% probability of one head(s) and one tail(s).
      I hope that you see how this is wrong. You would actually have 50% probability of one head and one tail, 25% probability of two heads, 25% probability of two tails.

      For a sample of size n, 10% probability for a piece of data to be necessary, the correct formula says that the probability for at least one element of the sample to be necessary is 1-(0.9^n), which quickly approches 1 (100%) as n increases.

      Now, a MUCH more useful set of data is probability over time. 1/10 within 10 years? 5 years? 1 week?

      It depends of what you mean by probability over time. What I can tell you is that as more time elpases, the probability of an element to be necessary (more correctly, to having been necessary) increases. The 90% never read is supposedly for an infinity of time (that's what "never" means, right ?).

      --
      I have discovered a truly marvelous proof of killer sig, which this margin is too narrow to contain.
  3. which 90% by marmusa · · Score: 3, Insightful

    Which 90% though? Like the Coca Cola exec who remarked that he was pretty sure half of his advertising budget was wasted, he just wasn't sure which half.

    1. Re:which 90% by Koby77 · · Score: 5, Informative

      I worked in a call center, and I can definitely believe that 90% of the data is never read again. However, when a customer is calling back (and is angry!), you don't have time on a live call to wait to see what's up with the account. Also there can be some litigious aspects, and a lot of information was recorded for C.Y.A. purposes. Again, you never know which part is needed for C.Y.A. purposes, but that 10% sure is valuable.

      So yeah, we needed to store ALL the account information, and we needed fast access to ALL of it ALL the time.

  4. It's like Office features by drinkypoo · · Score: 5, Informative

    People always bitch that they have to pay for Microsoft (or whatver) Office's features because they only use 5% of its functionality. But you buy all those features at once because you don't know which you will need in the future. Data warehousing is the same way. If you start taking data offline you'll just need that data. That's why analyses of very large data sets are performed before archiving.

    But what is really wanted is a way to cluster the database servers, with old data automatically cycled to the slowest, most remote nodes, and with the most frequently-altered data heavily replicated and aggressively synchronized.

    --
    "You're right," Fisheye says. "I should have set it on 'whip' or 'chop.'"
  5. The problem is "Write-only" applications by shoppa · · Score: 4, Insightful

    Interesting that this seems to have been written up as a "hardware" or "storage" topic.

    The problem is, that IT people dream up all these "write only" applications that record data, without any rational plan for what the data might actually be used for in the business.

    For example, some people worry about privacy when they go to the grocery store and know that all their purchases are being tracked by their loyalty card, or worry that the big bad US government is tapping all the E-mail.

    In fact, I'm 100% sure that some IT geek had some wet dream years ago about recording everybody's purchases and E-mail and phone call and it's being done every which way.;

    The true "IT application" issue is that there is no real business need for this data 99.999% of the time. It gets recorded, probably gets staged off to tape, maybe indexed in some giant table, and then ... sits there for years with no actual need for it.

    I'm sure the IT geeks who dreamed up the technical ability to record all this stuff, thought they were hot shit when they came up with it. Oh, man, those IT architects were just having a big go-round whipping this problem in scalability. In their heads, they were gonna record everything on disk, then go home and fuck the prom queen.

    1. Re:The problem is "Write-only" applications by mikael_j · · Score: 5, Insightful

      The problem is, that IT people dream up all these "write only" applications that record data, without any rational plan for what the data might actually be used for in the business.

      These plans mostly come into being because us "IT people" (read: developers) know that the "business people" love changing the specs and they'll blame us if they want to start using data they didn't ask us to save and we tell them we can't save data retroactively (really, they'll basically blame the developers for not being able to time-travel). This is why we'd rather save everything than not save enough.

      --
      Greylisting is to SMTP as NAT is to IPv4
  6. This isn't a 'new way of thinking' by sirwired · · Score: 5, Insightful

    Automated Hierarchical Storage Management has literally been around for decades. It may be new-ish on low-end crap x86 servers, but for say, mainframe users, it isn't new at all.

    What is new is available implementation choices. When your tier choices are between enterprise disk and enterprise tape, you are biased towards keeping data on disk; there's still use cases for HSM with only high-end disk and tape, but they aren't as great. Now with lower-cost disk available, you have a cheap disk choice too, with fairly reasonable access time.

    SirWired

  7. Perfect by Andreaskem · · Score: 4, Funny

    A perfect application for my patented write-only memory.

  8. This is new? by rapturizer · · Score: 4, Interesting

    I saw this over a decade ago when I was working as an IT consultant in the advertising industry. They regularly used only 5% - 10% of their information (and that's being generous). The systems I designed included a server for active work, an archive server for information used in the last 24 months, and then an archive solution (Magneto Optical at the time) that allowed for the information to be available, just not on demand. This idea has been working since for the clients that are still in business.

  9. Signetics invented the needed chip back in the 70s by ve3id · · Score: 3, Funny

    FINALLY !!! AN APPLICATION FOR THE WOM!!!! http://www.national.com/rap/files/datasheet.pdf Bob Pease sure was fore-sighted, since this memory chip was invented back in the seventies!

  10. Good argument for tape? by mlts · · Score: 3, Interesting

    This is one reason I like tape: The drives are expensive, but the tapes are $30-$50 (LTO-4 is $30 on mail-order). So having an autochanger moving all the rarely used data into storage is likely the most efficient way of moving data to long term archiving. Even better is making sure that 2-3 sets of tapes are used (one onsite, one offsite.)

    Of course, hard disks by themselves may seem cheaper, but they are not a true archival medium. There are so many moving parts in a HDD and each of them (bearings, heads, spindles, motors, controller card) are a point of failure.

    With HDD capacities starting to not grow as exponentially as they did last decade, it would be nice if tape companies would not just catch up with 2-3TB native tape offerings, but be able to offer drives at a lower price so home and SOHO users can use them for long term storage. I'm sure that if someone offered a consumer level tape drive for $500 with a decent capacity, that a lot of small businesses would buy it, especially if it came with decent backup software (Retrospect, Backup Exec, Amanda, bru, or another utility that is similar.) Since some tape drives are even bootable (some HP offerings have a section of the tape to emulate a boot CD or DVD), it would be ideal for bare metal recoveries even by nontechnical users. Pop in the tape, boot the machine, type in the encryption key, select where the data should be restored to, walk off for a bit and it is done.

    Even though the SAN companies have said tape is going to die, until another form of media (perhaps super-inexpensive flash media [1]) is as reliable as tapes and can be put in the Iron Mountain case and sent offsite for safekeeping for decades on end, tape will be with us. Only optical comes close to tape for long term archiving abilities.

    [1]: I can see someone make flash media that is semi-smart where it is put in a specific case, shipped to an offsite warehouse, and that warehouse plugs in the cases into 5-12VDC. Then over time, the circuitry on the flash drives periodically checks the stored flash media for damage or bit rot, corrects errors by rewriting blocks, and good blocks it would periodically move to ensure that there is a high signal to noise level on all media. Of course, this requires power, while tapes can happily sit in a climate controlled warehouse and be still recoverable.

    1. Re:Good argument for tape? by mbone · · Score: 3, Informative

      Tapes are not archival storage either. In either case, archival storage is a system, not a medium.

      I hope you are reading all of those tapes on a 5 year cycle, and writing new ones with the recovered data. I also hope you are making sure that the humidity and temperature are strictly controlled at all times in the tape storage room.

  11. this is actionable: think of the storage savings by rubycodez · · Score: 4, Funny

    this helps me to be a better employee. From now on I'll only save 25% of the data I acquire, because the odds are the other 75% would only be needed 7.5% of the time. In other words, 92.5% chance not likely to be needed at all.

  12. Much, much higher - probably 99% +++ by petes_PoV · · Score: 3, Funny

    If you're talking about blog entries. Almost all of them (well, almost all of *mine* :-) are written once and never read, unless you count spiders as reading them.

    --
    politicians are like babies' nappies: they should both be changed regularly and for the same reasons
  13. dell's new line of fire extinguishers coming soon! by drfireman · · Score: 5, Insightful

    Over 92% of fire extinguishers will never be used, we could probably save a bit of space by having the unneeded ones stored off-site, or in less accessible corners of the garage.

    Slightly more seriously, we can certainly answer this question posed by the linked article easily: "why on earth did we squander so much money by not thinking this way until now?" The answer is: because you are a moron. Anyone who has given even a moment's thought to storage has known this, either implicitly or explicitly, for a long time. So whoever's included in your "we," Steve Cassidy, is just profoundly stupid. I think that quite easily explains why you all squandered so much money by not thinking about this. Next question?

  14. So what? by davidbrit2 · · Score: 4, Insightful

    And if you didn't have that 10% that is eventually needed, you'd be totally screwed. Do we really need to play the 20/20 hindsight game every time somebody thinks of something like this?

  15. Exactly. by brusk · · Score: 3, Insightful

    I wasted money on a dictionary that has tens of thousands of words but have only ever looked up a few hundred. I should have bought one that just had the words I would actually need.

    --
    .sig withheld by request
  16. Solutions: by drolli · · Score: 5, Interesting

    a) Forbid *unmanaged* of documents. If the question: "where is the most up-to-date version of this document stored?" is systematically and easily answered then people can delete the crap from their laptops.

    b) Forbid in-company attachments to mails. If the last version can be easily found, including the revision history, a link to this revision is worth *more* than the current state of the document. Most space in my inbox are totally useless attached documents.

    c) Forbid the use of formats unsuitable for storing a certain kind of information. (Where i work, they use powerpoint/word files for electronics forms)

    d) Provide a good archiving and backup service. Besides the quality improvement by using a service, also the 100th copy done in some unsystematic way of some data is prevented (forbid this explicitely)

    e) Thin clients. store the data on a server. Deduplicate.

    f) i would expect that most of the documents in a company can (and should) be stored in a database.